Project Description:
The Movie Recommendation System is a machine learning project that recommends similar movies based on a selected movie. Using content-based filtering, the system analyzes movie metadata such as genres, cast, director, plot keywords, and overview to find similar films.
With NLP techniques and cosine similarity, the system delivers accurate movie recommendations. A Streamlit web interface allows users to easily select a movie and view the top 5 recommended movies along with their posters fetched from the OMDb API.
Key Features:
- Data Preprocessing:
- Remove duplicates and handle missing values
- Clean movie metadata including genres, overview, plot keywords, director, and cast
- Combine metadata into a single text column for analysis
- Remove duplicates and handle missing values
- Feature Engineering:
- Apply text preprocessing: lowercase conversion, remove spaces, and stemming using PorterStemmer
- Convert text into numerical vectors using CountVectorizer
- Apply text preprocessing: lowercase conversion, remove spaces, and stemming using PorterStemmer
- Recommendation Engine:
- Compute cosine similarity between movie vectors
- Recommend top 5 movies similar to the selected movie
- Display recommended movies with poster images
- Compute cosine similarity between movie vectors
- Web Interface:
- Built with Streamlit
- User-friendly dropdown to select a movie
- Dynamic display of recommended movies and posters
- Built with Streamlit
Technology / Tools:
- Programming Language: Python
- Libraries: pandas, numpy, scikit-learn, nltk, streamlit, requests
- Tools: Jupyter Notebook, Streamlit, OMDb API
Project Workflow:
- Load the dataset (25k IMDb movie Dataset.csv) and clean it
- Remove duplicates and handle missing values
- Process metadata fields: genres, cast, director, plot keywords, and overview
- Combine all text fields into a single tags column
- Apply text preprocessing: lowercase, stemming, remove spaces
- Convert tags into numerical vectors using CountVectorizer
- Compute cosine similarity between all movies
- Build a recommendation function to return top 5 similar movies
- Fetch movie posters using OMDb API
- Develop a Streamlit web interface to display recommendations
Dataset:
- Source: 25k IMDb Movie Dataset.csv
- Columns Used:
- movie title โ Movie name
- Generes โ Movie genres
- Overview โ Short description of the movie
- Plot Keyword โ Plot-related keywords
- Director โ Director names
- Top 5 Casts โ Main cast of the movie
- movie title โ Movie name
- Preprocessing:
- Dropped unnecessary columns (path, rating, runtime, writer, year)
- Handled missing values and duplicates
- Converted list-like fields into lists using ast.literal_eval
- Dropped unnecessary columns (path, rating, runtime, writer, year)
Download Source Code:
Project Setup Instruction:
โ ๏ธ Note: This project is for educational purposes only. Not for commercial sale.