Movie Recommendation System with Python & Streamlit

Project Overview

The Movie Recommendation System is a machine learning project that recommends similar movies based on a selected movie. Using content-based filtering, the system analyzes movie metadata such as genres, cast, director, plot keywords, and overview to find similar films.

With NLP techniques and cosine similarity, the system delivers accurate movie recommendations. A Streamlit web interface allows users to easily select a movie and view the top 5 recommended movies along with their posters fetched from the OMDb API.

Key Features

Data Preprocessing:
- Remove duplicates and handle missing values
- Clean movie metadata including genres, overview, plot keywords, director, and cast
- Combine metadata into a single text column for analysis
Feature Engineering:
- Apply text preprocessing: lowercase conversion, remove spaces, and stemming using PorterStemmer
- Convert text into numerical vectors using CountVectorizer
Recommendation Engine:
- Compute cosine similarity between movie vectors
- Recommend top 5 movies similar to the selected movie
- Display recommended movies with poster images
Web Interface:
- Built with Streamlit
- User-friendly dropdown to select a movie
- Dynamic display of recommended movies and posters

Technology Used

Programming Language: Python
Libraries: pandas, numpy, scikit-learn, nltk, streamlit, requests
Tools: Jupyter Notebook, Streamlit, OMDb API

Project Workflow

Load the dataset (25k IMDb movie Dataset.csv) and clean it
Remove duplicates and handle missing values
Process metadata fields: genres, cast, director, plot keywords, and overview
Combine all text fields into a single tags column
Apply text preprocessing: lowercase, stemming, remove spaces
Convert tags into numerical vectors using CountVectorizer
Compute cosine similarity between all movies
Build a recommendation function to return top 5 similar movies
Fetch movie posters using OMDb API
Develop a Streamlit web interface to display recommendations

Dataset

Source: 25k IMDb Movie Dataset.csv
Columns Used:
- movie title – Movie name
- Generes – Movie genres
- Overview – Short description of the movie
- Plot Keyword – Plot-related keywords
- Director – Director names
- Top 5 Casts – Main cast of the movie
Preprocessing:
- Dropped unnecessary columns (path, rating, runtime, writer, year)
- Handled missing values and duplicates
- Converted list-like fields into lists using ast.literal_eval