Movie Recommendation System

The Movie Recommendation System is a machine learning project that recommends similar movies based on a selected movie. Using content-based filtering, the system analyzes movie metadata such as genres, cast, director, plot keywords, and overview to find similar films.

With NLP techniques and cosine similarity, the system delivers accurate movie recommendations. A Streamlit web interface allows users to easily select a movie and view the top 5 recommended movies along with their posters fetched from the OMDb API.


  • Data Preprocessing:
    • Remove duplicates and handle missing values
    • Clean movie metadata including genres, overview, plot keywords, director, and cast
    • Combine metadata into a single text column for analysis
  • Feature Engineering:
    • Apply text preprocessing: lowercase conversion, remove spaces, and stemming using PorterStemmer
    • Convert text into numerical vectors using CountVectorizer
  • Recommendation Engine:
    • Compute cosine similarity between movie vectors
    • Recommend top 5 movies similar to the selected movie
    • Display recommended movies with poster images
  • Web Interface:
    • Built with Streamlit
    • User-friendly dropdown to select a movie
    • Dynamic display of recommended movies and posters

  1. Programming Language: Python
  2. Libraries: pandas, numpy, scikit-learn, nltk, streamlit, requests
  3. Tools: Jupyter Notebook, Streamlit, OMDb API

  1. Load the dataset (25k IMDb movie Dataset.csv) and clean it
  2. Remove duplicates and handle missing values
  3. Process metadata fields: genres, cast, director, plot keywords, and overview
  4. Combine all text fields into a single tags column
  5. Apply text preprocessing: lowercase, stemming, remove spaces
  6. Convert tags into numerical vectors using CountVectorizer
  7. Compute cosine similarity between all movies
  8. Build a recommendation function to return top 5 similar movies
  9. Fetch movie posters using OMDb API
  10. Develop a Streamlit web interface to display recommendations

  • Source: 25k IMDb Movie Dataset.csv
  • Columns Used:
    • movie title โ€“ Movie name
    • Generes โ€“ Movie genres
    • Overview โ€“ Short description of the movie
    • Plot Keyword โ€“ Plot-related keywords
    • Director โ€“ Director names
    • Top 5 Casts โ€“ Main cast of the movie
  • Preprocessing:
    • Dropped unnecessary columns (path, rating, runtime, writer, year)
    • Handled missing values and duplicates
    • Converted list-like fields into lists using ast.literal_eval


โš ๏ธ Note: This project is for educational purposes only. Not for commercial sale.