Resume Scanning

This project is an intelligent Resume Screening System that automatically analyzes and ranks candidate resumes against a given job description using Natural Language Processing (NLP) and Machine Learning techniques.

The system extracts text from PDF resumes, cleans and processes the content using NLP, converts text into numerical vectors using TF-IDF, and calculates similarity scores using Cosine Similarity to determine how well each resume matches the job requirements.

It classifies resumes into categories such as Highly Matched, Moderately Matched, and Not Matched, helping recruiters quickly identify the most suitable candidates and drastically reduce manual screening time.

Resume Scanning Image 1
Resume Scanning Image 2
  • Automatic PDF Resume Text Extraction
  • Advanced Text Cleaning using NLP (Stopword removal, punctuation removal)
  • TF-IDF Vectorization for text representation
  • Cosine Similarity based resume ranking
  • Smart classification into match categories
  • Batch processing of multiple resumes from a folder
  • Generates a CSV report with match percentage
  • Fast, accurate, and fully automated screening process
  1. The system reads a Job Description PDF.
  2. It extracts text from multiple resume PDFs stored in a folder.
  3. Text data is cleaned using NLP techniques:
    • Lowercasing
    • Punctuation removal
    • Stopword removal
  4. The cleaned text is converted into numerical form using TF-IDF Vectorizer.
  5. Cosine Similarity is calculated between the Job Description and each resume.
  6. Based on similarity score:
  7. ≥ 70% → Highly Matched
  8. 40–69% → Moderately Matched
  9. < 40% → Not Matched
  10. Results are sorted and saved into a CSV file for recruiters.
  • Python
  • NLP (Natural Language Processing) — NLTK
  • Machine Learning Concepts
  • TF-IDF Vectorization
  • Cosine Similarity
  • PyPDF2 for PDF text extraction
  • Pandas for data handling
  • Scikit-learn for vectorization and similarity
  • CSV for result storage
  • Python 3.8 or above
  • Streamlit
  • python-docx
  • Eliminates manual resume shortlisting
  • Saves significant recruiter time and effort
  • Provides objective, data-driven candidate ranking
  • Improves hiring efficiency and accuracy
  • Useful for HR departments, consultancies, and job portals
  • Can be extended into a full ATS (Applicant Tracking System)

Note: This project is for educational purposes only. Not for commercial sale.