Data and Document Completeness Checker

Easily validate and verify your data and documents in just a few clicks. The Data & Document Completeness Checker is a powerful yet simple Python + Streamlit tool that lets you upload a CSV file, rules DOCX file, and PDF documents to automatically check missing data fields and detect missing documents. Ideal for compliance, finance, insurance, and data teams.


  • Automatic CSV Validation: Instantly detect missing or incomplete fields in your dataset.
  • PDF Document Matching: Scan multiple PDFs and verify if required documents are present.
  • Rule-Based Checking: Use a DOCX file to define required fields and documents per report type.
  • Per-Client Report View: Expand each clientโ€™s record to view missing fields and unmatched documents.
  • Summary Dashboard: Get a complete overview of total cases, completed vs. incomplete, and most missing fields.
  • Easy-to-Use Interface: Built on Streamlit for a clean and fast experience.

  1. Upload a CSV file containing your client or record data.
  2. Upload a Rules DOCX file that lists required fields and documents for each type of report.
  3. Upload one or more PDFs with the supporting documents.
  4. The system scans and compares the data with the rules.
  5. Instantly see which clients have missing fields or missing documents.
  6. View an easy-to-read summary and download reports if needed.

  • Save hours of manual data and document checking.
  • Improve accuracy and ensure data completeness.
  • Detect issues early to avoid delays in compliance workflows.
  • Works locally โ€” no sensitive data leaves your system.
  • 100 % open source and easy to customize.

  • Python 3.8 or above
  • Streamlit
  • Pandas
  • PyMuPDF
  • python-docx


โš ๏ธ Note: This project is for educational purposes only. Not for commercial sale.