Project Description:
Easily validate and verify your data and documents in just a few clicks. The Data & Document Completeness Checker is a powerful yet simple Python + Streamlit tool that lets you upload a CSV file, rules DOCX file, and PDF documents to automatically check missing data fields and detect missing documents. Ideal for compliance, finance, insurance, and data teams.
Key Features:
- Automatic CSV Validation: Instantly detect missing or incomplete fields in your dataset.
- PDF Document Matching: Scan multiple PDFs and verify if required documents are present.
- Rule-Based Checking: Use a DOCX file to define required fields and documents per report type.
- Per-Client Report View: Expand each clientโs record to view missing fields and unmatched documents.
- Summary Dashboard: Get a complete overview of total cases, completed vs. incomplete, and most missing fields.
- Easy-to-Use Interface: Built on Streamlit for a clean and fast experience.
How It Works:
- Upload a CSV file containing your client or record data.
- Upload a Rules DOCX file that lists required fields and documents for each type of report.
- Upload one or more PDFs with the supporting documents.
- The system scans and compares the data with the rules.
- Instantly see which clients have missing fields or missing documents.
- View an easy-to-read summary and download reports if needed.
Benefits:
- Save hours of manual data and document checking.
- Improve accuracy and ensure data completeness.
- Detect issues early to avoid delays in compliance workflows.
- Works locally โ no sensitive data leaves your system.
- 100 % open source and easy to customize.
System Requirements:
- Python 3.8 or above
- Streamlit
- Pandas
- PyMuPDF
- python-docx
Download Source Code:
Project Setup Instruction:
โ ๏ธ Note: This project is for educational purposes only. Not for commercial sale.