😫 The Pain Point
Your customer list has 5,000 rows.
- Row 1: “Nguyen Van A - 090xxx”
- Row 500: “Nguyen V. A - 090xxx” Excel’s “Remove Duplicates” only catches exact matches. It fails when there’s a typo, slight variation, or missing data. Sending spam emails to the same client twice is unprofessional.
🚀 Agentic Solution
A “Smart Deduplication” tool using Fuzzy Logic (matching by similarity, not exactness).
Key Features:
- Fuzzy Match: Detects “Jonh Doe” and “John Doe” as the same person (95% similarity).
- Merge Strategy: Intelligently merges data (e.g., keeps the longest email, the newest phone number).
⚔️ Phase 1: Commander (Quick Fix)
For a quick cleanup of a specific file.
Prompt:
“I have a file
customers.csv. Find potential duplicates based on the ‘Phone’ and ‘Email’ columns. Normalize the phone numbers first (remove spaces/dots). If two rows have the same phone, mark them as duplicates. Save the list of duplicates toduplicates.csvfor me to review.”
Result: A list of duplicates to manually check.
🏗️ Phase 2: Architect (Permanent Tool)
For Data Analysts/CRM Admins.
Engineering Prompt:
**Role:** Python GUI Developer (PyQt6 Specialist)
**Task:** Create "Advanced Fuzzy Deduplicator" Desktop App
**Objective:** A desktop application to clean dirty customer data using fuzzy logic matching.
**Tech Stack:**
* Language: Python 3.10+
* GUI Library: PyQt6 (Cross-platform)
* Algorithms: rapidfuzz, pandas
* Packaging: PyInstaller
**Functional Requirements:**
1. **UI Layout (PyQt6):**
* **Import:** Excel/CSV File Loader.
* **Config:** Checkboxes for "Match Columns" (Name, Email, Phone).
* **Tuning:** "Similarity Threshold" Slider (e.g., 85%).
* **Review:** Side-by-side comparison of potential merge groups.
2. **Core Logic:**
* **Fuzzy Match:** Compute similarity scores using `rapidfuzz`.
* **Grouping:** Cluster records that exceed threshold.
* **Threading:** Data processing in background thread.
3. **Deliverables:**
* `main.py`: Complete source code.
* `requirements.txt`: Dependencies.
* **Build Instructions:**
* Windows: `pyinstaller --onefile --noconsole main.py`
* macOS: `pyinstaller --windowed --noconsole main.py`
🧠 Prompt Decoding
- Fuzzy Logic: Standard programming checks if
A == B. Agentic programming checks ifDistance(A, B) < Small_Amount. This allows for human-like flexibility in detecting errors.
🛠️ Instructions
- Copy Prompt -> Paste -> Run.
- Load Data -> Set Threshold 85% -> Scan.