π« The Pain Point
Your data has inconsistent text: β JOHN SMITH β vs βjohn smithβ vs βJohn Smithβ. Before processing, you need everything standardized.
π Agentic Solution
A Text Cleaner that applies consistent formatting rules.
Key Features:
- Whitespace Cleanup: Remove extra spaces, trim edges.
- Case Normalization: UPPER, lower, Title Case.
- Unicode Normalization: NFC/NFD forms for Vietnamese.
βοΈ Phase 1: Commander (Quick Fix)
For quick normalization.
Prompt:
βI have an Excel
data.xlsxwith text columns. Write a Python script to:
- Trim: Remove leading/trailing whitespace.
- Collapse: Multiple spaces to single space.
- Unicode: Normalize to NFC form.
- Case: Apply Title Case to βNameβ column.
- Output: Save as
data_normalized.xlsx.Print sample before/after.β
Result: Clean, consistent text data.
ποΈ Phase 2: Architect (Permanent Tool)
For Data Cleaners.
Engineering Prompt:
**Role:** Python GUI Developer (PyQt6 Specialist)
**Task:** Create "Text Normalizer" Desktop App
**Objective:** A batch cleaning tool to fix whitespace, casing, and encoding issues in datasets.
**Tech Stack:**
* Language: Python 3.10+
* GUI Library: PyQt6 (Cross-platform)
* Data Engine: Pandas
* Packaging: PyInstaller
**Functional Requirements:**
1. **UI Layout (PyQt6):**
* **Data:** Excel Input.
* **Columns:** Checkbox list of text columns to clean.
* **Ops:** Toggles for "Trim Whitespace", "Fix Case (Title/Upper)", "Normalize Unicode (NFC)".
* **View:** Preview of 5 sample rows.
2. **Core Logic:**
* Vectorized string operations in Pandas.
* `unicodedata.normalize` for text encoding.
* **Threading:** Process execution in background thread.
3. **Deliverables:**
* `main.py`: Complete source code.
* `requirements.txt`: Dependencies.
* **Build Instructions:**
* Windows: `pyinstaller --onefile --noconsole main.py`
* macOS: `pyinstaller --windowed --noconsole main.py`
π§ Prompt Decoding
- Unicode Normalization: Vietnamese characters can be composed differently. NFC is preferred for web.
π οΈ Instructions
- Copy Prompt β Run.