πŸ“„

Text Normalizer

Clean and standardize text data: remove extra spaces, fix capitalization, normalize Unicode.

Document ⭐ Beginner ⏱️ 2 minutes

😫 The Pain Point

Your data has inconsistent text: ” JOHN SMITH ” vs β€œjohn smith” vs β€œJohn Smith”. Before processing, you need everything standardized.

πŸš€ Agentic Solution

A Text Cleaner that applies consistent formatting rules.

Key Features:

  • Whitespace Cleanup: Remove extra spaces, trim edges.
  • Case Normalization: UPPER, lower, Title Case.
  • Unicode Normalization: NFC/NFD forms for Vietnamese.

βš”οΈ Phase 1: Commander (Quick Fix)

For quick normalization.

Prompt:

β€œI have an Excel data.xlsx with text columns. Write a Python script to:

  1. Trim: Remove leading/trailing whitespace.
  2. Collapse: Multiple spaces to single space.
  3. Unicode: Normalize to NFC form.
  4. Case: Apply Title Case to β€˜Name’ column.
  5. Output: Save as data_normalized.xlsx.

Print sample before/after.”

Result: Clean, consistent text data.

πŸ—οΈ Phase 2: Architect (Permanent Tool)

For Data Cleaners.

Engineering Prompt:

**Role:** Python GUI Developer (PyQt6 Specialist)
**Task:** Create "Text Normalizer" Desktop App

**Objective:** A batch cleaning tool to fix whitespace, casing, and encoding issues in datasets.

**Tech Stack:**
* Language: Python 3.10+
* GUI Library: PyQt6 (Cross-platform)
* Data Engine: Pandas
* Packaging: PyInstaller

**Functional Requirements:**
1.  **UI Layout (PyQt6):**
    *   **Data:** Excel Input.
    *   **Columns:** Checkbox list of text columns to clean.
    *   **Ops:** Toggles for "Trim Whitespace", "Fix Case (Title/Upper)", "Normalize Unicode (NFC)".
    *   **View:** Preview of 5 sample rows.

2.  **Core Logic:**
    *   Vectorized string operations in Pandas.
    *   `unicodedata.normalize` for text encoding.
    *   **Threading:** Process execution in background thread.

3.  **Deliverables:**
    *   `main.py`: Complete source code.
    *   `requirements.txt`: Dependencies.
    *   **Build Instructions:**
        *   Windows: `pyinstaller --onefile --noconsole main.py`
        *   macOS: `pyinstaller --windowed --noconsole main.py`

🧠 Prompt Decoding

  • Unicode Normalization: Vietnamese characters can be composed differently. NFC is preferred for web.

πŸ› οΈ Instructions

  1. Copy Prompt β†’ Run.

Related Workflows

Explore other categories

πŸ“¬

Get Started with Agentic Working

Subscribe to receive updates from AgenticWorking.io

πŸ“– Free eBook Guide πŸ“¦ 7 Ready-to-use Scripts πŸ”” Weekly Tips

No spam, unsubscribe anytime. Join 1,000+ subscribers.