๐ซ The Pain Point
After multiple backups, your hard drive is full of backup1, copy_of_backup. You have 5,000 photos, but 2,000 are duplicates.
Scanning by โFilenameโ fails because one is named img001.jpg and the other backup_img001.jpg.
๐ Agentic Solution
Visual Fingerprinting (Hashing): It looks at the content of the image. Even if you resize or rename it, the โfingerprintโ remains similar.
Key Features:
- Visual Compare: Shows you the two images side-by-side to verify before deleting.
- Smart Select: Suggests keeping the higher-resolution version.
โ๏ธ Phase 1: Commander (Quick Fix)
For a quick scan.
Prompt:
โScan this directory for duplicate images using
imagehash(phash algorithm). Print out groups of duplicate files.โ
Result: A list of redundant files.
๐๏ธ Phase 2: Architect (Permanent Tool)
For Clean-up Freaks.
Engineering Prompt:
**Role:** Python AI Developer
**Task:** Create a "Visual Image Deduplicator".
**Requirements:**
1. **Tech Stack:** `imagehash`, `pillow`, `tkinter`.
2. **GUI:**
* Select Scan Folder.
* "Scan" button.
* **Review Mode (Crucial):** Display duplicates side-by-side. Checkbox to delete "Image A" or "Image B".
3. **Logic:**
* Compute `phash` for every image.
* Store in dict `{hash: [file_list]}`.
* If len(list) > 1 -> Duplicate.
4. **Deliverables:** `dedup_img.py`, `run.bat` (Windows), `run.sh` (Mac).
๐ง Prompt Decoding
- Perceptual Hashing: Unlike cryptographic hashing (MD5) where 1 pixel change alters the whole hash, Perceptual Hashing produces similar hashes for similar looking images. This is the secret sauce.
๐ ๏ธ Instructions
- Copy Prompt -> Paste -> Run.
- Scan -> Review Matches -> Clean.