Classifying a Decade of Gmail with AI
My Gmail inbox had roughly ten years of uncategorised mail in it — banks, game launchers, newsletters, job boards, Amazon, random services I'd signed up to in 2016 and completely forgotten about. The standard advice is to set up filters manually. That works fine when you're starting fresh. When you're retrofitting a decade of noise, you automate it.
The result is a fully AI-driven pipeline: fetch headers, cluster by sender, classify with an LLM, generate Gmail filter rules, and backfill the entire mailbox — without touching a single message body.
Step 1 — Fetch headers in bulk
fetch.py downloads every message's From, Subject, and List-Unsubscribe headers using 50 parallel worker threads, writing to a .jsonl file as it goes so you can interrupt and resume safely. For 50,000 messages, the full fetch takes around eight minutes.
python -m gmail_organizer.fetch
Step 2 — Build threads and cluster by sender
thread_builder.py groups fetched messages into threads, collecting unique subjects and body samples per sender. thread_clusterer.py then groups those threads by domain using token-based subject keys — so all the receipts from noreply@shopify.com end up in one cluster, separate from their shipping notifications, rather than being treated as a single flat blob.
python -m gmail_organizer.thread_builder
python -m gmail_organizer.thread_clusterer
An optional embedding_cluster_refiner step runs after clustering to further split or merge clusters using semantic embeddings — useful for high-volume senders that send genuinely different types of email from the same address.
Step 3 — AI classification
cluster_classifier.py sends each cluster to an LLM with the sender domain, representative subjects, and any body samples collected during threading. The model returns a label, a tier, and any subject patterns needed to distinguish this cluster from others on the same domain. Results are cached per cluster so re-runs only hit the API for new or changed senders.
python -m gmail_organizer.cluster_classifier
The four tiers control what Gmail does with matched messages:
- Tier 1 — IMPORTANT: apply label, keep in inbox — billing, banking, security alerts
- Tier 2 — USEFUL: apply label, skip inbox — social, gaming, job alerts
- Tier 3 — LOW NOISE: apply label, skip inbox, leave unread — light newsletters
- Tier 4 — UNSUBSCRIBE: apply label, skip inbox, mark read — pure marketing noise
Step 4 — Build rules and export
rule_builder.py merges classified clusters by (sender_domain, label) into a clean filter_rules.json, adding a gmail_query field for each rule. gmail_filter_export.py converts that into Gmail's XML import format.
python -m gmail_organizer.rule_builder
python -m gmail_organizer.gmail_filter_export
Step 5 — Backfill the mailbox
gmail_backfill.py applies labels thread-wise to everything that already exists using the Gmail API's batchModify endpoint — 1,000 messages per batch. Fifty thousand messages takes about two minutes.
python -m gmail_organizer.gmail_backfill --dry-run # preview
python -m gmail_organizer.gmail_backfill # apply
The whole pipeline runs offline after the initial fetch. No message bodies ever hit disk — only headers and subject samples used for classification.