Building a 'Karachi-Scale' Lead Gen Bot

Why Most Lead Gen Bots Are Garbage

The average "automated outreach bot" does three things: scrapes a list of emails, pastes them into a template with a first-name variable substitution, and sends 500 identical emails at 8 AM. Reply rate: 0.3-0.8%. This is not automation — it is high-speed spam with extra steps. It damages your sender reputation, produces no useful signal about lead quality, and burns through prospect lists that took real effort to compile.

What I mean by a "Karachi-scale" lead gen bot is architecturally different. It is a system that, for every single lead it processes, generates a genuinely unique and personalized pitch rooted in real data about that specific business's revenue gaps. It does not send volume — it sends signal. The output is not 500 emails; it is 500 diagnostic gifts, each one different, each one demonstrating competence about the recipient's specific situation before making any ask.

This requires 11-source enrichment, multi-tier AI logic, and a scoring system that can differentiate between a high-opportunity lead and a time-waster at the enrichment stage before any AI compute is spent on pitch generation.

The 11-Source Enrichment Architecture

Every lead that enters the pipeline is run through eleven data sources before any AI gets involved. The enrichment runs in parallel (using Python asyncio and ThreadPoolExecutor) to keep total enrichment time under 8 seconds per lead even with 11 simultaneous API calls:

Google PageSpeed Insights (PSI API): Mobile and desktop performance scores, Core Web Vitals, LCP, FID, CLS. A business with a PSI score below 40 is losing 50%+ of mobile visitors. This is a quantifiable revenue leak we can reference precisely in the pitch.
Wappalyzer tech stack detection: What CMS, analytics, CRM, live chat, payment processor, and marketing tools are they running? No Google Analytics = flying blind. No email capture = no lifecycle marketing. No CRM = manual follow-up only. Each absence is a pitch angle.
Hunter.io email discovery: Find the primary decision-maker email with confidence scoring. Skip leads where Hunter confidence is below 70% — bounces hurt sender reputation.
WHOIS domain data: Domain age, registrar, expiry date. A domain registered 6 months ago signals a new business that may be more open to infrastructure investment. A 10-year-old domain with no SSL signals neglect.
Open Graph metadata scrape: What does their homepage meta title and description say? A meta description that is "Welcome to our website" signals zero SEO investment — another pitch angle.
Social media presence check: Do they have active Instagram, Facebook, LinkedIn? Last post date? A business with 2,000 Instagram followers and no posts since Q3 2024 has a content marketing gap we can fill.
Abstract API geolocation + company data: Confirm location, extract company size signals, validate that the business is active.
Trustpilot review scrape: Average rating, review count, most recent review date and sentiment. A 3.1/5 average with 12 reviews is a reputation management opportunity.
Google Maps / TripAdvisor scrape (for local businesses): Ratings, review velocity, response rate to reviews. A restaurant that never responds to reviews has a CRM gap.
LinkedIn company page check: Employee count, growth rate, recent posts, hiring signals. A company hiring 3 sales roles suggests revenue growth and budget availability.
Unsplash / visual asset check: For agencies pitching content services, check if the business is using generic stock images. Custom photography vs. stock is a visible creative quality signal.

The Three-Tier AI Scoring and Pitch Logic

Raw enrichment data is not a pitch. It is signal. The AI layer converts signal into a scored lead and a personalized pitch using a three-tier architecture that balances cost and quality:

Tier 1 — Gemini 2.5 Flash (scoring): Every enriched lead is scored on a 0-100 "revenue leak probability" scale using a structured prompt that weighs the enrichment signals. PSI below 40 (+20 points), no CRM detected (+15), no email capture (+15), review rating below 3.5 (+10), last social post over 90 days (+10), etc. Flash is fast and cheap — this runs on every lead, including the ones we will discard.
Tier 2 — Gemini 2.5 Pro (lead research, score 50+): For leads that score above 50, Pro does a deeper analysis pass — examining the combination of signals, identifying the single highest-impact pitch angle, and generating the diagnostic framing (how do we present this as a discovery, not a criticism?).
Tier 3 — Claude Sonnet 4.6 (pitch generation, score 65+): Only leads scoring above 65 get a full Claude-generated pitch. Sonnet writes the personalized opener, the diagnostic narrative, and the CTA — using the Pro analysis as context. This is where the quality that drives 14-18% reply rates comes from. Claude costs 40x more per token than Flash; we only spend it where it matters.

The Output: What Actually Gets Sent

After enrichment and three-tier AI processing, a high-scoring lead gets a pitch that looks nothing like a template. It references their specific PSI score ("Your mobile site loads in 8.4 seconds — you're losing roughly 6 out of 10 visitors before they see your product"). It names the missing tool ("I noticed you're not running any email capture — that means every visitor who doesn't buy immediately is gone forever"). It frames the problem as a cost, not a criticism. And it offers something specific and free as the CTA — a full audit report, not a "discovery call."

This pipeline processes 500 leads per day on a single laptop running n8n workflows connected to the Python enrichment engine. About 18% of those 500 leads score above 65 and get a full Claude pitch. The remaining 82% get scored, archived, and optionally contacted with a lighter-touch sequence. Total compute cost per day: approximately $12-18 in API fees. Reply volume: 12-18 qualified responses per day from the top-scored cohort.

See the full Karachi agency methodology for how this pipeline feeds into the client acquisition funnel, or the SEO Audit tool to understand what the enrichment data looks like in practice.

Building a 'Karachi-Scale' Lead Gen Bot

Why Most Lead Gen Bots Are Garbage

The 11-Source Enrichment Architecture

The Three-Tier AI Scoring and Pitch Logic

The Output: What Actually Gets Sent

Enjoyed this article?