2.2 — Autogen: Building Conversational Agents
Autogen: Building Conversational Agents
While CrewAI is great for tasks, Microsoft Autogen is the king of conversational agents that can "Talk and Code" their way to a solution. In this lesson, we learn how to architect an Autogen swarm where agents iterate on each other's work autonomously.
The Autogen Conversation Logic
- The User Proxy: Acts as the bridge between you and the swarm.
- The Assistant Agent: The primary worker (e.g., The Coder).
- The Critic Agent: Reviews the output and provides feedback loops until the goal is met.
Technical Snippet: A Basic Autogen Swarm
import autogen
config_list = [{"model": "gpt-4", "api_key": "..."}]
assistant = autogen.AssistantAgent("assistant", llm_config={"config_list": config_list})
user_proxy = autogen.UserProxyAgent("user_proxy", code_execution_config={"work_dir": "coding"})
user_proxy.initiate_chat(assistant, message="Write a Python script to scrape 10 leads from LinkedIn.")
Nuance: Code Execution Safety
Autogen agents can execute the code they write. This is powerful but dangerous. A professional architect always runs Autogen in a Docker Container to ensure the agents cannot accidentally delete files on the host machine.
AutoGen vs. CrewAI: When to Use Which
Understanding when to reach for AutoGen vs. CrewAI saves you significant time in architectural decisions:
| Dimension | AutoGen | CrewAI |
|---|---|---|
| Primary strength | Conversational code iteration loops | Role-based task orchestration |
| Best use case | Code writing, testing, and self-correction | Multi-step research, content pipelines, analysis |
| Agent communication | Conversational back-and-forth | Sequential or hierarchical task passing |
| Code execution | Built-in, sandboxable | Requires external tools |
| Learning curve | Moderate | Moderate |
| Cost per run (PKR) | ~PKR 3-5 per session | ~PKR 2-4 per session |
| Pakistani use cases | Upwork proposal bots, automated code QA | Market research, SEO pipelines, competitor analysis |
Rule of thumb:
- Need agents to write AND test code? → AutoGen
- Need agents to work through a research or content pipeline? → CrewAI
- Need both? → Use AutoGen within a CrewAI task (they're composable)
Pakistan Application: The Upwork Proposal Bot
Build an Autogen system that writes Upwork proposals for Pakistani freelancers:
Agent Setup:
- User Proxy: Feeds the job description
- Assistant: Writes the initial proposal (mentioning PKR pricing, Karachi timezone, relevant skills)
- Critic: Reviews the proposal and checks for: (1) Grammar errors (2) Missing keywords from the job post (3) Price competitiveness
The Loop:
- Assistant writes proposal → Critic reviews → "Your opening line is too generic. Add a specific technical insight about the client's tech stack."
- Assistant rewrites → Critic approves → "This is ready to submit."
Why this matters for Pakistani freelancers: On Upwork, 90% of proposals are generic copy-paste. An Autogen swarm that iterates 3-4 times produces proposals that read like they were hand-crafted. Pakistani freelancers using this approach report 3x higher response rates.
Extending the Proposal Bot: Adding Context Awareness
The basic Upwork proposal bot improves dramatically when the Critic has access to market data:
# Enhanced Critic Agent with market context
CRITIC_SYSTEM_PROMPT = """
You are a ruthless Upwork proposal editor. You have access to:
MARKET CONTEXT:
- Current average reply rate for Pakistani freelancers: 8%
- Top 1% reply rate: 24%
- Most common rejection reason: Generic opening line
RUBRIC (reject if ANY rule fails):
1. First 10 words must reference something specific from the job post
(NOT: "I am a developer", YES: "Your Django app's auth issue sounds like...")
2. No generic phrases: "I am passionate", "I am skilled", "I have experience"
3. Must name the timezone explicitly if client is US/UK-based
4. Must include ONE concrete technical insight about their problem
5. Must end with a specific question (not a closing statement)
6. Under 130 words total
For every rejection, give:
- Which rule failed
- The exact phrase that violated the rule
- One example of how to fix it
"""
# This context transforms the Critic from a grammar checker
# to a conversion-rate optimizer
The iteration quality curve:
AUTOGEN PROPOSAL QUALITY PER ITERATION
Iteration 1 (initial draft):
Quality: Generic, could be from any freelancer
Reply rate equivalent: ~3-5%
Iteration 2 (after first critique):
Quality: More specific, still has generic phrases
Reply rate equivalent: ~8-10%
Iteration 3 (after second critique):
Quality: Client-specific, technical insight present
Reply rate equivalent: ~15-20%
Iteration 4+ (diminishing returns):
Quality: Marginal improvements to phrasing
Reply rate equivalent: ~20-24%
OPTIMAL STOP POINT: 3 iterations
Cost: 3x LLM calls (~PKR 2-3 per proposal set)
Value: 3-5x higher reply rate vs. 1 iteration
ROI: 50-100x on proposal generation cost
Visual Reference
AutoGen Conversational Loop
┌─────────────────────────────────────┐
│ USER PROXY AGENT (You) │
│ "Write code to scrape 10 restaurants"
└────────────┬────────────────────────┘
│
↓
┌────────────────┐
│ ASSISTANT │
│ AGENT │
│ (The Coder) │
└────┬───────────┘
│
│ "Here's my Python code..."
↓
┌────────────────┐
│ CRITIC AGENT │
│ (Test & Review)│
└────┬───────────┘
│
│ "Tests passed" OR
│ "ERROR: NameError on line 5"
↓
┌────────────────┐
│ Loop Decision │
│ Pass? → Stop │
│ Fail? → Retry │
└────┬───────────┘
│
└──→ Back to ASSISTANT AGENT
(Max 5 iterations)
Pakistan Case Study: The Self-Correcting Proposal Bot
Adnan was a DevOps engineer from Gulberg, Lahore. He built a simple Autogen 2-agent system: an Assistant that wrote Upwork proposals and a Critic that reviewed them against a strict rubric.
His Critic Agent's rubric:
PROPOSAL CRITIC RULES:
1. First sentence must reference a specific detail from the job post
2. No generic phrases: "I am a skilled developer with X years"
3. Must mention timezone overlap (PKT vs. EST/GMT)
4. Must include ONE specific technical insight about the client's stack
5. Must end with a SINGLE clear question (not "looking forward to hearing from you")
6. Under 120 words
If ANY rule fails: Reject with specific reason. Force rewrite.
Sample critic output (iteration 1):
"REJECTED: Rule 5 failed. Closing line is generic — 'I look forward to hearing from you.' Replace with a specific question about their deployment environment or timeline. Also: Rule 1 fails — first sentence says 'I am an experienced DevOps engineer' instead of referencing their stated blocker (AWS Lambda cold start issue)."
After 3 iterations, approved proposal:
"Your Lambda cold start issue is causing real latency — I've fixed this pattern twice using provisioned concurrency + connection pooling. I can map your entire function architecture in 30 minutes and tell you exactly where the latency is coming from. What's your current average cold start time? That number determines the approach."
Results over 30 days (150 proposals submitted):
| Metric | Before Autogen | After Autogen |
|---|---|---|
| Avg iterations per proposal | 1 | 3.2 |
| Reply rate | 6% | 22% |
| Client interview rate | 2% | 9% |
| Contract win rate | 1.2% | 6.5% |
| Monthly income | PKR 65,000 | PKR 240,000 |
Adnan's key insight: "The critic is ruthless. It rejects good proposals if they're not great. That ruthlessness is the feature, not the bug."
Practice Lab: The Feedback Loop
Exercise 1: Setup: Create an Assistant and a Critic agent using either AutoGen or two separate Claude chat sessions. Task: Ask them to "Write a viral headline for a Pakistani food brand." Loop: Watch the Critic reject the first 3 headlines and force the Assistant to improve the emotional hook. Note how the final output is 10x better than the first attempt.
Exercise 2: Build a self-healing code assistant. Assistant writes a Python function (e.g., "Sort a list of Pakistani cities by population"). Critic tests it with 3 edge cases (empty list, single item, duplicate values). If any test fails: Critic sends the error message back. Assistant fixes and resubmits. Stop when all 3 tests pass.
Exercise 3: Add the Proposal Bot critic from this lesson to your own workflow. Take your last 3 Upwork proposals. Run them through the 6-rule rubric manually. How many would be rejected at Rule 1 (generic opening)? Rewrite each to pass all 6 rules. Submit the rewritten versions and compare response rates.
Key Takeaways
- AutoGen specializes in conversational code generation loops. The Critic Agent forces iteration until the output passes tests — not just until it looks right.
- Code execution safety requires Docker containers. An agent that can write AND execute code can also delete your files if not sandboxed.
- The feedback loop architecture (Assistant → Critic → Retry) is the core pattern for self-healing systems. Set a max iteration limit (5-7) to prevent infinite loops.
- For Pakistani freelancers, Upwork proposal iteration (3-4 critique cycles) consistently outperforms one-shot generation — higher reply rates, more specific tone.
- The Critic's rubric is the most important component. A vague rubric produces vague improvements. A precise rubric (6 numbered rules with examples) produces measurable quality gains.
- AutoGen and CrewAI are complementary: AutoGen handles code iteration, CrewAI handles multi-step pipelines. Build with both when your workflow requires both.
Homework: The Self-Correcting Coder
Design an Autogen system that: (1) Writes a Python function (2) Runs a test on it (3) If the test fails, the agent must read the error and fix the code autonomously. Use Gemini 2.5 Flash as the model (cheaper than GPT-4 for Pakistani developers on a budget — approximately PKR 0.50 per iteration vs. PKR 5+ for GPT-4).
Lesson Summary
Quiz: Autogen Conversational Agents
5 questions to test your understanding. Score 60% or higher to pass.