Autogen: Building Conversational Agents

While CrewAI is great for tasks, Microsoft Autogen is the king of conversational agents that can "Talk and Code" their way to a solution. In this lesson, we learn how to architect an Autogen swarm where agents iterate on each other's work autonomously.

The Autogen Conversation Logic

The User Proxy: Acts as the bridge between you and the swarm.
The Assistant Agent: The primary worker (e.g., The Coder).
The Critic Agent: Reviews the output and provides feedback loops until the goal is met.

Technical Snippet

Technical Snippet: A Basic Autogen Swarm

python

import autogen

config_list = [{"model": "gpt-4", "api_key": "..."}]

assistant = autogen.AssistantAgent("assistant", llm_config={"config_list": config_list})
user_proxy = autogen.UserProxyAgent("user_proxy", code_execution_config={"work_dir": "coding"})

user_proxy.initiate_chat(assistant, message="Write a Python script to scrape 10 leads from LinkedIn.")

Key Insight

Nuance: Code Execution Safety

Autogen agents can execute the code they write. This is powerful but dangerous. A professional architect always runs Autogen in a Docker Container to ensure the agents cannot accidentally delete files on the host machine.

AutoGen vs. CrewAI: When to Use Which

Understanding when to reach for AutoGen vs. CrewAI saves you significant time in architectural decisions:

Dimension	AutoGen	CrewAI
Primary strength	Conversational code iteration loops	Role-based task orchestration
Best use case	Code writing, testing, and self-correction	Multi-step research, content pipelines, analysis
Agent communication	Conversational back-and-forth	Sequential or hierarchical task passing
Code execution	Built-in, sandboxable	Requires external tools
Learning curve	Moderate	Moderate
Cost per run (PKR)	~PKR 3-5 per session	~PKR 2-4 per session
Pakistani use cases	Upwork proposal bots, automated code QA	Market research, SEO pipelines, competitor analysis

Rule of thumb:

Need agents to write AND test code? → AutoGen
Need agents to work through a research or content pipeline? → CrewAI
Need both? → Use AutoGen within a CrewAI task (they're composable)

Pakistan Application: The Upwork Proposal Bot

Build an Autogen system that writes Upwork proposals for Pakistani freelancers:

Agent Setup:

User Proxy: Feeds the job description
Assistant: Writes the initial proposal (mentioning PKR pricing, Karachi timezone, relevant skills)
Critic: Reviews the proposal and checks for: (1) Grammar errors (2) Missing keywords from the job post (3) Price competitiveness

The Loop:

Assistant writes proposal → Critic reviews → "Your opening line is too generic. Add a specific technical insight about the client's tech stack."
Assistant rewrites → Critic approves → "This is ready to submit."

Why this matters for Pakistani freelancers: On Upwork, 90% of proposals are generic copy-paste. An Autogen swarm that iterates 3-4 times produces proposals that read like they were hand-crafted. Pakistani freelancers using this approach report 3x higher response rates.

Extending the Proposal Bot: Adding Context Awareness

The basic Upwork proposal bot improves dramatically when the Critic has access to market data:

python

# Enhanced Critic Agent with market context

CRITIC_SYSTEM_PROMPT = """
You are a ruthless Upwork proposal editor. You have access to:

MARKET CONTEXT:
- Current average reply rate for Pakistani freelancers: 8%
- Top 1% reply rate: 24%
- Most common rejection reason: Generic opening line

RUBRIC (reject if ANY rule fails):
1. First 10 words must reference something specific from the job post
   (NOT: "I am a developer", YES: "Your Django app's auth issue sounds like...")
2. No generic phrases: "I am passionate", "I am skilled", "I have experience"
3. Must name the timezone explicitly if client is US/UK-based
4. Must include ONE concrete technical insight about their problem
5. Must end with a specific question (not a closing statement)
6. Under 130 words total

For every rejection, give:
- Which rule failed
- The exact phrase that violated the rule
- One example of how to fix it
"""

# This context transforms the Critic from a grammar checker
# to a conversion-rate optimizer

The iteration quality curve:

code

AUTOGEN PROPOSAL QUALITY PER ITERATION

Iteration 1 (initial draft):
  Quality: Generic, could be from any freelancer
  Reply rate equivalent: ~3-5%

Iteration 2 (after first critique):
  Quality: More specific, still has generic phrases
  Reply rate equivalent: ~8-10%

Iteration 3 (after second critique):
  Quality: Client-specific, technical insight present
  Reply rate equivalent: ~15-20%

Iteration 4+ (diminishing returns):
  Quality: Marginal improvements to phrasing
  Reply rate equivalent: ~20-24%

OPTIMAL STOP POINT: 3 iterations
  Cost: 3x LLM calls (~PKR 2-3 per proposal set)
  Value: 3-5x higher reply rate vs. 1 iteration
  ROI: 50-100x on proposal generation cost

Visual Reference

code

AutoGen Conversational Loop

┌─────────────────────────────────────┐
│ USER PROXY AGENT (You)              │
│ "Write code to scrape 10 restaurants"
└────────────┬────────────────────────┘
             │
             ↓
    ┌────────────────┐
    │ ASSISTANT      │
    │ AGENT          │
    │ (The Coder)    │
    └────┬───────────┘
         │
         │ "Here's my Python code..."
         ↓
    ┌────────────────┐
    │ CRITIC AGENT   │
    │ (Test & Review)│
    └────┬───────────┘
         │
         │ "Tests passed" OR
         │ "ERROR: NameError on line 5"
         ↓
    ┌────────────────┐
    │ Loop Decision  │
    │ Pass? → Stop   │
    │ Fail? → Retry  │
    └────┬───────────┘
         │
         └──→ Back to ASSISTANT AGENT
              (Max 5 iterations)

Pakistan Case Study: The Self-Correcting Proposal Bot

Adnan was a DevOps engineer from Gulberg, Lahore. He built a simple Autogen 2-agent system: an Assistant that wrote Upwork proposals and a Critic that reviewed them against a strict rubric.

His Critic Agent's rubric:

code

PROPOSAL CRITIC RULES:
1. First sentence must reference a specific detail from the job post
2. No generic phrases: "I am a skilled developer with X years"
3. Must mention timezone overlap (PKT vs. EST/GMT)
4. Must include ONE specific technical insight about the client's stack
5. Must end with a SINGLE clear question (not "looking forward to hearing from you")
6. Under 120 words
If ANY rule fails: Reject with specific reason. Force rewrite.

Sample critic output (iteration 1):

"REJECTED: Rule 5 failed. Closing line is generic — 'I look forward to hearing from you.' Replace with a specific question about their deployment environment or timeline. Also: Rule 1 fails — first sentence says 'I am an experienced DevOps engineer' instead of referencing their stated blocker (AWS Lambda cold start issue)."

After 3 iterations, approved proposal:

"Your Lambda cold start issue is causing real latency — I've fixed this pattern twice using provisioned concurrency + connection pooling. I can map your entire function architecture in 30 minutes and tell you exactly where the latency is coming from. What's your current average cold start time? That number determines the approach."

Results over 30 days (150 proposals submitted):

Metric	Before Autogen	After Autogen
Avg iterations per proposal	1	3.2
Reply rate	6%	22%
Client interview rate	2%	9%
Contract win rate	1.2%	6.5%
Monthly income	PKR 65,000	PKR 240,000

Adnan's key insight: "The critic is ruthless. It rejects good proposals if they're not great. That ruthlessness is the feature, not the bug."

Practice Lab

Practice Lab: The Feedback Loop

Exercise 1: Setup: Create an Assistant and a Critic agent using either AutoGen or two separate Claude chat sessions. Task: Ask them to "Write a viral headline for a Pakistani food brand." Loop: Watch the Critic reject the first 3 headlines and force the Assistant to improve the emotional hook. Note how the final output is 10x better than the first attempt.

Exercise 2: Build a self-healing code assistant. Assistant writes a Python function (e.g., "Sort a list of Pakistani cities by population"). Critic tests it with 3 edge cases (empty list, single item, duplicate values). If any test fails: Critic sends the error message back. Assistant fixes and resubmits. Stop when all 3 tests pass.

Exercise 3: Add the Proposal Bot critic from this lesson to your own workflow. Take your last 3 Upwork proposals. Run them through the 6-rule rubric manually. How many would be rejected at Rule 1 (generic opening)? Rewrite each to pass all 6 rules. Submit the rewritten versions and compare response rates.

Key Takeaways

AutoGen specializes in conversational code generation loops. The Critic Agent forces iteration until the output passes tests — not just until it looks right.
Code execution safety requires Docker containers. An agent that can write AND execute code can also delete your files if not sandboxed.
The feedback loop architecture (Assistant → Critic → Retry) is the core pattern for self-healing systems. Set a max iteration limit (5-7) to prevent infinite loops.
For Pakistani freelancers, Upwork proposal iteration (3-4 critique cycles) consistently outperforms one-shot generation — higher reply rates, more specific tone.
The Critic's rubric is the most important component. A vague rubric produces vague improvements. A precise rubric (6 numbered rules with examples) produces measurable quality gains.
AutoGen and CrewAI are complementary: AutoGen handles code iteration, CrewAI handles multi-step pipelines. Build with both when your workflow requires both.

Homework

Homework: The Self-Correcting Coder

Design an Autogen system that: (1) Writes a Python function (2) Runs a test on it (3) If the test fails, the agent must read the error and fix the code autonomously. Use Gemini 2.5 Flash as the model (cheaper than GPT-4 for Pakistani developers on a budget — approximately PKR 0.50 per iteration vs. PKR 5+ for GPT-4).

2.2 — Autogen: Building Conversational Agents

Autogen: Building Conversational Agents

The Autogen Conversation Logic

Technical Snippet: A Basic Autogen Swarm

Nuance: Code Execution Safety

AutoGen vs. CrewAI: When to Use Which

Pakistan Application: The Upwork Proposal Bot

Extending the Proposal Bot: Adding Context Awareness

Visual Reference

Pakistan Case Study: The Self-Correcting Proposal Bot

Practice Lab: The Feedback Loop

Key Takeaways

Homework: The Self-Correcting Coder

Lesson Summary

Quiz: Autogen Conversational Agents