Autonomous AI AgentsModule 2

2.2Autogen: Building Conversational Agents

35 min 5 code blocks Practice Lab Homework Quiz (5Q)

Autogen: Building Conversational Agents

While CrewAI is great for tasks, Microsoft Autogen is the king of conversational agents that can "Talk and Code" their way to a solution. In this lesson, we learn how to architect an Autogen swarm where agents iterate on each other's work autonomously.

The Autogen Conversation Logic

  1. The User Proxy: Acts as the bridge between you and the swarm.
  2. The Assistant Agent: The primary worker (e.g., The Coder).
  3. The Critic Agent: Reviews the output and provides feedback loops until the goal is met.
Technical Snippet

Technical Snippet: A Basic Autogen Swarm

python
import autogen

config_list = [{"model": "gpt-4", "api_key": "..."}]

assistant = autogen.AssistantAgent("assistant", llm_config={"config_list": config_list})
user_proxy = autogen.UserProxyAgent("user_proxy", code_execution_config={"work_dir": "coding"})

user_proxy.initiate_chat(assistant, message="Write a Python script to scrape 10 leads from LinkedIn.")
Key Insight

Nuance: Code Execution Safety

Autogen agents can execute the code they write. This is powerful but dangerous. A professional architect always runs Autogen in a Docker Container to ensure the agents cannot accidentally delete files on the host machine.

AutoGen vs. CrewAI: When to Use Which

Understanding when to reach for AutoGen vs. CrewAI saves you significant time in architectural decisions:

DimensionAutoGenCrewAI
Primary strengthConversational code iteration loopsRole-based task orchestration
Best use caseCode writing, testing, and self-correctionMulti-step research, content pipelines, analysis
Agent communicationConversational back-and-forthSequential or hierarchical task passing
Code executionBuilt-in, sandboxableRequires external tools
Learning curveModerateModerate
Cost per run (PKR)~PKR 3-5 per session~PKR 2-4 per session
Pakistani use casesUpwork proposal bots, automated code QAMarket research, SEO pipelines, competitor analysis

Rule of thumb:

  • Need agents to write AND test code? → AutoGen
  • Need agents to work through a research or content pipeline? → CrewAI
  • Need both? → Use AutoGen within a CrewAI task (they're composable)

Pakistan Application: The Upwork Proposal Bot

Build an Autogen system that writes Upwork proposals for Pakistani freelancers:

Agent Setup:

  • User Proxy: Feeds the job description
  • Assistant: Writes the initial proposal (mentioning PKR pricing, Karachi timezone, relevant skills)
  • Critic: Reviews the proposal and checks for: (1) Grammar errors (2) Missing keywords from the job post (3) Price competitiveness

The Loop:

  1. Assistant writes proposal → Critic reviews → "Your opening line is too generic. Add a specific technical insight about the client's tech stack."
  2. Assistant rewrites → Critic approves → "This is ready to submit."

Why this matters for Pakistani freelancers: On Upwork, 90% of proposals are generic copy-paste. An Autogen swarm that iterates 3-4 times produces proposals that read like they were hand-crafted. Pakistani freelancers using this approach report 3x higher response rates.

Extending the Proposal Bot: Adding Context Awareness

The basic Upwork proposal bot improves dramatically when the Critic has access to market data:

python
# Enhanced Critic Agent with market context

CRITIC_SYSTEM_PROMPT = """
You are a ruthless Upwork proposal editor. You have access to:

MARKET CONTEXT:
- Current average reply rate for Pakistani freelancers: 8%
- Top 1% reply rate: 24%
- Most common rejection reason: Generic opening line

RUBRIC (reject if ANY rule fails):
1. First 10 words must reference something specific from the job post
   (NOT: "I am a developer", YES: "Your Django app's auth issue sounds like...")
2. No generic phrases: "I am passionate", "I am skilled", "I have experience"
3. Must name the timezone explicitly if client is US/UK-based
4. Must include ONE concrete technical insight about their problem
5. Must end with a specific question (not a closing statement)
6. Under 130 words total

For every rejection, give:
- Which rule failed
- The exact phrase that violated the rule
- One example of how to fix it
"""

# This context transforms the Critic from a grammar checker
# to a conversion-rate optimizer

The iteration quality curve:

code
AUTOGEN PROPOSAL QUALITY PER ITERATION

Iteration 1 (initial draft):
  Quality: Generic, could be from any freelancer
  Reply rate equivalent: ~3-5%

Iteration 2 (after first critique):
  Quality: More specific, still has generic phrases
  Reply rate equivalent: ~8-10%

Iteration 3 (after second critique):
  Quality: Client-specific, technical insight present
  Reply rate equivalent: ~15-20%

Iteration 4+ (diminishing returns):
  Quality: Marginal improvements to phrasing
  Reply rate equivalent: ~20-24%

OPTIMAL STOP POINT: 3 iterations
  Cost: 3x LLM calls (~PKR 2-3 per proposal set)
  Value: 3-5x higher reply rate vs. 1 iteration
  ROI: 50-100x on proposal generation cost

Visual Reference

code
AutoGen Conversational Loop

┌─────────────────────────────────────┐
│ USER PROXY AGENT (You)              │
│ "Write code to scrape 10 restaurants"
└────────────┬────────────────────────┘
             │
             ↓
    ┌────────────────┐
    │ ASSISTANT      │
    │ AGENT          │
    │ (The Coder)    │
    └────┬───────────┘
         │
         │ "Here's my Python code..."
         ↓
    ┌────────────────┐
    │ CRITIC AGENT   │
    │ (Test & Review)│
    └────┬───────────┘
         │
         │ "Tests passed" OR
         │ "ERROR: NameError on line 5"
         ↓
    ┌────────────────┐
    │ Loop Decision  │
    │ Pass? → Stop   │
    │ Fail? → Retry  │
    └────┬───────────┘
         │
         └──→ Back to ASSISTANT AGENT
              (Max 5 iterations)

Pakistan Case Study: The Self-Correcting Proposal Bot

Adnan was a DevOps engineer from Gulberg, Lahore. He built a simple Autogen 2-agent system: an Assistant that wrote Upwork proposals and a Critic that reviewed them against a strict rubric.

His Critic Agent's rubric:

code
PROPOSAL CRITIC RULES:
1. First sentence must reference a specific detail from the job post
2. No generic phrases: "I am a skilled developer with X years"
3. Must mention timezone overlap (PKT vs. EST/GMT)
4. Must include ONE specific technical insight about the client's stack
5. Must end with a SINGLE clear question (not "looking forward to hearing from you")
6. Under 120 words
If ANY rule fails: Reject with specific reason. Force rewrite.

Sample critic output (iteration 1):

"REJECTED: Rule 5 failed. Closing line is generic — 'I look forward to hearing from you.' Replace with a specific question about their deployment environment or timeline. Also: Rule 1 fails — first sentence says 'I am an experienced DevOps engineer' instead of referencing their stated blocker (AWS Lambda cold start issue)."

After 3 iterations, approved proposal:

"Your Lambda cold start issue is causing real latency — I've fixed this pattern twice using provisioned concurrency + connection pooling. I can map your entire function architecture in 30 minutes and tell you exactly where the latency is coming from. What's your current average cold start time? That number determines the approach."

Results over 30 days (150 proposals submitted):

MetricBefore AutogenAfter Autogen
Avg iterations per proposal13.2
Reply rate6%22%
Client interview rate2%9%
Contract win rate1.2%6.5%
Monthly incomePKR 65,000PKR 240,000

Adnan's key insight: "The critic is ruthless. It rejects good proposals if they're not great. That ruthlessness is the feature, not the bug."

Practice Lab

Practice Lab: The Feedback Loop

Exercise 1: Setup: Create an Assistant and a Critic agent using either AutoGen or two separate Claude chat sessions. Task: Ask them to "Write a viral headline for a Pakistani food brand." Loop: Watch the Critic reject the first 3 headlines and force the Assistant to improve the emotional hook. Note how the final output is 10x better than the first attempt.

Exercise 2: Build a self-healing code assistant. Assistant writes a Python function (e.g., "Sort a list of Pakistani cities by population"). Critic tests it with 3 edge cases (empty list, single item, duplicate values). If any test fails: Critic sends the error message back. Assistant fixes and resubmits. Stop when all 3 tests pass.

Exercise 3: Add the Proposal Bot critic from this lesson to your own workflow. Take your last 3 Upwork proposals. Run them through the 6-rule rubric manually. How many would be rejected at Rule 1 (generic opening)? Rewrite each to pass all 6 rules. Submit the rewritten versions and compare response rates.

Key Takeaways

  • AutoGen specializes in conversational code generation loops. The Critic Agent forces iteration until the output passes tests — not just until it looks right.
  • Code execution safety requires Docker containers. An agent that can write AND execute code can also delete your files if not sandboxed.
  • The feedback loop architecture (Assistant → Critic → Retry) is the core pattern for self-healing systems. Set a max iteration limit (5-7) to prevent infinite loops.
  • For Pakistani freelancers, Upwork proposal iteration (3-4 critique cycles) consistently outperforms one-shot generation — higher reply rates, more specific tone.
  • The Critic's rubric is the most important component. A vague rubric produces vague improvements. A precise rubric (6 numbered rules with examples) produces measurable quality gains.
  • AutoGen and CrewAI are complementary: AutoGen handles code iteration, CrewAI handles multi-step pipelines. Build with both when your workflow requires both.
Homework

Homework: The Self-Correcting Coder

Design an Autogen system that: (1) Writes a Python function (2) Runs a test on it (3) If the test fails, the agent must read the error and fix the code autonomously. Use Gemini 2.5 Flash as the model (cheaper than GPT-4 for Pakistani developers on a budget — approximately PKR 0.50 per iteration vs. PKR 5+ for GPT-4).

Lesson Summary

Includes hands-on practice labHomework assignment included5 runnable code examples5-question knowledge check below

Quiz: Autogen Conversational Agents

5 questions to test your understanding. Score 60% or higher to pass.