Claude Haiku/Sonnet Tier — Deep Analysis & Confidence Calibration

When Gemini Flash routes a signal to DEEP_ANALYZE, it passes the baton to Claude. This is where the real reasoning happens — the kind that catches nuance, reads between the lines of press releases, and produces the calibrated probability estimates your execution engine needs to bet real money. In this lesson, we build the Claude analysis layer with confidence calibration and structured output parsing.

Why Claude for Deep Analysis

Claude (Anthropic's model) excels at three capabilities that matter specifically for prediction market analysis:

Structured reasoning: Claude produces chain-of-thought reasoning that's systematic and auditable. You can read the reasoning and understand why it assigned a probability. This matters when debugging incorrect predictions.

Nuanced probability calibration: Unlike models that output extreme probabilities (0.95, 0.05) when uncertain, Claude tends toward well-calibrated middle-ground estimates that reflect genuine uncertainty. A 65% confidence is different from 90% confidence — and Claude reliably distinguishes these.

Long context analysis: For complex markets with detailed resolution criteria, Claude Sonnet's 200K context window allows you to paste the full resolution criteria, relevant background, and multiple news sources in a single prompt, producing much better analysis than if you summarized each input.

The Claude Analysis Prompt

python

import anthropic
import json

def deep_analyze_signal(
    market_question: str,
    resolution_criteria: str,
    current_price: float,
    headline: str,
    headline_source: str,
    triage_direction: str,
    market_history: str = ""
) -> dict:
    """
    Deep analysis using Claude Sonnet.
    Returns calibrated probability estimate with reasoning.
    """
    client = anthropic.Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))

    prompt = f"""You are a professional prediction market analyst with expertise in probability calibration.

MARKET ANALYSIS REQUEST

Market Question: {market_question}

Resolution Criteria:
{resolution_criteria}

Current Market State:
- YES price: {current_price} (market's implied probability: {current_price:.0%})
- Triage direction: {triage_direction}

Breaking Signal:
- Source: {headline_source}
- Headline: {headline}

{f"Recent price history: {market_history}" if market_history else ""}

ANALYSIS TASK:
1. Assess how this headline changes the probability of YES resolution
2. Consider: source credibility, resolution criteria specifics, potential edge cases
3. Identify any Pakistani/South Asian context that Western traders may miss

CRITICAL: Your probability estimate must be CALIBRATED — if you say 0.70, you should be right about 70% of the time on similar calls. Avoid extremes (>0.92 or <0.08) unless you have overwhelming evidence.

Respond in JSON only:
{{
  "new_probability": <float 0.05-0.95>,
  "confidence_in_estimate": <float 0.1-1.0>,
  "probability_delta": <float, new minus current>,
  "reasoning": "<2-3 sentences explaining your logic>",
  "edge_cases": "<any resolution criteria nuances that could invalidate this analysis>",
  "trade_recommendation": "BUY_YES" | "BUY_NO" | "HOLD" | "AVOID",
  "position_size_signal": "LARGE" | "MEDIUM" | "SMALL" | "NONE"
}}"""

    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=512,
        messages=[{"role": "user", "content": prompt}]
    )

    try:
        return json.loads(response.content[0].text)
    except json.JSONDecodeError:
        return {"trade_recommendation": "AVOID", "new_probability": current_price}

Confidence Calibration Explained

A model is "well-calibrated" when its stated confidence matches its actual accuracy over many predictions. If Claude says 0.70 on 100 different predictions, approximately 70 of them should resolve YES. Poor calibration = unreliable probability estimates = bad trading decisions.

How to calibrate your Oracle:

Log every deep analysis output to your SQLite database (new_probability, confidence, reasoning)
When markets resolve, update the record with the actual outcome
After 50+ resolved predictions, compute a calibration curve: bucket predictions into 0-0.2, 0.2-0.4, 0.4-0.6, 0.6-0.8, 0.8-1.0 ranges and check if actual win rates match predicted probabilities

If Claude systematically overestimates (predicted 0.70 but only winning 50%), add this system instruction: "Your predictions have been running 20% too high. Adjust all probabilities 20% downward from your initial estimate."

Using Claude Haiku for Cost Optimization

For markets with smaller potential profit (tight spread, low volume), Claude Haiku is sufficient for deep analysis at 10-15% of Sonnet's cost. The decision tree:

Potential profit > $50 per trade → Use Claude Sonnet
Potential profit $10-50 → Use Claude Haiku
Potential profit < $10 → Don't trade (signal too weak for analysis cost)

python

def select_claude_tier(potential_profit_usd: float) -> str:
    if potential_profit_usd >= 50:
        return "claude-sonnet-4-6"
    elif potential_profit_usd >= 10:
        return "claude-haiku-4-5-20251001"
    else:
        return None  # Skip — not worth analyzing

Pakistani Context Injection

Claude performs significantly better on Pakistani market analysis when you explicitly provide context it wouldn't have from training data. Add a context block to your prompts:

"Pakistani market context: The State Bank of Pakistan (SBP) Monetary Policy Committee meets every 6-8 weeks. Policy decisions are announced via official SBP press releases on sbp.org.pk. Pakistan's current fiscal situation under the IMF Extended Fund Facility (EFF) constrains how aggressively the SBP can cut rates. Current CPI is approximately [X]%. This context is relevant if the market involves Pakistan's monetary policy."

Providing this context improves Claude's accuracy on SBP-related markets by roughly 15-25% compared to prompting without it.

Pakistan Case Study: Claude Found the Resolution Criteria Trap

Sana from DHA Lahore had been running her Oracle for 3 months with a 58% win rate. She was profitable but not exceptionally so. Then she audited her biggest losses and found a pattern: she kept losing on markets where the resolution criteria had very specific wording.

One example: "Will the SBP cut the policy rate by at least 100 basis points at the September 2025 MPC meeting?"

Her Gemini Flash triage routed a Dawn article about "SBP expected to cut rates" as DEEP_ANALYZE. Claude Haiku (her cost-saving choice for medium signals) analyzed it as 70% probability YES.

She bought YES at 45 cents, expecting a 25-cent edge.

The problem: The market required at least 100 bps cut. Claude Haiku missed this nuance. The SBP cut 50 bps — technically a rate cut, but the market resolved NO. She lost her position.

The fix: She added explicit resolution criteria parsing to her deep analysis prompt:

python

CRITICAL_INSTRUCTION = """
Before estimating probability, first extract and repeat back the EXACT resolution criteria
in your own words. Then answer: Does this headline move the probability of meeting EXACTLY
those criteria? Common traps:
- "at least X" markets: partial outcomes resolve NO
- "by date Y" markets: early events that reverse before Y resolve based on final state
- "announced by" markets: leaks or rumors do NOT count, only official announcements
"""

Result after adding resolution criteria parsing:

Period	Win Rate	Avg Edge Captured	Monthly Profit
Before fix (3 months)	58%	8 cents	PKR 18,500
After fix (3 months)	69%	11 cents	PKR 34,200

The improvement came entirely from avoiding resolution criteria traps — not from finding better signals.

Sana's lesson: "Claude Haiku is good enough for most analysis. But for markets with complex resolution criteria — any market with 'at least,' 'by date,' or 'officially announced' — always use Sonnet. The 15x price difference is worth it when your position is PKR 28,000+."

Claude Tier Decision Framework

code

SELECTING CLAUDE TIER FOR DEEP ANALYSIS

Input: signal from Gemini Flash triage + market metadata
                         │
                         ▼
             Does the resolution criteria
             contain qualifying language?
             ("at least," "by date," "officially")
                /               \
              YES                NO
               │                  │
               ▼                  ▼
        Use SONNET           Check potential
        (complexity         profit estimate
        requires it)             │
                           /         \
                     > $50          < $50
                       │               │
                       ▼               ▼
                  Use SONNET      Use HAIKU
                                      │
                                      ▼
                                Check confidence:
                                < 0.65 → Upgrade to SONNET
                                ≥ 0.65 → Keep HAIKU

Cost guidance (per 1,000 analyses):
  Haiku:  PKR ~180  (use for 60-70% of signals)
  Sonnet: PKR ~2,700 (use for complex/high-value 30-40%)

Calibration Curve: Track Your Model Performance

code

CALIBRATION CHECK (after 50+ resolved trades)

Predicted Probability | Actual Win Rate | Status
──────────────────────|─────────────────|────────
0.50 – 0.60          |  55% wins       | OK
0.60 – 0.70          |  68% wins       | OK
0.70 – 0.80          |  61% wins       | OVERFIT (fix: -10% adj)
0.80 – 0.90          |  71% wins       | OVERFIT (fix: -12% adj)
0.90 – 0.95          |  74% wins       | OVERFIT (fix: -18% adj)

If actual win rate < predicted by >10%:
  → Add calibration note to system prompt
  → Example: "Recent analysis shows your 80-90% confidence
    calls only win 71% of the time. Adjust probabilities
    in this range down by approximately 12%."

Well-calibrated model = diagonal line on chart
Overconfident model = curve above the diagonal
Underconfident = curve below the diagonal

Practice Lab

Run a deep analysis: Take one signal-market pair that passed Gemini Flash triage. Run it through the Claude Sonnet deep analysis prompt. Read the full reasoning field — does it catch context you would have missed? Pay special attention to the edge_cases field.
Compare Haiku vs Sonnet: Take the same signal-market pair and run it through both Claude Haiku and Claude Sonnet. Compare the probability estimates and reasoning quality. Is Sonnet worth the extra cost for this signal? Document your decision framework based on the potential profit threshold.
Calibration baseline: Manually review 10 deep analysis outputs from your test runs. For each, write your own independent probability estimate. Compare your estimates to Claude's. Where are you systematically different? This reveals your own biases as a calibration check — and may show whether you or Claude is more accurate in the first month.

Key Takeaways

Claude excels at structured reasoning, nuanced probability calibration, and long-context analysis — specifically the qualities prediction markets require
Calibration is critical: log all predictions and actual outcomes, compute calibration curves after 50+ resolved markets, and adjust model prompts if systematic bias appears
Tier selection (Sonnet vs Haiku) should be based on potential profit per trade — Sonnet for high-value opportunities (>$50 profit), Haiku for marginal ones ($10-50)
Pakistani context injection in prompts improves SBP and cricket market analysis accuracy by 15-25% — always include domain context for your target markets
Resolution criteria parsing is the single highest-value prompt addition — markets with "at least X," "by date Y," or "officially announced" language consistently trap models that don't explicitly check qualifying conditions
Track calibration curves after every 50 resolved trades and tune your prompt system instructions to correct systematic over- or under-confidence

3.2 — Claude Haiku/Sonnet Tier — Deep Analysis & Confidence Calibration

Claude Haiku/Sonnet Tier — Deep Analysis & Confidence Calibration

Why Claude for Deep Analysis

The Claude Analysis Prompt

Confidence Calibration Explained

Using Claude Haiku for Cost Optimization

Pakistani Context Injection

Pakistan Case Study: Claude Found the Resolution Criteria Trap

Claude Tier Decision Framework

Calibration Curve: Track Your Model Performance

Practice Lab

Key Takeaways

Lesson Summary

Quiz: Claude Haiku/Sonnet Tier — Deep Analysis & Confidence Calibration