Gemini Flash Tier — Fast Triage & Initial Scoring

The 4-tier AI brain of your Oracle works like a hospital emergency department: triage happens fast and cheap, advanced diagnosis is reserved for cases that pass initial screening. Gemini Flash is your triage nurse — processing hundreds of market-headline pairs per hour, filtering the vast majority as irrelevant, and flagging the small percentage that deserve deeper analysis.

Why a Tiered AI Architecture

A naive approach would run every market signal through Claude Sonnet or Gemini Pro — the most capable models. At scale, this is financially unsustainable. Processing 200 market-signal pairs per hour through Claude Sonnet at current pricing costs roughly $15-50/day. Processing the same volume through Gemini Flash costs $0.30-1.50/day — an order of magnitude cheaper.

The tiered approach works because most news is irrelevant to most markets. A headline about US Fed policy is irrelevant to a cricket match market. A Dawn headline about CPEC is irrelevant to a US Presidential election market. Gemini Flash makes this irrelevance determination in milliseconds at minimal cost, ensuring that only the 5-10% of genuinely relevant signals reach the expensive reasoning tier.

Gemini Flash's Role: Three Triage Tasks

Task 1 — Relevance check: "Is this headline relevant to this market question?" Binary answer: YES/NO. This eliminates 80-90% of signal-market pairs.

Task 2 — Sentiment direction: "If this headline is relevant, does it make YES or NO more likely?" Binary: BULLISH_YES / BEARISH_YES / NEUTRAL.

Task 3 — Urgency classification: "How time-sensitive is this signal?" THREE_TIERS: HIGH (act within 5 minutes), MEDIUM (act within 30 minutes), LOW (monitor, no immediate action).

The combination of these three outputs produces a routing decision: DISCARD (80-90%), DEEP_ANALYZE (8-15%), or MONITOR (2-5%).

The Gemini Flash Triage Prompt

python

from google import genai

client = genai.Client(api_key=os.getenv("GEMINI_API_KEY"))

def triage_signal(headline: str, market_question: str, current_price: float) -> dict:
    """Fast triage using Gemini Flash. Target: < 500ms response time."""

    prompt = f"""TRIAGE TASK. Respond in JSON only. No explanation.

Market: "{market_question}" | Current YES price: {current_price}
Headline: "{headline}"

Return:
{{
  "relevant": true/false,
  "direction": "BULLISH_YES" | "BEARISH_YES" | "NEUTRAL",
  "urgency": "HIGH" | "MEDIUM" | "LOW",
  "route": "DISCARD" | "DEEP_ANALYZE" | "MONITOR"
}}"""

    response = client.models.generate_content(
        model="gemini-2.5-flash",
        contents=prompt
    )

    try:
        return json.loads(response.text.strip())
    except json.JSONDecodeError:
        return {"relevant": False, "route": "DISCARD"}

# Process a batch of signals
def triage_batch(signals: list, markets: list) -> list:
    """Triage multiple signal-market pairs concurrently."""
    results = []
    for signal in signals:
        for market in markets:
            triage = triage_signal(
                headline=signal["title"],
                market_question=market["question"],
                current_price=market["yes_price"]
            )
            if triage["route"] != "DISCARD":
                results.append({
                    "signal": signal,
                    "market": market,
                    "triage": triage
                })
    return results

Optimizing for Speed

Gemini Flash's target response time is 200-800ms per call. For a bot processing 20 signals against 50 active markets (1,000 triage calls per cycle), sequential processing takes 200-800 seconds — unacceptably slow. Use async calls:

python

import asyncio

async def triage_signal_async(headline, market_question, current_price):
    # Same prompt as above but using async client
    response = await client.aio.models.generate_content(
        model="gemini-2.5-flash",
        contents=prompt
    )
    return json.loads(response.text)

async def triage_all_pairs(signals, markets):
    tasks = [
        triage_signal_async(s["title"], m["question"], m["yes_price"])
        for s in signals for m in markets
    ]
    results = await asyncio.gather(*tasks)
    return [r for r in results if r.get("route") != "DISCARD"]

With async processing, 1,000 triage calls complete in 3-8 seconds. This is the difference between a bot that reacts in 10 seconds and one that misses the laggard window entirely.

Calibrating the Triage Threshold

After running your bot for 2-3 weeks, analyze triage performance:

False negatives (relevant signals classified as DISCARD): These are missed trades. Review them manually weekly.
False positives (irrelevant signals classified as DEEP_ANALYZE): These waste Tier 2 compute. If more than 30% of signals passing triage are irrelevant, tighten your prompts.

The ideal triage configuration passes 8-12% of signals to Tier 2. Below 5% means you're too aggressive (missing opportunities). Above 20% means too permissive (wasting Tier 2 budget).

Pakistani Market Calibration

Gemini Flash needs calibration for Pakistani market context. Out of the box, it correctly identifies Western news relevance but struggles with Pakistani-specific signals. Add context to your triage prompt:

"Context: Pakistani traders have edge in these market categories: State Bank of Pakistan (SBP) policy decisions, Pakistan-India geopolitical events, ICC cricket matches involving Pakistan, IMF program for Pakistan. When assessing relevance, weight these categories higher."

This single addition significantly improves triage accuracy for your target markets.

Pakistan Case Study: When the Triage Was Too Loose

Arif from Rawalpindi built his Oracle bot and ran it for the first two weeks on live markets. His costs were spiraling: Gemini Flash triage was cheap, but his Claude Sonnet deep analysis bills were unexpectedly high — PKR 8,400 in the first week alone.

He audited his triage logs and found the problem: 45% of signals were passing triage and reaching Claude Sonnet deep analysis. The target should be 8-15%.

The root cause: His triage prompt had no market context injection. Gemini Flash was classifying any headline containing "Pakistan" as relevant to ANY Pakistan-related market. So a Dawn headline about a Karachi robbery was being flagged as relevant to "Will SBP cut rates at the June MPC meeting?" — which is obviously wrong.

The fix — specific context injection:

Before:

python

prompt = f'Market: "{market_question}" | Headline: "{headline}" | Respond in JSON: ...'

After:

python

context = """Pakistan-specific market guidance:
- SBP rate markets: relevant headlines mention inflation, CPI, IMF, interest rates, monetary policy, SBP Governor
- Cricket markets: relevant headlines mention team selection, match conditions, injuries, pitch report
- Election markets: relevant headlines mention polling data, party alliances, ECP, court rulings
- Geo/border markets: relevant headlines mention LOC, India-Pakistan, diplomatic, military exercises
ONLY mark as relevant if the headline directly pertains to the market's resolution criteria."""

prompt = f'{context}\n\nMarket: "{market_question}"\nHeadline: "{headline}"\n\nRespond in JSON:'

Results after the fix:

Metric	Before	After
Pass rate to deep analysis	45%	9%
Claude Sonnet cost/week	PKR 8,400	PKR 1,680
False positive trades	6/week	0/week
Genuine edge trades	3/week	4/week

The context injection didn't just reduce costs — it actually improved deep analysis quality because Claude Sonnet was now only processing genuinely relevant signals, not noise.

Arif's lesson: "My triage prompt was the weakest link. I spent two days writing better market context and saved PKR 6,700/week. That's the ROI on prompt engineering."

Triage Architecture at Scale

code

TRIAGE PROCESSING LOOP (every 5 minutes)

New headlines batch: 20-50 per cycle
    │
    ▼
Active markets: 50-200 in database
    │
    ▼
Signal × Market pairs: 1,000-10,000 combinations
    │
    ├── asyncio.gather() — all pairs in parallel
    │   └── Target: complete in under 10 seconds
    │
    ▼
Routing results:
    ├── DISCARD: 80-90% (dropped, logged)
    ├── MONITOR: 2-5% (logged, no action now)
    └── DEEP_ANALYZE: 8-15% (sent to Claude tier)

Ideal outcome per cycle: 3-8 signals reach deep analysis
Too many (>20): tighten relevance threshold
Too few (<2): loosen threshold or add more news sources

PKR cost per 1,000 triage calls: ~PKR 25-85
PKR cost per 1,000 deep analyses: ~PKR 2,500-8,500
→ Good triage saves 97% of Tier 2 costs

Cost Comparison Table

Processing Method	Volume	Daily Cost (PKR)	Speed
Gemini Flash (triage)	1,000 pairs	PKR 85–420	3–8 sec (async)
Claude Sonnet (all)	1,000 pairs	PKR 4,200–14,000	800+ sec (seq)
Tiered (Flash + Sonnet)	1,000 pairs	PKR 300–800	8–15 sec
Gemini Flash only (no Sonnet)	1,000 pairs	PKR 85–420	3–8 sec

Bottom line: The tiered approach costs 95% less than running Sonnet on everything, with similar quality because Flash handles the obvious 90% correctly.

Practice Lab

Benchmark triage speed: Call the triage function 10 times sequentially and measure total time. Then run 10 calls with asyncio.gather(). Calculate the speedup ratio. On a real bot this speedup is the difference between catching the laggard window and missing it.
Calibration test: Take 20 headlines from Dawn — 10 clearly relevant to a Pakistan interest rate market, 10 clearly irrelevant. Run all 20 through triage. How many does it get right? Where does it fail? Document each false positive or false negative and what context injection would fix it.
Batch processing: Build the triage_all_pairs() async function. Run it with 5 signals against 10 markets (50 pairs). Measure how many pass to Tier 2 and verify the routing decisions make sense. If more than 15% pass, tighten your Pakistani context injection.

Key Takeaways

Gemini Flash is the cheapest and fastest triage layer — processing hundreds of signal-market pairs per hour for PKR 85–420/day vs. PKR 4,200–14,000/day for using Sonnet-tier models directly
Three triage outputs (relevance, direction, urgency) produce a routing decision: DISCARD (80-90%), DEEP_ANALYZE (8-15%), MONITOR (2-5%)
Async processing is mandatory at scale — 1,000 sequential triage calls take 800 seconds; async completes in 8 seconds — this is the difference between catching and missing the laggard window
Calibrate for Pakistani markets explicitly: add context about SBP decisions, Pakistan cricket, and South Asian geopolitics to improve triage accuracy on your target market categories
The ideal pass-through rate is 8-12%; below 5% means missed opportunities, above 20% means wasted Sonnet budget — audit weekly and tune your prompts
Prompt engineering on your triage context block delivers the highest ROI of any optimization in the Oracle system — Arif's example shows PKR 6,700/week saved from a 2-day prompt improvement effort

3.1 — Gemini Flash Tier — Fast Triage & Initial Scoring

Gemini Flash Tier — Fast Triage & Initial Scoring

Why a Tiered AI Architecture

Gemini Flash's Role: Three Triage Tasks

The Gemini Flash Triage Prompt

Optimizing for Speed

Calibrating the Triage Threshold

Pakistani Market Calibration

Pakistan Case Study: When the Triage Was Too Loose

Triage Architecture at Scale

Cost Comparison Table

Practice Lab

Key Takeaways

Lesson Summary

Quiz: Gemini Flash Tier — Fast Triage & Initial Scoring