3.1 — Gemini Flash Tier — Fast Triage & Initial Scoring
Gemini Flash Tier — Fast Triage & Initial Scoring
The 4-tier AI brain of your Oracle works like a hospital emergency department: triage happens fast and cheap, advanced diagnosis is reserved for cases that pass initial screening. Gemini Flash is your triage nurse — processing hundreds of market-headline pairs per hour, filtering the vast majority as irrelevant, and flagging the small percentage that deserve deeper analysis.
Why a Tiered AI Architecture
A naive approach would run every market signal through Claude Sonnet or Gemini Pro — the most capable models. At scale, this is financially unsustainable. Processing 200 market-signal pairs per hour through Claude Sonnet at current pricing costs roughly $15-50/day. Processing the same volume through Gemini Flash costs $0.30-1.50/day — an order of magnitude cheaper.
The tiered approach works because most news is irrelevant to most markets. A headline about US Fed policy is irrelevant to a cricket match market. A Dawn headline about CPEC is irrelevant to a US Presidential election market. Gemini Flash makes this irrelevance determination in milliseconds at minimal cost, ensuring that only the 5-10% of genuinely relevant signals reach the expensive reasoning tier.
Gemini Flash's Role: Three Triage Tasks
Task 1 — Relevance check: "Is this headline relevant to this market question?" Binary answer: YES/NO. This eliminates 80-90% of signal-market pairs.
Task 2 — Sentiment direction: "If this headline is relevant, does it make YES or NO more likely?" Binary: BULLISH_YES / BEARISH_YES / NEUTRAL.
Task 3 — Urgency classification: "How time-sensitive is this signal?" THREE_TIERS: HIGH (act within 5 minutes), MEDIUM (act within 30 minutes), LOW (monitor, no immediate action).
The combination of these three outputs produces a routing decision: DISCARD (80-90%), DEEP_ANALYZE (8-15%), or MONITOR (2-5%).
The Gemini Flash Triage Prompt
from google import genai
client = genai.Client(api_key=os.getenv("GEMINI_API_KEY"))
def triage_signal(headline: str, market_question: str, current_price: float) -> dict:
"""Fast triage using Gemini Flash. Target: < 500ms response time."""
prompt = f"""TRIAGE TASK. Respond in JSON only. No explanation.
Market: "{market_question}" | Current YES price: {current_price}
Headline: "{headline}"
Return:
{{
"relevant": true/false,
"direction": "BULLISH_YES" | "BEARISH_YES" | "NEUTRAL",
"urgency": "HIGH" | "MEDIUM" | "LOW",
"route": "DISCARD" | "DEEP_ANALYZE" | "MONITOR"
}}"""
response = client.models.generate_content(
model="gemini-2.5-flash",
contents=prompt
)
try:
return json.loads(response.text.strip())
except json.JSONDecodeError:
return {"relevant": False, "route": "DISCARD"}
# Process a batch of signals
def triage_batch(signals: list, markets: list) -> list:
"""Triage multiple signal-market pairs concurrently."""
results = []
for signal in signals:
for market in markets:
triage = triage_signal(
headline=signal["title"],
market_question=market["question"],
current_price=market["yes_price"]
)
if triage["route"] != "DISCARD":
results.append({
"signal": signal,
"market": market,
"triage": triage
})
return results
Optimizing for Speed
Gemini Flash's target response time is 200-800ms per call. For a bot processing 20 signals against 50 active markets (1,000 triage calls per cycle), sequential processing takes 200-800 seconds — unacceptably slow. Use async calls:
import asyncio
async def triage_signal_async(headline, market_question, current_price):
# Same prompt as above but using async client
response = await client.aio.models.generate_content(
model="gemini-2.5-flash",
contents=prompt
)
return json.loads(response.text)
async def triage_all_pairs(signals, markets):
tasks = [
triage_signal_async(s["title"], m["question"], m["yes_price"])
for s in signals for m in markets
]
results = await asyncio.gather(*tasks)
return [r for r in results if r.get("route") != "DISCARD"]
With async processing, 1,000 triage calls complete in 3-8 seconds. This is the difference between a bot that reacts in 10 seconds and one that misses the laggard window entirely.
Calibrating the Triage Threshold
After running your bot for 2-3 weeks, analyze triage performance:
- False negatives (relevant signals classified as DISCARD): These are missed trades. Review them manually weekly.
- False positives (irrelevant signals classified as DEEP_ANALYZE): These waste Tier 2 compute. If more than 30% of signals passing triage are irrelevant, tighten your prompts.
The ideal triage configuration passes 8-12% of signals to Tier 2. Below 5% means you're too aggressive (missing opportunities). Above 20% means too permissive (wasting Tier 2 budget).
Pakistani Market Calibration
Gemini Flash needs calibration for Pakistani market context. Out of the box, it correctly identifies Western news relevance but struggles with Pakistani-specific signals. Add context to your triage prompt:
"Context: Pakistani traders have edge in these market categories: State Bank of Pakistan (SBP) policy decisions, Pakistan-India geopolitical events, ICC cricket matches involving Pakistan, IMF program for Pakistan. When assessing relevance, weight these categories higher."
This single addition significantly improves triage accuracy for your target markets.
Pakistan Case Study: When the Triage Was Too Loose
Arif from Rawalpindi built his Oracle bot and ran it for the first two weeks on live markets. His costs were spiraling: Gemini Flash triage was cheap, but his Claude Sonnet deep analysis bills were unexpectedly high — PKR 8,400 in the first week alone.
He audited his triage logs and found the problem: 45% of signals were passing triage and reaching Claude Sonnet deep analysis. The target should be 8-15%.
The root cause: His triage prompt had no market context injection. Gemini Flash was classifying any headline containing "Pakistan" as relevant to ANY Pakistan-related market. So a Dawn headline about a Karachi robbery was being flagged as relevant to "Will SBP cut rates at the June MPC meeting?" — which is obviously wrong.
The fix — specific context injection:
Before:
prompt = f'Market: "{market_question}" | Headline: "{headline}" | Respond in JSON: ...'
After:
context = """Pakistan-specific market guidance:
- SBP rate markets: relevant headlines mention inflation, CPI, IMF, interest rates, monetary policy, SBP Governor
- Cricket markets: relevant headlines mention team selection, match conditions, injuries, pitch report
- Election markets: relevant headlines mention polling data, party alliances, ECP, court rulings
- Geo/border markets: relevant headlines mention LOC, India-Pakistan, diplomatic, military exercises
ONLY mark as relevant if the headline directly pertains to the market's resolution criteria."""
prompt = f'{context}\n\nMarket: "{market_question}"\nHeadline: "{headline}"\n\nRespond in JSON:'
Results after the fix:
| Metric | Before | After |
|---|---|---|
| Pass rate to deep analysis | 45% | 9% |
| Claude Sonnet cost/week | PKR 8,400 | PKR 1,680 |
| False positive trades | 6/week | 0/week |
| Genuine edge trades | 3/week | 4/week |
The context injection didn't just reduce costs — it actually improved deep analysis quality because Claude Sonnet was now only processing genuinely relevant signals, not noise.
Arif's lesson: "My triage prompt was the weakest link. I spent two days writing better market context and saved PKR 6,700/week. That's the ROI on prompt engineering."
Triage Architecture at Scale
TRIAGE PROCESSING LOOP (every 5 minutes)
New headlines batch: 20-50 per cycle
│
▼
Active markets: 50-200 in database
│
▼
Signal × Market pairs: 1,000-10,000 combinations
│
├── asyncio.gather() — all pairs in parallel
│ └── Target: complete in under 10 seconds
│
▼
Routing results:
├── DISCARD: 80-90% (dropped, logged)
├── MONITOR: 2-5% (logged, no action now)
└── DEEP_ANALYZE: 8-15% (sent to Claude tier)
Ideal outcome per cycle: 3-8 signals reach deep analysis
Too many (>20): tighten relevance threshold
Too few (<2): loosen threshold or add more news sources
PKR cost per 1,000 triage calls: ~PKR 25-85
PKR cost per 1,000 deep analyses: ~PKR 2,500-8,500
→ Good triage saves 97% of Tier 2 costs
Cost Comparison Table
| Processing Method | Volume | Daily Cost (PKR) | Speed |
|---|---|---|---|
| Gemini Flash (triage) | 1,000 pairs | PKR 85–420 | 3–8 sec (async) |
| Claude Sonnet (all) | 1,000 pairs | PKR 4,200–14,000 | 800+ sec (seq) |
| Tiered (Flash + Sonnet) | 1,000 pairs | PKR 300–800 | 8–15 sec |
| Gemini Flash only (no Sonnet) | 1,000 pairs | PKR 85–420 | 3–8 sec |
Bottom line: The tiered approach costs 95% less than running Sonnet on everything, with similar quality because Flash handles the obvious 90% correctly.
Practice Lab
-
Benchmark triage speed: Call the triage function 10 times sequentially and measure total time. Then run 10 calls with asyncio.gather(). Calculate the speedup ratio. On a real bot this speedup is the difference between catching the laggard window and missing it.
-
Calibration test: Take 20 headlines from Dawn — 10 clearly relevant to a Pakistan interest rate market, 10 clearly irrelevant. Run all 20 through triage. How many does it get right? Where does it fail? Document each false positive or false negative and what context injection would fix it.
-
Batch processing: Build the
triage_all_pairs()async function. Run it with 5 signals against 10 markets (50 pairs). Measure how many pass to Tier 2 and verify the routing decisions make sense. If more than 15% pass, tighten your Pakistani context injection.
Key Takeaways
- Gemini Flash is the cheapest and fastest triage layer — processing hundreds of signal-market pairs per hour for PKR 85–420/day vs. PKR 4,200–14,000/day for using Sonnet-tier models directly
- Three triage outputs (relevance, direction, urgency) produce a routing decision: DISCARD (80-90%), DEEP_ANALYZE (8-15%), MONITOR (2-5%)
- Async processing is mandatory at scale — 1,000 sequential triage calls take 800 seconds; async completes in 8 seconds — this is the difference between catching and missing the laggard window
- Calibrate for Pakistani markets explicitly: add context about SBP decisions, Pakistan cricket, and South Asian geopolitics to improve triage accuracy on your target market categories
- The ideal pass-through rate is 8-12%; below 5% means missed opportunities, above 20% means wasted Sonnet budget — audit weekly and tune your prompts
- Prompt engineering on your triage context block delivers the highest ROI of any optimization in the Oracle system — Arif's example shows PKR 6,700/week saved from a 2-day prompt improvement effort
Lesson Summary
Quiz: Gemini Flash Tier — Fast Triage & Initial Scoring
4 questions to test your understanding. Score 60% or higher to pass.