AI Prediction MarketsModule 2

2.3Building Your Market Database — SQLite & Historical Tracking

25 min 2 code blocks Practice Lab Quiz (4Q)

Building Your Market Database — SQLite & Historical Tracking

A prediction market bot without a database is blind. You might make good trades today, but without historical data you cannot know which strategies actually work, which markets you have edge in, or when your bot's performance is degrading. This lesson builds the SQLite database layer that powers your Oracle's memory and enables systematic backtesting.

Why SQLite for a Trading Bot

SQLite is the correct choice for a personal/small-team trading bot for three reasons:

  1. No server required: SQLite runs embedded in your Python process — no PostgreSQL server to manage, no connection strings to configure, no cloud database costs
  2. Sufficient performance: A single SQLite database handles millions of rows with sub-millisecond reads on indexed columns. A trading bot generating 10,000 market snapshots/day produces ~3.6M rows/year — well within SQLite's range
  3. Portable: The entire database is a single .db file you can copy, version, and backup trivially

When you eventually need distributed storage (multiple bots, multiple machines), migrate to PostgreSQL. For now, SQLite is right.

The Database Schema

Your Oracle needs three core tables:

sql
-- Table 1: Market catalog (updated daily)
CREATE TABLE IF NOT EXISTS markets (
    id TEXT PRIMARY KEY,
    question TEXT NOT NULL,
    resolution_criteria TEXT,
    end_date TEXT,
    total_volume REAL DEFAULT 0,
    category TEXT,
    created_at TEXT DEFAULT CURRENT_TIMESTAMP,
    last_updated TEXT DEFAULT CURRENT_TIMESTAMP
);

-- Table 2: Price snapshots (updated every 5 minutes)
CREATE TABLE IF NOT EXISTS price_snapshots (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    market_id TEXT NOT NULL,
    yes_price REAL NOT NULL,
    no_price REAL NOT NULL,
    spread REAL,
    volume_24h REAL,
    timestamp TEXT DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (market_id) REFERENCES markets(id)
);

-- Table 3: Trade log (every trade your bot executes)
CREATE TABLE IF NOT EXISTS trades (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    market_id TEXT NOT NULL,
    side TEXT NOT NULL,          -- 'YES' or 'NO'
    action TEXT NOT NULL,        -- 'BUY' or 'SELL'
    shares REAL NOT NULL,
    price REAL NOT NULL,
    total_cost_usd REAL,
    signal_source TEXT,          -- which news headline triggered this
    signal_confidence REAL,
    timestamp TEXT DEFAULT CURRENT_TIMESTAMP,
    resolved INTEGER DEFAULT 0,  -- 0=open, 1=won, -1=lost
    pnl_usd REAL
);

-- Indexes for fast queries
CREATE INDEX IF NOT EXISTS idx_snapshots_market_time
    ON price_snapshots(market_id, timestamp);
CREATE INDEX IF NOT EXISTS idx_trades_market
    ON trades(market_id, timestamp);

The Database Layer Python Class

python
import sqlite3
from contextlib import contextmanager
from datetime import datetime

class OracleDB:
    def __init__(self, db_path="oracle.db"):
        self.db_path = db_path
        self._init_schema()

    @contextmanager
    def get_conn(self):
        conn = sqlite3.connect(self.db_path)
        conn.row_factory = sqlite3.Row  # rows as dict-like objects
        try:
            yield conn
            conn.commit()
        except Exception:
            conn.rollback()
            raise
        finally:
            conn.close()

    def _init_schema(self):
        with self.get_conn() as conn:
            conn.executescript("""
                CREATE TABLE IF NOT EXISTS markets (
                    id TEXT PRIMARY KEY,
                    question TEXT NOT NULL,
                    end_date TEXT,
                    total_volume REAL DEFAULT 0
                );
                CREATE TABLE IF NOT EXISTS price_snapshots (
                    id INTEGER PRIMARY KEY AUTOINCREMENT,
                    market_id TEXT,
                    yes_price REAL,
                    timestamp TEXT DEFAULT CURRENT_TIMESTAMP
                );
                CREATE TABLE IF NOT EXISTS trades (
                    id INTEGER PRIMARY KEY AUTOINCREMENT,
                    market_id TEXT,
                    side TEXT,
                    action TEXT,
                    price REAL,
                    shares REAL,
                    pnl_usd REAL,
                    timestamp TEXT DEFAULT CURRENT_TIMESTAMP
                );
            """)

    def save_price_snapshot(self, market_id: str, yes_price: float):
        with self.get_conn() as conn:
            conn.execute(
                "INSERT INTO price_snapshots (market_id, yes_price) VALUES (?, ?)",
                (market_id, yes_price)
            )

    def log_trade(self, market_id: str, side: str, action: str,
                  price: float, shares: float):
        with self.get_conn() as conn:
            conn.execute(
                """INSERT INTO trades (market_id, side, action, price, shares)
                   VALUES (?, ?, ?, ?, ?)""",
                (market_id, side, action, price, shares)
            )

    def get_win_rate(self, market_id=None) -> dict:
        with self.get_conn() as conn:
            query = "SELECT resolved, COUNT(*) as count FROM trades WHERE resolved != 0"
            if market_id:
                query += f" AND market_id = '{market_id}'"
            query += " GROUP BY resolved"
            results = conn.execute(query).fetchall()

        won = next((r["count"] for r in results if r["resolved"] == 1), 0)
        lost = next((r["count"] for r in results if r["resolved"] == -1), 0)
        total = won + lost
        return {
            "won": won, "lost": lost, "total": total,
            "win_rate": won / total if total > 0 else 0
        }

Historical Price Tracking for Backtesting

Price history lets you answer critical questions: Does YES price typically rise 3-6 hours before a SBP announcement? Do cricket markets overreact to team selection news? These patterns are only visible in historical data.

Build a daily cron job that snapshots all active markets every 5 minutes. After 30 days, you have 288 snapshots per market per day — enough to identify patterns that repeat across similar event types.

Pakistani Market Performance Analytics

The database enables Pakistan-specific analytics that Western traders won't build:

  • SBP rate decision markets: Track your win rate specifically on "Will SBP cut rates?" type markets. If you're winning 70%+ on these, it's evidence of genuine edge.
  • Cricket markets: Track performance on Pakistan cricket markets separately. Cultural knowledge should show in the numbers.
  • Time-of-day analysis: Does your bot perform better when trading during PKT daytime (laggard window) vs. Pakistani night?

These analytics require the database. Without it, you're flying blind.

Practice Lab

Practice Lab

  1. Set up the OracleDB: Initialize the database class and run _init_schema(). Use a SQLite browser (SQLiteOnline.com or DB Browser for SQLite) to verify the tables are created correctly.

  2. Populate 10 market records: Take 10 markets from your Module 2.1 scan and insert them into the markets table. Then run a price snapshot for each.

  3. Simulate a trade log: Log 5 hypothetical trades (2 wins, 2 losses, 1 open). Call get_win_rate() and verify the output. What does a 50% win rate mean for profitability if you're betting at 40c (60c potential payout)?

Key Takeaways

  • SQLite is the right database for a personal trading bot — no infrastructure, no cost, handles millions of rows easily
  • Three tables are the minimum: markets (catalog), price_snapshots (time series), trades (execution log)
  • Historical price data enables backtesting and pattern detection — 30 days of 5-minute snapshots reveals market dynamics invisible to day traders
  • Pakistan-specific win rate tracking (SBP decisions, cricket markets) quantifies whether your cultural information advantage actually translates to edge

Lesson Summary

Includes hands-on practice lab2 runnable code examples4-question knowledge check below

Quiz: Building Your Market Database — SQLite & Historical Tracking

4 questions to test your understanding. Score 60% or higher to pass.