AI Infrastructure & Local LLMsModule 7

7.3API Gateway & Authentication for AI Products

25 min 8 code blocks Practice Lab Quiz (4Q)

API Gateway & Authentication for AI Products

You have a working AI API behind a load balancer. Now you need to decide: who can use it, how much they can use, and how you charge them. An API gateway sits in front of your AI services and handles authentication, rate limiting, usage tracking, and billing — the infrastructure that turns a technical project into a business.

What an API Gateway Does

code
Client request
    │
    ▼
┌───────────────────────────────────┐
│  API GATEWAY                      │
│  ├── Authentication (who are you?)│
│  ├── Rate limiting (slow down!)   │
│  ├── Usage tracking (logging)     │
│  ├── Request routing (which API?) │
│  ├── Response caching (faster!)   │
│  └── Billing (how much to charge) │
└───────────────┬───────────────────┘
                │
    ┌───────────┼───────────┐
    ▼           ▼           ▼
  LLM API    Image API   Voice API

API Key Authentication

Simple API Key System with FastAPI

python
# auth.py
import sqlite3
import hashlib
import secrets
from fastapi import HTTPException, Security
from fastapi.security import APIKeyHeader

api_key_header = APIKeyHeader(name="X-API-Key")

def hash_key(key: str) -> str:
    return hashlib.sha256(key.encode()).hexdigest()

def create_api_key(label: str, tier: str = "free") -> str:
    key = f"sk-{secrets.token_hex(24)}"

    conn = sqlite3.connect("api_keys.db")
    conn.execute("""
        INSERT INTO api_keys (key_hash, label, tier, created_at)
        VALUES (?, ?, ?, datetime('now'))
    """, (hash_key(key), label, tier))
    conn.commit()
    conn.close()

    return key  # Only returned once — client must save it

async def validate_key(key: str = Security(api_key_header)) -> dict:
    conn = sqlite3.connect("api_keys.db")
    row = conn.execute(
        "SELECT label, tier, is_active FROM api_keys WHERE key_hash = ?",
        (hash_key(key),)
    ).fetchone()
    conn.close()

    if not row:
        raise HTTPException(status_code=401, detail="Invalid API key")
    if not row[2]:
        raise HTTPException(status_code=403, detail="API key disabled")

    return {"label": row[0], "tier": row[1]}

Using It in Routes

python
from auth import validate_key

@app.post("/v1/chat")
async def chat(request: ChatRequest, api_key: dict = Security(validate_key)):
    # api_key contains {"label": "acme-corp", "tier": "pro"}
    # Use tier to set limits
    max_tokens = 4096 if api_key["tier"] == "pro" else 512
    # ... process request

Rate Limiting

Per-Key Rate Limits

python
# rate_limiter.py
import time
from collections import defaultdict

class RateLimiter:
    def __init__(self):
        self.requests = defaultdict(list)

    TIER_LIMITS = {
        "free":  {"rpm": 10,  "rpd": 100,  "tokens_per_day": 10000},
        "basic": {"rpm": 60,  "rpd": 1000, "tokens_per_day": 100000},
        "pro":   {"rpm": 300, "rpd": 10000,"tokens_per_day": 1000000},
    }

    def check(self, key_hash: str, tier: str) -> bool:
        now = time.time()
        limits = self.TIER_LIMITS[tier]

        # Clean old entries
        self.requests[key_hash] = [
            t for t in self.requests[key_hash] if now - t < 86400
        ]

        # Check requests per minute
        recent_minute = [t for t in self.requests[key_hash] if now - t < 60]
        if len(recent_minute) >= limits["rpm"]:
            return False

        # Check requests per day
        if len(self.requests[key_hash]) >= limits["rpd"]:
            return False

        self.requests[key_hash].append(now)
        return True

Rate Limit Headers

Tell clients their remaining quota:

python
from fastapi import Response

@app.post("/v1/chat")
async def chat(request: ChatRequest, response: Response,
               api_key: dict = Security(validate_key)):

    remaining = rate_limiter.get_remaining(api_key["key_hash"])
    response.headers["X-RateLimit-Limit"] = str(remaining["limit"])
    response.headers["X-RateLimit-Remaining"] = str(remaining["remaining"])
    response.headers["X-RateLimit-Reset"] = str(remaining["reset_at"])

    # ... process request

Usage Tracking & Billing

Logging Every Request

python
# usage.py
import sqlite3
from datetime import datetime

def log_usage(key_hash: str, endpoint: str, tokens_in: int,
              tokens_out: int, latency_ms: float):
    conn = sqlite3.connect("usage.db")
    conn.execute("""
        INSERT INTO usage_logs
        (key_hash, endpoint, tokens_in, tokens_out, latency_ms, timestamp)
        VALUES (?, ?, ?, ?, ?, ?)
    """, (key_hash, endpoint, tokens_in, tokens_out, latency_ms,
          datetime.utcnow().isoformat()))
    conn.commit()
    conn.close()

def get_usage_summary(key_hash: str, period: str = "month") -> dict:
    conn = sqlite3.connect("usage.db")
    row = conn.execute("""
        SELECT COUNT(*), SUM(tokens_in), SUM(tokens_out)
        FROM usage_logs
        WHERE key_hash = ? AND timestamp > datetime('now', '-1 month')
    """, (key_hash,)).fetchone()
    conn.close()

    return {
        "total_requests": row[0],
        "total_tokens_in": row[1] or 0,
        "total_tokens_out": row[2] or 0,
        "estimated_cost": calculate_cost(row[1] or 0, row[2] or 0)
    }

Usage Dashboard Endpoint

python
@app.get("/v1/usage")
async def get_usage(api_key: dict = Security(validate_key)):
    return get_usage_summary(api_key["key_hash"])

Pricing Your AI API

Pricing Models

ModelHow It WorksBest For
Per-tokenCharge per input/output tokenLLM APIs (like OpenAI)
Per-requestFixed price per API callSimple models (classification, OCR)
Tiered subscriptionMonthly plans with limitsSaaS products
Pay-as-you-goUsage-based with no commitmentEnterprise clients

Example Tier Structure

code
┌─────────────────────────────────────────────────┐
│  FREE TIER — PKR 0/month                        │
│  ├── 100 requests/day                           │
│  ├── 10,000 tokens/day                          │
│  ├── 10 RPM                                     │
│  └── Community support only                     │
├─────────────────────────────────────────────────┤
│  STARTER — PKR 5,000/month ($18)                │
│  ├── 1,000 requests/day                         │
│  ├── 100,000 tokens/day                         │
│  ├── 60 RPM                                     │
│  └── Email support                              │
├─────────────────────────────────────────────────┤
│  PRO — PKR 25,000/month ($90)                   │
│  ├── 10,000 requests/day                        │
│  ├── 1,000,000 tokens/day                       │
│  ├── 300 RPM                                    │
│  ├── Priority support                           │
│  └── Custom model fine-tuning                   │
├─────────────────────────────────────────────────┤
│  ENTERPRISE — Custom pricing                    │
│  ├── Unlimited requests                         │
│  ├── Dedicated GPU instances                    │
│  ├── SLA guarantee (99.9%)                      │
│  └── On-premise deployment option               │
└─────────────────────────────────────────────────┘

API Gateway Options

Self-Hosted

ToolComplexityFeaturesCost
Nginx + LuaMediumAuth, rate limit, routingFree
KongMedium-HighFull gateway, pluginsFree (OSS)
TraefikLow-MediumAuto-config, Docker-nativeFree

Managed (Cloud)

ServiceBest ForCost
AWS API GatewayAWS infrastructure$3.50/million requests
GCP API GatewayGCP infrastructure$3.00/million requests
Cloudflare WorkersEdge routing, global$5/month + $0.50/million

The Practical Choice for Pakistan

For most Pakistani AI startups:

  1. Start with: Nginx + custom FastAPI middleware (free, full control)
  2. Scale to: Kong or Traefik when you have 5+ microservices
  3. Enterprise: AWS/GCP API Gateway when clients require it
Practice Lab

Practice Lab

Task 1: API Key System Implement the API key authentication system from this lesson. Create a /admin/keys endpoint that generates new keys. Protect your /v1/chat endpoint with key validation.

Task 2: Rate Limiter Add per-key rate limiting with 3 tiers (free/basic/pro). Test by hitting the API rapidly and verify that rate limit headers are returned and requests are blocked after the limit.

Task 3: Usage Dashboard Build a /v1/usage endpoint that returns total requests, tokens consumed, and estimated cost for the authenticated key this month.

Pakistan Case Study

Meet Hamza — built an Urdu sentiment analysis API for Pakistani e-commerce brands.

His API business model:

  • Free tier: 100 calls/day (enough for testing)
  • Starter: PKR 8,000/month for 5,000 calls/day
  • Pro: PKR 30,000/month for unlimited calls + custom model

His infrastructure:

  • FastAPI + custom auth middleware (no expensive gateway)
  • SQLite for API keys and usage (simple, works)
  • Nginx reverse proxy with rate limiting
  • Single Hetzner GPU VPS (PKR 12,000/month)

Revenue after 6 months:

  • 3 free tier users (future conversions)
  • 8 Starter clients: PKR 64,000/month
  • 2 Pro clients: PKR 60,000/month
  • Total revenue: PKR 124,000/month
  • Infrastructure cost: PKR 12,000/month
  • Profit margin: 90%

His key decision: "I could have used AWS API Gateway and paid $200/month for something I built in 2 hours with FastAPI. At our scale, self-hosted is the right call. I'll switch to managed when we hit 50+ clients."

Key Takeaways

  • API gateways handle auth, rate limiting, usage tracking, and billing
  • Start simple: FastAPI middleware + SQLite is enough for 0-50 clients
  • API key auth is the standard for AI APIs — hash keys, never store plaintext
  • Rate limit by tier: free (10 RPM), basic (60 RPM), pro (300 RPM)
  • Track every request for billing: key, tokens in/out, latency, timestamp
  • Price based on value tiers, not just tokens — subscription models create predictable revenue
  • Self-hosted beats managed gateways until you hit enterprise scale

Next lesson: Cloud cost analysis — comparing AWS, GCP, and local infrastructure for Pakistan.

Lesson Summary

Includes hands-on practice lab8 runnable code examples4-question knowledge check below

Quiz: API Gateway & Authentication for AI Products

4 questions to test your understanding. Score 60% or higher to pass.