7.3 — API Gateway & Authentication for AI Products
API Gateway & Authentication for AI Products
You have a working AI API behind a load balancer. Now you need to decide: who can use it, how much they can use, and how you charge them. An API gateway sits in front of your AI services and handles authentication, rate limiting, usage tracking, and billing — the infrastructure that turns a technical project into a business.
What an API Gateway Does
Client request
│
▼
┌───────────────────────────────────┐
│ API GATEWAY │
│ ├── Authentication (who are you?)│
│ ├── Rate limiting (slow down!) │
│ ├── Usage tracking (logging) │
│ ├── Request routing (which API?) │
│ ├── Response caching (faster!) │
│ └── Billing (how much to charge) │
└───────────────┬───────────────────┘
│
┌───────────┼───────────┐
▼ ▼ ▼
LLM API Image API Voice API
API Key Authentication
Simple API Key System with FastAPI
# auth.py
import sqlite3
import hashlib
import secrets
from fastapi import HTTPException, Security
from fastapi.security import APIKeyHeader
api_key_header = APIKeyHeader(name="X-API-Key")
def hash_key(key: str) -> str:
return hashlib.sha256(key.encode()).hexdigest()
def create_api_key(label: str, tier: str = "free") -> str:
key = f"sk-{secrets.token_hex(24)}"
conn = sqlite3.connect("api_keys.db")
conn.execute("""
INSERT INTO api_keys (key_hash, label, tier, created_at)
VALUES (?, ?, ?, datetime('now'))
""", (hash_key(key), label, tier))
conn.commit()
conn.close()
return key # Only returned once — client must save it
async def validate_key(key: str = Security(api_key_header)) -> dict:
conn = sqlite3.connect("api_keys.db")
row = conn.execute(
"SELECT label, tier, is_active FROM api_keys WHERE key_hash = ?",
(hash_key(key),)
).fetchone()
conn.close()
if not row:
raise HTTPException(status_code=401, detail="Invalid API key")
if not row[2]:
raise HTTPException(status_code=403, detail="API key disabled")
return {"label": row[0], "tier": row[1]}
Using It in Routes
from auth import validate_key
@app.post("/v1/chat")
async def chat(request: ChatRequest, api_key: dict = Security(validate_key)):
# api_key contains {"label": "acme-corp", "tier": "pro"}
# Use tier to set limits
max_tokens = 4096 if api_key["tier"] == "pro" else 512
# ... process request
Rate Limiting
Per-Key Rate Limits
# rate_limiter.py
import time
from collections import defaultdict
class RateLimiter:
def __init__(self):
self.requests = defaultdict(list)
TIER_LIMITS = {
"free": {"rpm": 10, "rpd": 100, "tokens_per_day": 10000},
"basic": {"rpm": 60, "rpd": 1000, "tokens_per_day": 100000},
"pro": {"rpm": 300, "rpd": 10000,"tokens_per_day": 1000000},
}
def check(self, key_hash: str, tier: str) -> bool:
now = time.time()
limits = self.TIER_LIMITS[tier]
# Clean old entries
self.requests[key_hash] = [
t for t in self.requests[key_hash] if now - t < 86400
]
# Check requests per minute
recent_minute = [t for t in self.requests[key_hash] if now - t < 60]
if len(recent_minute) >= limits["rpm"]:
return False
# Check requests per day
if len(self.requests[key_hash]) >= limits["rpd"]:
return False
self.requests[key_hash].append(now)
return True
Rate Limit Headers
Tell clients their remaining quota:
from fastapi import Response
@app.post("/v1/chat")
async def chat(request: ChatRequest, response: Response,
api_key: dict = Security(validate_key)):
remaining = rate_limiter.get_remaining(api_key["key_hash"])
response.headers["X-RateLimit-Limit"] = str(remaining["limit"])
response.headers["X-RateLimit-Remaining"] = str(remaining["remaining"])
response.headers["X-RateLimit-Reset"] = str(remaining["reset_at"])
# ... process request
Usage Tracking & Billing
Logging Every Request
# usage.py
import sqlite3
from datetime import datetime
def log_usage(key_hash: str, endpoint: str, tokens_in: int,
tokens_out: int, latency_ms: float):
conn = sqlite3.connect("usage.db")
conn.execute("""
INSERT INTO usage_logs
(key_hash, endpoint, tokens_in, tokens_out, latency_ms, timestamp)
VALUES (?, ?, ?, ?, ?, ?)
""", (key_hash, endpoint, tokens_in, tokens_out, latency_ms,
datetime.utcnow().isoformat()))
conn.commit()
conn.close()
def get_usage_summary(key_hash: str, period: str = "month") -> dict:
conn = sqlite3.connect("usage.db")
row = conn.execute("""
SELECT COUNT(*), SUM(tokens_in), SUM(tokens_out)
FROM usage_logs
WHERE key_hash = ? AND timestamp > datetime('now', '-1 month')
""", (key_hash,)).fetchone()
conn.close()
return {
"total_requests": row[0],
"total_tokens_in": row[1] or 0,
"total_tokens_out": row[2] or 0,
"estimated_cost": calculate_cost(row[1] or 0, row[2] or 0)
}
Usage Dashboard Endpoint
@app.get("/v1/usage")
async def get_usage(api_key: dict = Security(validate_key)):
return get_usage_summary(api_key["key_hash"])
Pricing Your AI API
Pricing Models
| Model | How It Works | Best For |
|---|---|---|
| Per-token | Charge per input/output token | LLM APIs (like OpenAI) |
| Per-request | Fixed price per API call | Simple models (classification, OCR) |
| Tiered subscription | Monthly plans with limits | SaaS products |
| Pay-as-you-go | Usage-based with no commitment | Enterprise clients |
Example Tier Structure
┌─────────────────────────────────────────────────┐
│ FREE TIER — PKR 0/month │
│ ├── 100 requests/day │
│ ├── 10,000 tokens/day │
│ ├── 10 RPM │
│ └── Community support only │
├─────────────────────────────────────────────────┤
│ STARTER — PKR 5,000/month ($18) │
│ ├── 1,000 requests/day │
│ ├── 100,000 tokens/day │
│ ├── 60 RPM │
│ └── Email support │
├─────────────────────────────────────────────────┤
│ PRO — PKR 25,000/month ($90) │
│ ├── 10,000 requests/day │
│ ├── 1,000,000 tokens/day │
│ ├── 300 RPM │
│ ├── Priority support │
│ └── Custom model fine-tuning │
├─────────────────────────────────────────────────┤
│ ENTERPRISE — Custom pricing │
│ ├── Unlimited requests │
│ ├── Dedicated GPU instances │
│ ├── SLA guarantee (99.9%) │
│ └── On-premise deployment option │
└─────────────────────────────────────────────────┘
API Gateway Options
Self-Hosted
| Tool | Complexity | Features | Cost |
|---|---|---|---|
| Nginx + Lua | Medium | Auth, rate limit, routing | Free |
| Kong | Medium-High | Full gateway, plugins | Free (OSS) |
| Traefik | Low-Medium | Auto-config, Docker-native | Free |
Managed (Cloud)
| Service | Best For | Cost |
|---|---|---|
| AWS API Gateway | AWS infrastructure | $3.50/million requests |
| GCP API Gateway | GCP infrastructure | $3.00/million requests |
| Cloudflare Workers | Edge routing, global | $5/month + $0.50/million |
The Practical Choice for Pakistan
For most Pakistani AI startups:
- Start with: Nginx + custom FastAPI middleware (free, full control)
- Scale to: Kong or Traefik when you have 5+ microservices
- Enterprise: AWS/GCP API Gateway when clients require it
Practice Lab
Task 1: API Key System
Implement the API key authentication system from this lesson. Create a /admin/keys endpoint that generates new keys. Protect your /v1/chat endpoint with key validation.
Task 2: Rate Limiter Add per-key rate limiting with 3 tiers (free/basic/pro). Test by hitting the API rapidly and verify that rate limit headers are returned and requests are blocked after the limit.
Task 3: Usage Dashboard
Build a /v1/usage endpoint that returns total requests, tokens consumed, and estimated cost for the authenticated key this month.
Pakistan Case Study
Meet Hamza — built an Urdu sentiment analysis API for Pakistani e-commerce brands.
His API business model:
- Free tier: 100 calls/day (enough for testing)
- Starter: PKR 8,000/month for 5,000 calls/day
- Pro: PKR 30,000/month for unlimited calls + custom model
His infrastructure:
- FastAPI + custom auth middleware (no expensive gateway)
- SQLite for API keys and usage (simple, works)
- Nginx reverse proxy with rate limiting
- Single Hetzner GPU VPS (PKR 12,000/month)
Revenue after 6 months:
- 3 free tier users (future conversions)
- 8 Starter clients: PKR 64,000/month
- 2 Pro clients: PKR 60,000/month
- Total revenue: PKR 124,000/month
- Infrastructure cost: PKR 12,000/month
- Profit margin: 90%
His key decision: "I could have used AWS API Gateway and paid $200/month for something I built in 2 hours with FastAPI. At our scale, self-hosted is the right call. I'll switch to managed when we hit 50+ clients."
Key Takeaways
- API gateways handle auth, rate limiting, usage tracking, and billing
- Start simple: FastAPI middleware + SQLite is enough for 0-50 clients
- API key auth is the standard for AI APIs — hash keys, never store plaintext
- Rate limit by tier: free (10 RPM), basic (60 RPM), pro (300 RPM)
- Track every request for billing: key, tokens in/out, latency, timestamp
- Price based on value tiers, not just tokens — subscription models create predictable revenue
- Self-hosted beats managed gateways until you hit enterprise scale
Next lesson: Cloud cost analysis — comparing AWS, GCP, and local infrastructure for Pakistan.
Lesson Summary
Quiz: API Gateway & Authentication for AI Products
4 questions to test your understanding. Score 60% or higher to pass.