API Gateway & Authentication for AI Products

You have a working AI API behind a load balancer. Now you need to decide: who can use it, how much they can use, and how you charge them. An API gateway sits in front of your AI services and handles authentication, rate limiting, usage tracking, and billing — the infrastructure that turns a technical project into a business.

What an API Gateway Does

code

Client request
    │
    ▼
┌───────────────────────────────────┐
│  API GATEWAY                      │
│  ├── Authentication (who are you?)│
│  ├── Rate limiting (slow down!)   │
│  ├── Usage tracking (logging)     │
│  ├── Request routing (which API?) │
│  ├── Response caching (faster!)   │
│  └── Billing (how much to charge) │
└───────────────┬───────────────────┘
                │
    ┌───────────┼───────────┐
    ▼           ▼           ▼
  LLM API    Image API   Voice API

API Key Authentication

Simple API Key System with FastAPI

python

# auth.py
import sqlite3
import hashlib
import secrets
from fastapi import HTTPException, Security
from fastapi.security import APIKeyHeader

api_key_header = APIKeyHeader(name="X-API-Key")

def hash_key(key: str) -> str:
    return hashlib.sha256(key.encode()).hexdigest()

def create_api_key(label: str, tier: str = "free") -> str:
    key = f"sk-{secrets.token_hex(24)}"

    conn = sqlite3.connect("api_keys.db")
    conn.execute("""
        INSERT INTO api_keys (key_hash, label, tier, created_at)
        VALUES (?, ?, ?, datetime('now'))
    """, (hash_key(key), label, tier))
    conn.commit()
    conn.close()

    return key  # Only returned once — client must save it

async def validate_key(key: str = Security(api_key_header)) -> dict:
    conn = sqlite3.connect("api_keys.db")
    row = conn.execute(
        "SELECT label, tier, is_active FROM api_keys WHERE key_hash = ?",
        (hash_key(key),)
    ).fetchone()
    conn.close()

    if not row:
        raise HTTPException(status_code=401, detail="Invalid API key")
    if not row[2]:
        raise HTTPException(status_code=403, detail="API key disabled")

    return {"label": row[0], "tier": row[1]}

Using It in Routes

python

from auth import validate_key

@app.post("/v1/chat")
async def chat(request: ChatRequest, api_key: dict = Security(validate_key)):
    # api_key contains {"label": "acme-corp", "tier": "pro"}
    # Use tier to set limits
    max_tokens = 4096 if api_key["tier"] == "pro" else 512
    # ... process request

Rate Limiting

Per-Key Rate Limits

python

# rate_limiter.py
import time
from collections import defaultdict

class RateLimiter:
    def __init__(self):
        self.requests = defaultdict(list)

    TIER_LIMITS = {
        "free":  {"rpm": 10,  "rpd": 100,  "tokens_per_day": 10000},
        "basic": {"rpm": 60,  "rpd": 1000, "tokens_per_day": 100000},
        "pro":   {"rpm": 300, "rpd": 10000,"tokens_per_day": 1000000},
    }

    def check(self, key_hash: str, tier: str) -> bool:
        now = time.time()
        limits = self.TIER_LIMITS[tier]

        # Clean old entries
        self.requests[key_hash] = [
            t for t in self.requests[key_hash] if now - t < 86400
        ]

        # Check requests per minute
        recent_minute = [t for t in self.requests[key_hash] if now - t < 60]
        if len(recent_minute) >= limits["rpm"]:
            return False

        # Check requests per day
        if len(self.requests[key_hash]) >= limits["rpd"]:
            return False

        self.requests[key_hash].append(now)
        return True

Rate Limit Headers

Tell clients their remaining quota:

python

from fastapi import Response

@app.post("/v1/chat")
async def chat(request: ChatRequest, response: Response,
               api_key: dict = Security(validate_key)):

    remaining = rate_limiter.get_remaining(api_key["key_hash"])
    response.headers["X-RateLimit-Limit"] = str(remaining["limit"])
    response.headers["X-RateLimit-Remaining"] = str(remaining["remaining"])
    response.headers["X-RateLimit-Reset"] = str(remaining["reset_at"])

    # ... process request

Usage Tracking & Billing

Logging Every Request

python

# usage.py
import sqlite3
from datetime import datetime

def log_usage(key_hash: str, endpoint: str, tokens_in: int,
              tokens_out: int, latency_ms: float):
    conn = sqlite3.connect("usage.db")
    conn.execute("""
        INSERT INTO usage_logs
        (key_hash, endpoint, tokens_in, tokens_out, latency_ms, timestamp)
        VALUES (?, ?, ?, ?, ?, ?)
    """, (key_hash, endpoint, tokens_in, tokens_out, latency_ms,
          datetime.utcnow().isoformat()))
    conn.commit()
    conn.close()

def get_usage_summary(key_hash: str, period: str = "month") -> dict:
    conn = sqlite3.connect("usage.db")
    row = conn.execute("""
        SELECT COUNT(*), SUM(tokens_in), SUM(tokens_out)
        FROM usage_logs
        WHERE key_hash = ? AND timestamp > datetime('now', '-1 month')
    """, (key_hash,)).fetchone()
    conn.close()

    return {
        "total_requests": row[0],
        "total_tokens_in": row[1] or 0,
        "total_tokens_out": row[2] or 0,
        "estimated_cost": calculate_cost(row[1] or 0, row[2] or 0)
    }

Usage Dashboard Endpoint

python

@app.get("/v1/usage")
async def get_usage(api_key: dict = Security(validate_key)):
    return get_usage_summary(api_key["key_hash"])

Pricing Your AI API

Pricing Models

Model	How It Works	Best For
Per-token	Charge per input/output token	LLM APIs (like OpenAI)
Per-request	Fixed price per API call	Simple models (classification, OCR)
Tiered subscription	Monthly plans with limits	SaaS products
Pay-as-you-go	Usage-based with no commitment	Enterprise clients

Example Tier Structure

code

┌─────────────────────────────────────────────────┐
│  FREE TIER — PKR 0/month                        │
│  ├── 100 requests/day                           │
│  ├── 10,000 tokens/day                          │
│  ├── 10 RPM                                     │
│  └── Community support only                     │
├─────────────────────────────────────────────────┤
│  STARTER — PKR 5,000/month ($18)                │
│  ├── 1,000 requests/day                         │
│  ├── 100,000 tokens/day                         │
│  ├── 60 RPM                                     │
│  └── Email support                              │
├─────────────────────────────────────────────────┤
│  PRO — PKR 25,000/month ($90)                   │
│  ├── 10,000 requests/day                        │
│  ├── 1,000,000 tokens/day                       │
│  ├── 300 RPM                                    │
│  ├── Priority support                           │
│  └── Custom model fine-tuning                   │
├─────────────────────────────────────────────────┤
│  ENTERPRISE — Custom pricing                    │
│  ├── Unlimited requests                         │
│  ├── Dedicated GPU instances                    │
│  ├── SLA guarantee (99.9%)                      │
│  └── On-premise deployment option               │
└─────────────────────────────────────────────────┘

API Gateway Options

Self-Hosted

Tool	Complexity	Features	Cost
Nginx + Lua	Medium	Auth, rate limit, routing	Free
Kong	Medium-High	Full gateway, plugins	Free (OSS)
Traefik	Low-Medium	Auto-config, Docker-native	Free

Managed (Cloud)

Service	Best For	Cost
AWS API Gateway	AWS infrastructure	$3.50/million requests
GCP API Gateway	GCP infrastructure	$3.00/million requests
Cloudflare Workers	Edge routing, global	$5/month + $0.50/million

The Practical Choice for Pakistan

For most Pakistani AI startups:

Start with: Nginx + custom FastAPI middleware (free, full control)
Scale to: Kong or Traefik when you have 5+ microservices
Enterprise: AWS/GCP API Gateway when clients require it

Practice Lab

Task 1: API Key System Implement the API key authentication system from this lesson. Create a /admin/keys endpoint that generates new keys. Protect your /v1/chat endpoint with key validation.

Task 2: Rate Limiter Add per-key rate limiting with 3 tiers (free/basic/pro). Test by hitting the API rapidly and verify that rate limit headers are returned and requests are blocked after the limit.

Task 3: Usage Dashboard Build a /v1/usage endpoint that returns total requests, tokens consumed, and estimated cost for the authenticated key this month.

Pakistan Case Study

Meet Hamza — built an Urdu sentiment analysis API for Pakistani e-commerce brands.

His API business model:

Free tier: 100 calls/day (enough for testing)
Starter: PKR 8,000/month for 5,000 calls/day
Pro: PKR 30,000/month for unlimited calls + custom model

His infrastructure:

FastAPI + custom auth middleware (no expensive gateway)
SQLite for API keys and usage (simple, works)
Nginx reverse proxy with rate limiting
Single Hetzner GPU VPS (PKR 12,000/month)

Revenue after 6 months:

3 free tier users (future conversions)
8 Starter clients: PKR 64,000/month
2 Pro clients: PKR 60,000/month
Total revenue: PKR 124,000/month
Infrastructure cost: PKR 12,000/month
Profit margin: 90%

His key decision: "I could have used AWS API Gateway and paid $200/month for something I built in 2 hours with FastAPI. At our scale, self-hosted is the right call. I'll switch to managed when we hit 50+ clients."

Key Takeaways

API gateways handle auth, rate limiting, usage tracking, and billing
Start simple: FastAPI middleware + SQLite is enough for 0-50 clients
API key auth is the standard for AI APIs — hash keys, never store plaintext
Rate limit by tier: free (10 RPM), basic (60 RPM), pro (300 RPM)
Track every request for billing: key, tokens in/out, latency, timestamp
Price based on value tiers, not just tokens — subscription models create predictable revenue
Self-hosted beats managed gateways until you hit enterprise scale

Next lesson: Cloud cost analysis — comparing AWS, GCP, and local infrastructure for Pakistan.

7.3 — API Gateway & Authentication for AI Products

API Gateway & Authentication for AI Products

What an API Gateway Does

API Key Authentication

Simple API Key System with FastAPI

Using It in Routes

Rate Limiting

Per-Key Rate Limits

Rate Limit Headers

Usage Tracking & Billing

Logging Every Request

Usage Dashboard Endpoint

Pricing Your AI API

Pricing Models

Example Tier Structure

API Gateway Options

Self-Hosted

Managed (Cloud)

The Practical Choice for Pakistan

Practice Lab

Pakistan Case Study

Key Takeaways

Lesson Summary

Quiz: API Gateway & Authentication for AI Products