API Gateway — Architecture notes
2025-01-10 • Go · Proxy · Rate limiting
Notes on designing a high-throughput API gateway for AI workloads with per-tenant rate limiting, fan-out, and aggregation.
### Summary
The gateway routes requests to multiple model backends, enforces per-tenant limits, and provides caching for hot responses. Token-bucket rate limiting and circuit breakers protect downstream services.