AI Gateway
A service layer between an application/agent and one or more model providers that standardizes access to models and centralizes operational controls.
Details
Unlike a generic API gateway, an AI gateway is model-aware: it understands token-based rate limiting, model routing (model selection, fallbacks), prompt caching (including semantic caching for returning stored responses to semantically similar queries), and LLM-specific observability.
A gateway presents a unified API across multiple inference providers, decoupling application code from any single provider's API shape. This reduces vendor lock-in and simplifies failover, A/B testing across providers, and migration to open-weight models or self-hosted deployments - the application switches models by changing a routing configuration rather than rewriting integration code.
The operational control surface - unified auth and key management, quota management, cost controls, PII handling, retries/timeouts, and logging/metrics - becomes critical as the number of models and consuming teams grows, centralizing policies that would otherwise be duplicated across every service that calls a model.
Examples
- A gateway that normalizes chat-completion APIs across providers so the application switches models by changing a routing config, not application code.
- A centralized key-management layer that rotates per-provider credentials without touching downstream services.
- A cost-control gateway that enforces per-team spend limits and falls back to a cheaper model when a budget threshold is reached.
Synonyms
model gateway, LLM gateway