AI Gateway

A service layer between an application/agent and one or more model providers that standardizes access to models and centralizes operational controls.

Details

Unlike a generic API gateway, an AI gateway is model-aware: it understands token-based rate limiting, model routing (model selection, fallbacks), prompt caching (including semantic caching for returning stored responses to semantically similar queries), and LLM-specific observability.

A gateway presents a unified API across multiple inference providers, decoupling application code from any single provider's API shape. This reduces vendor lock-in and simplifies failover, A/B testing across providers, and migration to open-weight models or self-hosted deployments - the application switches models by changing a routing configuration rather than rewriting integration code.

The operational control surface - unified auth and key management, quota management, cost controls, PII handling, retries/timeouts, and logging/metrics - becomes critical as the number of models and consuming teams grows, centralizing policies that would otherwise be duplicated across every service that calls a model.

Examples

  • A gateway that normalizes chat-completion APIs across providers so the application switches models by changing a routing config, not application code.
  • A centralized key-management layer that rotates per-provider credentials without touching downstream services.
  • A cost-control gateway that enforces per-team spend limits and falls back to a cheaper model when a budget threshold is reached.

Synonyms

model gateway, LLM gateway