The traffic layer for AI-native services — SSE streaming, scale-to-zero, revision splitting and traffic mirroring, in a single Rust binary.

A3S Gateway

A3S Gateway is the traffic layer for AI-native services. It is built on hyper and rustls, ships as a single statically-linked binary, and is configured exclusively in ACL (Agent Configuration Language). Configuration files must use the .acl extension — HCL, TOML, YAML, and JSON config files are rejected at load.

This page is an orientation. Each capability below links to the page that documents it in full:

Quick Start — install and run your first gateway
Wire Firewall — optional inline LLM/MCP proxy that masks secrets and runs Sentry inspection before forwarding
Configuration — the full ACL schema (entrypoints, routers, services, middlewares, providers, management, observability)
Routing — Traefik-style rule matching and priority
Middleware — the built-in request/response pipeline
Services — load balancing, health checks, sticky sessions, failover, and AI-native scaling
Observability — Prometheus metrics, access logs, tracing, and the management API
Deployment — Docker, Helm/Kubernetes, Homebrew, Cargo

Why A3S Gateway

AI services have fundamentally different operational patterns than web applications. Tools built for the Web era make poor assumptions when applied to the AI stack:

Concern	Web assumption	AI reality
Response shape	Small, bounded JSON	Unbounded token stream (SSE)
Latency profile	Milliseconds	Seconds (model inference)
Idle cost	Negligible	High (GPU memory reservation)
Deployment risk	Low, stateless	High (model quality regression)
Protocols needed	HTTP request/response	SSE, WebSocket, gRPC, HTTP/2

A3S Gateway is designed for the right column:

SSE / streaming — chunked transfer relay with zero response buffering. The first token reaches the client the moment the model emits it, not when the full response is assembled. SSE is detected by Accept: text/event-stream (also application/x-ndjson, application/stream+json, and chunked Transfer-Encoding).
Scale-to-zero with cold-start buffering — a service can scale its replica floor down to zero (min_replicas = 0). When buffering is enabled (buffer_enabled = true), a request that arrives with no healthy backend is held in a bounded in-memory queue and re-dispatched the instant a backend becomes ready — the in-flight request is held, not recorded-and-replayed.
Revision traffic splitting — split live traffic across named revisions by weight (traffic_percent must sum to 100). Selection is deterministic weighted round-robin, and an unhealthy revision is skipped via fallthrough to the next revision with a healthy backend.
Traffic mirroring — copy a configurable percentage of live requests to a shadow service, fire-and-forget. The shadow response is discarded and never affects the client's reply or latency.
Circuit breaker — when an AI backend starts returning 5xx (slow inference, GPU OOM), open the circuit automatically to stop amplifying the problem.

Everything else — routing, TLS, rate limiting, auth, Prometheus — is standard infrastructure that A3S Gateway includes so you don't need a second tool.

Gradual rollout (rollout { }) is fully implemented and validated — from/to revisions, step_percent, step_interval_secs, and auto-rollback on error_rate_threshold / latency_threshold_ms. In v1.0.11 the rollout controller is parsed and validated but not yet driven by a runtime loop, so the config is inert at runtime. Use static revision splitting to shift traffic manually today.

Architecture

Client Request
    ↓
Entrypoint  (HTTP / HTTPS / TCP / UDP listener)
    ↓
TLS Termination  (rustls, static cert/key; min TLS 1.2 or 1.3)
    ↓
Router  (Host / Path / PathPrefix / Headers / Method matching, && only)
    ↓
Middleware Pipeline  (request order in, reverse order out; short-circuit allowed)
    ↓
Service  (load balancer + active/passive health + sticky + failover + mirror)
    ↓
Scaling  (Knative autoscaler · cold-start request buffer · revision router)
    ↓
Proxy  (HTTP / WebSocket / SSE / gRPC h2c / TCP / UDP)
    ↓
Backend Service

Hot reload replaces the router table and service registry atomically through a shared runtime snapshot. When listener configuration is unchanged (and no UDP entrypoint exists), HTTP/TCP sockets are not rebound — the runtime state is hot-swapped in place. Added or changed HTTP/TCP listeners are reconciled by name, and unchanged listeners stay active even if a new bind fails. Any UDP entrypoint forces a full restart.

pub enum GatewayState {
    Created,     // Configured, listeners not yet bound
    Starting,    // Binding ports, compiling routes and middleware pipelines
    Running,     // Accepting and proxying requests
    Reloading,   // Atomically swapping router table and service registry
    Stopping,    // Draining in-flight requests
    Stopped,     // All listeners closed
}

Key Features

AI-native patterns — see Services

SSE / streaming relay — zero-buffer chunked transfer for LLM outputs
Inline wire firewall — feature-gated /wire/<agent>/... proxy for secret/PII masking and Sentry request inspection
Scale-to-zero — Knative autoscaling formula with optional cold-start request buffering (buffer_enabled, off by default)
Revision splitting — deterministic weighted traffic split across named revisions (must sum to 100%)
Traffic mirroring — copy a percentage of live traffic to a shadow service, fire-and-forget
Gradual rollout — step-by-step traffic shift with auto-rollback on error rate or latency (implemented and validated; not yet driven at runtime in v1.0.11)

Core proxy — see Routing and Services

Traefik-compatible rule syntax — Host(), Path(), PathPrefix(), Headers(), Method(), joined with && (AND only)
Router priority — higher positive priority wins (Traefik-style); otherwise the longest rule string wins
4 load balancing strategies — round-robin (default), weighted, least-connections, random
Active HTTP health probes + always-on passive 5xx eviction with background half-open recovery
Sticky sessions — cookie-based affinity (cookie name configurable; TTL 1h and capacity fixed)
Failover — automatic switch to a secondary service when the primary has zero healthy backends
Forwarded headers — X-Forwarded-For, -Host, -Proto, -Port added unconditionally on upstream HTTP requests
Pure Rust TLS via rustls — static cert/key PEM, ALPN h2 + http/1.1, minimum TLS 1.2 or 1.3
Hot reload — inotify/kqueue file watcher; no traffic-port rebind when HTTP/TCP entrypoints are unchanged

Middleware (14 HTTP types + TCP filter) — see Middleware

Auth: API Key, Basic Auth, JWT (HS256), Forward Auth
Traffic: Rate Limit (in-process), Rate Limit (Redis, redis feature), Circuit Breaker, Retry, Body Limit
Transform: CORS, Headers, Strip Prefix (single-segment /* wildcard), Compress (br/gzip/deflate, marker only)
Network: IP Allowlist, plus a connection-level TCP filter configured on the entrypoint (max_connections, tcp_allowed_ips)

Observability — see Observability

Prometheus metrics — request counters per router / service / backend, response status classes, bytes, and active connections (in-process, dependency-free)
Distributed tracing — inbound W3C Trace Context and B3 / Zipkin extraction; outbound injects a W3C traceparent
Management / Dashboard API — optional dedicated listener (disabled by default, loopback-bound) with bearer token, IP allowlist, and optional TLS/mTLS

Operations — see Deployment

Single statically-linked binary — no runtime dependencies, no OpenSSL
ACL-only configuration — human-readable, version-controlled (.acl extension required)
Hot reload via file watcher; optional file/discovery/kubernetes/docker config providers
Helm chart for Kubernetes, Docker image (ghcr.io/a3s-lab/gateway), Homebrew formula, cargo install a3s-gateway
Current release: v1.0.11 (MSRV Rust 1.88); published Linux images/binaries enable the kube + redis features

A3S Gateway

A3S Gateway

Why A3S Gateway

Architecture

Core Components

Protocol Support

Gateway Lifecycle

Key Features

On this page