A3S Gateway
The traffic layer for AI-native services — SSE streaming, scale-to-zero, safe model rollouts, single Rust binary.
A3S Gateway
A3S Gateway is the traffic layer for AI-native services. It is built on hyper and rustls, ships as a single statically-linked binary, and is configured exclusively in HCL (HashiCorp Configuration Language).
Why A3S Gateway
AI services have fundamentally different operational patterns than web applications. Tools built for the Web era make poor assumptions when applied to the AI stack:
| Concern | Web assumption | AI reality |
|---|---|---|
| Response shape | Small, bounded JSON | Unbounded token stream (SSE) |
| Latency profile | Milliseconds | Seconds (model inference) |
| Idle cost | Negligible | High (GPU memory reservation) |
| Deployment risk | Low, stateless | High (model quality regression) |
| Protocols needed | HTTP request/response | SSE, WebSocket, gRPC, HTTP/2 |
A3S Gateway is designed for the right column:
- SSE/Streaming — chunked transfer relay with zero response buffering. The first token reaches the client the moment the model emits it, not when the full response is assembled.
- Scale-to-zero with request buffering — when a model replica is cold, incoming requests are held in memory and replayed the instant the replica is ready. No request is dropped; no client sees a 503.
- Revision traffic splitting — send a configurable percentage of live traffic to a new model version. Automatic rollback fires if error rate or p99 latency crosses a threshold.
- Traffic mirroring — shadow-test a new model against real production requests before it handles a single live response. The client sees only the primary backend's reply.
- Circuit breaker — when an AI backend is under load (slow inference, GPU OOM), open the circuit automatically to stop amplifying the problem.
Everything else — routing, TLS, rate limiting, auth, Prometheus — is standard infrastructure that A3S Gateway includes so you don't need a second tool.
Architecture
Client Request
↓
Entrypoint (HTTP / HTTPS / TCP / UDP listener)
↓
TLS Termination (rustls + ACME / Let's Encrypt)
↓
Router (Host / Path / Headers / Method / SNI matching)
↓
Middleware Pipeline (Auth → RateLimit → CircuitBreaker → CORS → ...)
↓
Service (Load Balancer + Health Check + Failover + Mirror)
↓
Scaling (Knative autoscaler · request buffer · revision router)
↓
Proxy (HTTP / WebSocket / SSE / gRPC / TCP / UDP)
↓
Backend ServiceHot reload replaces the router table and service registry atomically under an Arc swap at the Service layer. No connection is dropped during a configuration change.
Core Components
Prop
Type
Protocol Support
Prop
Type
Gateway Lifecycle
pub enum GatewayState {
Created, // Configured, listeners not yet bound
Starting, // Binding ports, compiling routes and middleware pipelines
Running, // Accepting and proxying requests
Reloading, // Atomically swapping router table and service registry
Stopping, // Draining in-flight requests
Stopped, // All listeners closed
}Key Features
AI-native patterns
- SSE/streaming relay — zero-buffer chunked transfer for LLM outputs
- Scale-to-zero — Knative autoscaling formula with cold-start request buffering
- Gradual rollout — step-by-step traffic shift with automatic rollback on error rate or latency breach
- Traffic mirroring — shadow-test new model versions with live production traffic
Core proxy
- Traefik-compatible rule syntax —
Host(),PathPrefix(),Headers(),Method(),&& - 4 load balancing strategies — Round Robin, Weighted, Least Connections, Random
- Active HTTP health probes + passive error-count backend eviction
- Sticky sessions — cookie-based affinity with TTL and LRU eviction
- Failover — automatic switch to secondary pool when primary has no healthy backends
- Pure Rust TLS via rustls — ACME/Let's Encrypt with HTTP-01 and DNS-01 (Cloudflare, Route53)
- Hot reload — inotify/kqueue file watcher, zero dropped connections
Middleware (15 built-in)
- Auth: JWT (HS256), API Key, Basic Auth, Forward Auth
- Traffic: Rate Limit (in-process), Rate Limit (Redis distributed), Circuit Breaker, Retry, Body Limit
- Transform: CORS, Headers, Strip Prefix, Compress (brotli/gzip/deflate)
- Network: IP Allowlist, TCP Filter
Observability
- Prometheus metrics — per-router/service/backend request counts, latency histograms, autoscaler state
- Structured JSON access log — client IP, method, path, status, duration, backend, router
- Distributed tracing — W3C Trace Context and B3/Zipkin propagation, child span injection
Operations
- Single statically-linked binary — no runtime dependencies, no OpenSSL
- HCL-only configuration — consistent, human-readable, version-controlled
- Built-in dashboard API — health, metrics, config, routes, services, backends, version
- Helm chart for Kubernetes, Docker image, Homebrew formula
- 877 tests across all modules