A3S Docs
A3S Gateway

A3S Gateway

The traffic layer for AI-native services — SSE streaming, scale-to-zero, safe model rollouts, single Rust binary.

A3S Gateway

A3S Gateway is the traffic layer for AI-native services. It is built on hyper and rustls, ships as a single statically-linked binary, and is configured exclusively in HCL (HashiCorp Configuration Language).

Why A3S Gateway

AI services have fundamentally different operational patterns than web applications. Tools built for the Web era make poor assumptions when applied to the AI stack:

ConcernWeb assumptionAI reality
Response shapeSmall, bounded JSONUnbounded token stream (SSE)
Latency profileMillisecondsSeconds (model inference)
Idle costNegligibleHigh (GPU memory reservation)
Deployment riskLow, statelessHigh (model quality regression)
Protocols neededHTTP request/responseSSE, WebSocket, gRPC, HTTP/2

A3S Gateway is designed for the right column:

  • SSE/Streaming — chunked transfer relay with zero response buffering. The first token reaches the client the moment the model emits it, not when the full response is assembled.
  • Scale-to-zero with request buffering — when a model replica is cold, incoming requests are held in memory and replayed the instant the replica is ready. No request is dropped; no client sees a 503.
  • Revision traffic splitting — send a configurable percentage of live traffic to a new model version. Automatic rollback fires if error rate or p99 latency crosses a threshold.
  • Traffic mirroring — shadow-test a new model against real production requests before it handles a single live response. The client sees only the primary backend's reply.
  • Circuit breaker — when an AI backend is under load (slow inference, GPU OOM), open the circuit automatically to stop amplifying the problem.

Everything else — routing, TLS, rate limiting, auth, Prometheus — is standard infrastructure that A3S Gateway includes so you don't need a second tool.

Architecture

Client Request

Entrypoint  (HTTP / HTTPS / TCP / UDP listener)

TLS Termination  (rustls + ACME / Let's Encrypt)

Router  (Host / Path / Headers / Method / SNI matching)

Middleware Pipeline  (Auth → RateLimit → CircuitBreaker → CORS → ...)

Service  (Load Balancer + Health Check + Failover + Mirror)

Scaling  (Knative autoscaler · request buffer · revision router)

Proxy  (HTTP / WebSocket / SSE / gRPC / TCP / UDP)

Backend Service

Hot reload replaces the router table and service registry atomically under an Arc swap at the Service layer. No connection is dropped during a configuration change.

Core Components

Prop

Type

Protocol Support

Prop

Type

Gateway Lifecycle

pub enum GatewayState {
    Created,     // Configured, listeners not yet bound
    Starting,    // Binding ports, compiling routes and middleware pipelines
    Running,     // Accepting and proxying requests
    Reloading,   // Atomically swapping router table and service registry
    Stopping,    // Draining in-flight requests
    Stopped,     // All listeners closed
}

Key Features

AI-native patterns

  • SSE/streaming relay — zero-buffer chunked transfer for LLM outputs
  • Scale-to-zero — Knative autoscaling formula with cold-start request buffering
  • Gradual rollout — step-by-step traffic shift with automatic rollback on error rate or latency breach
  • Traffic mirroring — shadow-test new model versions with live production traffic

Core proxy

  • Traefik-compatible rule syntax — Host(), PathPrefix(), Headers(), Method(), &&
  • 4 load balancing strategies — Round Robin, Weighted, Least Connections, Random
  • Active HTTP health probes + passive error-count backend eviction
  • Sticky sessions — cookie-based affinity with TTL and LRU eviction
  • Failover — automatic switch to secondary pool when primary has no healthy backends
  • Pure Rust TLS via rustls — ACME/Let's Encrypt with HTTP-01 and DNS-01 (Cloudflare, Route53)
  • Hot reload — inotify/kqueue file watcher, zero dropped connections

Middleware (15 built-in)

  • Auth: JWT (HS256), API Key, Basic Auth, Forward Auth
  • Traffic: Rate Limit (in-process), Rate Limit (Redis distributed), Circuit Breaker, Retry, Body Limit
  • Transform: CORS, Headers, Strip Prefix, Compress (brotli/gzip/deflate)
  • Network: IP Allowlist, TCP Filter

Observability

  • Prometheus metrics — per-router/service/backend request counts, latency histograms, autoscaler state
  • Structured JSON access log — client IP, method, path, status, duration, backend, router
  • Distributed tracing — W3C Trace Context and B3/Zipkin propagation, child span injection

Operations

  • Single statically-linked binary — no runtime dependencies, no OpenSSL
  • HCL-only configuration — consistent, human-readable, version-controlled
  • Built-in dashboard API — health, metrics, config, routes, services, backends, version
  • Helm chart for Kubernetes, Docker image, Homebrew formula
  • 877 tests across all modules

On this page