Services
Load balancing, health checks, sticky sessions, traffic mirroring, failover, and autoscaling
Services
Services define backend pools with load balancing, health checks, sticky sessions, traffic mirroring, failover, and optional Knative-style autoscaling.
Load Balancing Strategies
Prop
Type
services "api" {
load_balancer {
strategy = "weighted"
request_timeout = "30s"
servers = [
{ url = "http://127.0.0.1:8001", weight = 3 },
{ url = "http://127.0.0.1:8002", weight = 1 }
]
}
}request_timeout controls the maximum time the gateway waits for a buffered
plain HTTP upstream response before returning 504 Gateway Timeout. It accepts millisecond,
second, or minute values such as "500ms", "30s", or "2m".
Prop
Type
Health Checks
Active HTTP probes with configurable thresholds. Unhealthy backends are automatically excluded from load balancing.
services "api" {
load_balancer {
strategy = "round-robin"
servers = [
{ url = "http://127.0.0.1:8001" },
{ url = "http://127.0.0.1:8002" }
]
health_check {
path = "/health"
interval = "10s"
timeout = "5s"
unhealthy_threshold = 3
healthy_threshold = 1
}
}
}Prop
Type
Passive health checks also track error counts and automatically remove backends that exceed the failure threshold.
Sticky Sessions
Route the same client to the same backend using cookies:
services "api" {
load_balancer {
strategy = "round-robin"
servers = [{ url = "http://127.0.0.1:8001" }]
sticky {
cookie = "srv_id"
}
}
}On the first request, the gateway selects a backend and sets a Set-Cookie header. Subsequent requests with that cookie are routed to the same backend. Sessions are evicted on TTL expiry or when max sessions is reached.
Traffic Mirroring
Copy a percentage of live traffic to a shadow service for testing without affecting the primary response:
services "api" {
load_balancer {
strategy = "round-robin"
servers = [{ url = "http://127.0.0.1:8001" }]
}
mirror {
service = "shadow-backend"
percentage = 10
}
}Prop
Type
Failover
Automatic fallback to a secondary backend pool when the primary has zero healthy backends:
services "api" {
load_balancer {
strategy = "round-robin"
servers = [{ url = "http://127.0.0.1:8001" }]
health_check {
path = "/health"
}
}
failover {
service = "backup-pool"
}
}
services "backup-pool" {
load_balancer {
strategy = "round-robin"
servers = [{ url = "http://10.0.0.1:8001" }]
}
}Knative-Style Autoscaling
Optional autoscaling with scale-to-zero, request buffering, and concurrency-based decisions:
services "api" {
load_balancer {
strategy = "round-robin"
servers = [{ url = "http://127.0.0.1:8001" }]
}
scaling {
min_replicas = 0
max_replicas = 10
container_concurrency = 50
target_utilization = 0.7
scale_down_delay_secs = 300
buffer_enabled = true
buffer_timeout_secs = 30
buffer_size = 100
executor = "box"
}
}Prop
Type
Revision Traffic Splitting
Split traffic across named revisions for canary deployments:
services "api" {
load_balancer {
strategy = "round-robin"
servers = [{ url = "http://127.0.0.1:8001" }]
}
revisions = [
{ name = "v1", traffic_percent = 90, strategy = "round-robin",
servers = [{ url = "http://127.0.0.1:8001" }] },
{ name = "v2", traffic_percent = 10, strategy = "round-robin",
servers = [{ url = "http://127.0.0.1:8003" }] }
]
}Traffic percentages must sum to 100.
Gradual Rollout
Automatically shift traffic from one revision to another with safety guardrails:
services "api" {
load_balancer {
strategy = "round-robin"
servers = [{ url = "http://127.0.0.1:8001" }]
}
rollout {
from = "v1"
to = "v2"
step_percent = 10
step_interval_secs = 60
error_rate_threshold = 0.05
latency_threshold_ms = 5000
}
}The rollout shifts step_percent of traffic every step_interval_secs. If the error rate exceeds error_rate_threshold or p99 latency exceeds latency_threshold_ms, the rollout automatically rolls back.