Services
Load balancing, health checks, sticky sessions, traffic mirroring, failover, and autoscaling
Services
Services define backend pools with load balancing, health checks, sticky sessions, traffic mirroring, failover, and optional Knative-style autoscaling.
Load Balancing Strategies
Prop
Type
services "api" {
load_balancer {
strategy = "weighted"
servers = [
{ url = "http://127.0.0.1:8001", weight = 3 },
{ url = "http://127.0.0.1:8002", weight = 1 }
]
}
}Health Checks
Active HTTP probes with configurable thresholds. Unhealthy backends are automatically excluded from load balancing.
services "api" {
load_balancer {
strategy = "round-robin"
servers = [
{ url = "http://127.0.0.1:8001" },
{ url = "http://127.0.0.1:8002" }
]
health_check {
path = "/health"
interval = "10s"
timeout = "5s"
unhealthy_threshold = 3
healthy_threshold = 1
}
}
}Prop
Type
Passive health checks also track error counts and automatically remove backends that exceed the failure threshold.
Sticky Sessions
Route the same client to the same backend using cookies:
services "api" {
load_balancer {
strategy = "round-robin"
servers = [{ url = "http://127.0.0.1:8001" }]
sticky {
cookie = "srv_id"
}
}
}On the first request, the gateway selects a backend and sets a Set-Cookie header. Subsequent requests with that cookie are routed to the same backend. Sessions are evicted on TTL expiry or when max sessions is reached.
Traffic Mirroring
Copy a percentage of live traffic to a shadow service for testing without affecting the primary response:
services "api" {
load_balancer {
strategy = "round-robin"
servers = [{ url = "http://127.0.0.1:8001" }]
}
mirror {
service = "shadow-backend"
percentage = 10
}
}Prop
Type
Failover
Automatic fallback to a secondary backend pool when the primary has zero healthy backends:
services "api" {
load_balancer {
strategy = "round-robin"
servers = [{ url = "http://127.0.0.1:8001" }]
health_check {
path = "/health"
}
}
failover {
service = "backup-pool"
}
}
services "backup-pool" {
load_balancer {
strategy = "round-robin"
servers = [{ url = "http://10.0.0.1:8001" }]
}
}Knative-Style Autoscaling
Optional autoscaling with scale-to-zero, request buffering, and concurrency-based decisions:
services "api" {
load_balancer {
strategy = "round-robin"
servers = [{ url = "http://127.0.0.1:8001" }]
}
scaling {
min_replicas = 0
max_replicas = 10
container_concurrency = 50
target_utilization = 0.7
scale_down_delay_secs = 300
buffer_enabled = true
buffer_timeout_secs = 30
buffer_size = 100
executor = "box"
}
}Prop
Type
Revision Traffic Splitting
Split traffic across named revisions for canary deployments:
services "api" {
load_balancer {
strategy = "round-robin"
servers = [{ url = "http://127.0.0.1:8001" }]
}
revisions = [
{ name = "v1", traffic_percent = 90, strategy = "round-robin",
servers = [{ url = "http://127.0.0.1:8001" }] },
{ name = "v2", traffic_percent = 10, strategy = "round-robin",
servers = [{ url = "http://127.0.0.1:8003" }] }
]
}Traffic percentages must sum to 100.
Gradual Rollout
Automatically shift traffic from one revision to another with safety guardrails:
services "api" {
load_balancer {
strategy = "round-robin"
servers = [{ url = "http://127.0.0.1:8001" }]
}
rollout {
from = "v1"
to = "v2"
step_percent = 10
step_interval_secs = 60
error_rate_threshold = 0.05
latency_threshold_ms = 5000
}
}The rollout shifts step_percent of traffic every step_interval_secs. If the error rate exceeds error_rate_threshold or p99 latency exceeds latency_threshold_ms, the rollout automatically rolls back.