A3S Docs
A3S Gateway

Services

Load balancing, health checks, sticky sessions, traffic mirroring, failover, and autoscaling

Services

Services define backend pools with load balancing, health checks, sticky sessions, traffic mirroring, failover, and optional Knative-style autoscaling.

Load Balancing Strategies

Prop

Type

services "api" {
  load_balancer {
    strategy = "weighted"
    servers  = [
      { url = "http://127.0.0.1:8001", weight = 3 },
      { url = "http://127.0.0.1:8002", weight = 1 }
    ]
  }
}

Health Checks

Active HTTP probes with configurable thresholds. Unhealthy backends are automatically excluded from load balancing.

services "api" {
  load_balancer {
    strategy = "round-robin"
    servers  = [
      { url = "http://127.0.0.1:8001" },
      { url = "http://127.0.0.1:8002" }
    ]
    health_check {
      path                = "/health"
      interval            = "10s"
      timeout             = "5s"
      unhealthy_threshold = 3
      healthy_threshold   = 1
    }
  }
}

Prop

Type

Passive health checks also track error counts and automatically remove backends that exceed the failure threshold.

Sticky Sessions

Route the same client to the same backend using cookies:

services "api" {
  load_balancer {
    strategy = "round-robin"
    servers  = [{ url = "http://127.0.0.1:8001" }]
    sticky {
      cookie = "srv_id"
    }
  }
}

On the first request, the gateway selects a backend and sets a Set-Cookie header. Subsequent requests with that cookie are routed to the same backend. Sessions are evicted on TTL expiry or when max sessions is reached.

Traffic Mirroring

Copy a percentage of live traffic to a shadow service for testing without affecting the primary response:

services "api" {
  load_balancer {
    strategy = "round-robin"
    servers  = [{ url = "http://127.0.0.1:8001" }]
  }
  mirror {
    service    = "shadow-backend"
    percentage = 10
  }
}

Prop

Type

Failover

Automatic fallback to a secondary backend pool when the primary has zero healthy backends:

services "api" {
  load_balancer {
    strategy = "round-robin"
    servers  = [{ url = "http://127.0.0.1:8001" }]
    health_check {
      path = "/health"
    }
  }
  failover {
    service = "backup-pool"
  }
}

services "backup-pool" {
  load_balancer {
    strategy = "round-robin"
    servers  = [{ url = "http://10.0.0.1:8001" }]
  }
}

Knative-Style Autoscaling

Optional autoscaling with scale-to-zero, request buffering, and concurrency-based decisions:

services "api" {
  load_balancer {
    strategy = "round-robin"
    servers  = [{ url = "http://127.0.0.1:8001" }]
  }
  scaling {
    min_replicas          = 0
    max_replicas          = 10
    container_concurrency = 50
    target_utilization    = 0.7
    scale_down_delay_secs = 300
    buffer_enabled        = true
    buffer_timeout_secs   = 30
    buffer_size           = 100
    executor              = "box"
  }
}

Prop

Type

Revision Traffic Splitting

Split traffic across named revisions for canary deployments:

services "api" {
  load_balancer {
    strategy = "round-robin"
    servers  = [{ url = "http://127.0.0.1:8001" }]
  }
  revisions = [
    { name = "v1", traffic_percent = 90, strategy = "round-robin",
      servers = [{ url = "http://127.0.0.1:8001" }] },
    { name = "v2", traffic_percent = 10, strategy = "round-robin",
      servers = [{ url = "http://127.0.0.1:8003" }] }
  ]
}

Traffic percentages must sum to 100.

Gradual Rollout

Automatically shift traffic from one revision to another with safety guardrails:

services "api" {
  load_balancer {
    strategy = "round-robin"
    servers  = [{ url = "http://127.0.0.1:8001" }]
  }
  rollout {
    from                 = "v1"
    to                   = "v2"
    step_percent         = 10
    step_interval_secs   = 60
    error_rate_threshold = 0.05
    latency_threshold_ms = 5000
  }
}

The rollout shifts step_percent of traffic every step_interval_secs. If the error rate exceeds error_rate_threshold or p99 latency exceeds latency_threshold_ms, the rollout automatically rolls back.

On this page