Observability

Health Endpoint

GET /health — returns server status, TEE state, and loaded model count.

curl http://localhost:11434/health

{
  "status": "ok",
  "version": "0.4.2",
  "uptime_seconds": 3600,
  "loaded_models": 2,
  "tee": {
    "enabled": true,
    "type": "sev-snp",
    "models_verified": true
  }
}

max_concurrent_requests applies backpressure instead of rejecting excess inference requests. Waiting requests are counted in power_requests_waiting, and admitted requests hold their permit until the streamed response completes.

Inference Metrics

Prop

Type

Model Metrics

Prop

Type

GPU Metrics

Prop

Type

TEE Metrics

Prop

Type

Prometheus Scrape Config

scrape_configs:
  - job_name: a3s-power
    static_configs:
      - targets: ["localhost:11434"]
    metrics_path: /metrics

Audit Logging

Power writes structured audit logs in JSONL format. Each inference request is logged with timing, model name, token counts (optionally rounded), and request ID — but never with prompt or response content when redact_logs = true.

Audit logs are flushed on graceful shutdown (SIGTERM / Ctrl-C) before the process exits.

Logging

Power uses tracing with tracing-subscriber. Set log level via RUST_LOG:

RUST_LOG=info a3s-power serve
RUST_LOG=debug a3s-power serve
RUST_LOG=a3s_power=debug,tower_http=info a3s-power serve

When redact_logs = true, all inference content is stripped from log output regardless of log level.

Observability

Observability

Health Endpoint

Prometheus Metrics

Request Metrics

Inference Metrics

Model Metrics

GPU Metrics

TEE Metrics

Prometheus Scrape Config

Audit Logging

Logging

On this page