Providers & Configuration
Complete HCL configuration reference — LLM providers, model fields, OpenAI-compatible APIs, and local models via Ollama / GPUStack / A3S Power / OneAPI.
All configuration lives in a single .hcl file. This page is the complete reference for every field, plus ready-to-paste configs for common LLM providers and local inference servers.
A3S Code uses HCL (HashiCorp Configuration Language) as its primary config format. The env() function reads environment variables at parse time — use it to keep secrets out of config files.
Config File Location
Agent::new("agent.hcl") resolves the path relative to the current working directory. Any filename and any path (relative or absolute) work.
| Convention | Use case |
|---|---|
./agent.hcl | Per-project config, can be checked into the repo |
~/.a3s/config.hcl | User-level default shared across projects |
/etc/a3s/config.hcl | System-wide config for multi-user servers |
Minimal Config
The only required top-level field is default_model. For cloud providers, api_key at the provider level is also required.
default_model = "anthropic/claude-sonnet-4-20250514"
providers {
name = "anthropic"
api_key = env("ANTHROPIC_API_KEY")
models {
id = "claude-sonnet-4-20250514"
name = "Claude Sonnet 4"
tool_call = true
}
}For fully local providers (e.g., Ollama) no api_key is needed — only a base_url.
Top-Level Fields
Prop
Type
Provider Fields
Prop
Type
Model Fields
Prop
Type
modalities block
Declares what content types the model can receive and produce:
# Multimodal model (text + vision)
modalities {
input = ["text", "image", "pdf"]
output = ["text"]
}
# Text-only model
modalities {
input = ["text"]
output = ["text"]
}cost block
Pricing in USD per million tokens, consumed by the Telemetry module for cost tracking:
cost {
input = 3.0 # per 1M input tokens
output = 15.0 # per 1M output tokens
cache_read = 0.3 # per 1M prompt-cache read tokens (Anthropic)
cache_write = 3.75 # per 1M prompt-cache write tokens (Anthropic)
}limit block
limit {
context = 200000 # context window in tokens
output = 64000 # max output tokens per response
}Multiple Providers
Define as many providers as needed. Switch between them per session via SessionOptions:
default_model = "anthropic/claude-sonnet-4-20250514"
providers {
name = "anthropic"
api_key = env("ANTHROPIC_API_KEY")
models {
id = "claude-sonnet-4-20250514"
name = "Claude Sonnet 4"
tool_call = true
}
models {
id = "claude-opus-4-5-20251101"
name = "Claude Opus 4.5"
reasoning = true
tool_call = true
}
}
providers {
name = "openai"
api_key = env("OPENAI_API_KEY")
models {
id = "gpt-4o"
name = "GPT-4o"
tool_call = true
}
}Select at runtime:
const session = agent.session('.', { model: 'openai/gpt-4o' });session = agent.session(".", model="openai/gpt-4o")Per-Model API Key & Base URL
Provider-level api_key and base_url are defaults; model-level values override them. This enables:
- API key rotation — different keys per model
- Proxy routing — send specific models through a gateway
- Regional endpoints — lower latency or data-residency compliance
- Self-hosted endpoints — point individual models to local servers
providers {
name = "anthropic"
api_key = env("ANTHROPIC_API_KEY") # default for all models
base_url = "https://api.anthropic.com" # default base URL
models {
id = "claude-sonnet-4-20250514"
name = "Claude Sonnet 4"
tool_call = true
# inherits provider api_key and base_url
}
models {
id = "claude-sonnet-4-20250514"
name = "Claude Sonnet 4 (EU)"
api_key = env("ANTHROPIC_EU_KEY") # override
base_url = "https://eu.api.anthropic.com" # override
tool_call = true
}
}OpenAI-Compatible Providers
Any service implementing the OpenAI Chat Completions API (POST /v1/chat/completions) works as a provider. Set base_url to point to the endpoint. The name field is arbitrary.
This covers: Ollama, GPUStack, A3S Power, OneAPI, Together AI, Groq, Fireworks, LM Studio, vLLM, llama.cpp, and any other OpenAI-compatible backend.
Local LLM Providers
Run inference locally for privacy, cost control, offline use, or to serve fine-tuned models. All four options below expose an OpenAI-compatible endpoint and follow the same config pattern.
Ollama
Ollama runs open-source models locally and serves them at http://localhost:11434/v1.
Setup:
# macOS
brew install ollama
ollama serve
# Pull models
ollama pull llama3.2
ollama pull qwen2.5-coder:7b
ollama pull deepseek-r1:8bConfig:
default_model = "ollama/llama3.2"
providers {
name = "ollama"
base_url = "http://localhost:11434/v1" # no api_key needed
models {
id = "llama3.2"
name = "Llama 3.2 (Ollama)"
family = "llama"
tool_call = true
temperature = true
modalities {
input = ["text"]
output = ["text"]
}
limit {
context = 128000
output = 4096
}
}
models {
id = "qwen2.5-coder:7b"
name = "Qwen 2.5 Coder 7B (Ollama)"
family = "qwen"
tool_call = true
temperature = true
limit { context = 32000 output = 4096 }
}
models {
id = "deepseek-r1:8b"
name = "DeepSeek-R1 8B (Ollama)"
family = "deepseek"
tool_call = false
reasoning = true
temperature = false
limit { context = 64000 output = 8192 }
}
}Ollama does not require an API key. If the client library requires a non-empty value, use any placeholder string (e.g., api_key = "ollama").
GPUStack
GPUStack manages GPU clusters and serves models via an OpenAI-compatible API. API keys use the format gpustack_<token>. Model IDs follow the convention model--<org>--<model-name> — find the exact value in your GPUStack dashboard.
Single-node config:
default_model = "gpustack/model--zhipuai--glm-4.7"
providers {
name = "gpustack"
api_key = env("GPUSTACK_API_KEY")
base_url = "http://your-gpustack-host/v1" # shared for all models
models {
id = "model--zhipuai--glm-4.7"
name = "GLM-4.7 (GPUStack)"
family = "glm"
tool_call = true
temperature = true
limit { context = 128000 output = 4096 }
}
models {
id = "model--meta--llama-3.2-3b-instruct"
name = "Llama 3.2 3B Instruct (GPUStack)"
family = "llama"
tool_call = true
temperature = true
limit { context = 128000 output = 4096 }
}
}Multi-node config — override base_url per model to route to different nodes:
providers {
name = "gpustack"
api_key = env("GPUSTACK_API_KEY")
models {
id = "model--zhipuai--glm-4.7"
name = "GLM-4.7 (node-1)"
base_url = "http://node-1.gpustack.internal/v1"
tool_call = true
temperature = true
limit { context = 128000 output = 4096 }
}
models {
id = "model--meta--llama-3.2-3b-instruct"
name = "Llama 3.2 3B (node-2)"
base_url = "http://node-2.gpustack.internal/v1"
tool_call = true
temperature = true
limit { context = 128000 output = 4096 }
}
}A3S Power
A3S Power provides managed inference backed by A3S infrastructure, with an OpenAI-compatible API.
default_model = "power/qwen2.5-72b"
providers {
name = "power"
api_key = env("A3S_POWER_API_KEY")
base_url = "http://your-a3s-power-host/v1"
models {
id = "qwen2.5-72b"
name = "Qwen 2.5 72B (A3S Power)"
family = "qwen"
tool_call = true
temperature = true
reasoning = false
modalities {
input = ["text"]
output = ["text"]
}
limit {
context = 128000
output = 8192
}
}
models {
id = "deepseek-r1:70b"
name = "DeepSeek-R1 70B (A3S Power)"
family = "deepseek"
tool_call = true
reasoning = true
temperature = false
limit { context = 64000 output = 8192 }
}
}OneAPI
OneAPI is an open-source API aggregator that proxies multiple upstream providers (Anthropic, OpenAI, Azure, Gemini, etc.) behind a single endpoint and key.
default_model = "oneapi/claude-sonnet-4-20250514"
providers {
name = "oneapi"
api_key = env("ONEAPI_TOKEN")
base_url = "http://your-oneapi-host/v1"
models {
id = "claude-sonnet-4-20250514"
name = "Claude Sonnet 4 (via OneAPI)"
tool_call = true
temperature = true
limit { context = 200000 output = 8192 }
}
models {
id = "gpt-4o"
name = "GPT-4o (via OneAPI)"
tool_call = true
temperature = true
limit { context = 128000 output = 4096 }
}
models {
id = "deepseek-chat"
name = "DeepSeek Chat (via OneAPI)"
tool_call = true
temperature = true
limit { context = 64000 output = 4096 }
}
}The id field must match the channel model name configured in your OneAPI instance, not necessarily the upstream provider's model ID. Verify the exact value in your OneAPI admin panel.
Full Configuration Reference
# ─── Global ──────────────────────────────────────────────────────────────────
default_model = "anthropic/claude-sonnet-4-20250514"
max_tool_rounds = 20 # default: 50
thinking_budget = 4096 # reasoning token budget (optional)
# ─── Extensions ──────────────────────────────────────────────────────────────
skill_dirs = ["./skills"] # directories with *.md skill files
agent_dirs = ["./agents"] # directories with agent definition files
# ─── Storage ─────────────────────────────────────────────────────────────────
storage_backend = "file" # "memory" | "file" | "custom"
sessions_dir = "~/.a3s/sessions" # path for "file" backend
storage_url = "redis://localhost:6379" # URL for "custom" backend
# ─── Cloud Provider ──────────────────────────────────────────────────────────
providers {
name = "anthropic"
api_key = env("ANTHROPIC_API_KEY")
base_url = "https://api.anthropic.com" # optional override
models {
id = "claude-sonnet-4-20250514"
name = "Claude Sonnet 4"
family = "claude-sonnet"
tool_call = true
reasoning = false
temperature = true
attachment = true
release_date = "2025-05-14"
modalities {
input = ["text", "image", "pdf"]
output = ["text"]
}
cost {
input = 3.0
output = 15.0
cache_read = 0.3
cache_write = 3.75
}
limit {
context = 200000
output = 64000
}
}
}
# ─── OpenAI-compatible cloud ─────────────────────────────────────────────────
providers {
name = "openai"
api_key = env("OPENAI_API_KEY")
models {
id = "gpt-4o"
name = "GPT-4o"
tool_call = true
temperature = true
attachment = true
modalities {
input = ["text", "image"]
output = ["text"]
}
cost {
input = 2.5
output = 10.0
}
limit {
context = 128000
output = 16384
}
}
# Per-model API key + base_url override
models {
id = "gpt-4o"
name = "GPT-4o (via proxy)"
api_key = env("PROXY_API_KEY")
base_url = "https://proxy.example.com/v1"
tool_call = true
temperature = true
limit { context = 128000 output = 4096 }
}
}
# ─── Local provider (Ollama) ─────────────────────────────────────────────────
providers {
name = "ollama"
base_url = "http://localhost:11434/v1"
models {
id = "llama3.2"
name = "Llama 3.2 (Ollama)"
tool_call = true
temperature = true
limit { context = 128000 output = 4096 }
}
}
# ─── Queue (optional) ────────────────────────────────────────────────────────
queue {
query_max_concurrency = 5 # default: 5
execute_max_concurrency = 2 # default: 2
generate_max_concurrency = 1 # default: 1
enable_metrics = true
enable_dlq = true
retry_policy {
strategy = "exponential" # "fixed" | "exponential"
max_retries = 3
initial_delay_ms = 100
}
}
# ─── Search (optional) ───────────────────────────────────────────────────────
search {
timeout = 30
health {
max_failures = 3
suspend_seconds = 60
}
engine {
ddg { enabled = true weight = 1.5 }
wiki { enabled = true weight = 1.2 }
brave { enabled = true weight = 1.0 timeout = 20 }
}
}The env() Function
Use env("VAR") anywhere a string value is expected. The variable is resolved once at parse time — if unset, Agent::new() returns an error immediately at startup.
providers {
name = "anthropic"
api_key = env("ANTHROPIC_API_KEY")
}export ANTHROPIC_API_KEY="sk-ant-..."
export OPENAI_API_KEY="sk-..."
export GPUSTACK_API_KEY="gpustack_..."$env:ANTHROPIC_API_KEY = "sk-ant-..."
$env:OPENAI_API_KEY = "sk-..."
$env:GPUSTACK_API_KEY = "gpustack_..."set ANTHROPIC_API_KEY=sk-ant-...
set OPENAI_API_KEY=sk-...
set GPUSTACK_API_KEY=gpustack_...