A3S Docs
A3S Power

Quick Start

Get A3S Power running with local model inference in minutes

Quick Start

Install

# Via Homebrew (macOS)
brew tap a3s-lab/tap https://github.com/A3S-Lab/homebrew-tap
brew install a3s-power

# Via Cargo (pure Rust inference, no C++ needed)
cargo install a3s-power

# Build from source
git clone https://github.com/A3S-Lab/Power.git && cd Power
cargo build --release

Start the Server

a3s-power serve
# Listening on http://127.0.0.1:11434

Pull a Model from HuggingFace Hub

# By quantization tag (resolves filename automatically)
curl -N http://localhost:11434/v1/models/pull \
  -H "Content-Type: application/json" \
  -d '{"name": "bartowski/Llama-3.2-3B-Instruct-GGUF:Q4_K_M"}'

Progress streams as SSE:

data: {"status":"downloading","completed":209715200,"total":2147483648}
data: {"status":"verifying"}
data: {"status":"success","id":"bartowski/Llama-3.2-3B-Instruct-GGUF:Q4_K_M"}

Register a Local Model

curl -X POST http://localhost:11434/v1/models \
  -H "Content-Type: application/json" \
  -d '{"name": "llama3.2:3b", "path": "/path/to/model.gguf"}'

Chat

curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.2:3b",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Use with OpenAI SDK

from openai import OpenAI

client = OpenAI(base_url="http://localhost:11434/v1", api_key="unused")

response = client.chat.completions.create(
    model="llama3.2:3b",
    messages=[{"role": "user", "content": "Explain Rust in one sentence"}]
)
print(response.choices[0].message.content)
import OpenAI from "openai";

const client = new OpenAI({ baseURL: "http://localhost:11434/v1", apiKey: "unused" });
const response = await client.chat.completions.create({
  model: "llama3.2:3b",
  messages: [{ role: "user", content: "Hello!" }],
});
console.log(response.choices[0].message.content);

Streaming

stream = client.chat.completions.create(
    model="llama3.2:3b",
    messages=[{"role": "user", "content": "Count to 5"}],
    stream=True
)
for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Check Health

curl http://localhost:11434/health
{
  "status": "ok",
  "version": "0.2.0",
  "uptime_seconds": 42,
  "loaded_models": 1,
  "tee": { "enabled": false, "type": "none", "models_verified": false }
}

On this page