A3S Power
Quick Start
Get A3S Power running with local model inference in minutes
Quick Start
Install
# Via Homebrew (macOS)
brew tap a3s-lab/tap https://github.com/A3S-Lab/homebrew-tap
brew install a3s-power
# Via Cargo (pure Rust inference, no C++ needed)
cargo install a3s-power
# Build from source
git clone https://github.com/A3S-Lab/Power.git && cd Power
cargo build --releaseStart the Server
a3s-power serve
# Listening on http://127.0.0.1:11434Pull a Model from HuggingFace Hub
# By quantization tag (resolves filename automatically)
curl -N http://localhost:11434/v1/models/pull \
-H "Content-Type: application/json" \
-d '{"name": "bartowski/Llama-3.2-3B-Instruct-GGUF:Q4_K_M"}'Progress streams as SSE:
data: {"status":"downloading","completed":209715200,"total":2147483648}
data: {"status":"verifying"}
data: {"status":"success","id":"bartowski/Llama-3.2-3B-Instruct-GGUF:Q4_K_M"}Register a Local Model
curl -X POST http://localhost:11434/v1/models \
-H "Content-Type: application/json" \
-d '{"name": "llama3.2:3b", "path": "/path/to/model.gguf"}'Chat
curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llama3.2:3b",
"messages": [{"role": "user", "content": "Hello!"}]
}'Use with OpenAI SDK
from openai import OpenAI
client = OpenAI(base_url="http://localhost:11434/v1", api_key="unused")
response = client.chat.completions.create(
model="llama3.2:3b",
messages=[{"role": "user", "content": "Explain Rust in one sentence"}]
)
print(response.choices[0].message.content)import OpenAI from "openai";
const client = new OpenAI({ baseURL: "http://localhost:11434/v1", apiKey: "unused" });
const response = await client.chat.completions.create({
model: "llama3.2:3b",
messages: [{ role: "user", content: "Hello!" }],
});
console.log(response.choices[0].message.content);Streaming
stream = client.chat.completions.create(
model="llama3.2:3b",
messages=[{"role": "user", "content": "Count to 5"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)Check Health
curl http://localhost:11434/health{
"status": "ok",
"version": "0.2.0",
"uptime_seconds": 42,
"loaded_models": 1,
"tee": { "enabled": false, "type": "none", "models_verified": false }
}