A3S Docs
A3S Power

OpenAI API

OpenAI-compatible API endpoint reference

OpenAI API

A3S Power provides a full OpenAI-compatible API. Any OpenAI SDK works out of the box by pointing base_url at the Power server.

Endpoints

Prop

Type

Chat Completions

POST /v1/chat/completions

curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.2:3b",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Hello!"}
    ],
    "temperature": 0.7,
    "max_tokens": 256,
    "stream": false
  }'

Python SDK

from openai import OpenAI

client = OpenAI(base_url="http://localhost:11434/v1", api_key="unused")

# Non-streaming
response = client.chat.completions.create(
    model="llama3.2:3b",
    messages=[{"role": "user", "content": "Explain Rust"}],
    temperature=0.7,
    max_tokens=512
)
print(response.choices[0].message.content)

# Streaming
stream = client.chat.completions.create(
    model="llama3.2:3b",
    messages=[{"role": "user", "content": "Count to 5"}],
    stream=True
)
for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

TypeScript SDK

import OpenAI from "openai";

const client = new OpenAI({ baseURL: "http://localhost:11434/v1", apiKey: "unused" });

const response = await client.chat.completions.create({
  model: "llama3.2:3b",
  messages: [{ role: "user", content: "Hello!" }],
});
console.log(response.choices[0].message.content);

Text Completions

POST /v1/completions

curl http://localhost:11434/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.2:3b",
    "prompt": "Once upon a time",
    "max_tokens": 100
  }'

Embeddings

POST /v1/embeddings

curl http://localhost:11434/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{"model": "qwen3-embed", "input": ["Hello", "World"]}'

Model Management

# List models
curl http://localhost:11434/v1/models

# Register a local model
curl -X POST http://localhost:11434/v1/models \
  -H "Content-Type: application/json" \
  -d '{"name": "llama3.2:3b", "path": "/models/llama3.2-q4.gguf"}'

# Delete a model
curl -X DELETE http://localhost:11434/v1/models/llama3.2:3b

Pull from HuggingFace Hub

POST /v1/models/pull — streams SSE progress events.

curl -N http://localhost:11434/v1/models/pull \
  -H "Content-Type: application/json" \
  -d '{"name": "bartowski/Llama-3.2-3B-Instruct-GGUF:Q4_K_M"}'

Check pull status (persists across restarts):

curl http://localhost:11434/v1/models/pull/bartowski%2FLlama-3.2-3B-Instruct-GGUF:Q4_K_M/status

Attestation

GET /v1/attestation — returns a TEE attestation report. Returns 503 if TEE is not enabled.

# Basic
curl http://localhost:11434/v1/attestation

# With nonce (prevents replay attacks)
curl "http://localhost:11434/v1/attestation?nonce=deadbeef01234567"

# Bind to a specific model's SHA-256
curl "http://localhost:11434/v1/attestation?model=llama3.2:3b"
{
  "tee_type": "sev-snp",
  "report": "base64-encoded-raw-report",
  "report_data": "hex-encoded-64-bytes",
  "measurement": "hex-encoded-48-bytes",
  "timestamp": "2026-02-21T00:00:00Z"
}

On this page