A3S Power
Models
Model formats, storage, HuggingFace Hub pull, and lifecycle management
Models
Supported Formats
Prop
Type
Register a Model
# GGUF model
curl -X POST http://localhost:11434/v1/models \
-H "Content-Type: application/json" \
-d '{"name": "llama3.2:3b", "path": "/models/llama3.2-3b-q4_k_m.gguf"}'
# SafeTensors chat model (ISQ quantization on load)
curl -X POST http://localhost:11434/v1/models \
-H "Content-Type: application/json" \
-d '{
"name": "qwen2.5:7b",
"path": "/models/Qwen2.5-7B-Instruct",
"format": "safetensors",
"default_parameters": {"isq": "Q4K"}
}'
# Vision model
curl -X POST http://localhost:11434/v1/models \
-H "Content-Type: application/json" \
-d '{"name": "llava:7b", "path": "/models/llava-7b", "format": "vision"}'
# Embedding model
curl -X POST http://localhost:11434/v1/models \
-H "Content-Type: application/json" \
-d '{"name": "qwen3-embed", "path": "/models/Qwen3-Embedding", "format": "huggingface"}'Pull from HuggingFace Hub
Requires the hf feature (included in default builds).
# By quantization tag — resolves filename via HF API
curl -N http://localhost:11434/v1/models/pull \
-H "Content-Type: application/json" \
-d '{"name": "bartowski/Llama-3.2-3B-Instruct-GGUF:Q4_K_M"}'
# By exact filename
curl -N http://localhost:11434/v1/models/pull \
-H "Content-Type: application/json" \
-d '{"name": "bartowski/Llama-3.2-3B-Instruct-GGUF/Llama-3.2-3B-Instruct-Q4_K_M.gguf"}'
# Private/gated model with HF token
curl -N http://localhost:11434/v1/models/pull \
-H "Content-Type: application/json" \
-d '{"name": "meta-llama/Llama-3.1-8B-Instruct/Meta-Llama-3.1-8B-Q4_K_M.gguf", "token": "hf_..."}'
# Force re-download
curl -N http://localhost:11434/v1/models/pull \
-H "Content-Type: application/json" \
-d '{"name": "bartowski/Llama-3.2-3B-Instruct-GGUF:Q4_K_M", "force": true}'Set HF_TOKEN env var as an alternative to passing token in the request body.
SSE Progress Events
data: {"status":"resuming","offset":104857600,"total":2147483648}
data: {"status":"downloading","completed":524288000,"total":2147483648}
data: {"status":"verifying"}
data: {"status":"success","id":"bartowski/Llama-3.2-3B-Instruct-GGUF:Q4_K_M","object":"model"}Interrupted downloads resume automatically — the partial file is identified by a SHA-256 of the download URL and resumed via HTTP Range requests.
Check Pull Status
# Get persisted pull progress (survives server restarts)
curl http://localhost:11434/v1/models/pull/bartowski%2FLlama-3.2-3B-Instruct-GGUF:Q4_K_M/status{"name": "bartowski/Llama-3.2-3B-Instruct-GGUF:Q4_K_M", "status": "pulling", "completed": 524288000, "total": 2147483648}List and Inspect Models
# List all registered models
curl http://localhost:11434/v1/models
# Get a specific model
curl http://localhost:11434/v1/models/llama3.2:3bDelete a Model
curl -X DELETE http://localhost:11434/v1/models/llama3.2:3bStorage Layout
~/.a3s/power/
├── models/
│ ├── manifests/ # JSON metadata per model
│ │ ├── llama3.2-3b.json
│ │ └── qwen2.5-7b.json
│ └── blobs/ # Content-addressed by SHA-256
│ ├── sha256-a1b2c3... # Model weights
│ └── sha256-d4e5f6...
└── pulls/ # Pull progress state (resume)
└── <sha256-of-url>.jsonModel blobs are stored by SHA-256 hash — identical weights shared across model names are stored only once.
ISQ Quantization Types (SafeTensors)
Prop
Type
Model Lifecycle
Register/Pull → Auto-load on first request → Serve → Idle → Evict (LRU)
↑ │
└────────────────────────────────────────┘- Models load automatically on first inference request
- Idle models unload after
keep_aliveduration (default5m) - When
max_loaded_modelsis reached, the least-recently-used model is evicted