Cluster Extension Points
The seams a cluster host uses to run long-lived agent sessions across many nodes without forking the framework.
Cluster Extension Points
A cluster host platform runs long-lived agent sessions across many nodes. The framework does not ship a scheduler or a placement engine. Instead it exposes a small set of seams: it defines the decision points, emits structured events, and lets the host supply the policy. Everything below is something you wire from outside the framework — you never fork it.
This page is precise about which seams are available from both SDKs (shown with Node.js + Python code) and which are configured in the Rust core today (described in prose, with SDK wiring to follow).
Identity labels
Every session can carry four opaque identity labels. The framework never interprets them — it propagates them to hooks, traces, and SessionData, and restores them on resume. This is how a host attributes a session to a tenant, a principal, an agent template, and a wider correlation chain.
Pair identity labels with a sessionStore / session_store so the labels survive a process restart. On resume, caller-supplied options win, so you can relabel a session as you move it between nodes.
const session = agent.session('/path/to/project', {
tenantId: 'acme-corp',
principal: 'user:42',
agentTemplateId: 'reviewer-v3',
correlationId: 'req-9f2c',
});
// Getters return string | null
console.log(session.tenantId); // 'acme-corp'
console.log(session.principal); // 'user:42'
console.log(session.agentTemplateId); // 'reviewer-v3'
console.log(session.correlationId); // 'req-9f2c'opts = SessionOptions()
opts.tenant_id = 'acme-corp'
opts.principal = 'user:42'
opts.agent_template_id = 'reviewer-v3'
opts.correlation_id = 'req-9f2c'
session = agent.session('/path/to/project', opts)
# Getters are methods, return str | None
print(session.tenant_id()) # 'acme-corp'
print(session.principal()) # 'user:42'
print(session.agent_template_id()) # 'reviewer-v3'
print(session.correlation_id()) # 'req-9f2c'Budget / cost guard
A budget guard lets the host gate every LLM call against a cost or token budget. The framework calls your guard before each LLM request and after it returns. The guard is policy you own; the framework only enforces the decision you hand back.
session.setBudgetGuard({
checkBeforeLlm(sessionId, estimatedTokens) {
if (overLimit(sessionId, estimatedTokens)) {
return { decision: 'deny', resource: 'tokens', reason: 'monthly cap reached' };
}
return { decision: 'allow' };
},
recordAfterLlm(sessionId, usage) {
meter(sessionId, usage);
},
});
// Clear the guard
session.setBudgetGuard(null);Guard callbacks must not throw — a thrown error is treated as Allow.
class MyGuard:
def check_before_llm(self, session_id, estimated_tokens):
if over_limit(session_id, estimated_tokens):
return {'decision': 'deny', 'resource': 'tokens', 'reason': 'monthly cap reached'}
return {'decision': 'allow'}
def record_after_llm(self, session_id, usage):
meter(session_id, usage)
opts = SessionOptions()
opts.budget_guard = MyGuard()
session = agent.session('/path/to/project', opts)
# To clear: set opts.budget_guard = None and re-create the session.The decision shape is identical across both SDKs:
| Return value | Effect |
|---|---|
None / null / { decision: 'allow' } | Proceed with the LLM call. |
{ decision: 'soft', resource, consumed, limit, message? } | Emits BudgetThresholdHit (kind soft) and proceeds. |
{ decision: 'deny', resource, reason } | Aborts the LLM call. Python raises RuntimeError("Budget exhausted..."); Node rejects with "Budget exhausted...". |
Robustness is intentional: a missing guard method is treated as the permissive default, and a callback error falls back to Allow. A misbehaving guard can never halt a live session — only an explicit deny does that.
Cluster event vocabulary
The host emits cluster-level decisions as structured AgentEvent variants through its hook executor. In-session hooks subscribe to them uniformly — the same way they observe any other event — so policy authored at the host shows up to the agent's own hooks without special casing.
The cluster vocabulary is:
BudgetThresholdHit { resource, kind, consumed, limit, message? }— a budget guard returned asoftdecision (or the host crossed a threshold it tracks itself).kinddistinguishes soft warnings from harder limits.PassivationRequested { reason, deadline_ms? }— the host is asking the session to reach a safe, persistable state so it can be evicted from this node.deadline_ms, when present, is the grace window before forced eviction.PeerInvocation { from_session_id, from_tenant_id?, correlation_id? }— another session invoked this one. The labels let the receiver attribute the call back to its origin tenant and correlation chain.
These are observed through the same verified hook API your in-session hooks already use — session.registerHook in Node, session.register_hook in Python (see Hooks). Treat the three variants above as the documented contract; the host is responsible for emitting them via its hook executor.
Deterministic IDs and time (replay)
A cluster that wants bit-identical replay of a run on a different node must remove the two sources of nondeterminism in a normal run: random IDs and the wall clock. The Rust core models both behind a HostEnv { id_generator, clock }. The default pairs a UUID generator with the system clock; replay tooling swaps in a SequentialIdGenerator and a FixedClock so that re-executing the same inputs produces the same IDs and timestamps, and therefore the same output, on any node.
This is configured in the Rust core today. It is not yet exposed on the JS/Python option surface, so there is no Node/Python code for it — SDK wiring may follow.
Loop checkpoints and run resumption
With a sessionStore / session_store configured, the agent loop persists a checkpoint after each completed tool round, keyed by run id. Any node that shares the same store can rehydrate the run and continue it.
import { FileSessionStore } from '@a3s-lab/code';
const session = agent.session(workspace, {
sessionStore: new FileSessionStore('./.a3s/sessions'),
sessionId: 'session-from-node-a',
});
const result = await session.resumeRun('run-id-from-node-a');from a3s_code import FileSessionStore
opts = SessionOptions()
opts.session_store = FileSessionStore('./.a3s/sessions')
opts.session_id = 'session-from-node-a'
session = agent.session(workspace, opts)
result = session.resume_run('run-id-from-node-a')A new run id is allocated for the resumed work — the original run is left intact in the store. Two error paths are worth handling:
resume_run requires a session_store— no store was configured; fall back to a fresh session.no loop checkpoint found for run 'X'— the run never reached its first checkpoint, or it was pruned; retry later or treat the run as lost.
Because checkpoints are taken only between tool rounds, never mid-tool, a resumed run never replays a half-executed tool. See Persistence for store details.
Retention caps for long-running sessions
A session that runs for hours or days accumulates state in four in-memory stores: run records, per-run event buffers, trace events, and terminal subagent task snapshots. Left unbounded, these grow with session age — fine for short-lived sessions, a real leak for long-lived ones.
SessionRetentionLimits caps each of the four stores. Every cap is optional: None means the unbounded default. Eviction is strict FIFO, and running subagent tasks are never dropped — only terminal (completed/failed) snapshots are evicted.
This is configured via the Rust core SessionRetentionLimits today; the SDK shapes land in a follow-up, so there is no Node/Python code for it yet. See Limits for the per-session resource caps that are already on the SDK surface.
See also: Multi-machine · Persistence · Limits · Hooks