Prove a turn is done with verification commands and reports instead of trusting the model's claim

Verification

The harness treats "done" as something that must be proven, not merely claimed. When the model says a task is complete, that assertion is worth nothing on its own. Verification turns the claim into evidence: you declare commands that must succeed, the runtime executes them, and the result carries a report you can inspect, gate on, or surface to a user.

Verification is session-scoped. The Rust core runs each command, records its exit status and output, and rolls every report up into a single summary that travels alongside the turn result.

Running Verification Commands

A verification command is a small, named check: an id, a kind, a human-readable description, and the command to run. Mark a check required when a failure should be treated as a hard failure rather than a warning.

const report = await session.verifyCommands('release-readiness', [
  {
    id: 'build',
    kind: 'build',
    description: 'Project compiles',
    command: 'cargo build --all-features',
    required: true,
    timeoutMs: 120000,
  },
  {
    id: 'tests',
    kind: 'test',
    description: 'Unit tests pass',
    command: 'cargo test',
    required: true,
  },
]);

console.log(report);

report = session.verify_commands('release-readiness', [
    {
        "id": "build",
        "kind": "build",
        "description": "Project compiles",
        "command": "cargo build --all-features",
        "required": True,
        "timeout_ms": 120000,
    },
    {
        "id": "tests",
        "kind": "test",
        "description": "Unit tests pass",
        "command": "cargo test",
        "required": True,
    },
])

print(report)

The subject (here release-readiness) labels the batch so multiple verification passes within one session stay distinct in the reports.

Reading The Post-Turn Summary

Every turn's send() result also carries read-only verification fields, so you can gate on the outcome without issuing a separate verification call. Use these to decide whether the turn actually accomplished what it claimed.

const result = await session.send('Apply the fix and run the checks');

console.log(result.verificationStatus);
console.log(result.pendingVerificationCount);
console.log(result.failedVerificationCount);
console.log(result.verificationReportCount);
console.log(result.verificationSummaryText);

if (result.failedVerificationCount > 0) {
  throw new Error('Turn reported done but verification failed');
}

result = session.send('Apply the fix and run the checks')

print(result.verification_status)
print(result.pending_verification_count)
print(result.failed_verification_count)
print(result.verification_report_count)
print(result.verification_summary_text)

if result.failed_verification_count > 0:
    raise RuntimeError('Turn reported done but verification failed')

Inspecting Reports And Summaries

Beyond the per-turn fields, the session exposes the full set of reports, a structured summary, the available presets, and a human-readable digest. The digest is the quickest way to show a person why a turn passed or failed.

import { formatVerificationSummary } from '@a3s-lab/code';

const reports = session.verificationReports();
const summary = session.verificationSummary();
const presets = session.verificationPresets();

// Either the session helper or the standalone formatter yields readable text.
console.log(session.verificationSummaryText());
console.log(formatVerificationSummary(summary));

reports = session.verification_reports()
summary = session.verification_summary()
presets = session.verification_presets()

# The session helper returns a ready-to-print human-readable digest.
print(session.verification_summary_text())

verificationPresets() returns workspace-aware check templates inferred from files such as Cargo.toml, package.json, pyproject.toml, and go.mod. Treat them as starting points: review the commands, timeouts, and required flags for the project before gating releases or user-visible automation.

Why This Matters

Without verification, an agent run ends on the model's word. With it, the run ends on observable evidence: a build that compiled, a test suite that passed, a linter that stayed quiet. The summary text gives you the audit trail; the counts on the result let you fail closed in automation.

Telemetry — inspect trace events and verification reports as runtime evidence.
Limits — bound how much work a turn can do before verification runs.

Verification

Verification

Running Verification Commands

Reading The Post-Turn Summary

Inspecting Reports And Summaries

Why This Matters

Related

On this page