ARB Middleware — Agent Self-Reporting Receipts¶

This document is the deep-dive reference for ARB (Agent Self-Reporting), the QA middleware layer that wraps real verification steps (lint, type check, tests, coverage, docs build) and emits structured JSON receipts used as attestation evidence.

The binding rules — the em-dash enrichment pattern, the canonical-invocations table, lane behavior (Lite warn / Heavy fail-closed), and the worked example — live in AGENTS.md § Attestation. Read that section first; this file expands the middleware surface the rules depend on.

Consolidation lineage: .gzkit/rules/attestation-enrichment.md (retired 2026-04-23, ADR-0.0.20 OBPI-03) → AGENTS.md § Attestation (binding) + this file (deep-dive).

ARB Middleware — Core Concept¶

ARB (Agent Self-Reporting) is a QA middleware layer that wraps real verification steps (lint, type check, tests, coverage, docs build) and emits structured JSON receipts. Every claim in the Canonical invocations table in AGENTS.md § Attestation is a thin wrapper over a real tool; the receipt is the deterministic evidence artifact.

ARB intercepts QA command execution and records:

Execution metadata (timestamp, duration, environment)
Input/output (command, arguments, exit code, stderr/stdout)
Structured findings (linting violations, type errors, test failures)
Receipt artifacts (JSON schema-validated, persistent)

This lets agents and humans:

Validate QA step outcomes programmatically
Aggregate recurring patterns across runs
File issues with deterministic evidence
Audit compliance and enforcement

Available commands¶

The canonical invocations (binding) live in AGENTS.md § Attestation. The commands below are the practical surface for producing and consuming receipts.

Wrap a QA tool¶

Bash

uv run gz arb ruff
uv run gz arb ruff --fix
uv run gz arb step --name unittest -- uv run -m unittest -q
uv run gz arb typecheck
uv run gz arb coverage run -m unittest discover -s tests -t .

Validate and analyze receipts¶

Bash

uv run gz arb validate
uv run gz arb validate --limit 50
uv run gz arb advise
uv run gz arb advise --limit 10

Extract recurring anti-patterns¶

Bash

uv run gz arb patterns
uv run gz arb patterns --compact
uv run gz arb patterns --json

Receipt schema and storage¶

Lint receipt schema: data/schemas/arb_lint_receipt.schema.json ($id: gzkit.arb.lint_receipt.schema.json)
Step receipt schema: data/schemas/arb_step_receipt.schema.json ($id: gzkit.arb.step_receipt.schema.json)
Storage: artifacts/receipts/ (configurable via arb.receipts_root in .gzkit.json)

Receipt-binding gate¶

Authored under ADR-0.0.24. The receipt-binding gate is the mechanical floor under the previously-narrative covenant in AGENTS.md § Attestation that "the citing agent must verify the receipt exists and status matches the claim." Verification is no longer narrative discipline; the gate fires automatically on every attestation surface.

Invocation point¶

The gate is invoked pre-emission inside:

uv run gz obpi complete --attestation-text … --attestor …
uv run gz adr emit-receipt … --attestor …
uv run gz validate --attestation-receipts <text|@file> [--lane heavy|lite] [--kind foundation|feature] — the standalone surface, also used by gz check and the OBPI pipeline's pre-flight checklist (gz obpi precomplete).

The gate parses the attestation string for inline IDs of shape arb-(ruff|step-<name>)-[a-f0-9]{32}, reads each receipt from artifacts/receipts/<id>.json (resolved via gzkit.arb.paths.receipts_root()), and asserts each receipt:

Exists and parses as JSON conforming to its declared schema (gzkit.arb.lint_receipt.v1 or gzkit.arb.step_receipt.v1).
Has exit_status == 0.
Matches the canonical claim category derived from its shape (arb-ruff-* → lint; arb-step-<name>-* → <name> keyed to CANONICAL_STEP_COMMANDS in src/gzkit/arb/validator.py) against the category named adjacent to the citation in the attestation text.

`arb-meta-receipt-bind-…` family¶

When the gate ratifies an attestation it writes a self-attesting ledger event of family arb-meta-receipt-bind-<id> — the gate's own evidence trail. The meta-receipt records the cited receipt IDs, the per-ID verification result (resolved / missing / status_mismatch / claim_mismatch), the lane/kind axes the gate was evaluated under, and the verdict it returned. Audits that need to verify "the gate fired on attestation X" read the meta-receipt; the gate-fired condition is itself observable, dated, and replayable from the ledger rather than inferred from the absence of a defect.

The family is reserved in CANONICAL_STEP_COMMANDS under the same extend-only rule as the rest of the table (AGENTS.md § Canonical invocations) — adding a new gate-emitted family requires an ADR or OBPI naming the surface; shrinking the table is forbidden.

Failure modes¶

Failure	Gate verdict	Lane behavior
`missing` — cited receipt file not present in `artifacts/receipts/`	Reject the attestation; meta-receipt records the missing IDs	Heavy / foundation = exit 3 (fail-closed); lite-non-foundation = warning, attestation still records
`status_mismatch` — receipt found but `exit_status != 0`	Reject the attestation; meta-receipt records the offending receipt + status	Heavy / foundation = exit 3; lite-non-foundation = warning
`claim_mismatch` — receipt category from shape does not match the category named adjacent to the citation (e.g. `lint:` adjacent to an `arb-step-typecheck-…` receipt)	Reject the attestation; meta-receipt records both categories	Heavy / foundation = exit 3; lite-non-foundation = warning
`no_ids` — attestation string contains zero `arb-…` citations	Reject on heavy/foundation; warn on lite-non-foundation	Heavy / foundation = exit 3 (the canonical "narrative-only" failure); lite-non-foundation = warning

Lane / kind matrix¶

The gate evaluates the same three-way OR predicate as the rest of the attestation surface (foundation kind OR heavy lane OR security sensitivity → fail-closed; otherwise warn). The matrix below is the gate-verdict projection of _requires_human_obpi_attestation (see AGENTS.md § Lane & Kind & Sensitivity Attestation Matrix for the full predicate):

Lane × Kind	Gate verdict on `missing` / `status_mismatch` / `claim_mismatch` / `no_ids`
Heavy / any kind	Exit 3 (fail-closed)
Any lane / `foundation` kind	Exit 3 (fail-closed)
Lite / `feature` kind / `sensitivity: security`	Exit 3 (fail-closed)
Lite / `feature` kind / no security sensitivity	Warning; attestation records narrative-only

Why a gate, not a reviewer checklist¶

ADR-0.0.24 § Alternatives Considered #1 records the rejection of "keep advisory, raise visibility via reviewer checklist" — the Opus 4.7 system card § 2.3.6.2 documents a model writing six memory files about a verification rule and re-violating it. Discipline-only enforcement is demonstrably insufficient at the current capability frontier; the gate is the mechanical backstop under the narrative discipline.

Cross-references¶

AGENTS.md § Attestation — binding rules, canonical-invocations table, lane behavior bullets that cite this gate
AGENTS.md § Lane & Kind & Sensitivity Attestation Matrix — three-axis predicate the gate evaluates
docs/user/commands/validate.md § --attestation-receipts — operator-facing CLI surface and EXAMPLES
ADR-0.0.24-attestation-receipt-binding — Decision text, non-goals (no --skip-receipt-binding, no git pre-receive enforcement, no fail-closed on lite-non-foundation), and consequences

Exit codes¶

0: Command succeeded; receipt created
1: Command failed; receipt created with error status
2: ARB internal error

Rationale¶

Why receipts, not narrative¶

Narrative recall is post-hoc reconstruction: the reporting pathway and the execution pathway are structurally separate (Lindsey et al. 2025 — the math-explanation pathway and the math-execution pathway are distinct circuits; a model can produce a plausible explanation of reasoning it did not actually perform). The only faithful record of a QA step is the wrapped-command receipt.

Why canonical commands¶

GHI #199 traces the class of failure where an ARB receipt reported exit 0 against ty check . while the governance gate (gz typecheck → ty check src) reported exit 1. Parallel approximations (different scope, different target tree, different flags) drift from the gate. gz arb typecheck (GHI #199) wraps uv run ty check src — the same command gz typecheck and gz closeout invoke.

TDD RED evidence is not ARB-shaped (GHI #157)¶

ARB step receipts encode exit_status=0 as success and exit_status=1 as failure. A TDD RED test is the inverse — a first-run failure is the correct outcome. Until the dedicated RED/GREEN receipt stream lands (tracked under ADR-pool.tdd-receipt-stream), Gate 2 TDD claims cite ARB receipts only for the GREEN side (arb-step-unittest-*); the RED side is recorded as per-increment observed-output pasted into the commit body or OBPI verification section, under the same observed-evidence discipline that governs routing-skill output claims.

Eval-feedback-source trailer (ADR-0.0.26)¶

Rule edits that land as a result of the evaluation feedback loop carry an Eval-feedback-source: <event-id-or-artifact-path> commit trailer alongside the existing Task: / Ceremony: trailers. The trailer is validated by gz validate --commit-trailers. See ADR-0.0.26 for the full loop doctrine.

ARB Middleware — Agent Self-Reporting Receipts¶

ARB Middleware — Core Concept¶

Available commands¶

Wrap a QA tool¶

Validate and analyze receipts¶

Extract recurring anti-patterns¶

Receipt schema and storage¶

Receipt-binding gate¶

Invocation point¶

arb-meta-receipt-bind-… family¶

Failure modes¶

Lane / kind matrix¶

Why a gate, not a reviewer checklist¶

Cross-references¶

Exit codes¶

Rationale¶

Why receipts, not narrative¶

Why canonical commands¶

TDD RED evidence is not ARB-shaped (GHI #157)¶

Eval-feedback-source trailer (ADR-0.0.26)¶

`arb-meta-receipt-bind-…` family¶