gz readiness evaluate¶

Run the instruction architecture eval suite with positive and negative controls across four readiness dimensions.

Usage¶

Bash

gz readiness evaluate [--json]

What It Evaluates¶

Ten baseline eval cases exercise instruction surfaces across:

Codex loading — AGENTS.md as primary surface, agent-exclusion filtering
Claude loading — .claude/rules/ sync from .github/instructions/, orphan detection
Workflow relocation — skills in .gzkit/skills/, command documentation coverage
Drift detection — applyTo reachability, foreign references, instruction/rule sync

Each case is classified by:

Surface: codex, claude, or shared
Dimension: outcome, process, style, or efficiency
Control: positive (expected to pass) or negative (expected to detect problems)

Scoring is 0.00-3.00 per dimension. The command fails when any eval case fails.

Example¶

Bash

uv run gz readiness evaluate

Extensibility¶

New eval cases can be added by appending to the BASELINE_CASES list in src/gzkit/instruction_eval.py or by passing custom cases to run_eval_suite(). No model changes required.

Options¶

Option	Description
`--json`	Emit machine-readable output