Skip to content

gz readiness evaluate

Run the instruction architecture eval suite with positive and negative controls across four readiness dimensions.


Usage

Bash
gz readiness evaluate [--json]

What It Evaluates

Ten baseline eval cases exercise instruction surfaces across:

  • Codex loading — AGENTS.md as primary surface, agent-exclusion filtering
  • Claude loading — .claude/rules/ sync from .github/instructions/, orphan detection
  • Workflow relocation — skills in .gzkit/skills/, command documentation coverage
  • Drift detection — applyTo reachability, foreign references, instruction/rule sync

Each case is classified by:

  • Surface: codex, claude, or shared
  • Dimension: outcome, process, style, or efficiency
  • Control: positive (expected to pass) or negative (expected to detect problems)

Scoring is 0.00-3.00 per dimension. The command fails when any eval case fails.


Example

Bash
uv run gz readiness evaluate

Extensibility

New eval cases can be added by appending to the BASELINE_CASES list in src/gzkit/instruction_eval.py or by passing custom cases to run_eval_suite(). No model changes required.


Options

Option Description
--json Emit machine-readable output