Audit Template: Agent Readiness
Applying the Four-Discipline Taxonomy
This audit applies the four-discipline taxonomy:
- Prompt craft
- Context engineering
- Intent engineering
- Specification engineering
It also applies the five specification primitives.
Use this template to compare two active projects or to audit one project by leaving the other column as N/A.
Each item is scored 0-3:
- 0: Not present. No artifact, no practice, no awareness.
- 1: Informal. The knowledge exists in a maintainer's head but is not encoded for agents.
- 2: Partial. Documentation exists but is incomplete, inconsistent, or not structured for agent use.
- 3: Agent-ready. A competent agent can execute reliably without additional human context.
The normal baseline: many items will score 1 or 2.
Discipline 1: Prompt Craft
Project A: ______
| Item |
Score |
Current State |
Gap |
| Standard prompts for common tasks |
|
|
|
| Examples of good and bad output for recurring work |
|
|
|
| Explicit output format expectations |
|
|
|
| Guardrails for known failure modes |
|
|
|
| Ambiguity resolution rules |
|
|
|
Project B: ______
| Item |
Score |
Current State |
Gap |
| Standard prompts for common tasks |
|
|
|
| Examples of good and bad output for recurring work |
|
|
|
| Explicit output format expectations |
|
|
|
| Guardrails for known failure modes |
|
|
|
| Ambiguity resolution rules |
|
|
|
Discipline 2: Context Engineering
Project A: ______
| Item |
Score |
Current State |
Gap |
Project-level agent instructions (AGENTS.md, CLAUDE.md, or equivalent) |
|
|
|
.github/copilot-instructions.md (or equivalent) |
|
|
|
| Architecture docs an agent can read |
|
|
|
| ADR index and decision history |
|
|
|
| README that orients a new agent session |
|
|
|
| Schema/domain docs current enough for safe implementation |
|
|
|
| Known conventions documented (naming, file placement, test patterns) |
|
|
|
| Context rot mitigation (stale docs cleaned/flagged) |
|
|
|
Project B: ______
| Item |
Score |
Current State |
Gap |
Project-level agent instructions (AGENTS.md, CLAUDE.md, or equivalent) |
|
|
|
.github/copilot-instructions.md (or equivalent) |
|
|
|
| Architecture docs an agent can read |
|
|
|
| ADR index and decision history |
|
|
|
| README that orients a new agent session |
|
|
|
| Schema/domain docs current enough for safe implementation |
|
|
|
| Known conventions documented (naming, file placement, test patterns) |
|
|
|
| Context rot mitigation (stale docs cleaned/flagged) |
|
|
|
Discipline 3: Intent Engineering
Project A: ______
| Item |
Score |
Current State |
Gap |
| Project purpose statement an agent can read |
|
|
|
| Trade-off hierarchy (rigor vs. speed) |
|
|
|
| Quality bar definition |
|
|
|
| Scope boundaries (in vs. out) |
|
|
|
| Relationship to upstream/canonical sources defined |
|
|
|
| Decision escalation rules |
|
|
|
| Self-application of governance (dogfooding) |
|
|
|
Project B: ______
| Item |
Score |
Current State |
Gap |
| Project purpose statement an agent can read |
|
|
|
| Trade-off hierarchy (rigor vs. speed) |
|
|
|
| Quality bar definition |
|
|
|
| Scope boundaries (in vs. out) |
|
|
|
| Relationship to upstream/canonical sources defined |
|
|
|
| Decision escalation rules |
|
|
|
| Self-application of governance (dogfooding) |
|
|
|
Discipline 4: Specification Engineering
Primitive 1: Self-Contained Problem Statements
| Project |
Score |
Current State |
Gap |
| Project A: Are tasks/issues written with full context? |
|
|
|
| Project A: Do tasks specify environment, schema/version, and test expectations? |
|
|
|
| Project B: Are tasks/issues written with full context? |
|
|
|
| Project B: Do tasks specify environment, schema/version, and test expectations? |
|
|
|
Primitive 2: Acceptance Criteria
| Project |
Score |
Current State |
Gap |
| Project A: Do tasks have explicit "done looks like" statements? |
|
|
|
| Project A: Are outputs independently verifiable? |
|
|
|
| Project B: Do tasks have explicit "done looks like" statements? |
|
|
|
| Project B: Are outputs independently verifiable? |
|
|
|
Primitive 3: Constraint Architecture
| Project |
Score |
Current State |
Gap |
| Project A: Musts |
|
|
|
| Project A: Must-nots |
|
|
|
| Project A: Preferences |
|
|
|
| Project A: Escalation triggers |
|
|
|
| Project B: Musts |
|
|
|
| Project B: Must-nots |
|
|
|
| Project B: Preferences |
|
|
|
| Project B: Escalation triggers |
|
|
|
Primitive 4: Decomposition
| Project |
Score |
Current State |
Gap |
| Project A: Work decomposed into independent milestones |
|
|
|
| Project A: Typical units are sub-2-hour agent tasks |
|
|
|
| Project A: Input/output boundaries are explicit |
|
|
|
| Project B: Work decomposed into independent milestones |
|
|
|
| Project B: Typical units are sub-2-hour agent tasks |
|
|
|
| Project B: Input/output boundaries are explicit |
|
|
|
Primitive 5: Evaluation Design
| Project |
Score |
Current State |
Gap |
| Project A: TDD evidence (unit-test gate or equivalent) is required and auditable |
|
|
|
| Project A: BDD evidence (behavior/acceptance gate or equivalent) is required and auditable |
|
|
|
| Project A: Eval cases for recurring workflows |
|
|
|
| Project A: Regression checks run after model/instruction changes |
|
|
|
| Project B: TDD evidence (unit-test gate or equivalent) is required and auditable |
|
|
|
| Project B: BDD evidence (behavior/acceptance gate or equivalent) is required and auditable |
|
|
|
| Project B: Eval cases for recurring workflows |
|
|
|
| Project B: Regression checks run after model/instruction changes |
|
|
|
Summary and Next Actions
After scoring, identify the three highest-impact gaps (largest score improvement for lowest effort) per project.
Project A: Top 3 gaps
1.
2.
3.
Project B: Top 3 gaps
1.
2.
3.
Cross-project observations
(Patterns that appear in both projects, indicating a systematic gap rather than a project-specific one.)
- Date:
- Auditor:
- Taxonomy source:
- Corroborating sources:
- Framework version: