Skip to content

Agent-Era Prompting Summary (Nate B. Jones Transcript)

Source discussed by operator: "If You're Prompting Like It's Last Month, You're Already Late" (February 2026).

This summary captures the transcript's design-relevant claims for GovZero and gzkit.


Core Thesis

Prompting has split into four different disciplines. Chat-era prompt craft alone is no longer sufficient for long-running autonomous agents.

Agents that run for hours or days require intent, context, and specification to be encoded before execution starts.


Four Disciplines

  1. Prompt Craft
  2. Clear instructions, examples/counterexamples, guardrails, output shape, ambiguity handling.
  3. Still required, but now table stakes.

  4. Context Engineering

  5. Curate the token environment, not just one prompt.
  6. Control surfaces, retrieval, memory, and project conventions determine quality ceiling.

  7. Intent Engineering

  8. Encode goals, values, trade-off rules, and escalation boundaries.
  9. Prevents "optimizing the wrong metric" failures.

  10. Specification Engineering

  11. Write executable specs for work that spans long time horizons.
  12. Treat organizational documents as machine-operable specifications.

Five Specification Primitives

  1. Self-contained problem statements
  2. Task is solvable without implicit tribal context.

  3. Acceptance criteria

  4. "Done" is independently verifiable.

  5. Constraint architecture

  6. Musts, must-nots, preferences, escalation triggers.

  7. Decomposition

  8. Work broken into independently executable/verifiable units.

  9. Evaluation design

  10. Recurring test/eval cases with regression detection after updates.

TDD and BDD Mapping

gzkit's existing gate model directly supports Nate's framework:

  • Gate 2 (TDD) strengthens acceptance criteria and evaluation design through executable unit-test evidence.
  • Gate 4 (BDD) strengthens self-contained problem statements and intent alignment through behavior-level contract checks.
  • Together, TDD + BDD reduce "looks-right" outputs by enforcing measurable correctness at both implementation and behavior layers.

Why This Matters For GovZero

GovZero already encodes several agent-era strengths:

  • Gate model and lane doctrine
  • Human authority boundary (attestation)
  • Evidence-first closeout and audit flow
  • Canonical control surfaces (AGENTS.md, CLAUDE.md, instructions, skills)

The transcript implies the next maturity step: readiness must be measured as a runtime contract, not only described in docs.


gzkit Design Implications

  1. Make readiness executable
  2. Add and maintain gz readiness audit with deterministic PASS/FAIL semantics.

  3. Keep context surfaces synchronized

  4. Discovery index, control surfaces, runbooks, and command docs must stay coherent.

  5. Elevate specification quality

  6. OBPI/ADR templates must preserve self-contained tasks, constraints, and acceptance criteria.

  7. Institutionalize evaluation

  8. Include parity/readiness regression checks in routine quality gates.

  9. Preserve human judgment boundary

  10. Automation can score readiness, but final authority remains with human attestation.

Practical Operating Rule

For long-running agent tasks, do not rely on real-time correction. Encode context, intent, constraints, and verification criteria up front, then run deterministic checks before completion claims.

Companion Reference

For a deeper practitioner-level synthesis with Anthropic/OpenAI corroboration, see: docs/user/reference/agent-input-disciplines.md