Doc‑Driven Development — Principles

The Problem with Current AI-Coding Approaches

Software development is experiencing a fundamental shift as AI agents become capable programming partners. The industry has responded with various methodologies: spec-driven development, AI-enhanced TDD, structured prompting frameworks, and workflow optimizations. While these approaches offer incremental improvements, they share a critical limitation — they are human programming practices remixed for AI, not ground-up designs for AI capabilities.

Most current AI-coding methodologies ask: "How can we modify existing development workflows to work better with AI?" But this misses the deeper question: "If we designed software development from scratch for AI agents, what would it look like?"

The difference is significant. Human-oriented practices evolved around human cognitive limitations: working memory constraints, context-switching costs, and the difficulty of maintaining mental models across large codebases. AI agents have entirely different constraints: they excel at rapid iteration and pattern generation but struggle with consistency across sessions, maintaining context over long conversations, and distinguishing between hallucination and valid solutions.

Traditional approaches try to fit AI into human-shaped processes. They focus on better prompts, more structured inputs, and clearer specifications — essentially teaching AI to work within frameworks designed for human cognition. This is like optimizing horse carriages instead of inventing the automobile.

Document-Driven Development takes the alternative approach: redesigning the entire development process around AI capabilities and limitations. Rather than asking how to make AI better at human workflows, it asks what programming methodology would emerge if we started from first principles with AI as the primary implementer.

The Economic Shift

AI assistants have fundamentally altered the economics of software creation. Activities that once consumed significant human effort — writing code, updating documentation, refactoring existing implementations — can now be automated or substantially accelerated. This economic inversion transforms the traditional development calculus across multiple dimensions:

Code Generation: Scaffolding, boilerplate, tests, and even complex implementations can be generated in minutes rather than hours.

Documentation Maintenance: Updating specs, refreshing README files, and maintaining API documentation become automated workflow steps rather than manual overhead.

Refactoring Operations: Restructuring code that already works — traditionally a hard-to-justify business expense due to the effort-to-benefit ratio — becomes routine maintenance within the development cycle.

The result is a shifted value equation: individual artifacts become expendable, while clarity, architectural insight, and strategic decision-making become the primary sources of durable value.

Document-Driven Development emerges from this shifted landscape, reversing the traditional implementation-first flow.

Core Principles

AI as Generator, Human as Editor: The AI produces comprehensive artifacts (documentation, specifications, plans, tests, implementations) while the human focuses on simplification, risk identification, and constraint setting. This division leverages each party's strengths — AI's generative capacity and human's editorial judgment.

Disposable Artifacts, Durable Insight: All implementations, documentation, and tests are treated as expendable drafts. The lasting value lies in the clarity extracted through the development process and captured in meta-documentation. This removes psychological barriers to refactoring and experimentation.

Parsimony Over Extensibility: Prefer the simplest mechanism that solves today's problem rather than abstract frameworks designed for hypothetical future needs. This principle counters AI systems' tendency toward comprehensive, layered solutions.

System Legibility: Design for transparent, inspectable execution that both humans and AI can reason about reliably.

Two Modes of DocDD

Document-Driven Development operates in two distinct modes depending on the project phase and level of uncertainty:

Discovery Mode: For novel solutions, uncertain requirements, or exploratory work where you need to validate concepts and discover constraints. Uses the full four-document harness (SPEC/PLAN/README/LEARNINGS) with toy model discipline to systematically explore and validate approaches.

Execution Mode: For established architectures where the patterns are known and you're building on proven foundations. Uses CODE_MAP.md as the central orchestration document plus mandatory refactoring after each feature implementation or integration step.

When to Use Discovery Mode

Implementing novel algorithms or approaches
Uncertain requirements or problem definition
Exploring new technologies or frameworks
Building foundational components where the patterns aren't established
Any work that requires systematic experimentation

When to Use Execution Mode

Adding features to established codebases
Building on proven architectural patterns
Straightforward implementations without novel components
Post-MVP development where core patterns are validated
Any work where the main challenge is orchestration rather than discovery

The key insight: most development is execution work that doesn't require the heavy documentation discipline of discovery mode. But when uncertainty exists, the discovery approach prevents costly architectural mistakes through systematic exploration.

Discovery Workflow

When working on novel solutions, uncertain requirements, or exploratory work, DocDD uses a structured approach centered on the four-document harness and systematic experimentation.

Core Artifacts: The Meta-Document Harness

The four core artifacts form a harness system that guides AI agents while preserving human control:

SPEC.md — The bit: precise contract keeping the pull straight
- Purpose: Comprehensive behavioral contract for the current scope
- Must contain: Input/output formats, invariants, internal state shapes, operations, validation rules, error semantics, test scenarios, success criteria
PLAN.md — The yoke: aligns effort into test-first steps
- Purpose: Strategic roadmap using Docs → Tests → Implementation cadence
- Must contain: What to test vs. skip, order of steps, timeboxing, dependencies, risks, explicit success checkboxes per step
README.md — The map: concise orientation for integration
- Purpose: 100–200 words context refresh on library functionality
- Must contain: Header + one-liner, 2–3 sentence purpose, 3–5 essential method signatures, core concepts, gotchas/caveats, representative test path
LEARNINGS.md — The tracks: record of constraints and lessons
- Purpose: Retrospective capturing architectural insights, pivots, fragile seams, production readiness, reusable patterns
- Must contain: What held, what failed, why, and next constraints discovered

Together these artifacts let the human act as driver, ensuring the cart (implementation) moves forward under control, with clarity preserved and ambiguity eliminated.

High-Level Workflow

The Document-Driven Development cycle follows four sequential phases:

1. Documentation

Generate or update SPEC.md and PLAN.md for the current, minimal slice of scope
Keep README.md for any touched library crisp and current

2. Tests

Derive executable tests (or rubrics) directly from SPEC.md
Golden examples and negative/error-path cases are required

3. Implementation

Provide the minimal code to pass tests; keep changes tightly scoped
Prefer single-file spikes for first proofs

4. Learnings

Update LEARNINGS.md with what held, what failed, why, and next constraints

Napkin Physics (Overview)

Upstream simplification to avoid scope drift before writing specs and plans.

Idea: capture problem, assumptions, invariant, minimal mechanism, and first try.
Why: enforces parsimony; prevents new layers/nouns without deletion elsewhere.

See: Napkin Physics

Toy Models (Overview)

Toy models are small, discardable experiments to extract architectural insight.

Idea: SPEC → PLAN → Tests → Minimal Impl → LEARNINGS cycle under TDD and minimal deps.
Why: validate invariants, data shapes, and APIs early; reduce risk and rework.
Integration: build via two‑at‑a‑time merges; keep scope small and focused.

See: Toy‑Model Rationale

CLI + JSON as Debugger (Overview)

The debugger mindset makes execution legible and falsifiable for both humans and agents.

Idea: expose pure CLIs with JSON I/O and structured errors; favor deterministic pipelines.
Why: enables single‑step reasoning, bisecting, and stable golden tests.
Outcome: predictable behavior and inspectable state across the system.

See: Debugger Mindset

Repo Layout, Guardrails, Workflow (Overview)

How we structure repos and constrain work to stay simple and safe.

Layout: clear locations for docs, CLIs, tests, and schemas.
Guardrails: dependency, complexity, and error-handling constraints.
Workflow: self-audit metrics and human review gates.

See: Repo Layout, Guardrails, Workflow

Example in Practice: Case Study II: Spatial MUD Database demonstrates discovery workflow in action, showing how toy model discipline and systematic experimentation addressed complex technical challenges through four focused prototypes and multi-system integration.

Napkin Physics

The term derives from Fermi estimation and "back-of-the-envelope" calculations — rough approximations simple enough to sketch on a restaurant napkin. In software development, napkin physics applies this same principle to problem framing: upstream simplification to prevent scope drift by capturing the essential mechanism at the highest level of abstraction.

The technique draws inspiration from Einstein's principle: "Everything should be made as simple as possible, but no simpler." Rather than diving directly into implementation details, napkin physics forces problem definition at the conceptual level — as if sketching the core mechanism on a restaurant napkin.

This approach counters the natural tendency of AI systems to generate comprehensive, layered solutions. By establishing conceptual constraints upfront, the methodology guides subsequent SPEC and PLAN generation toward parsimony without losing essential complexity.

Structure

Problem: Single sentence defining what needs to be solved.

Assumptions: 3–5 bullets listing what can be taken as given.

Invariant/Contract: One precise property that must hold across all operations.

Mechanism: ≤5 bullets describing the minimal viable path (single‑file spike preferred).

First Try: Short paragraph outlining the simplest possible approach.

Constraints

Prohibitions: No frameworks, no new architectural layers, no new abstractions unless two existing ones are removed.

Scope Limitation: Focus on the essential mechanism only — defer integration, optimization, and edge cases to subsequent phases.

Application

Napkin physics serves as the foundation step before SPEC and PLAN generation. By establishing conceptual boundaries first, it prevents scope drift and over-engineering in downstream documentation.

The exercise forces identification of the core problem without implementation assumptions. This clarity propagates through the entire development cycle, maintaining focus on essential functionality rather than comprehensive feature sets.

Effectiveness

The technique leverages AI systems' sensitivity to framing. Abstract, constraint-focused prompts produce fundamentally different outputs than implementation-focused ones. The napkin physics format consistently guides AI toward minimal viable solutions rather than maximal complete ones.

Toy‑Model Rationale

Toy models are scientific experiments, not products. Their purpose is to learn, reduce risk, and sharpen architectural clarity—not to ship.

What Toy Models Are

Focused experiments: Each toy validates a single technical idea.
Cheap and discardable: Code is expendable; insight is what matters.
Architectural probes: They test assumptions, reveal edge cases, and expose integration challenges.
Learning accelerators: Fast cycles of building, testing, and documenting.

What Toy Models Are Not

Not production systems
Not comprehensive solutions
Not sacred code to preserve
Not shortcuts to “done”

The Toy Model Cycle

1. Specification (SPEC.md)

Define the experiment before you run it.

Data structures, operations, and expected behaviors
Edge cases and failure conditions
Clear success criteria

2. Planning (PLAN.md)

Lay out the steps like a recipe.

Sequence of test-first steps
Risks and dependencies
What to validate at each stage

3. Implementation

Run the experiment under strict discipline.

Write failing tests first
Add only enough code to make them pass
Capture errors clearly and specifically
Stop when the hypothesis is validated

4. Learning Extraction (LEARNINGS.md)

Distill the insight.

What worked, what failed
Patterns worth reusing
Integration implications
Strategic takeaways

Exit Criteria

All step-level success criteria checked
Insights recorded
Follow-up scope cut

Guiding Principles

Test-Driven Development is mandatory
The red-green cycle keeps experiments honest, forces clarity, and documents usage.
Error messages are for humans and AIs
Be specific, actionable, and structured. Good errors guide both debugging and future automation.
Event sourcing is your microscope
Record every operation so you can replay, inspect, and debug how state evolved.
Minimal dependencies, maximum clarity
Use proven libraries, avoid frameworks, keep the system transparent.
Export in multiple formats
JSON for state, DOT for graphs, CSV for tabular views. Make insights portable.

Strategic Guidance

Pivot early when better approaches appear; persist when the gain is marginal.
Preserve learnings even when abandoning code.
Keep APIs clean and data formats consistent across toys.
Discard code without guilt—the artifact that matters is the documentation of insight.

North Star

Toy models are gardening, not construction.
You’re cultivating understanding, not building monuments.
The point is clarity, not permanence.

Toy Integration Convention

Each toyN_* directory must contain exactly one SPEC.md, PLAN.md, and LEARNINGS.md.
If a SPEC or PLAN grows too large or unfocused, split scope into new toyN_* experiments.
Integration toys (e.g. toy5_, toy6_) exist to recombine validated sub-toys.
Replace in place: update LEARNINGS.md rather than creating multiples for the same toy.
When consolidating, fold prior learnings into a single current doc; discard stale versions.
Always bias toward minimal scope: smaller toys, fewer docs, clearer insights.

Axis Principle for Toy Models

A base toy isolates exactly one axis of complexity (a single invariant, mechanism, or seam).
An integration toy merges exactly two axes to probe their interaction.
Never exceed two axes per toy; more belongs to higher‑order integration or production scope.
This discipline keeps learnings sharp, avoids doc bloat, and mirrors controlled experiments.

Debugger Mindset

Once documentation provides structure and AI agents have clear specifications, a critical challenge remains: how can agents execute systems reliably without becoming lost in hidden state? The solution lies in adopting a debugger mindset — treating all system components as if they operate in debugger mode, with every execution step exposed in machine-readable form.

System Legibility for AI Agents

Traditional software development tolerates hidden state, implicit context, and opaque execution flows. Human developers navigate these complexities through experience and debugging tools. AI agents, however, require explicit, deterministic interfaces to maintain consistency across execution sessions.

The core principle is system legibility: making all execution state visible and falsifiable. This enables agents to:

Verify intermediate results against specifications
Reproduce exact execution sequences
Identify failure points without ambiguity
Maintain consistent behavior across sessions

CLI + JSON Architecture

The most effective substrate for AI-legible systems combines command-line interfaces with JSON data interchange:

Interface Contract:

stdin: JSON input parameters
stdout: JSON output results
stderr: Structured error JSON

Execution Rules:

Deterministic behavior: identical inputs produce identical outputs
No hidden state dependencies
Pure functions with explicit side effects
Machine-parsable error handling

Error Format:

{
  "type": "ERR_CODE",
  "message": "human-readable description",
  "hint": "actionable remediation steps"
}

Pipeline Composition

JSON-based CLIs enable UNIX-style pipeline composition that agents can inspect and validate:

moduleA < input.json > intermediate.json
moduleB < intermediate.json > result.json
moduleC --transform < result.json > output.json

Each pipeline stage produces inspectable artifacts. Agents can:

Validate intermediate results against expected schemas
Isolate failure points by examining individual stages
Reproduce partial executions for testing and debugging
Generate comprehensive execution traces

Golden Test Integration

Every module should provide a canonical golden test demonstrating expected behavior:

# Single command that validates core functionality
./module --golden-test

Golden tests serve as deterministic checkpoints that:

Establish baseline behavior before modifications
Prevent specification drift during development
Provide concrete examples of correct input/output pairs
Enable agents to verify their understanding of system behavior

Implementation Patterns

Module Structure:

Single executable per logical function
JSON schema validation for inputs/outputs
Comprehensive error handling with structured messages
Built-in golden test modes

System Design:

Prefer composition over complex monolithic tools
Minimize interdependencies between modules
Expose all configuration through explicit parameters
Maintain audit trails of execution decisions

Benefits for AI Development

The debugger mindset transforms AI-system interaction from guesswork to systematic execution:

Predictability: Agents can reason about system behavior through explicit interfaces rather than implicit behavior patterns.

Testability: Every system interaction produces verifiable artifacts that can be validated against specifications.

Debuggability: Execution traces provide clear failure attribution and remediation paths.

Reproducibility: Deterministic interfaces enable exact recreation of execution sequences for analysis and refinement.

This approach establishes a foundation where human oversight and AI execution can coexist productively, with clear boundaries and verifiable outcomes at every step.

Repository Layout, Guardrails, Workflow

Document-Driven Development requires minimal repository structure to enable parallel experimentation through toy models. Each toy is a self-contained experiment with complete meta-documentation.

DocDD is inherently flexible and modular - different projects require different flavors. A spatial database benefits from CLI+JSON debugging and strict TDD practices, while a TUI project might emphasize human user testing over JSON pipelines. This book provides foundational patterns to help you discover the right DocDD variant for your specific problem domain.

Note: If this were an RFC, most recommendations would be SHOULDs not MUSTs - adapt the patterns to fit your context rather than following them rigidly.

Toy-Based Structure

toys/
  toy1_short_name/
    SPEC.md      - Initial contract for this experiment
    PLAN.md      - Initial implementation roadmap
    SPEC_2.md    - Refined contract after first iteration
    PLAN_2.md    - Updated roadmap for next stage
    README.md    - Living orientation document (updated each stage)
    LEARNINGS.md - Accumulating insights (updated each stage)
    [implementation files as needed]
  toy2_another_name/
    [same structure]

Core Principles

Toy Independence: Each toy contains everything needed to understand and reproduce the experiment. No shared dependencies on global documentation or complex directory hierarchies.

Language Agnostic: Directory structure and conventions emerge naturally from language choice (Python, Rust, JavaScript, etc.). DocDD imposes no language-specific requirements.

Iteration Cheapness: Code can be rewritten freely since LLMs make implementation cheap. The meta-documents capture lasting insights while code remains malleable.

Staged Evolution: SPEC and PLAN documents can be versioned (SPEC_2.md, PLAN_2.md) for major iterations within a toy. README and LEARNINGS are living documents updated after each stage to accumulate insights.

Essential Constraints

Constrained Vocabulary: When working with LLMs for content generation, limit vocabulary to well-defined terms to reduce hallucination and improve consistency.

Meta-Document Discipline: The four-document pattern (SPEC, PLAN, README, LEARNINGS) provides structure without prescribing implementation details.

Clear Error Handling: Structure errors for machine parsing when building CLI tools. Avoid leaking secrets or credentials in error messages or logs.

What DocDD Doesn't Prescribe

File organization within toys (language-dependent)
Testing frameworks or strategies (project-dependent)
Code complexity metrics (emerge from practice)
Dependency management approaches (language-dependent)
Directory structures beyond the basic toy pattern

Toy to Production Evolution

The README serves as production documentation, written and updated alongside implementation. README and LEARNINGS must reflect current reality - stale documentation is not permitted for these living documents. Historical SPEC and PLAN versions can remain as archival documentation or be cleaned up according to preference.

When a toy proves valuable enough to ship, its mature meta-documents become the definitive production specs. Archive Browser demonstrates this path: the toy's evolved documentation serves as the shipped NPM package's complete specification.

The methodology's strength lies in its minimal constraints that enable focused experimentation rather than comprehensive rules that must be followed.

Execution Workflow

Most software development happens within established systems. Once you've moved past the research and experimentation phase—once the core patterns are proven and the architectural approach is validated—the heavy documentation discipline of discovery mode becomes unnecessary overhead. Execution mode addresses this common reality with a lighter, more focused approach.

Execution workflow recognizes that the primary challenge shifts from discovery to orchestration. Instead of validating uncertain approaches through systematic experimentation, you're building features within known constraints, following established patterns, and maintaining architectural coherence across a growing system.

The Economic Reality of Established Systems

When working within mature codebases, the development economics change fundamentally. The core abstractions already exist. The data patterns are established. The integration points are defined. The technology choices have been made and validated through use.

In this context, the four-document harness of discovery mode—with its emphasis on systematic experimentation and risk mitigation—becomes process overhead that slows development without adding proportional value. The uncertainty that discovery mode addresses has already been resolved through earlier work.

Execution mode emerged from recognizing this shifted context. Rather than applying discovery discipline uniformly across all development work, DocDD adapts its approach to match the actual challenges: maintaining system legibility, ensuring architectural consistency, and preventing quality degradation as the system grows.

Living Architecture Through Code Maps

The cornerstone of execution workflow is the CODE_MAP.md—a living architectural document that serves as the primary coordination mechanism between human developers, AI agents, and the evolving codebase.

Unlike traditional architectural documentation that becomes stale and misleading over time, the code map maintains currency through discipline: it's updated with every commit that changes system structure. This creates a reliable source of truth that both humans and AI can depend on for understanding how the system is organized.

The code map serves multiple audiences simultaneously. For human developers returning to a project or working in unfamiliar areas, it provides rapid orientation without requiring them to reverse-engineer system structure from implementation details. For AI agents, it supplies the architectural context necessary to make changes that fit existing patterns rather than introducing inconsistencies.

See: Code Maps

Refactoring as System Maintenance

Execution workflow treats refactoring not as an occasional cleanup activity, but as mandatory system maintenance performed after every feature implementation or integration step. This shift from optional to required reflects the economic reality that AI assistance has made refactoring dramatically less expensive.

Traditionally, refactoring was difficult to justify because the effort-to-benefit ratio was poor. Improving working code consumed significant developer time while providing unclear business value. With AI assistance, refactoring becomes routine maintenance—similar to how automatic garbage collection eliminated manual memory management as a developer concern.

The mandatory refactoring step serves multiple purposes. It cleans up integration seams between new and existing components, maintaining clean boundaries and clear interfaces. It extracts emerging patterns and eliminates duplication that accumulates during feature development. Most importantly, it ensures that new code follows established architectural patterns rather than introducing inconsistencies that compound over time.

This discipline prevents the gradual degradation that typically occurs in software systems. Instead of accumulating technical debt that eventually requires expensive remediation, the system maintains consistent quality through continuous small improvements.

See: Refactoring with AI Agents

The Integration Development Cycle

Execution workflow follows a five-step cycle designed for efficiency within established systems:

Orient by reading the CODE_MAP.md to understand current architecture and constraints. This step ensures that new work builds on existing foundations rather than working against them.

Plan with brief, focused planning that emphasizes integration points and architectural fit. Unlike discovery mode's systematic experimentation planning, execution planning assumes known approaches and focuses on execution details.

Implement the feature following established patterns and architectural guidelines. The implementation phase benefits from clear constraints and proven approaches, enabling faster development cycles.

Refactor to clean up integration seams and improve code quality. This mandatory step ensures that the system's quality improves continuously rather than degrading over time.

Update the CODE_MAP.md to reflect any structural changes introduced during implementation. This maintains the currency and reliability of the architectural documentation.

When Execution Mode Applies

Execution workflow works best when uncertainty has been resolved through previous work. The architectural approaches are proven and understood. Requirements are clear and well-defined. Technical constraints and limitations are documented and stable. Core systems and interfaces have matured through use.

These conditions indicate that the primary development challenge has shifted from discovery to execution. The system's fundamental patterns are established, and new work primarily involves extending these patterns to address additional requirements.

Execution mode acknowledges this shift explicitly. Rather than applying discovery-oriented processes to execution-oriented work, it provides lightweight discipline that maintains system quality without unnecessary overhead.

Relationship to Discovery Mode

Execution workflow doesn't replace discovery mode—it complements it. When uncertainty arises during execution work, the methodology supports switching back to discovery mode's systematic experimentation approach. This might happen when requirements reveal gaps in the established patterns, when new technologies need evaluation, or when performance constraints require architectural changes.

The key insight is recognizing which mode fits the current development challenge. Most work in established systems benefits from execution workflow's lighter approach. But when uncertainty emerges, discovery mode's more rigorous discipline becomes valuable again.

This flexibility prevents the common antipattern of applying heavy process uniformly across all development work. Instead, DocDD adapts its methodology to match the actual challenges and uncertainty levels present in different phases of system development.

Example in Practice: Case Study I: ChatGPT Export Viewer demonstrates execution workflow in action, showing how CODE_MAP.md and refactoring discipline supported the development of a shipped NPM package with clean human-AI collaboration boundaries.

Code Maps

CODE_MAP.md serves as living architectural documentation that provides structural orientation for both humans and AI agents. It's the central orchestration document in Execution Mode, updated with every commit to reflect the current system state.

Purpose and Philosophy

Code maps bridge the gap between high-level architecture and implementation details. They give both humans and AI a clear mental model of how the codebase is organized without requiring them to reverse-engineer structure from code.

For Humans: Quick orientation when returning to a project or understanding unfamiliar areas For AI Agents: Essential context for understanding existing structure before making changes For Teams: Shared understanding of system organization and component responsibilities

Structure and Contents

Architecture Overview

High-level purpose, design philosophy, and data flow patterns that define the system's approach.

Key Directories

Functional organization with clear responsibilities - what each major directory contains and why.

Component Documentation

Each major module/library documented with:

Key functions and their purposes
Primary interfaces and data shapes
How the component fits into the larger system

Integration Patterns

How components connect and depend on each other:

Data flow between major systems
Interface boundaries and contracts
Orchestration and coordination patterns

Practical Insights

Known issues and gotchas that developers encounter
Fragile areas that require careful modification
Safety patterns and common pitfalls
Performance considerations and optimization notes

Maintenance Discipline

Updated with Every Commit

The CODE_MAP must always reflect current reality. When code structure changes, the map changes too.

Focus on Structure Over Details

Capture architectural insight, not implementation specifics. The goal is orientation, not exhaustive documentation.

AI-Agent Friendly

Written to help agents understand the system quickly and make appropriate changes that fit existing patterns.

Change-Sensitive Sections

Explicitly flag areas that are fragile, experimental, or require special care when modifying.

Writing Effective Code Maps

Start with Purpose

Begin with a clear statement of what the system does and its core design philosophy.

Show Data Flow

Trace typical execution paths through the system to illustrate how components interact.

Document the Why

Explain architectural decisions and trade-offs, not just what exists.

Keep It Current

Treat the CODE_MAP as a living document that evolves with the codebase.

Be Selective

Include what helps understanding, skip what adds noise. Focus on the most important 80% of the system.

Integration with Execution Workflow

Code maps work best when integrated into the standard development cycle:

Orient: Read CODE_MAP.md before starting work
Plan: Consider how changes fit existing architecture
Implement: Build following established patterns
Refactor: Clean up integration seams
Update: Refresh CODE_MAP.md for structural changes

The code map becomes the foundation that enables confident refactoring and consistent architectural decisions across the development cycle.

Refactoring with AI Agents

Traditional refactoring advice assumes human developers who naturally write DRY, well-structured code from the start. AI agents exhibit fundamentally different coding patterns that require adapted refactoring strategies. Rather than fighting these patterns, effective AI collaboration embraces them and integrates systematic cleanup into the development workflow.

The AI Verbosity Problem

Large language models demonstrate a consistent tendency toward verbose, repetitive code generation. Even when explicitly prompted to follow DRY principles or write clean, modular code, AI agents typically produce implementations with significant duplication and unnecessarily complex structures.

This pattern emerges from how LLMs process context and generate code. They excel at pattern matching and rapid code production, but struggle with the architectural discipline that humans develop through experience. The result is functional code that works correctly but contains substantial redundancy and missed abstraction opportunities.

Attempting to force DRY principles during initial code generation creates friction and slows development without proportional benefit. AI agents often misunderstand abstraction requests, leading to over-engineered solutions or incomplete implementations that require more correction effort than systematic post-generation cleanup.

The Three-Phase Refactoring Cycle

Effective AI collaboration adopts a three-phase approach that separates code generation from code optimization:

Phase 1: Generate to Green

Allow the AI agent to write repetitive, verbose code without DRY constraints. Focus entirely on functionality and test coverage. The goal is working code that passes all tests, regardless of structural quality.

This phase leverages AI agents' natural strengths while avoiding their weaknesses. Agents excel at rapid implementation when freed from architectural constraints. The repetitive code they generate typically follows consistent patterns that become clear refactoring targets in subsequent phases.

Phase 2: Plan the Cleanup

Once tests are passing, prompt the AI agent to review its own implementation and propose a refactoring plan. This meta-cognitive step often produces better results than upfront architectural guidance because the agent can analyze actual code patterns rather than working from abstract requirements.

The refactoring plan should identify specific duplication patterns, extract common abstractions, and propose architectural improvements. The human developer reviews this plan, suggests modifications, and approves the refactoring strategy before implementation begins.

Phase 3: Execute Refactoring

Implement the approved refactoring plan while maintaining test coverage. This phase benefits from the safety net that TDD provides—comprehensive tests catch regressions introduced during restructuring operations.

The AI agent performs the mechanical refactoring work under human oversight. The human ensures that the refactoring preserves intended behavior and maintains architectural consistency with the broader system.

Why This Approach Works

The three-phase cycle addresses the fundamental mismatch between AI code generation patterns and human architectural expectations. Rather than forcing AI agents to work against their natural tendencies, it creates a workflow that maximizes their contributions while maintaining code quality.

Separation of Concerns: Code generation and optimization become distinct activities with different success criteria. Generation focuses on functionality; optimization focuses on structure.

Leveraging AI Strengths: AI agents excel at rapid implementation and mechanical refactoring operations. The workflow emphasizes these strengths while minimizing exposure to architectural decision-making where they perform poorly.

Human Oversight: Critical architectural decisions remain under human control through the plan review process. This ensures that refactoring improves rather than degrades system architecture.

Safety Through Testing: TDD provides continuous validation throughout the refactoring process. This safety net enables aggressive restructuring that would be risky without comprehensive test coverage.

Application in Different Workflow Modes

Discovery Mode Refactoring

During discovery workflow, refactoring serves architectural exploration. As toy models reveal effective patterns, aggressive refactoring extracts these patterns into reusable forms. The three-phase cycle accelerates this extraction process while maintaining the experimental velocity that discovery mode requires.

Discovery refactoring often involves more radical restructuring as understanding evolves. The AI agent's willingness to perform extensive mechanical changes becomes particularly valuable when architectural insights require significant code reorganization.

Execution Mode Refactoring

In execution workflow, refactoring maintains architectural consistency as the system grows. The three-phase cycle becomes mandatory after each feature implementation, preventing the gradual degradation that typically occurs in evolving codebases.

Execution refactoring focuses on integration seams and pattern consistency rather than architectural discovery. The AI agent identifies where new code deviates from established patterns and proposes alignment strategies.

Practical Implementation

The refactoring cycle integrates naturally into both workflow modes through consistent prompting patterns:

Generation Phase: "Implement [feature] to make tests pass. Focus on functionality over code organization."

Planning Phase: "Review the implementation and identify opportunities for reducing duplication and improving structure. Propose a specific refactoring plan."

Execution Phase: "Implement the approved refactoring plan while maintaining all test coverage."

This structured approach transforms what traditional development treats as occasional cleanup into routine system maintenance. The economic reality that AI assistance makes refactoring dramatically less expensive enables this shift from optional to mandatory practice.

Economic Impact

The three-phase cycle fundamentally changes refactoring economics. Traditional development delayed refactoring due to high manual effort costs. With AI assistance, refactoring becomes routine maintenance rather than expensive technical debt remediation.

This economic shift enables continuous quality improvement rather than gradual degradation. Systems maintain architectural integrity through incremental improvements rather than requiring periodic major restructuring efforts.

The result is codebases that improve consistently over time while maintaining rapid development velocity. The AI agent handles the mechanical aspects of refactoring while human oversight ensures architectural coherence and quality improvement.

DocDD AGENTS.md Template

This chapter provides a sample AGENTS.md you can drop into a repository to guide a coding agent in using Document‑Driven Development (DocDD). Treat it as a template: adapt roles, guardrails, and the DocDD loop to your project's constraints and goals.

# AGENTS.md

## 1. Purpose

Doc-Driven Development (DocDD) turns ambiguous problems into deterministic, legible systems through lightweight docs, disposable toy models, and incremental integrations.  


---

## 2. Core Principles

- **Docs as control surfaces** — SPEC, PLAN, LEARNINGS, README.  
- **Toys, not monuments** — throwaway code, durable insights.  
- **Parsimony** — the simplest mechanism that works today.  
- **Determinism** — same input → same output; minimize hidden state.  
- **Legibility** — JSON + simple CLIs; human + agent inspectable.  
- **Two-at-a-time integration** — never combine more than two at once.

---

## 3. The DocDD Loop

1. SPEC — define minimal contract (inputs, outputs, invariants). 
2. PLAN — outline the smallest testable step.  
3. Implementation — write only enough to satisfy the contract.  
4. LEARNINGS — capture outcomes and constraints.
5. README - publish tool/API docs for future use.


---

## 4. Napkin Physics

Quick pre-spec simplification:  

- Problem (1 sentence)  
- Assumptions (a few bullets)  
- Invariant (one crisp property)  
- Mechanism (≤5 bullets)

Rule: no frameworks, no new nouns unless two are deleted.  


---

## 5. Kickoff: The Binary-Weave

The kickoff is a sequential weave:  

- Introduce exactly one primitive (Toy A, Toy B …).  
- Integrate it with the current product (A+B=C, C+D=E …).  
- Each integration yields the new current product.  
- Continue until the final product emerges.  

End state: name the final product, summarize woven primitives, state durable invariants, discard toys, keep docs and learnings.  

Goal: compounding clarity

Anti-Goal: combinatorial drift


---

## 6. Toy Models

Small, sharply scoped, fully specced implementations designed to be discarded.  

Cycle: SPEC → PLAN → Tests → Minimal code → LEARNINGS.  

Axis discipline: a base toy isolates one axis; an integration toy merges exactly two.  


---

## 7. CLI + JSON Convention

Modules behave like debuggers:  

- stdin: JSON input  
- stdout: JSON output  
- stderr: structured error JSON  
- Purity: same input → same output  

Error JSON shape:  
    { "type": "ERR_CODE", "message": "text", "hint": "fix" }

Schema-first: document I/O schemas in SPEC.  


---

## 8. Pipelines & Golden Tests

Compose CLIs as UNIX-style pipelines with inspectable intermediates, but only when this makes sense. It's not a good fit for every project.


---

## 9. Guardrails & Heuristics

Habits to constrain complexity:  

- Default import allowlist; justify exceptions.  
- Prefer single-file spikes.  
- Two-Function Rule: parse(input)→state; apply(state,input)→state|output.  
- No new nouns unless two removed.  
- Handle top errors with structured JSON.  
- Record cost, latency, privacy notes in LEARNINGS.

---

## 10. Roles

- **Agent** — generates docs, toys, integrations; pushes forward.  
- **Human** — spotter: nudges when the agent stalls or drifts, and makes judgment calls the agent cannot.

Authoring Guides

Agent-oriented writing templates for Doc Driven Development.

These guides are designed to be copied into your repository and referenced by AI agents before authoring documents. They provide structured templates, constraints, and examples that help agents produce consistent, high-quality documentation.

How to Use These Guides

Copy to your repo: Place relevant guides in your project's docs/ or .ai/ directory
Reference in prompts: Tell your agent to "read the spec writing guide before creating SPEC.md"
Maintain consistency: Use across projects to build a library of well-structured documents

Available Guides

Spec Writing: Structure, examples, and validation to make behavior falsifiable
Plan Writing: Step templates and TDD discipline for actionable plans
Kickoff Writing: "Napkin physics" approach to project initialization
README Writing: Concise orientation docs for internal libraries
Learnings Writing: Capturing evidence, pivots, and architectural insights

Each guide includes templates, constraints, anti-patterns, and real examples to help agents author documents that integrate seamlessly with the DocDD workflow.

Spec Writing

This chapter provides agent-oriented documentation for writing SPEC.md files in DocDD projects. Drop this guide into your repository as SPEC_WRITING.md to help AI agents understand how to create precise behavioral contracts for toy models.

# SPEC_WRITING.md

## Purpose

A **SPEC.md is a contract spike**: it defines what the system must accept, produce, and guarantee.  
It exists to make implementation falsifiable — to ensure tests and validation have clear ground truth.

---

## What a SPEC.md Is / Is Not

### ❌ Not

- Implementation details (classes, functions, algorithms)
- Internal design notes (unless exposed in the contract)
- Tutorials, manuals, or user guides
- Vague aspirations ("the system should work well")

### ✅ Is

- Precise input/output formats
- Defined state transitions or invariants
- Operation semantics (commands, APIs, behaviors)
- Error and validation rules
- Concrete test scenarios and acceptance criteria

---

## Core Structure

### 1. Header
Toy Model N: System Name Specification

One-line purpose statement

### 2. Overview

- **What it does:** core purpose in 2–3 sentences
- **Key principles:** 3–5 bullets on design philosophy
- **Integration context:** if relevant, note inputs/outputs to other toys

### 3. Data Model
Define external data formats with **realistic examples**:

- All required fields shown
- Nested structures expanded
- Field purposes explained
- JSON schemas when clarity demands

### 4. Core Operations
Document commands or APIs with a consistent pattern:

- **Syntax** (formal usage)
- **Parameters** (required/optional, ranges, defaults)
- **Examples** (simple + complex)
- **Behavior** (state changes, outputs, side effects)
- **Validation** (rules, errors, edge cases)

### 5. Test Scenarios
3 categories:

1. **Simple** — minimal case
2. **Complex** — realistic usage
3. **Error** — invalid inputs, edge handling  
Optionally, **Integration** — only if toy touches another system.

### 6. Success Criteria
Checkboxes phrased as falsifiable conditions, e.g.:

- [ ] Operation X preserves invariant Y
- [ ] Error messages are structured JSON
- [ ] Round-trip import/export retains labels

---

## Quality Heuristics

High-quality SPECs are:

- **Precise** — eliminate ambiguity
- **Minimal** — only cover one axis of complexity
- **Falsifiable** — every statement testable
- **Contextual** — note integration points when they matter

Low-quality SPECs are:

- Vague ("system processes data")
- Over-prescriptive (dictating implementation)
- Bloated with internal details
- Missing testable criteria

---

## Conclusion

A SPEC.md is not a design novel.
It is a **minimal, precise contract** that locks in what must hold true, so tests and implementations can be judged unambiguously. If multiple axes of complexity emerge, split them into separate toy models.

Plan Writing

This chapter provides agent-oriented documentation for writing PLAN.md files in DocDD projects. Drop this guide into your repository as PLAN_WRITING.md to help AI agents create strategic roadmaps for toy model implementation.

# PLAN_WRITING.md

## What a PLAN.md Actually Is

A **PLAN.md is a strategic roadmap** describing **what to build and how to build it step-by-step**. It enforces clarity, sequencing, and validation.

### ❌ NOT:

- Implementation code
- Literal test code
- Copy-paste ready
- Exhaustive details

### ✅ IS:

- Stepwise development roadmap
- TDD methodology guide
- Illustrative code patterns only
- Success criteria with checkboxes

---

## Structure

### Header

- **Overview**: Goal, scope, priorities
- **Methodology**: TDD principles; what to test vs. not test

### Step Template

    ## Step N: <Feature Name> **<PRIORITY>**

    ### Goal
    Why this step matters

    ### Step N.a: Write Tests

    - Outline test strategy (no literal code)
    - Key cases: core, error, integration
    - Expected validation behavior

    ### Step N.b: Implement

    - Tasks: file/module creation, core ops, integration
    - Code patterns for illustration only
    - State and error handling guidance

    ### Success Criteria

    - [ ] Clear, testable checkpoints
    - [ ] Functional + quality standards met

---

## Key Practices

### TDD Discipline

- Write failing tests first
- Red → Green → Next
- Focus on interfaces and contracts
- Cover error paths explicitly

### Test Scope

- ✅ Test: core features, errors, integration points
- ❌ Skip: helpers, edge cases, perf, internals

### Code Patterns
Use examples as **patterns**, not literal code:

    cmdWalk(cells, direction) {
        if (!(direction in DIRECTIONS)) throw Error(`Invalid: ${direction}`);
        const [dx, dy] = DIRECTIONS[direction];
        this.cursor.x += cells * dx; this.cursor.y += cells * dy;
    }

### Tasks
Break implementation into minimal units:

    1. Create directory/files
    2. Implement core command parsing
    3. Add integration test path
    4. Error handling

### Success Criteria
Always check with concrete, objective boxes:

- [ ] Parser initializes cleanly  
- [ ] Commands mutate state correctly  
- [ ] Errors raised for invalid input  
- [ ] Test suite runs with single command  

---

## Anti-Patterns

- ❌ Full test code in Plan (use bullet outlines)
- ❌ Full implementation code (use patterns only)
- ❌ Over-detail (Plan guides, does not replace dev thinking)

---

## Why This Works

- **Clear sequencing**: prevents scope drift  
- **TDD enforcement**: quality-first mindset  
- **Concrete validation**: objective step completion  
- **Minimal guidance**: gives direction without over-specifying  

---

## Conclusion
A good PLAN.md is a **map, not the territory**. It sequences work, enforces TDD, and defines success. It avoids detail bloat while ensuring implementers know exactly **what to test, what to build, and when it's done**.

Kickoff Writing

This chapter provides agent-oriented documentation for writing KICKOFF.md files in DocDD projects. Drop this guide into your repository as KICKOFF_WRITING.md to help AI agents structure project kickoffs using napkin physics and binary-weave integration patterns.

# KICKOFF_WRITING.md

This document instructs the agent how to write a kickoff document for a new DocDD project.
The goal is to produce a single, explicit binary-weave plan — not a flat list of toys, not parallel streams.
The weave always alternates: *new primitive → integration with prior product*.  

---

## Core Shape of a Kickoff

1. **Napkin Physics**:  
   - Problem (1 sentence)  
   - Assumptions (3–5 bullets)  
   - Invariant (one crisp property that must always hold)  
   - Mechanism (≤5 bullets describing the minimal path)  

2. **Binary-Weave Plan**:  
   - Always introduce **one new primitive at a time** (Toy A, Toy B, Toy C …).  
   - Always follow by **integrating it with the prior product** (A+B=C, C+D=E, …).  
   - Each integration produces the **new “current product”**.  
   - No step introduces more than one new primitive.  
   - No integration combines more than two things.  
   - Continue until the final product emerges.  

3. **End State**:  
   - Name the final product.  
   - Summarize which primitives and integrations were woven.  
   - State the durable invariants.  
   - Clarify that only the final docs + system remain; toys are discarded but learnings are kept.  

---

## Formatting Expectations

- **Stage numbering is sequential.**  
  - *Stage 1*: Primitive A, Primitive B  
  - *Stage 2*: A + B = C  
  - *Stage 3*: Primitive D  
  - *Stage 4*: C + D = E  
  - *Stage 5*: Primitive F  
  - *Stage 6*: E + F = G  
  - …continue until final product.  

- **Each stage entry must have**:  
  - **Name** (Toy or Integration)  
  - **What it does** (one sentence)  
  - **Invariant** (instantaneous, non-blocking, etc.)  

- **Avoid parallel numbering.** Don’t list “Stage 2.3” or “Stage 2.4”.  
- **Avoid over-specification.** The kickoff is a weave map, not a spec.  
- **Avoid skipping.** Each stage should follow the weave pattern strictly.  

---

## Tone & Style

- Write plainly and compactly — scaffolding, not prose.  
- Prioritize clarity of the weave over detail of implementation.  
- Keep invariants crisp and behavioral, not vague.  
- Use ≤2 bullets per primitive/integration when possible.  

---

## One-Shot Checklist

- [ ] Napkin Physics included?  
- [ ] Sequential stages?  
- [ ] Exactly one new primitive per stage?  
- [ ] Integration always combines current product with one new primitive?  
- [ ] Final product and invariants stated at end?  

If all are checked, the kickoff is valid.

README Writing

This chapter provides agent-oriented documentation for writing README.md files in DocDD projects. Drop this guide into your repository as README_WRITING.md to help AI agents create effective context refresh documentation.

# README_WRITING.md

## Purpose

These READMEs serve as **context refresh documents** for AI assistants working with the codebase. They should quickly re-establish understanding of what each library does, how to use it, and what to watch out for.

**Target audience**: AI assistants needing to quickly understand library purpose and usage patterns  
**Length target**: 100–200 words total  
**Focus**: Dense, essential information only

---

## Required Structure

### **1. Header + One-Liner**
    # library_name
    Brief description of what it does and key technology/pattern

### **2. Purpose (2–3 sentences)**

- What core problem this solves
- Key architectural approach or design pattern
- How it fits in the broader system/integration

### **3. Key API (essential methods only)**
    # 3-5 most important methods with type hints
    primary_method(param: Type) -> ReturnType
    secondary_method(param: Type) -> ReturnType

### **4. Core Concepts (bullet list)**

- Key data structures or abstractions
- Critical constraints or assumptions  
- Integration points with other libraries
- Important design patterns

### **5. Gotchas & Caveats**

- Known limitations or scale constraints
- Common usage mistakes
- Performance considerations
- Integration pitfalls

### **6. Quick Test**
    pytest tests/test_basic.py  # or most representative test

---

## Writing Guidelines

### **Be Concise**

- Use bullet points over paragraphs
- Focus on essential information only
- Assume reader has basic programming knowledge

### **Be Specific**

- Include actual method signatures, not generic descriptions
- Mention specific constraints (e.g., "max 1000 rooms before performance degrades")
- Reference specific test files for examples

### **Be Practical**

- Lead with most commonly used methods
- Highlight integration points with other libraries
- Focus on "what you need to know to use this correctly"

### **Avoid**

- Marketing language or feature lists
- Detailed implementation explanations
- Extensive examples (link to tests instead)
- Installation instructions (assume internal development environment)

---

## Template

    # library_name
    Brief description of what it does

    ## Purpose
    2–3 sentences covering the core problem solved, architectural approach, and role in broader integration.

    ## Key API
    most_important_method(params: Type) -> ReturnType
    second_most_important(params: Type) -> ReturnType
    utility_method(params: Type) -> ReturnType

    ## Core Concepts

    - Key data structure or abstraction
    - Critical constraint or assumption
    - Integration point with other libraries
    - Important design pattern

    ## Gotchas

    - Known limitation or performance constraint
    - Common usage mistake to avoid
    - Integration pitfall with other libraries

    ## Quick Test
    pytest tests/test_representative.py

---

## Quality Check

A good library README should allow an AI assistant to:

1. **Understand purpose** in 10 seconds
2. **Know primary methods** to call
3. **Avoid common mistakes** through gotchas section
4. **Validate functionality** through quick test

If any of these takes longer than expected, the README needs to be more concise or better organized.

Learnings Writing

This chapter provides agent-oriented documentation for writing LEARNINGS.md files in DocDD projects. Drop this guide into your repository as LEARNINGS_WRITING.md to help AI agents create effective retrospective documentation that captures architectural insights and constraints.

# LEARNINGS_WRITING.md

## Purpose

A **LEARNINGS.md** is a short, dense retrospective.  
Its job: extract maximum value from an experiment by recording **what worked, what failed, what remains uncertain, and why.**

---

## What It Is / Is Not

### ❌ Not

- A feature list  
- Implementation details  
- A user manual  
- Purely positive  
- Hype or speculation without evidence  

### ✅ Is

- A record of validated insights  
- A log of failures and limitations  
- A map of open questions  
- A pointer to architectural reuse  
- A calibration tool for future experiments  

---

## Essential Sections

### Header
    # Toy Model N: System Name – Learnings
    Duration: X days | Status: Complete/Incomplete | Estimate: Y days

### Summary

- Built: 1 line  
- Worked: 1–2 key successes  
- Failed: 1–2 key failures  
- Uncertain: open question

### Evidence

- ✅ Validated: concise finding with evidence  
- ⚠️ Challenged: difficulty, workaround, lesson  
- ❌ Failed: explicit dead end  
- 🌀 Uncertain: still unresolved

### Pivots

- Original approach → New approach, why, and what remains unknown.

### Impact

- Reusable pattern or asset  
- Architectural consequence  
- Estimate calibration (time/effort vs. outcome)

---

## Style

- Keep it **short and factual**.  
- Prefer **bullet points** over prose.  
- Note **failures and unknowns** as explicitly as successes.  
- One page max — dense, parsimonious, reusable.

LEARNINGS.md is not a diary. It is a **distilled record of architectural insights** that prevent future agents from repeating failures and help them understand what constraints actually matter.

Case Study I: ChatGPT Export Viewer

This archive lists concrete, working examples referenced throughout the book. It is intentionally small and current.

Archive Browser Project

A complete DocDD example demonstrating the full development cycle from kickoff to shipped product. This real-world project produced chatgpt‑export‑viewer, a suite of composable CLI tools for browsing ChatGPT export archives.

Project outcome: A cross-platform toolkit with clean human-AI collaboration boundaries:

Human role: Product direction, UX decisions, constraint setting, edge case validation
Agent role: Implementation, refactoring, shared pattern extraction, packaging polish

Key architectural decisions:

CLI + JSON I/O for deterministic, testable composition
Keyboard-first TUI with instant responsiveness (/ search, n/N navigation)
Modular libraries: ZIP access, terminal primitives, cross-platform launchers
Publishing discipline: proper bin entries, dependency management, lint/format gates

The example demonstrates DocDD's strength in AI-first development: clear documentation boundaries enable effective human-agent collaboration while maintaining code quality and user experience standards.

Kickoff Document

Initial project definition using "napkin physics" to establish core constraints and approach.

Archive Browser Kickoff

Spec Document

Technical specification defining invariants, contracts, and behaviors for Stage 1 primitives.

Archive Browser Spec

Plan Document

Step-by-step implementation plan with TDD methodology, success criteria, and risk mitigation.

Archive Browser Plan

Code Map Document

Living architectural documentation providing structural orientation for both humans and AI agents.

Archive Browser Code Map

Notes

Keep examples practical and minimal; link them from relevant chapters.
Export formats: when useful, include small JSON/DOT/CSV snippets alongside examples.

Archive Browser Kickoff

This example demonstrates the binary-weave kickoff process for a real DocDD project. This KICKOFF.md file guided the development of the Archive Browser, a shipped NPM package for viewing ChatGPT conversation exports.

# KICKOFF.md

A clarity-first, agent-oriented development plan for building a lightning-fast TUI archive browser for ChatGPT export ZIPs.
This document is disposable scaffolding: clarity and validated integrations are the true outputs.

## Napkin Physics

- **Problem**: We need a non-laggy, lightning-fast TUI to browse ChatGPT export ZIPs containing JSON, HTML, and hundreds of image files.
- **Assumptions**:
  - Archives can grow large (thousands of entries, gigabytes in size).
  - Performance > developer ergonomics (the developer is an LLM).
  - Implementation target: Node.js v20 with `yauzl` (ZIP) + `terminal-kit` (TUI).
  - Responsiveness depends on lazy rendering, precomputed metadata, and diff-based screen updates.
- **Invariant**: Every user interaction (scrolling, searching, previewing) must feel instantaneous — no noticeable lag.
- **Mechanism**:
  1. Use `yauzl` to stream central directory and precompute metadata.
  2. Render scrollable file list with `terminal-kit`.
  3. Display metadata in side panel; update on highlight.
  4. Lazy-load previews (JSON/HTML inline, images as stubs/external).
  5. Add fuzzy search to filter entries without slowing navigation.

---

## Binary-Weave Plan

Each stage introduces at most one new primitive, then integrates it with an existing validated toy. This continues until the final product emerges.

### Stage 1 — Primitives

- **Toy A**: ZIP Reader  
  Reads central directory, outputs JSON metadata for all entries.
- **Toy B**: TUI List  
  Displays an array of strings in a scrollable, instant navigation list.

### Stage 2 — First Integration

- **C = A + B** → Archive Lister  
  Combine ZIP Reader with TUI List.  
  Behavior: display archive entries in a scrollable list.  
  Invariant: open + scroll is instant regardless of archive size.

### Stage 3 — New Primitive

- **Toy D**: Metadata Panel  
  Displays key-value metadata (size, method, CRC, etc.) in a side panel.

### Stage 4 — Second Integration

- **E = C + D** → Entry Browser  
  Combine Archive Lister with Metadata Panel.  
  Behavior: highlight an entry in list, show details in panel.  
  Invariant: panel update is instant on keypress.

### Stage 5 — New Primitive

- **Toy F**: File Previewer (text only)  
  Opens JSON/HTML entries, streams to popup.  
  Must be cancelable and non-blocking.

### Stage 6 — Third Integration

- **G = E + F** → Text Browser  
  Combine Entry Browser with File Previewer.  
  Behavior: press Enter on a JSON/HTML file to preview inline.  
  Invariant: browsing stays snappy; previews load lazily.

### Stage 7 — New Primitive

- **Toy H**: Image Stub  
  Detects image files, shows placeholder `[IMG] filename`.  
  Optional: launch via external viewer.

### Stage 8 — Fourth Integration

- **I = G + H** → Full Viewer  
  Combine Text Browser with Image Stub.  
  Behavior: one interface handles JSON, HTML, and image entries.  
  Invariant: non-text files never block UI.

### Stage 9 — New Primitive

- **Toy J**: Search/Filter  
  Provides fuzzy filename search.  
  Invariant: filtering large lists is instant.

### Stage 10 — Final Integration

- **K = I + J** → Archive Browser Product  
  Combine Full Viewer with Search/Filter.  
  Final features:
  - Scrollable list of entries
  - Metadata side panel
  - Inline preview for JSON/HTML
  - Image handling stub
  - Fuzzy search/filter  
    Invariant: every keypress (nav, search, preview) feels immediate.

---

## End State

- **Final Product**: Lightning-fast Node.js archive browser for ChatGPT exports.
- **Process**: 5 primitives (A, B, D, F, H, J) woven into 5 integrations (C, E, G, I, K).

Archive Browser Spec

This example demonstrates a complete SPEC.md file from the Archive Browser project. This specification guided development of the shipped NPM package for viewing ChatGPT conversation exports.

# SPEC.md

Archive Browser, Stage 1 (Toy A + Toy B)

Scope: Implement the first two primitives for the Archive Browser kickoff (Stage 1), without integration.
- Toy A: ZIP Metadata Reader (`zipmeta`)
- Toy B: TUI List (`tuilist`)

No extrapolation beyond existing docs: this SPEC defines minimal contracts and invariants to enable TDD for Stage 1.

## 1. Invariants

- Determinism: Same input → same output; no hidden state in outputs.
- Legibility: JSON I/O; structured error JSON on stderr.
- Parsimony: Avoid unnecessary work; stream and avoid file content reads for Toy A.
- Responsiveness (Toy B): Rendering a large list (≥10k items) does not visibly lag on navigation.

---

## 2. Toy A — ZIP Metadata Reader (`zipmeta`)

### Purpose
Read a ZIP file’s central directory and emit a JSON array of entries (metadata only). Do not read file contents.

### Input (stdin JSON)

    {
      "zip_path": "./path/to/archive.zip"
    }

### Output (stdout JSON)
Array of entry objects, ordered by central directory order.

    [
      {
        "name": "conversations/2024-09-30.json",
        "compressed_size": 12345,
        "uncompressed_size": 67890,
        "method": "deflate",
        "crc32": "89abcd12",
        "last_modified": "2024-09-30T12:34:56Z",
        "is_directory": false
      }
    ]

Notes:
- `method` is a human label derived from the ZIP method code.
- `last_modified` is normalized UTC ISO8601 if available; otherwise omit.

### Errors (stderr JSON)
On failure, emit a single JSON object to stderr; no stdout payload.

    { "type": "ERR_ZIP_OPEN", "message": "cannot open zip", "hint": "check path and permissions" }
Other representative errors:
- `ERR_ZIP_NOT_FOUND` — path does not exist
- `ERR_ZIP_INVALID` — invalid/corrupt ZIP central directory

---

## 3. Toy B — TUI List (`tuilist`)

### Purpose
Render a scrollable list of strings with instant navigation. Stage 1 validates rendering performance and interaction loop structure; selection semantics may be finalized during integration.

### Input / Output
- Input: JSON array of strings on stdin.
- Output: For Stage 1, stdout may be empty or a minimal confirmation object; interaction is the focus. Errors use structured JSON on stderr if startup fails.

Example input (stdin):

    ["a.json", "b.json", "c.html"]

Example minimal output (stdout):

    { "ok": true }

### Errors (stderr JSON)

    { "type": "ERR_TUI_INIT", "message": "terminal init failed", "hint": "verify terminal supports required features" }

---

## 4. Operations

- Toy A `zipmeta`:
  - Open ZIP, iterate central directory entries, map metadata to the output schema.
  - Never read file contents; stream and collect metadata only.
  - Normalize fields (method label, crc32 hex, optional last_modified).
- Toy B `tuilist`:
  - Initialize terminal UI, render list from stdin array.
  - Provide non-blocking navigation; ensure smooth, low-latency scroll.

---

## 5. Validation Rules

- Toy A produces identical JSON given the same `zip_path`.
- Toy A handles non-existent/invalid ZIP paths with structured errors.
- Toy B starts without throwing; renders lists up to ≥10k items without visible lag.
- Toy B exits cleanly on user quit (e.g., Esc/Ctrl-C), leaving terminal in a good state.

---

## 6. Test Scenarios (Golden + Error Cases)

- Toy A Golden: small known ZIP → stable JSON array (order and fields match).
- Toy A Large: large ZIP central directory streams without memory blow-up; completes.
- Toy A Errors: missing path; corrupt file → structured error.
- Toy B Golden: feed 100 sample items → UI initializes and returns `{ "ok": true }` on immediate quit.
- Toy B Stress: feed ≥10k items → navigation is smooth; startup succeeds.

---

## 7. Success Criteria

- [ ] `zipmeta` emits correct JSON metadata without reading file contents.
- [ ] `zipmeta` error paths return structured JSON on stderr only.
- [ ] `tuilist` initializes, renders, and exits cleanly.
- [ ] `tuilist` remains responsive with large lists (subjective but observable).
- [ ] Both tools follow CLI + JSON purity (no hidden state in outputs; logs allowed).

*** End of Stage 1 SPEC ***

Archive Browser Plan

This example demonstrates a complete PLAN.md file from the Archive Browser project. This strategic roadmap guided TDD implementation of the shipped NPM package for viewing ChatGPT conversation exports.

# PLAN.md

Archive Browser, Stage 1 (Toy A + Toy B)

Overview: Build and validate the two Stage 1 primitives from KICKOFF — Toy A (`zipmeta`) and Toy B (`tuilist`) — using TDD. Keep changes minimal and test-first. Use A/B structure per Plan Writing guide.

Methodology: TDD discipline; tests-first; Red → Green → Next; explicit success criteria; focus on interfaces and contracts.

## Step 1: Toy A — ZIP Metadata Reader (HIGH)

### Step 1.a: Write Tests
- Golden: Small known ZIP → stable JSON array (fields: name, sizes, method label, crc32 hex, optional last_modified, is_directory). Order matches central directory.
- Large: Big ZIP central directory streams without memory blow-up; completes within acceptable time (manual check).
- Errors: Non-existent path → `ERR_ZIP_NOT_FOUND`; corrupt ZIP → `ERR_ZIP_INVALID`; open failure → `ERR_ZIP_OPEN`. No stdout on error.
- Determinism: Same input JSON (`{ zip_path }`) yields identical output JSON.

### Step 1.b: Implement
- Use `yauzl` to open and iterate central directory; do not read file contents.
- Map method code → human label; hex-encode crc32 to 8-char lowercase; include `is_directory`.
- Normalize optional `last_modified` to UTC ISO8601 if available (omit if unknown).
- Emit JSON array to stdout; on error, emit structured error JSON to stderr only.

### Success Criteria
- [ ] Golden output matches expected JSON exactly.
- [ ] Error cases emit structured JSON on stderr and no stdout payload.
- [ ] Determinism verified for repeated runs.
- [ ] No file content reads; central directory only.

---

## Step 2: Toy B — TUI List (HIGH)

### Step 2.a: Write Tests
- Startup: Given JSON array of ~100 strings on stdin, TUI initializes and exits cleanly on immediate quit; stdout may emit `{ "ok": true }`.
- Stress: Given ≥10k items, navigation remains smooth (manual check); startup is non-blocking.
- Error: TUI init failure emits `ERR_TUI_INIT` to stderr; no stdout payload.

### Step 2.b: Implement
- Use `terminal-kit` to render a scrollable list from stdin strings.
- Ensure non-blocking input handling; leave terminal state clean on exit.
- Keep Stage 1 output minimal (e.g., `{ "ok": true }`); finalize selection semantics post-integration.

### Success Criteria
- [ ] TUI starts, renders, and exits cleanly on quit.
- [ ] Stress navigation does not visibly lag (subjective but observable).
- [ ] Structured errors on init failure.

---

## Out of Scope (Stage 1)
- Integration (Stage 2) — combining A + B
- Metadata panel, previewer, image stub, search/filter (later stages)

---

## Risks & Mitigations
- Large ZIPs: stream central directory and avoid content reads to preserve memory.
- TUI responsiveness: keep drawing minimal; avoid synchronous blocking operations.
- Terminal variance: handle init errors gracefully; restore terminal state on exit.

---

## Completion Check
- [ ] Step 1 success criteria all pass
- [ ] Step 2 success criteria all pass
- [ ] SPEC and PLAN reflect actual behavior

*** End of Stage 1 PLAN ***

Archive Browser Code Map

This example demonstrates a complete CODE_MAP.md file from the Archive Browser project. This architectural documentation provided ongoing orientation for both human developers and AI agents throughout development of the shipped NPM package.

# CODE_MAP.md

This document orients you to the project structure, what each file does, and how the pieces fit together at a high level.

## Architecture Overview

- Purpose: Terminal-based tools to explore ChatGPT export ZIPs quickly with keyboard-centric UX.
- Design: Small composable CLIs built on a thin library layer.
  - UI: `terminal-kit` powers list menus, panes, and key handling.
  - ZIP I/O: `yauzl` streams metadata and file contents without full extraction.
  - GPT utils: Helpers reduce OpenAI export mappings into readable message sequences.
  - OS helpers: Minimal macOS integration for "open externally" convenience.
- Data Flow (typical):
  1. CLI parses args or JSON from stdin (`lib/io.js`).
  2. ZIP operations query or stream entries (`lib/zip.js`).
  3. TUI renders lists/panels and handles keys (`lib/terminal.js`).
  4. Optional: spawn specialized viewers (JSON tree, GPT browser) or export to files.

## Key Directories

- `cli/`: Executable entry points for each tool (users run these via npm scripts).
- `lib/`: Reusable helpers for ZIP I/O, GPT export reduction, terminal UI, and small OS shims.
- `backup1/`: Example data from a ChatGPT export (for local dev/testing).
- `zips/`: Sample export ZIPs used by the tools.
- `artifacts/`: Logs/JSON artifacts from runs.

## Libraries (`lib/`)

- `lib/zip.js`
  - Thin wrappers around `yauzl` for reading ZIPs:
    - `listNames(zipPath)`: stream all entry names.
    - `readMetadata(zipPath)`: emit metadata for each entry (name, sizes, method, crc32, directory flag, last_modified).
    - `readEntryText(zipPath, entryName)`: read a specific entry as UTF-8 text.
    - `extractEntry(zipPath, entryName, destPath)`: extract one entry to disk.
    - `extractToTemp(zipPath, entryName, prefix)`: extract a single entry into a unique temp dir; returns `{ dir, file }`.
    - `cleanupTemp(path)`: best-effort recursive removal.

- `lib/gpt.js`
  - Utilities to transform ChatGPT export structures:
    - `extractTextFromContent(content)`: normalize message content to text (handles common shapes, multimodal parts).
    - `extractAuthor(message)`: infer `user`/`assistant`/fallback from message author fields.
    - `buildMainPathIds(mapping, currentNodeId)`: follow parent pointers to root.
    - `autoDetectLeafId(mapping)`: choose a reasonable leaf when `current_node` is absent.
    - `reduceMappingToMessages(mapping, { currentNodeId, includeRoles })`: produce a minimal `[{ author, text }]` sequence along the main path.
    - `buildPlainTextTranscript(messages)`: render a readable transcript string.
    - `exportConversationPlain(title, messages)`: write a plain-text transcript to `exports/` and return the file path.

- `lib/open_external.js`
  - `openExternal(path)`: cross‑platform opener. macOS: `open`; Windows: `cmd /c start`; Linux/Unix: try `xdg-open`, `gio open`, `gnome-open`, `kde-open`, `wslview`.

- `lib/terminal.js`
  - Terminal helpers built on `terminal-kit`:
    - Exported `term` instance plus utilities: `paneWidth`, `status`, `statusKeys`, `statusSearch`, `printHighlighted`, `drawMetaPanel`.
    - Cursor/cleanup: `ensureCursorOnExit()`, `restoreCursor()`, `terminalEnter()`, `terminalLeave()`.
    - Menus: `withMenu()` (callback-based singleColumnMenu), `listMenu()` (fixed viewport, keyboard-driven; supports highlight query), `listMenuWrapped()` (wrapped multi-line items), `makeListSearch()` (reusable "/" + n/N search wiring for lists), `wrapLines()` (text wrapper).
    - Note: `cursorToggle()` wraps terminal-kit’s `hideCursor()`, which actually toggles cursor visibility (misnamed upstream). Our name reflects the real behavior.

- `lib/io.js`
  - I/O utilities shared by CLIs:
    - `emitError(type, message, hint)`: structured JSON errors to stderr.
    - `readStdin()`: read entire stdin as UTF-8.
    - `resolvePathFromArgOrStdin({ key })`: accept arg or JSON stdin for paths.
    - `jsonParseSafe(text)`: tolerant JSON parse.
    - `safeFilename(title)`: sanitize to a portable file name base.
    - `ensureDir(dirPath)`, `writeFileUnique(dir, base, ext, content)`: mkdir -p + non-clobbering writes.

- `lib/viewers.js`
- `showJsonTreeFile(filePath)`: spawn the JSON tree viewer for a file.
- `showJsonTreeFromObject(obj, opts)`: write a temp JSON and spawn the viewer; cleans up temp dir.

## CLIs (`cli/`)

- `cli/zipmeta.js`
  - Emits ZIP metadata as JSON. Accepts path via argv or stdin JSON (`{"zip_path":"..."}`) and writes an array of entry objects to stdout.

- `cli/listzip.js`
  - Scrollable list of ZIP entry names using `terminal-kit`'s single-column menu. Pure navigation; Enter exits.

- `cli/tuilist.js`
  - Scrollable list sourced from stdin JSON array of strings. Useful as a standalone picker component.

- `cli/jsontree.js`
  - Inline JSON tree viewer with expand/collapse, arrow/j/k/h/l movement, page/top/bottom, and a compact status line.
  - Accepts a file path arg or JSON via stdin.

- `cli/browsezip.js`
  - Two-pane browser for a ZIP:
    - Left: instant-scroll list of entries (keyboard-driven viewport).
    - Right: live metadata panel (`drawMetaPanel`).
    - Enter: inline JSON tree preview for `*.json` (extracts to temp, spawns JSON tree via `lib/viewers.js`).
    - `o`: open highlighted entry externally (extracts to temp, calls macOS `open`).
    - `v`: if `conversations.json` at ZIP root, launches `cli/gptbrowser.js` for a specialized view.

- `cli/mapping_reduce.js`
  - Utility to convert a `mapping` (or an item from `conversations.json`) into a minimal message sequence JSON.

- `cli/gptbrowser.js`
  - Specialized viewer for ChatGPT export ZIPs:
    - Reads `conversations.json` from the ZIP, shows a searchable conversation list with live match highlighting.
    - Opens a conversation into a large scrollable text view with role-colored lines, next/prev message jumps, and in-text search (`/`, `n/N`).
    - Export: writes plain-text transcripts to `exports/<title>.txt` (non-clobbering).

## Root and Supporting Files

- `package.json`
  - Declares `type: module`, `bin` entries for each CLI, and npm scripts for local runs.
  - Dependencies: `terminal-kit`, `yauzl`, `string-kit`.

- `README.md`
  - User-facing overview, requirements, usage examples, and key bindings.

- `KICKOFF.md`
  - Project kickoff notes (context and initial direction).

- `split_chapter.sh`
  - Shell script utility (repo-local helper; not used by the CLIs).

- `readme.html`
  - HTML export of the README for viewing in a browser.

- Data/example folders:
  - `backup1/`: Sample JSON files (e.g., `conversations.json`) for local experiments.
  - `zips/`: Example ChatGPT export ZIPs.
  - `artifacts/`: Run logs and metadata captures.

## How Things Work Together

- ZIP Browsing Path
  - `cli/browsezip.js` → `lib/io.js` (args) → `lib/zip.js` (metadata) → `lib/terminal.js` (list + panel) →
    - Enter: extract + spawn `cli/jsontree.js` for JSON.
    - `o`: extract + `lib/open_external.js` to open externally.
    - `v`: spawn `cli/gptbrowser.js` for GPT-specialized browsing.

- GPT Browsing Path
  - `cli/gptbrowser.js` → `lib/zip.js.readEntryText('conversations.json')` → `lib/gpt.js.reduceMappingToMessages()` →
    UI render via `lib/terminal.js` (search, highlight, navigation) →
    Optional export via `lib/io.js.writeFileUnique()`.

- Safety and UX
  - Cursor visibility and TTY cleanup are handled centrally by `ensureCursorOnExit()`.
  - CLIs print structured JSON errors to stderr for deterministic automation.
  - Temporary files are isolated and cleaned when possible; external open uses a temp cache on macOS.

## Notes / Known Issues

- `cli/gptbrowser.js` references `statusSearch` in its message view but must import it from `lib/terminal.js` to avoid a `ReferenceError`.
- Cursor control: terminal-kit's `hideCursor()` is a toggle; we expose it as `cursorToggle()` to make that explicit. Cursor is restored on exit by `ensureCursorOnExit()`.
- Inline preview in `cli/browsezip.js` is JSON-only; consider extending to small text types (`.txt`, `.md`) or surfacing a status hint when Enter does nothing.

Case Study II: Spatial MUD Database

Multi-Scale Spatial Architecture for MUDs

This case study documents the application of DocDD to a spatial reasoning system for text-based virtual worlds. The project involved multi-scale spatial coordination, AI-guided world generation, and algorithmic spatial reasoning.

The work demonstrates how toy model discipline can address complex technical challenges through systematic experimentation and integration.

Foundation: Four Validated Spatial Prototypes

The project began with four successful toy models, each validating a specific aspect of multi-scale spatial architecture:

Toy 1 (Blueprint Parser): Text-based spatial planning interface that isolates coordinate abstraction challenges.

Toy 2 (Spatial Graph Operations): Room manipulation system that validates graph-based approaches to spatial relationships.

Toy 3 (Scout System): AI-driven content generation that explores LLM integration for procedural world building.

Toy 4 (Indoor/Outdoor Glue): Scale-bridging system that addresses multi-level spatial coordination challenges.

Result: All four systems validated their core concepts with test coverage and error handling. This provided an experimental foundation spanning room-level detail to world-scale geography.

Integration Challenge: Toy5 (Outdoor Integration)

With four validated individual systems, the project addressed integrating the Scout system (Toy 3) with the Indoor/Outdoor Glue (Toy 4) to enable LLM-guided hierarchical world subdivision. This integration required:

Semantic spatial reasoning: LLMs understanding and maintaining geographic consistency
Bidirectional data flow: Scout observations driving quadtree subdivision decisions
Format compatibility: Ensuring clean data exchange between systems designed independently
Emergent spatial logic: Geographic constraints creating self-reinforcing consistency

This required moving from validated individual components to a system involving AI collaboration and spatial reasoning.

Technical Breakthroughs: Three Critical Experiments

The integration was addressed through three experiments that isolated specific technical risks:

Experiment 5a: Hierarchical Subdivision

Challenge: LLM-guided semantic splitting of world quadrants based on geographic observations.

Breakthrough: LLMs exhibit natural spatial reasoning patterns that systems should accommodate rather than constrain. Working with AI cognitive tendencies proved more effective than forcing predetermined formats.

Critical Discovery: Configuration consistency throughout complex system integration points became essential for reliable behavior. Silent failures from mismatched settings highlighted the importance of explicit validation at every boundary.

Experiment 5b: Geographic Constraints

Challenge: Maintaining spatial consistency when LLMs generate procedural content.

Outcome: Strategic pivot away from complex constraint systems toward simpler, more reliable approaches.

Learning: For systematic world-building, LLM creativity becomes a liability rather than an asset. Predictability and constraint adherence matter more than narrative richness.

Experiment 5c: Scout Path Iteration

Challenge: Bidirectional spatial consistency - scout observations creating geographic constraints that guide future observations.

Breakthrough: Bidirectional information flow creates emergent consistency. When system outputs become inputs for subsequent operations, careful design can achieve self-reinforcing reliability rather than accumulated drift.

Technical Discovery: Constrained AI creativity often produces more reliable results than enhanced creativity. Prompt engineering with explicit constraints and deterministic settings proved essential for reliable spatial reasoning.

Architecture Validation

The experiments validated multi-scale system coordination patterns. Each scale handled different aspects of the problem domain while maintaining clean integration boundaries. The toy model approach allowed isolated validation of individual scales before attempting integration, significantly reducing the complexity of debugging multi-system interactions.

Methodology Insights

Constraint Over Creativity: Effective AI collaboration required constraining rather than enhancing LLM creativity. This involved prompt engineering to establish clear boundaries and consistent behavior patterns.

Integration-First Testing: The most dangerous bugs occurred at system boundaries - format compatibility issues that appeared to work individually but failed silently in integration. Comprehensive data flow validation became the highest-priority testing strategy.

Adaptive Development Cycles: Natural spatial subdivision required flexible progression (3 reports instead of planned 2), demonstrating that rigid iteration counts don't match organic spatial reasoning patterns.

Experimental Impact

The spatial architecture demonstrates DocDD's effectiveness for complex technical challenges:

Proof of Concept: Demonstrates that DocDD's toy model discipline scales to genuinely complex technical challenges involving AI collaboration and multi-system integration.

Methodology Validation: Shows how systematic experimentation through focused toys can tackle problems that would be overwhelming as monolithic projects.

Process Insights: Reveals patterns for AI collaboration, integration testing priorities, and iterative refinement that apply beyond the specific domain.

Next Phase: Integration with Evennia MUD framework to validate DocDD's effectiveness for legacy system integration.

Complete technical details in /docs/legacy/INITIAL_LEARNINGS.md and /docs/legacy/EXPERIMENT_LEARNINGS.md

FAQ: Doc-Driven Development

What problem is DocDD actually solving?

It solves the chaos of AI-assisted coding without context. DocDD adapts to different development phases: Discovery Mode for uncertain work uses systematic experimentation, while Execution Mode for established systems uses lightweight documentation and refactoring discipline. Both keep you in control.

Is this just another spec-first approach with fancy terms?

Nope. The key difference is: you don't write the docs — the AI does. In Discovery Mode, the AI generates comprehensive docs (specs, plans, tests, code) and you review/edit. In Execution Mode, the AI maintains CODE_MAP.md and handles refactoring. Either way, you stay in the editor role.

Why not just use a PRD and vibe code from there?

PRDs describe what to build, but not how. Without technical scaffolding, the AI will guess — and its guesses change session to session. DocDD provides that scaffolding through different approaches depending on whether you're discovering new patterns or executing within established ones.

What's the difference between a toy model and a prototype?

Toy models are intentionally tiny and throwaway — built to learn, not to ship. They help validate structure or assumptions early. Prototypes often turn into half-baked production code. Toy models are lab experiments.

What does "one axis of complexity" mean?

It means keeping every step simple: build a new primitive, combine two things, or add one thing to an existing system. Nothing more. This keeps both you and the AI from getting overwhelmed.

Why JSON and CLI? Why not a full framework or GUI?

Because JSON + CLI = total visibility. You can inspect the whole state, write golden tests, and keep everything small and composable. Frameworks tend to hide structure — this makes it explicit.

Do I need to be an AI whisperer to use this?

Nope. You just need to get into the habit of asking the AI to explain itself — through Discovery Mode's four-document harness or Execution Mode's CODE_MAP.md and refactoring discipline. The system helps you keep it aligned.

Is this for solo devs or teams?

Works great for solo builders using AI as a partner. But the doc artifacts also make async teamwork smoother — people can ramp into the context just by reading the SPEC/PLAN.

What if I already wrote code — can I still apply this retroactively?

Yep. Just ask the AI to generate docs based on the existing codebase. Use that to lock in structure, then resume with the DocDD flow going forward. Think of it as reverse-engineering clarity.

Why is this worth the upfront structure? Doesn't it slow me down?

It feels slower on Day 1, but you gain huge speed by Day 5. You'll spend less time debugging, rewriting, and explaining stuff to the AI — because now it's building from a shared understanding.

How do I know when to use Discovery Mode vs. Execution Mode?

If you're unsure about the approach, requirements, or architecture - use Discovery Mode. If you're adding features to an established codebase with proven patterns - use Execution Mode. Most development is actually Execution work, so when in doubt, try Execution first and switch to Discovery if you hit uncertainty.

What do I do when the AI generates repetitive, verbose code despite asking for DRY principles?

Don't fight it upfront — let the AI write repetitive code until tests pass, then use the three-phase refactoring cycle: Generate to Green → Plan the Cleanup → Execute Refactoring. This works with AI tendencies rather than against them.

How often should I update my CODE_MAP.md? Every commit seems excessive.

Every commit that changes the code map is correct — it would be excessive for human engineers, but it's optimal for LLM agents. If something in the code changes that requires the code map to change, then it needs to be updated as part of that commit. AI agents need current architectural context to make good decisions, and the economic shift makes constant updates feasible.

Keyboard shortcuts

Doc Driven Development