Add Plan Mode to replace TODO system

Plan Mode is a cognitive forcing system that requires reasoning about: - Happy path - Negative case - Boundary condition New tools: - plan_read: Read current plan for session - plan_write: Create/update plan with YAML content (validates structure) - plan_approve: Mark current revision as approved New command: - /feature <description>: Start Plan Mode for a new feature Plan schema requires: - plan_id, revision, approved_revision - items with id, description, state, touches, checks (happy/negative/boundary) - evidence and notes required when marking items done Verification: - plan_verify() called automatically when all items are done/blocked Removed: - todo_read, todo_write tools - todo.rs module and related tests
2026-02-02 14:38:25 +11:00
parent 7fc9eb0778
commit a63950d8f5
12 changed files with 997 additions and 942 deletions
--- a/crates/g3-core/src/prompts.rs
+++ b/crates/g3-core/src/prompts.rs
@@ -18,70 +18,92 @@ IMPORTANT: You must call tools to achieve goals. When you receive a request:
 For shell commands: Use the shell tool with the exact command needed. Always use `rg` (ripgrep) instead of `grep` - it's faster, has better defaults, and respects .gitignore. Avoid commands that produce a large amount of output, and consider piping those outputs to files. Example: If asked to list files, immediately call the shell tool with command parameter \"ls\".
 If you create temporary files for verification, place these in a subdir named 'tmp'. Do NOT pollute the current dir.";

-const SHARED_TODO_SECTION: &str = "\
-# Task Management with TODO Tools
+const SHARED_PLAN_SECTION: &str = "\
+# Task Management with Plan Mode

-**REQUIRED for multi-step tasks.** Use TODO tools when your task involves ANY of:
+**REQUIRED for multi-step tasks.** Use Plan Mode when your task involves ANY of:
 - Multiple files to create/modify (2+)
 - Multiple distinct steps (3+)
 - Dependencies between steps
 - Testing or verification needed
 - Uncertainty about approach

+Plan Mode is a cognitive forcing system that prevents:
+- Attention collapse
+- False claims of completeness
+- Happy-path-only implementations
+- Duplication/contradiction with existing code
+
 ## Workflow

-Every multi-step task follows this pattern:
-1. **Start**: Call todo_read, then todo_write to create your plan
-2. **During**: Execute steps, then todo_read and todo_write to mark progress
-3. **End**: Call todo_read to verify all items complete
-4. **Finally**, call `remember` to save info on new features created or discovered
+1. **Draft**: Call `plan_read` to check for existing plan, then `plan_write` to create/update
+2. **Approval**: Ask user to approve before coding (\"'approve', or edit plan?\")
+3. **Execute**: Implement items, updating plan with `plan_write` to mark progress
+4. **Complete**: When all items are done/blocked, verification runs automatically
+5. **Remember**: Call `remember` to save discovered code locations

-Note: todo_write replaces the entire todo.g3.md file, so always read first to preserve content. TODO lists are scoped to the current session and stored in the session directory.
+## Plan Schema

-## Examples
+Each plan item MUST have:
+- `id`: Stable identifier (e.g., \"I1\", \"I2\")
+- `description`: What will be done
+- `state`: todo | doing | done | blocked
+- `touches`: Paths/modules this affects (forces \"where does this live?\")
+- `checks`: Three required perspectives:
+  - `happy`: {desc, target} - Normal successful operation
+  - `negative`: {desc, target} - Error handling, invalid input
+  - `boundary`: {desc, target} - Edge cases, limits
+- `evidence`: (required when done) File:line refs, test names
+- `notes`: (required when done) Short implementation explanation

-**Example 1: Feature Implementation**
-User asks: \"Add user authentication with tests\"
+## Rules

-First action:
-{\"tool\": \"todo_read\", \"args\": {}}
+When drafting a plan, you MUST:
+- Keep items ≤ 7 by default
+- Commit to where the work will live (touches)
+- Provide all three checks (happy, negative, boundary)

-Then create plan:
-{\"tool\": \"todo_write\", \"args\": {\"content\": \"- [ ] Add user authentication\\n  - [ ] Create User struct\\n  - [ ] Add login endpoint\\n  - [ ] Add password hashing\\n  - [ ] Write unit tests\\n  - [ ] Write integration tests\"}}
+When updating a plan:
+- Cannot remove items from an approved plan (mark as blocked instead)
+- Must provide evidence and notes when marking item as done

-After completing User struct:
-{\"tool\": \"todo_read\", \"args\": {}}
-{\"tool\": \"todo_write\", \"args\": {\"content\": \"- [ ] Add user authentication\\n  - [x] Create User struct\\n  - [ ] Add login endpoint\\n  - [ ] Add password hashing\\n  - [ ] Write unit tests\\n  - [ ] Write integration tests\"}}
+## Example Plan Item

-**Example 2: Bug Fix**
-User asks: \"Fix the memory leak in cache module\"
+```yaml
+- id: I1
+  description: \"Add CSV import for comic book metadata\"
+  state: todo
+  touches: [\"src/import\", \"src/library\"]
+  checks:
+    happy:
+      desc: \"Valid CSV imports 3 comics\"
+      target: \"import::csv\"
+    negative:
+      desc: \"Missing column errors with MissingColumn\"
+      target: \"import::csv\"
+    boundary:
+      desc: \"Empty file yields empty import without error\"
+      target: \"import::csv\"
+```

-{\"tool\": \"todo_read\", \"args\": {}}
-{\"tool\": \"todo_write\", \"args\": {\"content\": \"- [ ] Fix memory leak\\n  - [ ] Review cache.rs\\n  - [ ] Check for unclosed resources\\n  - [ ] Add drop implementation\\n  - [ ] Write test to verify fix\"}}
-
-**Example 3: Refactoring**
-User asks: \"Refactor database layer to use async/await\"
-
-{\"tool\": \"todo_read\", \"args\": {}}
-{\"tool\": \"todo_write\", \"args\": {\"content\": \"- [ ] Refactor to async\\n  - [ ] Update function signatures\\n  - [ ] Replace blocking calls\\n  - [ ] Update all callers\\n  - [ ] Update tests\"}}
-
-## Format
-
-Use markdown checkboxes:
- \"- [ ]\" for incomplete tasks
- \"- [x]\" for completed tasks
- Indent with 2 spaces for subtasks
-
-Keep items short, specific, and action-oriented.
+When done, add evidence and notes:
+```yaml
+  state: done
+  evidence:
+    - \"src/import/csv.rs:42-118\"
+    - \"tests/import_csv.rs::test_valid_csv\"
+  notes: \"Extended existing parser instead of creating duplicate\"
+```

 ## Benefits

 ✓ Prevents missed steps
 ✓ Makes progress visible
 ✓ Helps recover from interruptions
-✓ Creates better summaries
+✓ Forces consideration of edge cases
+✓ Provides audit trail with evidence

-If you can complete it with 1-2 tool calls, skip TODO.";
+If you can complete it with 1-2 tool calls, skip Plan Mode.";

 const SHARED_TEMPORARY_FILES: &str = "\
 # Temporary files
@@ -153,7 +175,7 @@ Do NOT save duplicates - check the Workspace Memory section (loaded at startup)

 After discovering how session continuation works:

-{\"tool\": \"remember\", \"args\": {\"notes\": \"### Session Continuation\\nSave/restore session state across g3 invocations using symlink-based approach.\\n\\n- `crates/g3-core/src/session_continuation.rs`\\n  - `SessionContinuation` [850..2100] - artifact struct with session state, TODO snapshot, context %\\n  - `save_continuation()` [5765..7200] - saves to `.g3/sessions/<id>/latest.json`, updates symlink\\n  - `load_continuation()` [7250..8900] - follows `.g3/session` symlink to restore\\n  - `find_incomplete_agent_session()` [10500..13200] - finds sessions with incomplete TODOs for agent resume\"}}
+{\"tool\": \"remember\", \"args\": {\"notes\": \"### Session Continuation\\nSave/restore session state across g3 invocations using symlink-based approach.\\n\\n- `crates/g3-core/src/session_continuation.rs`\\n  - `SessionContinuation` [850..2100] - artifact struct with session state, plan snapshot, context %\\n  - `save_continuation()` [5765..7200] - saves to `.g3/sessions/<id>/latest.json`, updates symlink\\n  - `load_continuation()` [7250..8900] - follows `.g3/session` symlink to restore\\n  - `find_incomplete_agent_session()` [10500..13200] - finds sessions with incomplete plans for agent resume\"}}

 After discovering a useful pattern:

@@ -213,13 +235,17 @@ Short description for providers without native calling specs:
  - Format: {\"tool\": \"str_replace\", \"args\": {\"file_path\": \"path/to/file\", \"diff\": \"--- old\\n-old text\\n+++ new\\n+new text\"}}
  - Example: {\"tool\": \"str_replace\", \"args\": {\"file_path\": \"src/main.rs\", \"diff\": \"--- old\\n-old_code();\\n+++ new\\n+new_code();\"}}

- **todo_read**: Read the current session's TODO list from todo.g3.md (session-scoped)
-  - Format: {\"tool\": \"todo_read\", \"args\": {}}
-  - Example: {\"tool\": \"todo_read\", \"args\": {}}
+- **plan_read**: Read the current Plan for this session
+  - Format: {\"tool\": \"plan_read\", \"args\": {}}
+  - Example: {\"tool\": \"plan_read\", \"args\": {}}

- **todo_write**: Write or overwrite the session's todo.g3.md file (WARNING: overwrites completely, always read first)
-  - Format: {\"tool\": \"todo_write\", \"args\": {\"content\": \"- [ ] Task 1\\n- [ ] Task 2\"}}
-  - Example: {\"tool\": \"todo_write\", \"args\": {\"content\": \"- [ ] Implement feature\\n  - [ ] Write tests\\n  - [ ] Run tests\"}}
+- **plan_write**: Create or update the Plan with YAML content
+  - Format: {\"tool\": \"plan_write\", \"args\": {\"plan\": \"plan_id: my-plan\\nitems: [...]\"}}
+  - Example: {\"tool\": \"plan_write\", \"args\": {\"plan\": \"plan_id: feature-x\\nitems:\\n  - id: I1\\n    description: Add feature\\n    state: todo\\n    touches: [src/lib.rs]\\n    checks:\\n      happy: {desc: Works, target: lib}\\n      negative: {desc: Errors, target: lib}\\n      boundary: {desc: Edge, target: lib}\"}}
+
+- **plan_approve**: Approve the current plan revision (called by user)
+  - Format: {\"tool\": \"plan_approve\", \"args\": {}}
+  - Example: {\"tool\": \"plan_approve\", \"args\": {}}

 - **code_search**: Syntax-aware code search using tree-sitter. Supports Rust, Python, JavaScript, TypeScript.
  - Format: {\"tool\": \"code_search\", \"args\": {\"searches\": [{\"name\": \"label\", \"query\": \"tree-sitter query\", \"language\": \"rust|python|javascript|typescript\", \"paths\": [\"src/\"], \"context_lines\": 0}]}}
@@ -269,11 +295,6 @@ write_file(\"file2.txt\", \"...\")
 write_file(\"helper.rs\", \"...\")
 [DONE]";

-const NON_NATIVE_TODO_ADDENDUM: &str = "
-
-IMPORTANT: If you are provided with a SHA256 hash of the requirements file, you MUST include it as the very first line of the todo.g3.md file in the following format:
-`{{Based on the requirements file with SHA256: <SHA>}}`
-This ensures the TODO list is tracked against the specific version of requirements it was generated from.";

 // ============================================================================
 // COMPOSED PROMPTS
@@ -284,7 +305,7 @@ pub fn get_system_prompt_for_native() -> String {
    format!(
        "{}\n\n{}\n\n{}\n\n{}\n\n{}\n\n{}",
        SHARED_INTRO,
-        SHARED_TODO_SECTION,
+        SHARED_PLAN_SECTION,
        SHARED_TEMPORARY_FILES,
        SHARED_WEB_RESEARCH,
        SHARED_WORKSPACE_MEMORY,
@@ -295,12 +316,11 @@ pub fn get_system_prompt_for_native() -> String {
 /// System prompt for providers without native tool calling (embedded models)
 pub fn get_system_prompt_for_non_native() -> String {
    format!(
-        "{}\n\n{}\n\n{}\n\n{}{}\n\n{}\n\n{}\n\n{}",
+        "{}\n\n{}\n\n{}\n\n{}\n\n{}\n\n{}\n\n{}",
        SHARED_INTRO,
        NON_NATIVE_TOOL_FORMAT,
        NON_NATIVE_INSTRUCTIONS,
-        SHARED_TODO_SECTION,
-        NON_NATIVE_TODO_ADDENDUM,
+        SHARED_PLAN_SECTION,
        SHARED_WEB_RESEARCH,
        SHARED_WORKSPACE_MEMORY,
        SHARED_RESPONSE_GUIDELINES
@@ -311,7 +331,7 @@ pub fn get_system_prompt_for_non_native() -> String {
 const G3_IDENTITY_LINE: &str = "You are G3, an AI programming agent of the same skill level as a seasoned engineer at a major technology company. You analyze given tasks and write code to achieve goals.";

 /// Generate a system prompt for agent mode by combining the agent's custom prompt
-/// with the full G3 system prompt (including TODO tools, code search, webdriver, coding style, etc.)
+/// with the full G3 system prompt (including plan tools, code search, webdriver, coding style, etc.)
 ///
 /// The agent_prompt replaces only the G3 identity line at the start of the prompt.
 /// Everything else (tool instructions, coding guidelines, etc.) is preserved.
@@ -374,12 +394,12 @@ mod tests {
    }

    #[test]
-    fn test_both_prompts_have_todo_section() {
+    fn test_both_prompts_have_plan_section() {
        let native = get_system_prompt_for_native();
        let non_native = get_system_prompt_for_non_native();
        
-        assert!(native.contains("# Task Management with TODO Tools"));
-        assert!(non_native.contains("# Task Management with TODO Tools"));
+        assert!(native.contains("# Task Management with Plan Mode"));
+        assert!(non_native.contains("# Task Management with Plan Mode"));
    }

    #[test]