When a plan reaches a terminal state (all items done or blocked) in
interactive mode, automatically exit plan mode and return to normal
prompt.
Changes:
- Add Agent::is_plan_terminal() method to check if plan is complete
- Add check_and_exit_plan_mode_if_terminal() helper in interactive.rs
- Call the helper after each execute_user_input() to detect completion
Fixes issue where plan mode prompt ' >> ' persisted after plan completion.
- Fix build warnings: add #[allow(dead_code)] to unused deserialization fields
- Fix plan approval gate bug: block file changes when no plan exists (not just
when plan exists but is unapproved)
- Add "Create a plan: " prefix to first user message in plan mode
- Add prepare_plan_mode_input() helper function for testability
- Reset is_first_plan_message flag when entering plan mode via /plan command
- Add tests for approval gate (no plan + no changes, no plan + changes)
- Add tests for prepare_plan_mode_input (happy, negative, boundary cases)
- Add format_envelope_markdown() function in invariants.rs for rich markdown
formatting of ActionEnvelope facts
- Add format_yaml_value_markdown() helper for recursive YAML value display
- Update execute_plan_read() to append rulespec and envelope sections
- Update execute_plan_write() to append envelope section alongside rulespec
- Add 3 tests for format_envelope_markdown (empty, with facts, null values)
When plan_read or plan_write is called, the output now includes:
- Plan YAML (as before)
- Rulespec section (if rulespec.yaml exists) with invariants grouped by source
- Envelope section (if envelope.yaml exists) with facts in readable format
Missing files show placeholder text rather than errors.
- Rewrite SKILL.md with inline instructions to spawn g3 --agent scout directly
- Extend read_file to handle embedded skill paths (<embedded:name>/SKILL.md)
- Remove scripts field from EmbeddedSkill struct (no longer needed)
- Delete extraction.rs module (was only for script extraction)
- Delete g3-research bash script
- Remove obsolete Async Research Tool section from workspace memory
Skills are now fully portable - they work when g3 is installed as a
binary without access to source files. Agents can read embedded skill
content via read_file with the special <embedded:...> path syntax.
- Remove is_embedded_skill() from discovery.rs (unused)
- Remove get_embedded_skills_map() from embedded.rs (unused)
- Remove associated tests for deleted functions
- Inline path check in test_repo_overrides_embedded test
This eliminates dead code warnings and reduces module surface area
without changing any behavior.
Agent: fowler
- Web Research instructions now come from skills/research/SKILL.md
- Skills are dynamically loaded and injected via generate_skills_prompt()
- Remove test_both_prompts_have_web_research test (no longer applicable)
- Remove unused G3Status::research_complete() function
This completes the externalization of research as a skill.
Replaces the built-in research/research_status tools with a portable
skill-based approach:
- Add embedded skills infrastructure (skills compiled into binary)
- Add repo-local skills/ directory support (highest priority)
- Create research skill with SKILL.md and g3-research shell script
- Script extraction to .g3/bin/ with version tracking
- Filesystem-based handoff via .g3/research/<id>/status.json
- Remove PendingResearchManager and all research tool code
- Update system prompt to reference skill instead of tool
Benefits:
- No special tool infrastructure needed (just shell + read_file)
- Context-efficient (reports stay on disk until needed)
- Crash-resilient (state persisted to filesystem)
- Portable (skill can be overridden per-workspace)
Breaking change: research tool calls now return a deprecation message
pointing to the research skill.
- Move system prompt for native tool calling models to prompts/system/native.md
- Use include_str! to embed at compile time
- Remove concatenated SHARED_* string constants
- Prompt is now readable/editable as a complete markdown document
- Non-native prompt still uses Rust constants (acceptable for now)
- Add in_plan_mode flag to Agent struct
- Add set_plan_mode() and is_plan_mode() methods
- Gate check now only runs when in_plan_mode is true
- CLI calls set_plan_mode(true) on /plan command and EnterPlanMode
- CLI calls set_plan_mode(false) on approval and CTRL-D exit
- Update integration test to enable plan mode
- Fix test YAML to use Vec<Check> for negative/boundary checks
Change Plan Mode to allow multiple negative and boundary checks per item,
while keeping happy path as a single check.
Schema change:
- checks.negative: Check -> Vec<Check> (>=1 required)
- checks.boundary: Check -> Vec<Check> (>=1 required)
- checks.happy: Check (unchanged, single)
This better reflects real-world tasks where there are often multiple
error conditions and edge cases worth tracking.
Changes:
- Update Checks struct to use Vec<Check> for negative/boundary
- Update validation to require at least 1 of each
- Update prompts and tool definitions with new array syntax
- Add 4 new tests for multi-check scenarios
- Add check_plan_approval_gate() in tools/plan.rs that runs after each tool call
- Detects file changes via git status --porcelain when plan exists but not approved
- Reverts changes: git checkout for modified files, rm for new untracked files
- Returns blocking message instructing LLM to create/approve plan first
- Add ApprovalGateResult enum with Allowed/Blocked/NotGitRepo variants
- Add set_session_id() and set_working_dir() methods on Agent for testing
- Add integration test using MockProvider to simulate blocked write_file
- Add --resume CLI flag that conflicts with --new-session
- Add load_continuation_by_id() to load sessions by full or partial ID
- Support loading from latest.json or falling back to session.json
- Handle --resume in both normal and agent modes
- Agent mode validates session belongs to correct agent
Adds rulespec.yaml and envelope.yaml support for machine-readable
invariant checking during plan completion.
- Add invariants module with Rulespec, ActionEnvelope, and evaluation logic
- Add Invariants section to system prompt with workflow instructions
- Show rulespec/envelope file status in plan verification output
- Rulespec written during planning (captures constraints from task)
- Envelope written after implementation (documents what was built)
Implements the Agent Skills specification (https://agentskills.io) for
portable skill packages that give the agent new capabilities.
Changes:
- Add skills module with SKILL.md parser (YAML frontmatter + markdown body)
- Implement skill discovery from ~/.g3/skills/, config extra_paths, and .g3/skills/
- Generate <available_skills> XML for system prompt injection
- Add SkillsConfig to g3-config with enabled flag and extra_paths
- Wire skills discovery into CLI startup
- Add 29 unit tests for parser, discovery, and prompt generation
- Update README with Agent Skills documentation
Skill locations (priority order):
1. ~/.g3/skills/ (global)
2. Config extra_paths
3. .g3/skills/ (workspace, highest priority)
At startup, g3 scans skill directories and injects a summary into the
system prompt. When the agent needs a skill, it reads the full SKILL.md
using the read_file tool.
- Add auto-approval logic in execute_plan_write() when ctx.is_autonomous is true
- Update system prompt to document auto-approval behavior
- Plans still require explicit approval in interactive mode
Added plan_approve to the compact tool list in format_tool_result_summary()
so it displays in the same format as other tools like read_file and write_file.
The format_plan_approve_summary() function already existed but was never
called because plan_approve was missing from the matches! block.
Plan tools (plan_read, plan_write) now display with elegant tree-style
formatting similar to the old todo_write UI:
- State indicators: □ (todo), ◐ (doing), ■ (done), ⊘ (blocked)
- Tree prefixes (├/└) for items with child details
- Strikethrough for completed items
- Shows touches and all three checks (happy/negative/boundary)
- Displays plan file path link at the end
plan_approve uses compact single-line format like read_file:
- Shows approval status and revision number
- Handles already-approved and error cases
Changes:
- Add print_plan_compact() to UiWriter trait with default impl
- Implement print_plan_compact() in ConsoleUiWriter
- Call print_plan_compact() from execute_plan_read/write
- Add plan_read/plan_write to is_self_handled_tool()
- Add plan_approve to is_compact_tool() with format_plan_approve_summary()
- Add serde_yaml dependency to g3-cli
Adds a verification system that checks evidence in completed plan items:
- Evidence parsing: supports code locations (file:line, file:line-line, file only)
and test references (file::test_name)
- Code location verification: checks file exists, validates line numbers in range
- Test reference verification: checks test file exists, searches for fn pattern
- Verification results: Verified, Warning, Error, Skipped statuses
- Loud output formatting with emoji indicators for warnings/errors
- Integration with execute_plan_write(): runs when plan is complete and approved
- 12 new unit tests covering parsing and verification
Warnings are advisory (don't block), errors are loud but also don't block.
Blocked items are skipped during verification.
Plan Mode is a cognitive forcing system that requires reasoning about:
- Happy path
- Negative case
- Boundary condition
New tools:
- plan_read: Read current plan for session
- plan_write: Create/update plan with YAML content (validates structure)
- plan_approve: Mark current revision as approved
New command:
- /feature <description>: Start Plan Mode for a new feature
Plan schema requires:
- plan_id, revision, approved_revision
- items with id, description, state, touches, checks (happy/negative/boundary)
- evidence and notes required when marking items done
Verification:
- plan_verify() called automatically when all items are done/blocked
Removed:
- todo_read, todo_write tools
- todo.rs module and related tests
Fixes issues in the last 11 commits:
1. pending_research.rs: Fix flaky test_generate_id_uniqueness
- Replaced random u16 suffix with atomic counter for guaranteed uniqueness
- The timestamp+random approach could collide when generating IDs rapidly
- Now uses static AtomicU32 counter that increments monotonically
2. embedded/adapters/glm.rs: Remove unused in_code_fence field
- Field was written but never read (dead code)
- Removed from struct definition, constructor, and reset()
3. embedded/adapters/glm.rs: Fix orphaned tests
- Two tests (test_strip_code_fences, test_code_fenced_tool_call) were
outside the #[cfg(test)] mod tests block
- Moved closing brace to include them in the test module
All 446 library tests pass.
Agent: fowler
When the LLM 'stutters' and emits incomplete tool call fragments like:
{"tool": "shell", "args": {...}}
{"tool":
{"tool": "shell", "args": {...}}
The parser would get stuck waiting for the incomplete fragment to complete,
causing the entire response to be lost (no tool executed, no text displayed).
This was observed in butler session butler_c6ab59af2e4f991c where the user's
'send!' command produced no response.
Fix: Enhanced is_json_invalidated() to detect when a new tool call pattern
({"tool"}) appears after a newline while parsing an incomplete JSON fragment.
This indicates the previous fragment was abandoned and should be invalidated.
Safety:
- Tool patterns inside JSON strings (e.g., writing example code) are not
affected because the check only runs outside strings
- Added tests for the stuttering pattern and the file-writing edge case
When background research completes, g3 now immediately prints a status
message instead of waiting for the next user interaction:
- Added ResearchCompletionNotification and broadcast channel to
PendingResearchManager for push-based notifications
- Added spawn_research_notification_handler() in interactive mode that
listens for completions in a background task
- When idle (at prompt): clears line, prints status, reprints prompt
- When busy (processing): prints status inline (interleaving is fine)
- Added G3Status::research_complete() for consistent formatting
- Added enable_research_notifications() method to Agent
Output format: "g3: 1 research report ... [done]"
The research tool now spawns the scout agent in a background tokio task
and returns immediately with a research_id placeholder. This allows the
agent to continue working while research runs (30-120 seconds).
Key changes:
- New PendingResearchManager for tracking async research tasks
- research tool returns immediately with placeholder containing research_id
- research_status tool to check progress of pending research
- Auto-injection of completed research at natural break points:
- Start of each tool iteration (before LLM call)
- Before prompting user in interactive mode
- /research CLI command to list all research tasks
- Updated system prompt to explain async behavior
The agent can:
- Continue with other work while research runs
- Check status with research_status tool
- Yield turn to user if results are critical before continuing
README.md is no longer auto-loaded into the LLM context at startup.
This saves ~4,600 tokens per session while AGENTS.md and memory.md
still provide all critical information for code tasks.
Changes:
- Delete read_project_readme() function
- Remove readme_content parameter from combine_project_content()
- Rename extract_readme_heading() -> extract_project_heading()
- Rename Agent constructors: *_with_readme_* -> *_with_project_context_*
- Update context preservation to only check for Agent Configuration
- Remove has_readme field from LoadedContent
- Update all tests to use new markers and function names
The LLM can still read README.md on-demand via read_file when needed.
- Add GeminiProvider with streaming and native tool calling
- Support gemini-2.5-pro, gemini-2.0-flash, gemini-1.5-pro/flash models
- Model-specific context window detection (1M-2M tokens)
- Message conversion: assistant -> model role mapping
- System messages extracted to system_instruction field
- Tool schema conversion with functionCall/functionResponse parts
- SSE streaming with JSON array buffer parsing
- 8 unit tests for conversion and parsing logic
- Register provider in g3-core and validate in g3-cli
The resolve_max_tokens() function was returning 2048 for embedded providers,
which caused responses to be truncated prematurely. Increased to 8192 to
allow the provider's own effective_max_tokens() calculation to work properly.
- Add context_window_size() method to LLMProvider trait
- Implement for EmbeddedProvider to return the auto-detected context length
- Update Agent to query provider directly instead of using hardcoded defaults
- Removes need for model-specific context length mappings
Eliminate code-path aliasing in Agent construction methods by introducing
a single `build_agent()` helper that all constructors delegate to.
Before: 3 nearly-identical `Ok(Self { ... })` blocks (~30 lines each)
with subtle differences in auto_compact, is_autonomous, quiet, and
computer_controller fields - prone to drift over time.
After: Single canonical `build_agent()` method that constructs Agent
with all fields. All public constructors delegate to this single path:
- new_for_test() -> new_for_test_with_readme() -> build_agent()
- new_with_mode_and_readme() -> build_agent()
Changes:
- Add `build_agent()` private helper method (single source of truth)
- Simplify `new_for_test()` to delegate to `new_for_test_with_readme()`
- Update `new_for_test_with_readme()` to use `build_agent()`
- Update `new_with_mode_and_readme()` to use `build_agent()`
Net reduction: ~43 lines (-109/+66)
All 190 tests pass.
Agent: fowler
- Extend Usage struct with cache_creation_tokens and cache_read_tokens fields
- Parse Anthropic cache_creation_input_tokens and cache_read_input_tokens
- Parse OpenAI prompt_tokens_details.cached_tokens for automatic prefix caching
- Add CacheStats struct to Agent for cumulative tracking across API calls
- Add "Prompt Cache Statistics" section to /stats output showing:
- API call count and cache hit count
- Hit rate percentage
- Total input tokens and cache read/creation tokens
- Cache efficiency (% of input served from cache)
- Update all provider implementations and test files
- Fix test_rehydrate_success race condition by using UUID for unique session IDs
- Add #[serial] attribute to prevent parallel execution conflicts
- Improve cleanup to remove entire session directory tree
- Add characterization test for resize_image_to_dimensions fallback behavior
(documents fix from commit af8b849 for media type preservation)
Agent: hopper
The previous implementation added the summary as a System message, which
caused "Conversation must start with a user message" errors because the
first non-system message after compaction was Assistant (the preserved
last assistant message).
Fix: Change summary from System to User message, creating valid alternation:
[System Prompt] -> [Summary as USER] -> [Last Assistant] -> [Latest User]
This also prevents system message bloat across multiple compactions since
the summary is now part of the conversation flow and gets replaced on
each compaction.
Added test_second_compaction_no_bloat to verify no accumulation.
When context window compaction occurs, the last assistant message is now
preserved in addition to the system prompt, README, and summary. This
improves continuity after compaction by keeping the LLM's most recent
response, which often contains important context about what was just
done or what comes next.
New message order after compaction:
[System Prompt] -> [README/AGENTS.md] -> [ACD Stub?] -> [Summary] -> [Last Assistant] -> [Latest User?]
Changes:
- Add last_assistant_message field to PreservedMessages struct
- Modify extract_preserved_messages() to find last assistant message
- Modify reset_with_summary_and_stub() to include last assistant message
- Add comprehensive integration tests using MockProvider
Tests cover edge cases:
- No assistant message exists
- Tool-call-only assistant messages (still preserved)
- Multiple assistant messages (only last one preserved)
- No trailing user message
When resize_image_to_dimensions() returns a larger file than the original,
we fall back to using the original bytes. Previously, was_resized was set
to true if the original dimensions exceeded MAX_IMAGE_DIMENSION, which
caused final_media_type to be set to 'image/jpeg' even though we were
using the original PNG bytes.
This caused Anthropic API errors like:
'Image does not match the provided media type image/jpeg'
Fix: Set was_resized=false when falling back to original bytes, so the
original media type (detected from magic bytes) is preserved.
Rename all references from "Project Memory" to "Workspace Memory" to avoid
future conflation if a "project" concept is introduced later.
Changes:
- Rename read_project_memory() -> read_workspace_memory()
- Update all prompts, tool descriptions, and comments
- Update header parsing in memory.rs to use "# Workspace Memory"
- Update display detection for "=== Workspace Memory ==="
- Update documentation and analysis/memory.md
11 files changed, ~36 occurrences updated.