1. str_replace: Show insertion/deletion counts with colors
"✅ +N insertions | -M deletions" (green/red)
2. write_file: Compact format with human-readable sizes
"✅ wrote N lines | Xk chars"
3. read_file: Cleaner format
"🔍 N lines read" instead of "📄 File content (N lines)"
4. webdriver_quit: Show correct driver name (safaridriver vs chromedriver)
5. read_file: When start position exceeds file length, read last 100 chars
with explanation instead of failing
6. shell: Remove redundant "Command failed:" prefix from error messages
When a code block ended without a trailing newline after the closing
\`\`\`, two bugs occurred in flush_incomplete():
1. The closing \`\`\` was included as part of the code block content
(displayed with syntax highlighting)
2. The same \`\`\` was then emitted again as literal text because
current_line was not cleared after being pushed to block_buffer
The fix:
- Check if current_line is the closing fence before adding to block_buffer
- Always clear current_line after processing in the CodeBlock case
Added two tests:
- test_code_fence_after_blank_line: code fence with trailing newline
- test_code_fence_no_trailing_newline: code fence without trailing newline
The agent mode header now shows:
- Agent name in uppercase with box art
- Working directory (truncated if too long)
- Status indicators for README, AGENTS.md, and Memory loading
- Task preview if provided
Also exports truncate_for_display and adds truncate_path_for_display
helper functions in project_files module.
The JSON tool call filter was outputting newlines immediately as they
were encountered. When the LLM output contained multiple newlines before
a tool call, each newline was output before the tool call JSON was
detected and suppressed, leaving orphaned blank lines in the output.
Changes:
- Add pending_newlines field to FilterState to buffer newlines at line start
- First newline after content is output immediately, subsequent ones buffered
- When tool call confirmed, pending_newlines cleared (suppressing extra blanks)
- When not a tool call, pending_newlines output with the buffer
- Add flush_json_tool_filter() to flush pending content at end of streaming
- Update tests to reflect new behavior
- Add tests for newline suppression behavior
Extract three cohesive modules from the monolithic lib.rs (3188 -> 2785 lines):
- metrics.rs (147 lines): Turn metrics tracking and histogram generation
- TurnMetrics struct
- format_elapsed_time() for human-readable durations
- generate_turn_histogram() for performance visualization
- Added unit tests for core functions
- project_files.rs (181 lines): Project file reading utilities
- read_agents_config() for AGENTS.md loading
- read_project_readme() for README detection
- read_project_memory() for .g3/memory.md
- extract_readme_heading() for display
- Added unit tests
- coach_feedback.rs (129 lines): Coach feedback extraction from session logs
- extract_from_logs() main entry point
- Helper functions for log parsing and text extraction
All modules have clear single responsibilities, improved documentation,
and maintain identical behavior to the original inline functions.
Agent: carmack
Collapsed nested if statements that check related conditions into
single conditions using &&. This improves readability by making
the logical relationship between conditions explicit.
Files changed:
- feedback_extraction.rs: 3 instances of tool_use/final_output checks
- tools/todo.rs: 1 instance of todo completion check
Agent: fowler
The function had two branches that both returned line.to_string():
- when !should_truncate
- when line.chars().count() <= max_width
Merged into a single condition. Also updated format! to use
inline variable syntax per clippy suggestion.
Agent: fowler
Agent mode was only loading README.md but not AGENTS.md or project
memory (.g3/memory.md). This meant agents were missing important
context that normal mode had access to.
Now agent mode uses the same read_agents_config(), read_project_readme(),
and read_project_memory() functions as normal mode, combining all three
into the agent context.
Simplify auto-memory by always enabling it in agent mode instead of
requiring the --auto-memory flag. This makes sense because:
- Agent mode is non-interactive, so blocking is acceptable
- Agents benefit from automatically saving discoveries to memory
- Reduces flag complexity for users
The --auto-memory flag still works for other modes if desired.
The --auto-memory flag was not being passed to run_agent_mode() and
send_auto_memory_reminder() was not being called after agent task
execution.
Changes:
- Pass auto_memory parameter to run_agent_mode()
- Add auto_memory parameter to run_agent_mode() function signature
- Call agent.set_auto_memory(true) when flag is enabled
- Call send_auto_memory_reminder() after execute_task() in agent mode
Adds a new --auto-memory CLI flag that automatically sends a reminder
to the LLM after each turn where tools were called, prompting it to
call the remember tool if it discovered any key code locations.
Changes:
- Add auto_memory field and set_auto_memory() method to Agent
- Add tool_calls_this_turn tracking in execute_tool_in_dir()
- Add send_auto_memory_reminder() that sends reminder after tool use
- Add --auto-memory CLI flag and wire it up in console/machine modes
- Call send_auto_memory_reminder() in single-shot and interactive modes
- Add visible status messages for auto-memory actions
Fixes bug where tool calls were not being tracked when execute_tool_in_dir
was called directly with working_dir=None.
The format_header() function was not calling format_inline_content()
to process inline formatting like **bold**, *italic*, and `code`
within headers. This caused raw markdown markers to appear in output.
Added 4 tests to verify the fix:
- test_bold_inside_header
- test_italic_inside_header
- test_code_inside_header
- test_mixed_formatting_inside_header
The code fence (```) was not being properly detected during streaming,
causing it to be rendered as inline code instead of a code block.
Root cause: When buffering a code fence after seeing ```, the code
was returning early for ALL characters including newlines. This meant
handle_newline() was never called and block_state was never set to
BlockState::CodeBlock.
Fixes:
- Don't return early for newlines when buffering code fence, allow them
to fall through to handle_newline()
- Support indented code fences (up to 3 spaces per CommonMark spec) by
using trim_start() when checking for ``` at line start
- Remove IMPORTANT FOR CODING section (~1,500 chars of coding guidelines)
- Remove <use_parallel_tool_calls> block (~500 chars)
- Remove unused const_format dependency from g3-core
- Simplify get_system_prompt_for_native() to just return base prompt
- Response Guidelines now cleanly ends the static prompt
Prompt reduced from ~8,500 to ~6,500 characters.
- Add description field to SessionContinuation struct
- Extract first user message (truncated to ~60 chars at word boundary)
- Display as quoted text instead of session ID hash
- Fall back to session ID if no description available
Example: [2 hours ago] 'when I call /resume it only shows me 2 sessions...'
- Change run_autonomous to return Agent instead of () so session
continuation is properly saved in accumulative mode
- Update format_session_time to show relative times ("2 hours ago",
"yesterday") for recent sessions and dates for older ones
- Handle Ctrl+C cancellation gracefully with informative message
Fix session detection:
- Add save_session_continuation() calls at all session exit points
- Sessions now properly create .g3/session symlink for resume detection
- Fixes issue where g3 wasn't offering to resume previous sessions
Add /resume command:
- New list_sessions_for_directory() to scan available sessions
- New switch_to_session() method to safely switch between sessions
- Shows numbered list with timestamps, context %, and TODO status
- Saves current session before switching (can be resumed later)
- Restores full context if <80% used, otherwise uses summary
- Machine mode supports /resume and /resume <number>
Documentation:
- Add /clear and /resume to CONTROL_COMMANDS.md
- Update /help output with new commands
When the scout agent fails (e.g., context window exhaustion), now:
- Captures both stdout and stderr from the scout process
- Detects context window exhaustion errors with specific patterns
- Provides detailed, actionable error messages to the user
- Shows suggestions for how to work around the issue
- Includes technical details (exit code, error output) for debugging
Handles two failure modes:
1. Scout agent exits with non-zero status
2. Scout agent exits successfully but doesn't produce valid report markers
Both cases now surface clear error messages instead of cryptic failures.
Runs automatically when --chrome-headless flag is used, checking:
- ChromeDriver installation and PATH
- Chrome/Chromium installation
- Chrome and ChromeDriver version compatibility
- config.toml chrome_binary setting
- Chrome for Testing installation
- ChromeDriver executable permissions (macOS quarantine)
Displays a detailed report with:
- Summary of detected versions and paths
- Pass/warning/error status for each check
- Specific fix suggestions for any issues found
Users can then ask g3 to help fix any detected issues.
The timing footer (e.g., ⏱️ 19.4s | 💭 4.7s) was being saved to the
conversation history as a separate assistant message. This happened
because stream_completion_with_tools returns the timing footer in
TaskResult.response for display, but the caller was also saving it
to context.
Fix: Strip the timing footer (identified by \n\n⏱️) before saving
to context window. The timing footer remains display-only.
Also includes:
- Research tool blank line fix: only add visual separator for research
tool output, not all tools
- Research tool webdriver propagation: pass parent's webdriver browser
choice (Safari vs Chrome headless) to scout subprocess
final_output removal:
- Remove final_output from tool definitions and dispatch
- Update system prompts to request summaries as regular text
- Remove final_output_called field from StreamingState
- Update auto_continue tests to remove final_output_called parameter
- Remove final_output test from tool_execution_test.rs
- Update planner and flock prompts to not reference final_output
- Keep backwards-compat code in feedback_extraction.rs and task_result.rs
Scout report handback:
- Change from file-based to delimiter-based report extraction
- Scout outputs report between ---SCOUT_REPORT_START/END--- markers
- Research tool extracts content between markers, strips ANSI codes
- Add comprehensive tests for extraction and ANSI stripping
657 tests pass.
The research tool now streams the underlying scout agent's output
to the CLI in real-time for visual indication of progress. This
output is displayed but not added to the conversation context.
Simplified the main system prompt's web research section to just direct
users to the research tool. Moved the detailed WebDriver usage instructions
to scout.md where they belong, since the scout agent is the one that
actually uses WebDriver for research.
Main prompt now simply says: use the research tool for web research.
Scout agent now has the full WebDriver best practices documentation.
New tool that spawns a scout agent to perform web research and return
a structured research brief. The scout agent uses webdriver to browse
the web and returns a decision-ready report.
Changes:
- Added 'research' tool definition (12 core tools total)
- Added research tool dispatch in tool_dispatch.rs
- Created tools/research.rs implementation:
- Spawns 'g3 --agent scout <query>' as subprocess
- Captures stdout and extracts last line (report file path)
- Reads and returns the report file contents
- Added exclude_research flag to ToolConfig
- Scout agent (agent_name == 'scout') does NOT have access to research
tool to prevent infinite recursion
- Updated system prompts to describe when to use research tool
- Added scout.md agent prompt with research brief output contract
The research tool is preferred for complex research tasks (APIs, SDKs,
libraries, approaches, bugs). WebDriver can still be used directly for
simple lookups or fine-grained control.
The buffer truncation code was slicing at a raw byte offset which could
land in the middle of a multi-byte character (like emojis), causing a
panic. Fixed by using char_indices() to find valid character boundaries.
Also added stop_reason field to CompletionChunk initializers in tests
to complete the stop_reason feature addition.
- Fix byte boundary panic in filter_json.rs line 327
- Add test for multi-byte character handling
- Update test files with missing stop_reason field
When the LLM executes a tool and then outputs text (e.g., analysis after
reading images), the text was being displayed during streaming but never
saved to the context window. This caused:
1. The response to appear truncated in the session log
2. Loss of context for subsequent turns
3. The LLM losing track of what it had already said
The fix saves current_response to the context window before breaking
out of the streaming loop for auto-continue after tool execution.
Reproduction scenario:
- User asks LLM to read images and analyze them
- LLM calls read_image tool
- Tool executes successfully
- LLM outputs analysis text ("Now I can see the results...")
- Text was displayed but lost from session log
Now the text is properly persisted to the context window.
- Remove final_output from tool definitions, dispatch, and misc tools
- Update system prompts to request summaries as regular markdown text
- Remove print_final_output from UiWriter trait and all implementations
- Remove final_output handling from agent core logic
- Rename final_output_summary → summary in session continuation
- Delete final_output test files
- Update tool count tests (12→11, 27→26)
This allows LLM summaries to stream through the markdown formatter
for a more natural, responsive user experience instead of buffering
everything into a tool call.