- Add description field to SessionContinuation struct
- Extract first user message (truncated to ~60 chars at word boundary)
- Display as quoted text instead of session ID hash
- Fall back to session ID if no description available
Example: [2 hours ago] 'when I call /resume it only shows me 2 sessions...'
- Change run_autonomous to return Agent instead of () so session
continuation is properly saved in accumulative mode
- Update format_session_time to show relative times ("2 hours ago",
"yesterday") for recent sessions and dates for older ones
- Handle Ctrl+C cancellation gracefully with informative message
Fix session detection:
- Add save_session_continuation() calls at all session exit points
- Sessions now properly create .g3/session symlink for resume detection
- Fixes issue where g3 wasn't offering to resume previous sessions
Add /resume command:
- New list_sessions_for_directory() to scan available sessions
- New switch_to_session() method to safely switch between sessions
- Shows numbered list with timestamps, context %, and TODO status
- Saves current session before switching (can be resumed later)
- Restores full context if <80% used, otherwise uses summary
- Machine mode supports /resume and /resume <number>
Documentation:
- Add /clear and /resume to CONTROL_COMMANDS.md
- Update /help output with new commands
When the scout agent fails (e.g., context window exhaustion), now:
- Captures both stdout and stderr from the scout process
- Detects context window exhaustion errors with specific patterns
- Provides detailed, actionable error messages to the user
- Shows suggestions for how to work around the issue
- Includes technical details (exit code, error output) for debugging
Handles two failure modes:
1. Scout agent exits with non-zero status
2. Scout agent exits successfully but doesn't produce valid report markers
Both cases now surface clear error messages instead of cryptic failures.
The timing footer (e.g., ⏱️ 19.4s | 💭 4.7s) was being saved to the
conversation history as a separate assistant message. This happened
because stream_completion_with_tools returns the timing footer in
TaskResult.response for display, but the caller was also saving it
to context.
Fix: Strip the timing footer (identified by \n\n⏱️) before saving
to context window. The timing footer remains display-only.
Also includes:
- Research tool blank line fix: only add visual separator for research
tool output, not all tools
- Research tool webdriver propagation: pass parent's webdriver browser
choice (Safari vs Chrome headless) to scout subprocess
final_output removal:
- Remove final_output from tool definitions and dispatch
- Update system prompts to request summaries as regular text
- Remove final_output_called field from StreamingState
- Update auto_continue tests to remove final_output_called parameter
- Remove final_output test from tool_execution_test.rs
- Update planner and flock prompts to not reference final_output
- Keep backwards-compat code in feedback_extraction.rs and task_result.rs
Scout report handback:
- Change from file-based to delimiter-based report extraction
- Scout outputs report between ---SCOUT_REPORT_START/END--- markers
- Research tool extracts content between markers, strips ANSI codes
- Add comprehensive tests for extraction and ANSI stripping
657 tests pass.
The research tool now streams the underlying scout agent's output
to the CLI in real-time for visual indication of progress. This
output is displayed but not added to the conversation context.
Simplified the main system prompt's web research section to just direct
users to the research tool. Moved the detailed WebDriver usage instructions
to scout.md where they belong, since the scout agent is the one that
actually uses WebDriver for research.
Main prompt now simply says: use the research tool for web research.
Scout agent now has the full WebDriver best practices documentation.
New tool that spawns a scout agent to perform web research and return
a structured research brief. The scout agent uses webdriver to browse
the web and returns a decision-ready report.
Changes:
- Added 'research' tool definition (12 core tools total)
- Added research tool dispatch in tool_dispatch.rs
- Created tools/research.rs implementation:
- Spawns 'g3 --agent scout <query>' as subprocess
- Captures stdout and extracts last line (report file path)
- Reads and returns the report file contents
- Added exclude_research flag to ToolConfig
- Scout agent (agent_name == 'scout') does NOT have access to research
tool to prevent infinite recursion
- Updated system prompts to describe when to use research tool
- Added scout.md agent prompt with research brief output contract
The research tool is preferred for complex research tasks (APIs, SDKs,
libraries, approaches, bugs). WebDriver can still be used directly for
simple lookups or fine-grained control.
The buffer truncation code was slicing at a raw byte offset which could
land in the middle of a multi-byte character (like emojis), causing a
panic. Fixed by using char_indices() to find valid character boundaries.
Also added stop_reason field to CompletionChunk initializers in tests
to complete the stop_reason feature addition.
- Fix byte boundary panic in filter_json.rs line 327
- Add test for multi-byte character handling
- Update test files with missing stop_reason field
When the LLM executes a tool and then outputs text (e.g., analysis after
reading images), the text was being displayed during streaming but never
saved to the context window. This caused:
1. The response to appear truncated in the session log
2. Loss of context for subsequent turns
3. The LLM losing track of what it had already said
The fix saves current_response to the context window before breaking
out of the streaming loop for auto-continue after tool execution.
Reproduction scenario:
- User asks LLM to read images and analyze them
- LLM calls read_image tool
- Tool executes successfully
- LLM outputs analysis text ("Now I can see the results...")
- Text was displayed but lost from session log
Now the text is properly persisted to the context window.
- Remove final_output from tool definitions, dispatch, and misc tools
- Update system prompts to request summaries as regular markdown text
- Remove print_final_output from UiWriter trait and all implementations
- Remove final_output handling from agent core logic
- Rename final_output_summary → summary in session continuation
- Delete final_output test files
- Update tool count tests (12→11, 27→26)
This allows LLM summaries to stream through the markdown formatter
for a more natural, responsive user experience instead of buffering
everything into a tool call.
This fixes a bug where the agent would stop responding abruptly without
calling final_output. The root cause was the allow_multiple_tool_calls
config option (default: false) which caused the agent to break out of
the streaming loop mid-stream after executing the first tool, losing
any subsequent content.
Changes:
- Remove allow_multiple_tool_calls config option entirely
- Always process all tool calls without breaking mid-stream
- Simplify system prompt generation (no longer needs boolean param)
- Let the stream complete fully before continuing to next iteration
- Change find_last_tool_call_start to find_first_tool_call_start
- Remove parser.reset() call on duplicate detection
Benefits:
- Simpler logic with less conditional branching
- No lost content after tool calls
- Consistent behavior for all users
- Reduced config complexity
Agent: carmack
Add get_session() helper function that:
- Checks if webdriver is enabled
- Acquires the session read lock
- Returns the cloned session or an error message
Refactored 12 webdriver tool functions to use this helper:
- execute_webdriver_navigate
- execute_webdriver_get_url
- execute_webdriver_get_title
- execute_webdriver_find_element
- execute_webdriver_find_elements
- execute_webdriver_click
- execute_webdriver_send_keys
- execute_webdriver_execute_script
- execute_webdriver_get_page_source
- execute_webdriver_screenshot
- execute_webdriver_back
- execute_webdriver_forward
- execute_webdriver_refresh
Each function previously had ~10 lines of identical boilerplate.
Now reduced to 4 lines using the helper.
Net reduction: 68 lines (678 -> 610)
All tests pass. Behavior unchanged.
Agent: carmack
databricks.rs:
- Extract ToolCallAccumulator struct to replace opaque (String, String, String) tuple
- Add decode_utf8_streaming() helper for cleaner UTF-8 handling
- Add is_incomplete_json_error() helper for JSON parse error detection
- Add make_final_chunk() helper to reduce duplication
- Add finalize_tool_calls() to convert accumulators to final format
- Refactor parse_streaming_response from ~270 lines to ~100 lines
- Reduce nesting depth from 8+ levels to 4 levels
- Use early returns and let-else for cleaner control flow
file_ops.rs:
- Replace repetitive if-let chains with declarative PATH_CONTENT_KEYS table
- Use match expression instead of nested if-else
- Reduce extract_path_and_content from 44 lines to 20 lines
All tests pass. Behavior unchanged.
- Add set_agent_mode() to UiWriter trait for visual mode differentiation
- ConsoleUiWriter uses royal blue (ANSI 256 color 69) for tool names in agent mode
- Fix extract_readme_heading() to search only README section of combined content
(was incorrectly showing AGENTS.md heading instead of README heading)
- streaming_parser.rs: Rename has_message_like_keys to args_contain_prose_fragments
with improved documentation explaining the heuristic for detecting malformed
tool calls where LLM prose leaked into JSON keys
- context_window.rs: Simplify build_thin_result_message using early return
pattern and match expression for cleaner control flow
Agent: carmack
Auto-continue was incorrectly triggering when the LLM asked questions
in interactive/chat mode. Now auto-continue only activates when
is_autonomous is true, allowing proper back-and-forth conversation
in interactive mode.
Agent: fowler
New test files:
- crates/g3-cli/tests/cli_integration_test.rs (14 tests)
Blackbox CLI tests: help/version flags, argument validation,
conflicting modes, flock mode requirements
- crates/g3-core/tests/tool_execution_test.rs (20 tests)
Tool call structure tests and unified diff application:
read_file, write_file, str_replace, shell, background_process,
todo, final_output, code_search, take_screenshot
- crates/g3-providers/tests/message_serialization_test.rs (20 tests)
Round-trip serialization tests for Message, MessageRole,
CacheControl, and Tool types. Covers Unicode, special chars,
and edge cases.
All tests follow blackbox/integration-first principles with
documentation of what they protect and intentionally do not assert.
- Extract check_duplicate_in_previous_message() helper to reduce nesting
from 6+ levels to 2 levels in stream_completion_with_tools
- Create do_thin_context() and do_thin_context_all() helpers to centralize
context thinning with event tracking
- Use provider_config::parse_provider_ref() in additional call sites
- All 295 tests pass
This continues the refactoring to eliminate code-path aliasing and
reduce cyclomatic complexity in the Agent implementation.
macOS uses U+202F (Narrow No-Break Space) in screenshot filenames
between the time and am/pm. When users type or paste these paths,
they use regular spaces, causing file-not-found errors.
Changes:
- Add resolve_path_with_unicode_fallback() to try U+202F variants
- Add resolve_paths_in_shell_command() for shell command paths
- Apply fix to read_file, read_image, and shell tools
- Fix read_image prompt docs: file_path -> file_paths (array)
- Add 6 unit tests for Unicode space normalization
- Add TODO completion check to final_output tool in autonomous mode only
- When incomplete TODO items exist, reject final_output and prompt LLM to continue
- Non-autonomous modes (interactive, chat) are unaffected
- Add 6 tests verifying behavior in both autonomous and non-autonomous modes
Fixes issue where LLM would call final_output after completing first phase,
causing agent to stop prematurely instead of continuing with remaining phases.
- Move tool_executed = true after duplicate check to prevent auto-continue
from triggering when only duplicate tools were detected
- Reset parser state when duplicate detected to clear any partial/polluted
state from LLM stuttering or example tool calls in markdown blocks
- Print └─ before images to break out of tool output box
- Print ┌─ after images to resume tool output box
- Remove │ prefix from image preview and info lines
- Info line uses single space prefix, dimmed text
- Only include error messages in tool result (success info printed via imgcat)
- Remove │ prefix before image preview, use single space instead
- Keep info line on its own line with │ prefix
- Keep blank line spacing between images
- Remove unused assignment to final_output_called (returns immediately after)
- Mark cache_config field as #[allow(dead_code)] (reserved for future use)
- Mark print_status_line method as #[allow(dead_code)] (reserved for future use)
TODO lists are now stored in .g3/sessions/<session_id>/todo.g3.md instead
of the workspace root. This prevents different g3 sessions from accidentally
picking up or overwriting each other's TODOs.
Changes:
- Add get_session_todo_path() function in paths.rs
- Update todo_read/todo_write handlers to use session-specific paths
- Remove TODO loading at Agent initialization (sessions start fresh)
- Update prompts to reflect session-scoped behavior
Fallback behavior preserved for planner mode (G3_TODO_PATH env var).
Adds a new tool that allows launching processes (like game servers) in the
background while g3 continues to operate. The process runs independently
with stdout/stderr captured to a log file.
Features:
- Named process tracking for easy reference
- Automatic log capture to logs/background_processes/
- Returns PID and log file path for use with shell tool
- Automatic cleanup on agent shutdown via Drop trait
Usage: Use shell tool to interact with the process:
- Read logs: tail -100 <logfile>
- Check status: ps -p <pid>
- Stop process: kill <pid>
Files:
- New: crates/g3-core/src/background_process.rs
- New: crates/g3-core/tests/background_process_demo_test.rs
- Modified: crates/g3-core/src/lib.rs (tool definition + handler)
- Modified: crates/g3-core/src/prompts.rs (documentation)
When the LLM outputs text containing tool call patterns (e.g., reading
log files, showing examples, or discussing tool calls), the parser's
has_unexecuted_tool_call() would detect these as real tool calls and
trigger auto-continue, leading to repeated empty responses.
The fix: mark the parser buffer as consumed when content is displayed.
This prevents tool-call-like patterns in displayed text from triggering
false positives later. The fix is safe because:
1. Only runs when no tool was detected (inside 'if !tool_executed')
2. Legitimate tool calls are detected first by process_chunk()
3. Matches existing pattern of calling mark_tool_calls_consumed()
after tool execution
The auto-continue logic was adding User continue prompts without first
adding an Assistant message when the LLM returned an empty response.
This caused consecutive User messages in the conversation history,
which confused the LLM and caused it to return more empty responses.
The fix ensures an Assistant message is always added before the continue
prompt, using '[empty response]' as a placeholder when the LLM returned
nothing substantive. This maintains proper User/Assistant alternation.