- Fixed run_agent_mode to call std::env::set_current_dir with workspace_dir
- Updated fowler.md to read README.md and AGENTS.md as part of Triage & Understanding step
- Move tool_executed = true after duplicate check to prevent auto-continue
from triggering when only duplicate tools were detected
- Reset parser state when duplicate detected to clear any partial/polluted
state from LLM stuttering or example tool calls in markdown blocks
- Print └─ before images to break out of tool output box
- Print ┌─ after images to resume tool output box
- Remove │ prefix from image preview and info lines
- Info line uses single space prefix, dimmed text
- Only include error messages in tool result (success info printed via imgcat)
- Remove │ prefix before image preview, use single space instead
- Keep info line on its own line with │ prefix
- Keep blank line spacing between images
- Remove unused assignment to final_output_called (returns immediately after)
- Mark cache_config field as #[allow(dead_code)] (reserved for future use)
- Mark print_status_line method as #[allow(dead_code)] (reserved for future use)
TODO lists are now stored in .g3/sessions/<session_id>/todo.g3.md instead
of the workspace root. This prevents different g3 sessions from accidentally
picking up or overwriting each other's TODOs.
Changes:
- Add get_session_todo_path() function in paths.rs
- Update todo_read/todo_write handlers to use session-specific paths
- Remove TODO loading at Agent initialization (sessions start fresh)
- Update prompts to reflect session-scoped behavior
Fallback behavior preserved for planner mode (G3_TODO_PATH env var).
Adds a new tool that allows launching processes (like game servers) in the
background while g3 continues to operate. The process runs independently
with stdout/stderr captured to a log file.
Features:
- Named process tracking for easy reference
- Automatic log capture to logs/background_processes/
- Returns PID and log file path for use with shell tool
- Automatic cleanup on agent shutdown via Drop trait
Usage: Use shell tool to interact with the process:
- Read logs: tail -100 <logfile>
- Check status: ps -p <pid>
- Stop process: kill <pid>
Files:
- New: crates/g3-core/src/background_process.rs
- New: crates/g3-core/tests/background_process_demo_test.rs
- Modified: crates/g3-core/src/lib.rs (tool definition + handler)
- Modified: crates/g3-core/src/prompts.rs (documentation)
When the LLM outputs text containing tool call patterns (e.g., reading
log files, showing examples, or discussing tool calls), the parser's
has_unexecuted_tool_call() would detect these as real tool calls and
trigger auto-continue, leading to repeated empty responses.
The fix: mark the parser buffer as consumed when content is displayed.
This prevents tool-call-like patterns in displayed text from triggering
false positives later. The fix is safe because:
1. Only runs when no tool was detected (inside 'if !tool_executed')
2. Legitimate tool calls are detected first by process_chunk()
3. Matches existing pattern of calling mark_tool_calls_consumed()
after tool execution
The auto-continue logic was adding User continue prompts without first
adding an Assistant message when the LLM returned an empty response.
This caused consecutive User messages in the conversation history,
which confused the LLM and caused it to return more empty responses.
The fix ensures an Assistant message is always added before the continue
prompt, using '[empty response]' as a placeholder when the LLM returned
nothing substantive. This maintains proper User/Assistant alternation.
Reduce lib.rs from 7481 to 6557 lines (-12.4%) by extracting:
- paths.rs: Session/workspace path utilities (get_todo_path, get_logs_dir, etc.)
- streaming_parser.rs: StreamingToolParser for LLM response parsing
- utils.rs: Diff parsing and shell escaping utilities
- webdriver_session.rs: Unified Safari/Chrome WebDriver abstraction
All public APIs preserved via re-exports for backward compatibility.
Added 13 new unit tests across extracted modules.
All 225 tests pass.
Added a quality-of-life feature that displays:
- Tokens used in the current turn (from LLM response, not estimated)
- Current context window usage percentage
These are displayed dimmed after the timing info:
⏱️ 1.2s | 💭 0.3s 1234tk | 45% ctx
The token count comes directly from the LLM's usage response data,
not from any estimation. If no usage data is available from the LLM,
only the context percentage is shown.
Added 13 tests to verify that duplicate detection only catches
IMMEDIATELY SEQUENTIAL duplicates:
- test_find_complete_json_object_end_* - Tests for JSON parsing helper
- test_same_tool_with_text_between_not_duplicate - Key test ensuring
tool calls separated by text are NOT duplicates
- test_different_tools_back_to_back_not_duplicate
- test_same_tool_different_args_not_duplicate
- test_identical_tool_calls_back_to_back_are_duplicates
- test_has_text_after_tool_call - Tests text detection logic
- test_tool_call_with_newlines_between
- test_tool_call_with_whitespace_text_between
- test_tool_call_in_middle_of_text
- test_multiple_different_tool_calls_with_text
Also made find_complete_json_object_end public for testing.
1. Set tool_executed=true when a tool call is detected, even if skipped
as a duplicate. This prevents the raw JSON from being printed to screen
when a tool call is detected but not executed.
2. Remove session-level duplicate detection entirely. All tools should be
allowed to be called multiple times in a session.
3. Fix sequential duplicate detection to only catch IMMEDIATELY sequential
duplicates:
- DUP IN CHUNK: Now only checks if the PREVIOUS tool call in the chunk
is the same (not any tool call in the chunk)
- DUP IN MSG: Now only checks if the LAST tool call in the previous
message matches AND there's no text after it. If there's any
non-whitespace text between tool calls, they're not considered
duplicates.
This allows legitimate re-use of tools while still catching cases where
the LLM stutters and outputs the same tool call twice in a row.
The bug was in the chunk.finished block inside stream_completion_with_tools.
When no tool was executed in the CURRENT iteration (!tool_executed), the code
would return early without checking if tools were executed in PREVIOUS iterations
(any_tool_executed) and final_output was never called.
This caused the agent to terminate prematurely after executing tools like
todo_read when the LLM responded with text instead of calling final_output.
The fix adds a check: if any_tool_executed && !final_output_called, we break
to let the outer loop's auto-continue logic prompt the LLM to continue.
Also fixed missing debug! import in g3-console/src/main.rs.
Converted ~77 info! macro calls to debug! across the codebase to prevent
log messages from interrupting the CLI experience during normal operation.
Users can still see these logs by setting RUST_LOG=debug if needed.
Affected crates:
- g3-cli
- g3-computer-control
- g3-console
- g3-core
- g3-ensembles
- g3-execution
- g3-providers
The bug: When the LLM emitted multiple tool calls in one response (e.g.,
str_replace followed by shell), only the first tool was executed. The
remaining tools were lost because mark_tool_calls_consumed() was called
BEFORE processing, marking ALL tools as consumed even when only ONE was
being processed.
This caused has_unexecuted_tool_call() to return false after executing
the first tool, so the parser was reset and the remaining tool calls
were discarded. The auto-continue logic never triggered because it
thought all tools had been handled.
The fix: Remove the premature mark_tool_calls_consumed() call. The
existing logic at line 4696-4699 already handles marking tools as
consumed AFTER execution, and correctly checks for remaining unexecuted
tools before deciding whether to reset the parser.
- Add last_consumed_position tracking to StreamingToolParser to prevent
re-detecting already-executed tool calls
- Add mark_tool_calls_consumed() method to mark tool calls as processed
- Add find_first_tool_call_start() for forward scanning of tool patterns
- Replace try_parse_json_tool_call_from_buffer() with
try_parse_all_json_tool_calls_from_buffer() to find ALL tool calls
- Update has_incomplete_tool_call() and has_unexecuted_tool_call() to
only check unconsumed portion of buffer
- Fix tool execution loop to not reset parser when unexecuted tools remain
- Simplify should_auto_continue logic (remove redundant condition)
- Add comprehensive tests for auto-continue condition logic
Properly separates UI display concern from core library:
- fixed_filter_json module now lives in g3-cli (UI layer)
- UiWriter trait gains filter_json_tool_calls() and reset_json_filter() methods
- g3-core delegates filtering to UI layer via trait methods
- Different UiWriter implementations can choose their own filtering behavior
- ConsoleUiWriter filters JSON tool calls for clean terminal display
- MachineUiWriter/NullUiWriter use default pass-through
Benefits:
- Proper separation of concerns
- Core stays clean without display-specific logic
- Testability - filter can be tested independently in g3-cli
- Add final_output_called flag to track if LLM properly completed
- Auto-continue with prompt if tools executed but final_output missing
- Remove unused last_action_was_tool and any_text_response variables
- Simplifies previous complex incomplete response detection logic
Chrome headless has too many issues:
- Session creation hangs when Chrome is already running
- Cloudflare and other bot protection blocks headless browsers
- Version mismatch issues between Chrome and ChromeDriver
Safari is more reliable for web automation on macOS.
Chrome headless is still available via --chrome-headless flag.
- Add unique user-data-dir per process to avoid profile conflicts
- Add 30-second timeout to connection attempts to prevent indefinite hangs
- Fix borrow checker issue with ClientBuilder
The session creation was hanging because ChromeDriver was trying to
use the same profile as the running Chrome browser. Using a unique
temp directory (/tmp/g3-chrome-{pid}) isolates the headless session.
- Add setup script (scripts/setup-chrome-for-testing.sh) that downloads
matching Chrome and ChromeDriver versions from Google's CDN
- Add chrome_binary config option to specify custom Chrome binary path
- Update ChromeDriver to support custom binary via with_port_headless_and_binary()
- Update README with Chrome for Testing setup instructions
- Update config.example.toml with chrome_binary documentation
Chrome for Testing is Google's dedicated browser for automated testing
that guarantees version compatibility with ChromeDriver, avoiding the
common 'version mismatch' errors when Chrome auto-updates.
- Add --safari flag to CLI for explicitly choosing Safari
- Update --chrome-headless flag description to indicate it's the default
- Update README to reflect Chrome headless as default
- Remove broken link to non-existent docs/webdriver-setup.md
- Add Safari flag handling in all webdriver config locations
The config already had ChromeHeadless as the default, this commit
updates the CLI and documentation to match.