- Extract check_duplicate_in_previous_message() helper to reduce nesting
from 6+ levels to 2 levels in stream_completion_with_tools
- Create do_thin_context() and do_thin_context_all() helpers to centralize
context thinning with event tracking
- Use provider_config::parse_provider_ref() in additional call sites
- All 295 tests pass
This continues the refactoring to eliminate code-path aliasing and
reduce cyclomatic complexity in the Agent implementation.
The blanket *.json ignore is not canonical for Rust projects.
JSON files that need ignoring are already covered by:
- .g3/ for session logs
- logs/ for error logs
- .build for Swift build artifacts
The Euler agent must now update AGENTS.md after generating artifacts:
- Add/update 'Dependency Analysis Artifacts' section
- Table listing each file in analysis/deps/ with one-line descriptions
- No findings, metrics, or recommendations in AGENTS.md
macOS uses U+202F (Narrow No-Break Space) in screenshot filenames
between the time and am/pm. When users type or paste these paths,
they use regular spaces, causing file-not-found errors.
Changes:
- Add resolve_path_with_unicode_fallback() to try U+202F variants
- Add resolve_paths_in_shell_command() for shell command paths
- Apply fix to read_file, read_image, and shell tools
- Fix read_image prompt docs: file_path -> file_paths (array)
- Add 6 unit tests for Unicode space normalization
- Add TODO completion check to final_output tool in autonomous mode only
- When incomplete TODO items exist, reject final_output and prompt LLM to continue
- Non-autonomous modes (interactive, chat) are unaffected
- Add 6 tests verifying behavior in both autonomous and non-autonomous modes
Fixes issue where LLM would call final_output after completing first phase,
causing agent to stop prematurely instead of continuing with remaining phases.
- Fixed run_agent_mode to call std::env::set_current_dir with workspace_dir
- Updated fowler.md to read README.md and AGENTS.md as part of Triage & Understanding step
- Move tool_executed = true after duplicate check to prevent auto-continue
from triggering when only duplicate tools were detected
- Reset parser state when duplicate detected to clear any partial/polluted
state from LLM stuttering or example tool calls in markdown blocks
- Print └─ before images to break out of tool output box
- Print ┌─ after images to resume tool output box
- Remove │ prefix from image preview and info lines
- Info line uses single space prefix, dimmed text
- Only include error messages in tool result (success info printed via imgcat)
- Remove │ prefix before image preview, use single space instead
- Keep info line on its own line with │ prefix
- Keep blank line spacing between images
- Remove unused assignment to final_output_called (returns immediately after)
- Mark cache_config field as #[allow(dead_code)] (reserved for future use)
- Mark print_status_line method as #[allow(dead_code)] (reserved for future use)
TODO lists are now stored in .g3/sessions/<session_id>/todo.g3.md instead
of the workspace root. This prevents different g3 sessions from accidentally
picking up or overwriting each other's TODOs.
Changes:
- Add get_session_todo_path() function in paths.rs
- Update todo_read/todo_write handlers to use session-specific paths
- Remove TODO loading at Agent initialization (sessions start fresh)
- Update prompts to reflect session-scoped behavior
Fallback behavior preserved for planner mode (G3_TODO_PATH env var).
Adds a new tool that allows launching processes (like game servers) in the
background while g3 continues to operate. The process runs independently
with stdout/stderr captured to a log file.
Features:
- Named process tracking for easy reference
- Automatic log capture to logs/background_processes/
- Returns PID and log file path for use with shell tool
- Automatic cleanup on agent shutdown via Drop trait
Usage: Use shell tool to interact with the process:
- Read logs: tail -100 <logfile>
- Check status: ps -p <pid>
- Stop process: kill <pid>
Files:
- New: crates/g3-core/src/background_process.rs
- New: crates/g3-core/tests/background_process_demo_test.rs
- Modified: crates/g3-core/src/lib.rs (tool definition + handler)
- Modified: crates/g3-core/src/prompts.rs (documentation)
When the LLM outputs text containing tool call patterns (e.g., reading
log files, showing examples, or discussing tool calls), the parser's
has_unexecuted_tool_call() would detect these as real tool calls and
trigger auto-continue, leading to repeated empty responses.
The fix: mark the parser buffer as consumed when content is displayed.
This prevents tool-call-like patterns in displayed text from triggering
false positives later. The fix is safe because:
1. Only runs when no tool was detected (inside 'if !tool_executed')
2. Legitimate tool calls are detected first by process_chunk()
3. Matches existing pattern of calling mark_tool_calls_consumed()
after tool execution
The auto-continue logic was adding User continue prompts without first
adding an Assistant message when the LLM returned an empty response.
This caused consecutive User messages in the conversation history,
which confused the LLM and caused it to return more empty responses.
The fix ensures an Assistant message is always added before the continue
prompt, using '[empty response]' as a placeholder when the LLM returned
nothing substantive. This maintains proper User/Assistant alternation.
Reduce lib.rs from 7481 to 6557 lines (-12.4%) by extracting:
- paths.rs: Session/workspace path utilities (get_todo_path, get_logs_dir, etc.)
- streaming_parser.rs: StreamingToolParser for LLM response parsing
- utils.rs: Diff parsing and shell escaping utilities
- webdriver_session.rs: Unified Safari/Chrome WebDriver abstraction
All public APIs preserved via re-exports for backward compatibility.
Added 13 new unit tests across extracted modules.
All 225 tests pass.
Added a quality-of-life feature that displays:
- Tokens used in the current turn (from LLM response, not estimated)
- Current context window usage percentage
These are displayed dimmed after the timing info:
⏱️ 1.2s | 💭 0.3s 1234tk | 45% ctx
The token count comes directly from the LLM's usage response data,
not from any estimation. If no usage data is available from the LLM,
only the context percentage is shown.
Added 13 tests to verify that duplicate detection only catches
IMMEDIATELY SEQUENTIAL duplicates:
- test_find_complete_json_object_end_* - Tests for JSON parsing helper
- test_same_tool_with_text_between_not_duplicate - Key test ensuring
tool calls separated by text are NOT duplicates
- test_different_tools_back_to_back_not_duplicate
- test_same_tool_different_args_not_duplicate
- test_identical_tool_calls_back_to_back_are_duplicates
- test_has_text_after_tool_call - Tests text detection logic
- test_tool_call_with_newlines_between
- test_tool_call_with_whitespace_text_between
- test_tool_call_in_middle_of_text
- test_multiple_different_tool_calls_with_text
Also made find_complete_json_object_end public for testing.
1. Set tool_executed=true when a tool call is detected, even if skipped
as a duplicate. This prevents the raw JSON from being printed to screen
when a tool call is detected but not executed.
2. Remove session-level duplicate detection entirely. All tools should be
allowed to be called multiple times in a session.
3. Fix sequential duplicate detection to only catch IMMEDIATELY sequential
duplicates:
- DUP IN CHUNK: Now only checks if the PREVIOUS tool call in the chunk
is the same (not any tool call in the chunk)
- DUP IN MSG: Now only checks if the LAST tool call in the previous
message matches AND there's no text after it. If there's any
non-whitespace text between tool calls, they're not considered
duplicates.
This allows legitimate re-use of tools while still catching cases where
the LLM stutters and outputs the same tool call twice in a row.
The bug was in the chunk.finished block inside stream_completion_with_tools.
When no tool was executed in the CURRENT iteration (!tool_executed), the code
would return early without checking if tools were executed in PREVIOUS iterations
(any_tool_executed) and final_output was never called.
This caused the agent to terminate prematurely after executing tools like
todo_read when the LLM responded with text instead of calling final_output.
The fix adds a check: if any_tool_executed && !final_output_called, we break
to let the outer loop's auto-continue logic prompt the LLM to continue.
Also fixed missing debug! import in g3-console/src/main.rs.
Converted ~77 info! macro calls to debug! across the codebase to prevent
log messages from interrupting the CLI experience during normal operation.
Users can still see these logs by setting RUST_LOG=debug if needed.
Affected crates:
- g3-cli
- g3-computer-control
- g3-console
- g3-core
- g3-ensembles
- g3-execution
- g3-providers
The bug: When the LLM emitted multiple tool calls in one response (e.g.,
str_replace followed by shell), only the first tool was executed. The
remaining tools were lost because mark_tool_calls_consumed() was called
BEFORE processing, marking ALL tools as consumed even when only ONE was
being processed.
This caused has_unexecuted_tool_call() to return false after executing
the first tool, so the parser was reset and the remaining tool calls
were discarded. The auto-continue logic never triggered because it
thought all tools had been handled.
The fix: Remove the premature mark_tool_calls_consumed() call. The
existing logic at line 4696-4699 already handles marking tools as
consumed AFTER execution, and correctly checks for remaining unexecuted
tools before deciding whether to reset the parser.
- Add last_consumed_position tracking to StreamingToolParser to prevent
re-detecting already-executed tool calls
- Add mark_tool_calls_consumed() method to mark tool calls as processed
- Add find_first_tool_call_start() for forward scanning of tool patterns
- Replace try_parse_json_tool_call_from_buffer() with
try_parse_all_json_tool_calls_from_buffer() to find ALL tool calls
- Update has_incomplete_tool_call() and has_unexecuted_tool_call() to
only check unconsumed portion of buffer
- Fix tool execution loop to not reset parser when unexecuted tools remain
- Simplify should_auto_continue logic (remove redundant condition)
- Add comprehensive tests for auto-continue condition logic