Adds a new CLI flag that allows users to force a new session when running
in agent mode, bypassing the automatic detection and resumption of
incomplete sessions.
Usage: g3 --agent my-agent --new-session
New test files:
- crates/g3-cli/tests/cli_integration_test.rs (14 tests)
Blackbox CLI tests: help/version flags, argument validation,
conflicting modes, flock mode requirements
- crates/g3-core/tests/tool_execution_test.rs (20 tests)
Tool call structure tests and unified diff application:
read_file, write_file, str_replace, shell, background_process,
todo, final_output, code_search, take_screenshot
- crates/g3-providers/tests/message_serialization_test.rs (20 tests)
Round-trip serialization tests for Message, MessageRole,
CacheControl, and Tool types. Covers Unicode, special chars,
and edge cases.
All tests follow blackbox/integration-first principles with
documentation of what they protect and intentionally do not assert.
- Extract check_duplicate_in_previous_message() helper to reduce nesting
from 6+ levels to 2 levels in stream_completion_with_tools
- Create do_thin_context() and do_thin_context_all() helpers to centralize
context thinning with event tracking
- Use provider_config::parse_provider_ref() in additional call sites
- All 295 tests pass
This continues the refactoring to eliminate code-path aliasing and
reduce cyclomatic complexity in the Agent implementation.
The blanket *.json ignore is not canonical for Rust projects.
JSON files that need ignoring are already covered by:
- .g3/ for session logs
- logs/ for error logs
- .build for Swift build artifacts
The Euler agent must now update AGENTS.md after generating artifacts:
- Add/update 'Dependency Analysis Artifacts' section
- Table listing each file in analysis/deps/ with one-line descriptions
- No findings, metrics, or recommendations in AGENTS.md
macOS uses U+202F (Narrow No-Break Space) in screenshot filenames
between the time and am/pm. When users type or paste these paths,
they use regular spaces, causing file-not-found errors.
Changes:
- Add resolve_path_with_unicode_fallback() to try U+202F variants
- Add resolve_paths_in_shell_command() for shell command paths
- Apply fix to read_file, read_image, and shell tools
- Fix read_image prompt docs: file_path -> file_paths (array)
- Add 6 unit tests for Unicode space normalization
- Add TODO completion check to final_output tool in autonomous mode only
- When incomplete TODO items exist, reject final_output and prompt LLM to continue
- Non-autonomous modes (interactive, chat) are unaffected
- Add 6 tests verifying behavior in both autonomous and non-autonomous modes
Fixes issue where LLM would call final_output after completing first phase,
causing agent to stop prematurely instead of continuing with remaining phases.
- Fixed run_agent_mode to call std::env::set_current_dir with workspace_dir
- Updated fowler.md to read README.md and AGENTS.md as part of Triage & Understanding step
- Move tool_executed = true after duplicate check to prevent auto-continue
from triggering when only duplicate tools were detected
- Reset parser state when duplicate detected to clear any partial/polluted
state from LLM stuttering or example tool calls in markdown blocks
- Print └─ before images to break out of tool output box
- Print ┌─ after images to resume tool output box
- Remove │ prefix from image preview and info lines
- Info line uses single space prefix, dimmed text
- Only include error messages in tool result (success info printed via imgcat)
- Remove │ prefix before image preview, use single space instead
- Keep info line on its own line with │ prefix
- Keep blank line spacing between images
- Remove unused assignment to final_output_called (returns immediately after)
- Mark cache_config field as #[allow(dead_code)] (reserved for future use)
- Mark print_status_line method as #[allow(dead_code)] (reserved for future use)
TODO lists are now stored in .g3/sessions/<session_id>/todo.g3.md instead
of the workspace root. This prevents different g3 sessions from accidentally
picking up or overwriting each other's TODOs.
Changes:
- Add get_session_todo_path() function in paths.rs
- Update todo_read/todo_write handlers to use session-specific paths
- Remove TODO loading at Agent initialization (sessions start fresh)
- Update prompts to reflect session-scoped behavior
Fallback behavior preserved for planner mode (G3_TODO_PATH env var).
Adds a new tool that allows launching processes (like game servers) in the
background while g3 continues to operate. The process runs independently
with stdout/stderr captured to a log file.
Features:
- Named process tracking for easy reference
- Automatic log capture to logs/background_processes/
- Returns PID and log file path for use with shell tool
- Automatic cleanup on agent shutdown via Drop trait
Usage: Use shell tool to interact with the process:
- Read logs: tail -100 <logfile>
- Check status: ps -p <pid>
- Stop process: kill <pid>
Files:
- New: crates/g3-core/src/background_process.rs
- New: crates/g3-core/tests/background_process_demo_test.rs
- Modified: crates/g3-core/src/lib.rs (tool definition + handler)
- Modified: crates/g3-core/src/prompts.rs (documentation)
When the LLM outputs text containing tool call patterns (e.g., reading
log files, showing examples, or discussing tool calls), the parser's
has_unexecuted_tool_call() would detect these as real tool calls and
trigger auto-continue, leading to repeated empty responses.
The fix: mark the parser buffer as consumed when content is displayed.
This prevents tool-call-like patterns in displayed text from triggering
false positives later. The fix is safe because:
1. Only runs when no tool was detected (inside 'if !tool_executed')
2. Legitimate tool calls are detected first by process_chunk()
3. Matches existing pattern of calling mark_tool_calls_consumed()
after tool execution
The auto-continue logic was adding User continue prompts without first
adding an Assistant message when the LLM returned an empty response.
This caused consecutive User messages in the conversation history,
which confused the LLM and caused it to return more empty responses.
The fix ensures an Assistant message is always added before the continue
prompt, using '[empty response]' as a placeholder when the LLM returned
nothing substantive. This maintains proper User/Assistant alternation.
Reduce lib.rs from 7481 to 6557 lines (-12.4%) by extracting:
- paths.rs: Session/workspace path utilities (get_todo_path, get_logs_dir, etc.)
- streaming_parser.rs: StreamingToolParser for LLM response parsing
- utils.rs: Diff parsing and shell escaping utilities
- webdriver_session.rs: Unified Safari/Chrome WebDriver abstraction
All public APIs preserved via re-exports for backward compatibility.
Added 13 new unit tests across extracted modules.
All 225 tests pass.
Added a quality-of-life feature that displays:
- Tokens used in the current turn (from LLM response, not estimated)
- Current context window usage percentage
These are displayed dimmed after the timing info:
⏱️ 1.2s | 💭 0.3s 1234tk | 45% ctx
The token count comes directly from the LLM's usage response data,
not from any estimation. If no usage data is available from the LLM,
only the context percentage is shown.
Added 13 tests to verify that duplicate detection only catches
IMMEDIATELY SEQUENTIAL duplicates:
- test_find_complete_json_object_end_* - Tests for JSON parsing helper
- test_same_tool_with_text_between_not_duplicate - Key test ensuring
tool calls separated by text are NOT duplicates
- test_different_tools_back_to_back_not_duplicate
- test_same_tool_different_args_not_duplicate
- test_identical_tool_calls_back_to_back_are_duplicates
- test_has_text_after_tool_call - Tests text detection logic
- test_tool_call_with_newlines_between
- test_tool_call_with_whitespace_text_between
- test_tool_call_in_middle_of_text
- test_multiple_different_tool_calls_with_text
Also made find_complete_json_object_end public for testing.