- Add last_consumed_position tracking to StreamingToolParser to prevent
re-detecting already-executed tool calls
- Add mark_tool_calls_consumed() method to mark tool calls as processed
- Add find_first_tool_call_start() for forward scanning of tool patterns
- Replace try_parse_json_tool_call_from_buffer() with
try_parse_all_json_tool_calls_from_buffer() to find ALL tool calls
- Update has_incomplete_tool_call() and has_unexecuted_tool_call() to
only check unconsumed portion of buffer
- Fix tool execution loop to not reset parser when unexecuted tools remain
- Simplify should_auto_continue logic (remove redundant condition)
- Add comprehensive tests for auto-continue condition logic
Properly separates UI display concern from core library:
- fixed_filter_json module now lives in g3-cli (UI layer)
- UiWriter trait gains filter_json_tool_calls() and reset_json_filter() methods
- g3-core delegates filtering to UI layer via trait methods
- Different UiWriter implementations can choose their own filtering behavior
- ConsoleUiWriter filters JSON tool calls for clean terminal display
- MachineUiWriter/NullUiWriter use default pass-through
Benefits:
- Proper separation of concerns
- Core stays clean without display-specific logic
- Testability - filter can be tested independently in g3-cli
- Add final_output_called flag to track if LLM properly completed
- Auto-continue with prompt if tools executed but final_output missing
- Remove unused last_action_was_tool and any_text_response variables
- Simplifies previous complex incomplete response detection logic
Chrome headless has too many issues:
- Session creation hangs when Chrome is already running
- Cloudflare and other bot protection blocks headless browsers
- Version mismatch issues between Chrome and ChromeDriver
Safari is more reliable for web automation on macOS.
Chrome headless is still available via --chrome-headless flag.
- Add unique user-data-dir per process to avoid profile conflicts
- Add 30-second timeout to connection attempts to prevent indefinite hangs
- Fix borrow checker issue with ClientBuilder
The session creation was hanging because ChromeDriver was trying to
use the same profile as the running Chrome browser. Using a unique
temp directory (/tmp/g3-chrome-{pid}) isolates the headless session.
- Add setup script (scripts/setup-chrome-for-testing.sh) that downloads
matching Chrome and ChromeDriver versions from Google's CDN
- Add chrome_binary config option to specify custom Chrome binary path
- Update ChromeDriver to support custom binary via with_port_headless_and_binary()
- Update README with Chrome for Testing setup instructions
- Update config.example.toml with chrome_binary documentation
Chrome for Testing is Google's dedicated browser for automated testing
that guarantees version compatibility with ChromeDriver, avoiding the
common 'version mismatch' errors when Chrome auto-updates.
- Add --safari flag to CLI for explicitly choosing Safari
- Update --chrome-headless flag description to indicate it's the default
- Update README to reflect Chrome headless as default
- Remove broken link to non-existent docs/webdriver-setup.md
- Add Safari flag handling in all webdriver config locations
The config already had ChromeHeadless as the default, this commit
updates the CLI and documentation to match.
Add file.flush() call in append_entry() to ensure planner history
entries are written to disk before git commits execute. While the
file handle drop should flush, explicit flush simplifies reasoning
about the ordering invariant.
Extend code comments in stage_and_commit() to document that the
write_git_commit-before-git::commit ordering has regressed multiple
times and must be preserved in any refactoring.
Requirements: completed_requirements_2025-12-11_10-05-08.md
Ensure planner writes GIT COMMIT entry before invoking git commit.
Keep history entry even when git commit fails, matching summary text.
Document invariant in code comment above write_git_commit call.
Add lightweight test to assert history write precedes git::commit using
test doubles instead of a real git repository.
Investigate git history to find regression and its prior fix, and
record a short root-cause summary outside the codebase.
Reference completed_requirements_2025-12-10_16-55-05.md for details.
Reference completed_todo_2025-12-10_16-55-05.md for task tracking.
Resolve two critical issues in planner mode that persisted through
multiple fix attempts:
1. Remove excessive whitespace between tool call displays by replacing
direct println!() calls with ui_writer methods and eliminating
redundant newlines in agent response streaming.
2. Ensure all log files (errors, sessions, tool calls, context dumps)
are written to <workspace>/logs instead of codepath by properly
initializing G3_WORKSPACE_PATH from --workspace argument.
Improve planner mode user experience with better error reporting,
cleaner tool output, and consistent log file placement.
- Propagate and display classified LLM errors to users with
appropriate icons and context
- Display tool calls on single lines with truncated arguments
- Show LLM text responses without overwriting via UiWriter
- Ensure all logs write to workspace/logs directory consistently
- Set G3_WORKSPACE_PATH early in planning mode initialization
- Display coach feedback content (up to 25 lines) instead of just length
- Write GIT COMMIT entry to history before actual commit for better a...
- Implement single-line status updates during LLM processing with too...
- Display non-tool LLM text responses in planner UI
- Redirect all logs to <workspace>/logs directory instead of codepath
- Preserve TODO file in planner mode for history (prevent deletion)
Completed files:
- completed_requirements_2025-12-09_16-16-51.md
- completed_todo_2025-12-09_16-16-51.md
When the CW is full, max_tokens is often passed at 0 or tiny. The LLM will fail. For Anthropic with thining, there is also the thinking budget.
This can happen during summary attempts, in that case
first try thinnify, skinnify etc..