- Add TODO completion check to final_output tool in autonomous mode only
- When incomplete TODO items exist, reject final_output and prompt LLM to continue
- Non-autonomous modes (interactive, chat) are unaffected
- Add 6 tests verifying behavior in both autonomous and non-autonomous modes
Fixes issue where LLM would call final_output after completing first phase,
causing agent to stop prematurely instead of continuing with remaining phases.
Adds a new tool that allows launching processes (like game servers) in the
background while g3 continues to operate. The process runs independently
with stdout/stderr captured to a log file.
Features:
- Named process tracking for easy reference
- Automatic log capture to logs/background_processes/
- Returns PID and log file path for use with shell tool
- Automatic cleanup on agent shutdown via Drop trait
Usage: Use shell tool to interact with the process:
- Read logs: tail -100 <logfile>
- Check status: ps -p <pid>
- Stop process: kill <pid>
Files:
- New: crates/g3-core/src/background_process.rs
- New: crates/g3-core/tests/background_process_demo_test.rs
- Modified: crates/g3-core/src/lib.rs (tool definition + handler)
- Modified: crates/g3-core/src/prompts.rs (documentation)
Added 13 tests to verify that duplicate detection only catches
IMMEDIATELY SEQUENTIAL duplicates:
- test_find_complete_json_object_end_* - Tests for JSON parsing helper
- test_same_tool_with_text_between_not_duplicate - Key test ensuring
tool calls separated by text are NOT duplicates
- test_different_tools_back_to_back_not_duplicate
- test_same_tool_different_args_not_duplicate
- test_identical_tool_calls_back_to_back_are_duplicates
- test_has_text_after_tool_call - Tests text detection logic
- test_tool_call_with_newlines_between
- test_tool_call_with_whitespace_text_between
- test_tool_call_in_middle_of_text
- test_multiple_different_tool_calls_with_text
Also made find_complete_json_object_end public for testing.
- Add last_consumed_position tracking to StreamingToolParser to prevent
re-detecting already-executed tool calls
- Add mark_tool_calls_consumed() method to mark tool calls as processed
- Add find_first_tool_call_start() for forward scanning of tool patterns
- Replace try_parse_json_tool_call_from_buffer() with
try_parse_all_json_tool_calls_from_buffer() to find ALL tool calls
- Update has_incomplete_tool_call() and has_unexecuted_tool_call() to
only check unconsumed portion of buffer
- Fix tool execution loop to not reset parser when unexecuted tools remain
- Simplify should_auto_continue logic (remove redundant condition)
- Add comprehensive tests for auto-continue condition logic
Properly separates UI display concern from core library:
- fixed_filter_json module now lives in g3-cli (UI layer)
- UiWriter trait gains filter_json_tool_calls() and reset_json_filter() methods
- g3-core delegates filtering to UI layer via trait methods
- Different UiWriter implementations can choose their own filtering behavior
- ConsoleUiWriter filters JSON tool calls for clean terminal display
- MachineUiWriter/NullUiWriter use default pass-through
Benefits:
- Proper separation of concerns
- Core stays clean without display-specific logic
- Testability - filter can be tested independently in g3-cli
- Add final_output_called flag to track if LLM properly completed
- Auto-continue with prompt if tools executed but final_output missing
- Remove unused last_action_was_tool and any_text_response variables
- Simplifies previous complex incomplete response detection logic
Improve planner mode user experience with better error reporting,
cleaner tool output, and consistent log file placement.
- Propagate and display classified LLM errors to users with
appropriate icons and context
- Display tool calls on single lines with truncated arguments
- Show LLM text responses without overwriting via UiWriter
- Ensure all logs write to workspace/logs directory consistently
- Set G3_WORKSPACE_PATH early in planning mode initialization
When the CW is full, max_tokens is often passed at 0 or tiny. The LLM will fail. For Anthropic with thining, there is also the thinking budget.
This can happen during summary attempts, in that case
first try thinnify, skinnify etc..
Writes the current context window to logs/current_context_window (uses a symlink to a session ID).
This PR was unfortunately generated by a different LLM and did a ton of superficial reformating, it's actually a fairly small and benign change, but I don't want to roll back everything. Hope that's ok.