Commit Graph

389 Commits

Author SHA1 Message Date
Dhanji R. Prasanna
76bfb77f84 further fowler fixes and session fixes 2026-01-03 15:47:04 +11:00
Dhanji R. Prasanna
65867e7f96 refactor tools out of lib.rs 2026-01-03 15:06:34 +11:00
Dhanji R. Prasanna
595ad6ad21 agent mode resumption 2026-01-03 14:50:08 +11:00
Dhanji R. Prasanna
016efc1db6 Prevent agent mode from stopping after first TODO phase
- Add TODO completion check to final_output tool in autonomous mode only
- When incomplete TODO items exist, reject final_output and prompt LLM to continue
- Non-autonomous modes (interactive, chat) are unaffected
- Add 6 tests verifying behavior in both autonomous and non-autonomous modes

Fixes issue where LLM would call final_output after completing first phase,
causing agent to stop prematurely instead of continuing with remaining phases.
2025-12-27 12:35:31 +11:00
Dhanji R. Prasanna
8d071d5eed fix: fowler agent now respects --workspace flag and reads project docs
- Fixed run_agent_mode to call std::env::set_current_dir with workspace_dir
- Updated fowler.md to read README.md and AGENTS.md as part of Triage & Understanding step
2025-12-26 15:24:20 +11:00
Dhanji R. Prasanna
4c25e43ee4 refactoring 2025-12-26 15:16:12 +11:00
Dhanji R. Prasanna
7e59e181f7 context line ui 2025-12-26 12:58:13 +11:00
Dhanji R. Prasanna
666be4ff40 Fix duplicate tool call handling: move tool_executed flag and reset parser
- Move tool_executed = true after duplicate check to prevent auto-continue
  from triggering when only duplicate tools were detected
- Reset parser state when duplicate detected to clear any partial/polluted
  state from LLM stuttering or example tool calls in markdown blocks
2025-12-26 11:55:57 +11:00
Dhanji R. Prasanna
46611d9e13 Improve read_image output formatting
- Add newline after └─ before first image preview
- Show only filename (not full path) in info line
2025-12-26 11:36:10 +11:00
Dhanji R. Prasanna
2a4dad2842 Update read_image output with box drawing characters
- Print └─ before images to break out of tool output box
- Print ┌─ after images to resume tool output box
- Remove │ prefix from image preview and info lines
- Info line uses single space prefix, dimmed text
- Only include error messages in tool result (success info printed via imgcat)
2025-12-26 11:29:33 +11:00
Dhanji R. Prasanna
e688d3b29f Simplify read_image imgcat output formatting
- Remove │ prefix before image preview, use single space instead
- Keep info line on its own line with │ prefix
- Keep blank line spacing between images
2025-12-26 11:24:13 +11:00
Dhanji R. Prasanna
3601cc0547 Enhance read_image tool with magic byte detection and multi-image support
- Fix media type detection using magic bytes instead of file extension
  - Correctly identifies JPEG files with .png extension (and vice versa)
  - Supports PNG, JPEG, GIF, and WebP formats

- Add multi-image support with file_paths array parameter
  - Load multiple images in a single tool call
  - All images queued for LLM analysis

- Enhanced CLI output:
  - Inline image preview via iTerm2 imgcat protocol (height=5)
  - Dimmed info line showing: path | dimensions | media type | file size
  - Proper │ prefix alignment with tool output boxing
  - Human-readable file sizes (bytes, KB, MB)

- Add image dimension extraction from file headers
  - PNG, JPEG, GIF, WebP dimension parsing

- Add comprehensive tests for magic byte detection and dimensions
2025-12-26 11:19:37 +11:00
Dhanji R. Prasanna
3ece02ff31 fix: resolve compiler warnings across crates
- Remove unused assignment to final_output_called (returns immediately after)
- Mark cache_config field as #[allow(dead_code)] (reserved for future use)
- Mark print_status_line method as #[allow(dead_code)] (reserved for future use)
2025-12-25 18:47:22 +11:00
Dhanji R. Prasanna
258f9878ff style: use ◉ symbol for token count in timing footer
Changes '227tk | 48% ctx' to '227 ◉ | 48%' for a cleaner look.
2025-12-25 18:40:17 +11:00
Dhanji R. Prasanna
d09c80180e fix: remove redundant TODO list header that breaks boxing effect 2025-12-25 18:34:51 +11:00
Dhanji R. Prasanna
64f27c0abc feat: move TODO lists to session-scoped directories
TODO lists are now stored in .g3/sessions/<session_id>/todo.g3.md instead
of the workspace root. This prevents different g3 sessions from accidentally
picking up or overwriting each other's TODOs.

Changes:
- Add get_session_todo_path() function in paths.rs
- Update todo_read/todo_write handlers to use session-specific paths
- Remove TODO loading at Agent initialization (sessions start fresh)
- Update prompts to reflect session-scoped behavior

Fallback behavior preserved for planner mode (G3_TODO_PATH env var).
2025-12-25 18:33:03 +11:00
Dhanji R. Prasanna
d9c58576a1 feat: add background_process tool for launching long-running processes
Adds a new tool that allows launching processes (like game servers) in the
background while g3 continues to operate. The process runs independently
with stdout/stderr captured to a log file.

Features:
- Named process tracking for easy reference
- Automatic log capture to logs/background_processes/
- Returns PID and log file path for use with shell tool
- Automatic cleanup on agent shutdown via Drop trait

Usage: Use shell tool to interact with the process:
- Read logs: tail -100 <logfile>
- Check status: ps -p <pid>
- Stop process: kill <pid>

Files:
- New: crates/g3-core/src/background_process.rs
- New: crates/g3-core/tests/background_process_demo_test.rs
- Modified: crates/g3-core/src/lib.rs (tool definition + handler)
- Modified: crates/g3-core/src/prompts.rs (documentation)
2025-12-25 18:23:10 +11:00
Dhanji R. Prasanna
9ff5ba6098 Fix auto-continue false positives from tool-call-like content
When the LLM outputs text containing tool call patterns (e.g., reading
log files, showing examples, or discussing tool calls), the parser's
has_unexecuted_tool_call() would detect these as real tool calls and
trigger auto-continue, leading to repeated empty responses.

The fix: mark the parser buffer as consumed when content is displayed.
This prevents tool-call-like patterns in displayed text from triggering
false positives later. The fix is safe because:

1. Only runs when no tool was detected (inside 'if !tool_executed')
2. Legitimate tool calls are detected first by process_chunk()
3. Matches existing pattern of calling mark_tool_calls_consumed()
   after tool execution
2025-12-25 17:55:13 +11:00
Dhanji R. Prasanna
f9d0c33461 Revert "Fix auto-continue bug: ensure assistant message before continue prompt"
This reverts commit fe96969adb.
2025-12-24 15:52:23 +11:00
Dhanji R. Prasanna
fe96969adb Fix auto-continue bug: ensure assistant message before continue prompt
The auto-continue logic was adding User continue prompts without first
adding an Assistant message when the LLM returned an empty response.
This caused consecutive User messages in the conversation history,
which confused the LLM and caused it to return more empty responses.

The fix ensures an Assistant message is always added before the continue
prompt, using '[empty response]' as a placeholder when the LLM returned
nothing substantive. This maintains proper User/Assistant alternation.
2025-12-24 15:50:30 +11:00
Dhanji R. Prasanna
cd64ebbf87 Add tokens consumed and context percentage to per-tool timing footer
The per-tool timing line now shows:
- Tokens delta (tokens added to context by this tool call)
- Context window usage percentage

Example: └─ ️ 1ms  523tk | 49% ctx

Changes:
- Updated UiWriter trait print_tool_timing signature
- Track tokens before/after adding tool messages to calculate delta
- Updated ConsoleUiWriter, MachineUiWriter, PlannerUiWriter, and test mocks
2025-12-24 15:44:19 +11:00
Dhanji R. Prasanna
fd22ce9890 refactor(g3-core): extract 4 modules from monolithic lib.rs
Reduce lib.rs from 7481 to 6557 lines (-12.4%) by extracting:

- paths.rs: Session/workspace path utilities (get_todo_path, get_logs_dir, etc.)
- streaming_parser.rs: StreamingToolParser for LLM response parsing
- utils.rs: Diff parsing and shell escaping utilities
- webdriver_session.rs: Unified Safari/Chrome WebDriver abstraction

All public APIs preserved via re-exports for backward compatibility.
Added 13 new unit tests across extracted modules.
All 225 tests pass.
2025-12-24 14:32:39 +11:00
Dhanji R. Prasanna
382b905441 duplicate output fix 2025-12-23 17:20:23 +11:00
Dhanji R. Prasanna
ed246ce434 consolidate .g3/session -> .g3/sessions/* 2025-12-23 16:22:12 +11:00
Dhanji R. Prasanna
743d622468 Add token usage and context % to timing footer
Added a quality-of-life feature that displays:
- Tokens used in the current turn (from LLM response, not estimated)
- Current context window usage percentage

These are displayed dimmed after the timing info:
  ⏱️ 1.2s | 💭 0.3s  1234tk | 45% ctx

The token count comes directly from the LLM's usage response data,
not from any estimation. If no usage data is available from the LLM,
only the context percentage is shown.
2025-12-22 17:22:54 +11:00
Dhanji R. Prasanna
10e2fe9b94 Add tests for duplicate detection logic
Added 13 tests to verify that duplicate detection only catches
IMMEDIATELY SEQUENTIAL duplicates:

- test_find_complete_json_object_end_* - Tests for JSON parsing helper
- test_same_tool_with_text_between_not_duplicate - Key test ensuring
  tool calls separated by text are NOT duplicates
- test_different_tools_back_to_back_not_duplicate
- test_same_tool_different_args_not_duplicate
- test_identical_tool_calls_back_to_back_are_duplicates
- test_has_text_after_tool_call - Tests text detection logic
- test_tool_call_with_newlines_between
- test_tool_call_with_whitespace_text_between
- test_tool_call_in_middle_of_text
- test_multiple_different_tool_calls_with_text

Also made find_complete_json_object_end public for testing.
2025-12-22 17:11:05 +11:00
Dhanji R. Prasanna
c7204c6699 Fix tool call detection and duplicate handling issues
1. Set tool_executed=true when a tool call is detected, even if skipped
   as a duplicate. This prevents the raw JSON from being printed to screen
   when a tool call is detected but not executed.

2. Remove session-level duplicate detection entirely. All tools should be
   allowed to be called multiple times in a session.

3. Fix sequential duplicate detection to only catch IMMEDIATELY sequential
   duplicates:

   - DUP IN CHUNK: Now only checks if the PREVIOUS tool call in the chunk
     is the same (not any tool call in the chunk)

   - DUP IN MSG: Now only checks if the LAST tool call in the previous
     message matches AND there's no text after it. If there's any
     non-whitespace text between tool calls, they're not considered
     duplicates.

This allows legitimate re-use of tools while still catching cases where
the LLM stutters and outputs the same tool call twice in a row.
2025-12-22 17:03:07 +11:00
Dhanji R. Prasanna
da91459e09 Fix auto-continue bug: don't return early when tools executed but final_output not called
The bug was in the chunk.finished block inside stream_completion_with_tools.
When no tool was executed in the CURRENT iteration (!tool_executed), the code
would return early without checking if tools were executed in PREVIOUS iterations
(any_tool_executed) and final_output was never called.

This caused the agent to terminate prematurely after executing tools like
todo_read when the LLM responded with text instead of calling final_output.

The fix adds a check: if any_tool_executed && !final_output_called, we break
to let the outer loop's auto-continue logic prompt the LLM to continue.

Also fixed missing debug! import in g3-console/src/main.rs.
2025-12-22 16:45:17 +11:00
Dhanji R. Prasanna
923def0ab2 Convert all INFO logs to DEBUG to reduce CLI noise
Converted ~77 info! macro calls to debug! across the codebase to prevent
log messages from interrupting the CLI experience during normal operation.
Users can still see these logs by setting RUST_LOG=debug if needed.

Affected crates:
- g3-cli
- g3-computer-control
- g3-console
- g3-core
- g3-ensembles
- g3-execution
- g3-providers
2025-12-22 16:27:35 +11:00
Dhanji R. Prasanna
58cbf3431a Fix auto-continue bug: don't mark tool calls consumed prematurely
The bug: When the LLM emitted multiple tool calls in one response (e.g.,
str_replace followed by shell), only the first tool was executed. The
remaining tools were lost because mark_tool_calls_consumed() was called
BEFORE processing, marking ALL tools as consumed even when only ONE was
being processed.

This caused has_unexecuted_tool_call() to return false after executing
the first tool, so the parser was reset and the remaining tool calls
were discarded. The auto-continue logic never triggered because it
thought all tools had been handled.

The fix: Remove the premature mark_tool_calls_consumed() call. The
existing logic at line 4696-4699 already handles marking tools as
consumed AFTER execution, and correctly checks for remaining unexecuted
tools before deciding whether to reset the parser.
2025-12-22 16:24:11 +11:00
Dhanji R. Prasanna
3a07a02b02 Add comprehensive tests for StreamingToolParser
Tests cover:
- Multiple tool calls in one response (single chunk and across chunks)
- Tool call followed by text (before, after, and both)
- Incomplete tool calls at various truncation points
- Parser reset behavior (buffer, incomplete state, unexecuted state)
- Buffer management and edge cases (streaming accumulation, empty chunks)
- JSON edge cases (escaped quotes, backslashes, nested braces)
- Tool call pattern variations (spacing, newlines)
- mark_tool_calls_consumed() functionality
- Duplicate tool call detection
- Multiple tool calls returned on stream finish
- has_message_like_keys validation
2025-12-22 16:10:34 +11:00
Dhanji R. Prasanna
8070147a0c Fix multiple tool call handling and improve auto-continue logic
- Add last_consumed_position tracking to StreamingToolParser to prevent
  re-detecting already-executed tool calls
- Add mark_tool_calls_consumed() method to mark tool calls as processed
- Add find_first_tool_call_start() for forward scanning of tool patterns
- Replace try_parse_json_tool_call_from_buffer() with
  try_parse_all_json_tool_calls_from_buffer() to find ALL tool calls
- Update has_incomplete_tool_call() and has_unexecuted_tool_call() to
  only check unconsumed portion of buffer
- Fix tool execution loop to not reset parser when unexecuted tools remain
- Simplify should_auto_continue logic (remove redundant condition)
- Add comprehensive tests for auto-continue condition logic
2025-12-22 16:08:57 +11:00
Dhanji R. Prasanna
a755301cf9 attempt 2 2025-12-22 15:33:23 +11:00
Dhanji R. Prasanna
0e4febc3fb attempted fix of autocontinue 2025-12-22 15:01:27 +11:00
Dhanji R. Prasanna
38fcaaf449 Add edge case tests for filter_json_tool_calls
- test_brace_inside_json_string_value: braces inside JSON strings
- test_multiple_braces_in_string: multiple braces in string values
- test_escaped_quotes_with_braces: escaped quotes with braces
- test_brace_in_string_across_chunks: streaming with braces in strings
- test_complex_nested_with_string_braces: nested JSON with string braces
- test_str_replace_with_diff_content: real-world str_replace case
- test_tool_call_after_other_content: tool call after other output
- test_tool_call_with_nested_tool_pattern_in_string: nested patterns

All 27 tests pass.
2025-12-22 13:30:57 +11:00
Dhanji R. Prasanna
3bc254962c clean up filter_json a bit (more to come) 2025-12-22 12:03:09 +11:00
Dhanji R. Prasanna
01a5284d6d Move fixed_filter_json from g3-core to g3-cli
Properly separates UI display concern from core library:
- fixed_filter_json module now lives in g3-cli (UI layer)
- UiWriter trait gains filter_json_tool_calls() and reset_json_filter() methods
- g3-core delegates filtering to UI layer via trait methods
- Different UiWriter implementations can choose their own filtering behavior
- ConsoleUiWriter filters JSON tool calls for clean terminal display
- MachineUiWriter/NullUiWriter use default pass-through

Benefits:
- Proper separation of concerns
- Core stays clean without display-specific logic
- Testability - filter can be tested independently in g3-cli
2025-12-22 10:32:21 +11:00
Dhanji R. Prasanna
fbf31e5f68 Fix continuation errors: auto-continue when final_output not called
- Add final_output_called flag to track if LLM properly completed
- Auto-continue with prompt if tools executed but final_output missing
- Remove unused last_action_was_tool and any_text_response variables
- Simplifies previous complex incomplete response detection logic
2025-12-20 15:32:12 +11:00
Dhanji R. Prasanna
ba8bd371fc fix randomly ending iteration 2025-12-19 16:40:01 +11:00
Dhanji R. Prasanna
e771382bd0 agent mode + fowler bot 2025-12-19 16:14:03 +11:00
Dhanji R. Prasanna
b4f6da6bf2 duplicate tool call bugfix 2025-12-19 15:24:03 +11:00
Dhanji R. Prasanna
faa6512b1f Revert to Safari as default WebDriver browser
Chrome headless has too many issues:
- Session creation hangs when Chrome is already running
- Cloudflare and other bot protection blocks headless browsers
- Version mismatch issues between Chrome and ChromeDriver

Safari is more reliable for web automation on macOS.
Chrome headless is still available via --chrome-headless flag.
2025-12-16 12:36:18 +11:00
Dhanji R. Prasanna
bbe57b4764 Fix ChromeDriver session hanging when Chrome is already running
- Add unique user-data-dir per process to avoid profile conflicts
- Add 30-second timeout to connection attempts to prevent indefinite hangs
- Fix borrow checker issue with ClientBuilder

The session creation was hanging because ChromeDriver was trying to
use the same profile as the running Chrome browser. Using a unique
temp directory (/tmp/g3-chrome-{pid}) isolates the headless session.
2025-12-15 17:36:34 +11:00
Dhanji R. Prasanna
81cba42c8d Add Chrome for Testing support for reliable WebDriver automation
- Add setup script (scripts/setup-chrome-for-testing.sh) that downloads
  matching Chrome and ChromeDriver versions from Google's CDN
- Add chrome_binary config option to specify custom Chrome binary path
- Update ChromeDriver to support custom binary via with_port_headless_and_binary()
- Update README with Chrome for Testing setup instructions
- Update config.example.toml with chrome_binary documentation

Chrome for Testing is Google's dedicated browser for automated testing
that guarantees version compatibility with ChromeDriver, avoiding the
common 'version mismatch' errors when Chrome auto-updates.
2025-12-15 17:02:30 +11:00
Dhanji R. Prasanna
d142cdfffe Improve ChromeDriver connection reliability with retry loop
- Replace simple 1.5s sleep with retry loop (10 attempts, 200ms apart)
- Better error reporting showing number of attempts
- More robust handling of ChromeDriver startup timing
2025-12-15 16:57:15 +11:00
Dhanji R. Prasanna
3d1b86d24b Make Chrome headless the default WebDriver browser
- Add --safari flag to CLI for explicitly choosing Safari
- Update --chrome-headless flag description to indicate it's the default
- Update README to reflect Chrome headless as default
- Remove broken link to non-existent docs/webdriver-setup.md
- Add Safari flag handling in all webdriver config locations

The config already had ChromeHeadless as the default, this commit
updates the CLI and documentation to match.
2025-12-15 16:51:42 +11:00
Dhanji R. Prasanna
d32bd9be03 Enable webdriver by default 2025-12-15 15:31:04 +11:00
Jochen
7b47495881 Document retry config location and verify planning mode logic
Add documentation for retry configuration in planning mode:
- Document retry settings in .g3.toml under [agent] section
- Note RetryConfig implementation in g3-core/src/retry.rs
- Clarify hardcoded vs config-based retry values

Verify existing retry loop and coach feedback parsing:
- Confirm execute_with_retry() handles recoverable errors
- Document feedback extraction source priority order
- Provide manual verification steps for testing
2025-12-11 14:56:27 +11:00
Jochen
1a13fc5345 Add explicit flush to append_entry and strengthen commit ordering docs
Add file.flush() call in append_entry() to ensure planner history
entries are written to disk before git commits execute. While the
file handle drop should flush, explicit flush simplifies reasoning
about the ordering invariant.

Extend code comments in stage_and_commit() to document that the
write_git_commit-before-git::commit ordering has regressed multiple
times and must be preserved in any refactoring.

Requirements: completed_requirements_2025-12-11_10-05-08.md
2025-12-11 10:05:39 +11:00
Jochen
b3ac7746b9 Preserve planner history ordering and add regression guardrails
Ensure planner writes GIT COMMIT entry before invoking git commit.
Keep history entry even when git commit fails, matching summary text.
Document invariant in code comment above write_git_commit call.
Add lightweight test to assert history write precedes git::commit using
test doubles instead of a real git repository.
Investigate git history to find regression and its prior fix, and
record a short root-cause summary outside the codebase.
Reference completed_requirements_2025-12-10_16-55-05.md for details.
Reference completed_todo_2025-12-10_16-55-05.md for task tracking.
2025-12-10 16:55:24 +11:00