alex/g3 - g3 - Millerson GIT hosting

alex/g3

Author	SHA1	Message	Date
Dhanji R. Prasanna	c7204c6699	Fix tool call detection and duplicate handling issues 1. Set tool_executed=true when a tool call is detected, even if skipped as a duplicate. This prevents the raw JSON from being printed to screen when a tool call is detected but not executed. 2. Remove session-level duplicate detection entirely. All tools should be allowed to be called multiple times in a session. 3. Fix sequential duplicate detection to only catch IMMEDIATELY sequential duplicates: - DUP IN CHUNK: Now only checks if the PREVIOUS tool call in the chunk is the same (not any tool call in the chunk) - DUP IN MSG: Now only checks if the LAST tool call in the previous message matches AND there's no text after it. If there's any non-whitespace text between tool calls, they're not considered duplicates. This allows legitimate re-use of tools while still catching cases where the LLM stutters and outputs the same tool call twice in a row.	2025-12-22 17:03:07 +11:00
Dhanji R. Prasanna	da91459e09	Fix auto-continue bug: don't return early when tools executed but final_output not called The bug was in the chunk.finished block inside stream_completion_with_tools. When no tool was executed in the CURRENT iteration (!tool_executed), the code would return early without checking if tools were executed in PREVIOUS iterations (any_tool_executed) and final_output was never called. This caused the agent to terminate prematurely after executing tools like todo_read when the LLM responded with text instead of calling final_output. The fix adds a check: if any_tool_executed && !final_output_called, we break to let the outer loop's auto-continue logic prompt the LLM to continue. Also fixed missing debug! import in g3-console/src/main.rs.	2025-12-22 16:45:17 +11:00
Dhanji R. Prasanna	923def0ab2	Convert all INFO logs to DEBUG to reduce CLI noise Converted ~77 info! macro calls to debug! across the codebase to prevent log messages from interrupting the CLI experience during normal operation. Users can still see these logs by setting RUST_LOG=debug if needed. Affected crates: - g3-cli - g3-computer-control - g3-console - g3-core - g3-ensembles - g3-execution - g3-providers	2025-12-22 16:27:35 +11:00
Dhanji R. Prasanna	58cbf3431a	Fix auto-continue bug: don't mark tool calls consumed prematurely The bug: When the LLM emitted multiple tool calls in one response (e.g., str_replace followed by shell), only the first tool was executed. The remaining tools were lost because mark_tool_calls_consumed() was called BEFORE processing, marking ALL tools as consumed even when only ONE was being processed. This caused has_unexecuted_tool_call() to return false after executing the first tool, so the parser was reset and the remaining tool calls were discarded. The auto-continue logic never triggered because it thought all tools had been handled. The fix: Remove the premature mark_tool_calls_consumed() call. The existing logic at line 4696-4699 already handles marking tools as consumed AFTER execution, and correctly checks for remaining unexecuted tools before deciding whether to reset the parser.	2025-12-22 16:24:11 +11:00
Dhanji R. Prasanna	3a07a02b02	Add comprehensive tests for StreamingToolParser Tests cover: - Multiple tool calls in one response (single chunk and across chunks) - Tool call followed by text (before, after, and both) - Incomplete tool calls at various truncation points - Parser reset behavior (buffer, incomplete state, unexecuted state) - Buffer management and edge cases (streaming accumulation, empty chunks) - JSON edge cases (escaped quotes, backslashes, nested braces) - Tool call pattern variations (spacing, newlines) - mark_tool_calls_consumed() functionality - Duplicate tool call detection - Multiple tool calls returned on stream finish - has_message_like_keys validation	2025-12-22 16:10:34 +11:00
Dhanji R. Prasanna	8070147a0c	Fix multiple tool call handling and improve auto-continue logic - Add last_consumed_position tracking to StreamingToolParser to prevent re-detecting already-executed tool calls - Add mark_tool_calls_consumed() method to mark tool calls as processed - Add find_first_tool_call_start() for forward scanning of tool patterns - Replace try_parse_json_tool_call_from_buffer() with try_parse_all_json_tool_calls_from_buffer() to find ALL tool calls - Update has_incomplete_tool_call() and has_unexecuted_tool_call() to only check unconsumed portion of buffer - Fix tool execution loop to not reset parser when unexecuted tools remain - Simplify should_auto_continue logic (remove redundant condition) - Add comprehensive tests for auto-continue condition logic	2025-12-22 16:08:57 +11:00
Dhanji R. Prasanna	a755301cf9	attempt 2	2025-12-22 15:33:23 +11:00
Dhanji R. Prasanna	0e4febc3fb	attempted fix of autocontinue	2025-12-22 15:01:27 +11:00
Dhanji R. Prasanna	38fcaaf449	Add edge case tests for filter_json_tool_calls - test_brace_inside_json_string_value: braces inside JSON strings - test_multiple_braces_in_string: multiple braces in string values - test_escaped_quotes_with_braces: escaped quotes with braces - test_brace_in_string_across_chunks: streaming with braces in strings - test_complex_nested_with_string_braces: nested JSON with string braces - test_str_replace_with_diff_content: real-world str_replace case - test_tool_call_after_other_content: tool call after other output - test_tool_call_with_nested_tool_pattern_in_string: nested patterns All 27 tests pass.	2025-12-22 13:30:57 +11:00
Dhanji R. Prasanna	3bc254962c	clean up filter_json a bit (more to come)	2025-12-22 12:03:09 +11:00
Dhanji R. Prasanna	87d9b39ae4	update gitignore	2025-12-22 11:50:01 +11:00
Dhanji R. Prasanna	01a5284d6d	Move fixed_filter_json from g3-core to g3-cli Properly separates UI display concern from core library: - fixed_filter_json module now lives in g3-cli (UI layer) - UiWriter trait gains filter_json_tool_calls() and reset_json_filter() methods - g3-core delegates filtering to UI layer via trait methods - Different UiWriter implementations can choose their own filtering behavior - ConsoleUiWriter filters JSON tool calls for clean terminal display - MachineUiWriter/NullUiWriter use default pass-through Benefits: - Proper separation of concerns - Core stays clean without display-specific logic - Testability - filter can be tested independently in g3-cli	2025-12-22 10:32:21 +11:00
Dhanji R. Prasanna	fbf31e5f68	Fix continuation errors: auto-continue when final_output not called - Add final_output_called flag to track if LLM properly completed - Auto-continue with prompt if tools executed but final_output missing - Remove unused last_action_was_tool and any_text_response variables - Simplifies previous complex incomplete response detection logic	2025-12-20 15:32:12 +11:00
Dhanji R. Prasanna	ba8bd371fc	fix randomly ending iteration	2025-12-19 16:40:01 +11:00
Dhanji R. Prasanna	e771382bd0	agent mode + fowler bot	2025-12-19 16:14:03 +11:00
Dhanji R. Prasanna	b4f6da6bf2	duplicate tool call bugfix	2025-12-19 15:24:03 +11:00
Dhanji R. Prasanna	faa6512b1f	Revert to Safari as default WebDriver browser Chrome headless has too many issues: - Session creation hangs when Chrome is already running - Cloudflare and other bot protection blocks headless browsers - Version mismatch issues between Chrome and ChromeDriver Safari is more reliable for web automation on macOS. Chrome headless is still available via --chrome-headless flag.	2025-12-16 12:36:18 +11:00
Dhanji R. Prasanna	bbe57b4764	Fix ChromeDriver session hanging when Chrome is already running - Add unique user-data-dir per process to avoid profile conflicts - Add 30-second timeout to connection attempts to prevent indefinite hangs - Fix borrow checker issue with ClientBuilder The session creation was hanging because ChromeDriver was trying to use the same profile as the running Chrome browser. Using a unique temp directory (/tmp/g3-chrome-{pid}) isolates the headless session.	2025-12-15 17:36:34 +11:00
Dhanji R. Prasanna	81cba42c8d	Add Chrome for Testing support for reliable WebDriver automation - Add setup script (scripts/setup-chrome-for-testing.sh) that downloads matching Chrome and ChromeDriver versions from Google's CDN - Add chrome_binary config option to specify custom Chrome binary path - Update ChromeDriver to support custom binary via with_port_headless_and_binary() - Update README with Chrome for Testing setup instructions - Update config.example.toml with chrome_binary documentation Chrome for Testing is Google's dedicated browser for automated testing that guarantees version compatibility with ChromeDriver, avoiding the common 'version mismatch' errors when Chrome auto-updates.	2025-12-15 17:02:30 +11:00
Dhanji R. Prasanna	d142cdfffe	Improve ChromeDriver connection reliability with retry loop - Replace simple 1.5s sleep with retry loop (10 attempts, 200ms apart) - Better error reporting showing number of attempts - More robust handling of ChromeDriver startup timing	2025-12-15 16:57:15 +11:00
Dhanji R. Prasanna	3d1b86d24b	Make Chrome headless the default WebDriver browser - Add --safari flag to CLI for explicitly choosing Safari - Update --chrome-headless flag description to indicate it's the default - Update README to reflect Chrome headless as default - Remove broken link to non-existent docs/webdriver-setup.md - Add Safari flag handling in all webdriver config locations The config already had ChromeHeadless as the default, this commit updates the CLI and documentation to match.	2025-12-15 16:51:42 +11:00
Dhanji R. Prasanna	d32bd9be03	Enable webdriver by default	2025-12-15 15:31:04 +11:00
Jochen	4aa5bf75ce	Merge pull request #42 from dhanji/jochen-planner Add planning mode	2025-12-11 16:07:26 +11:00
Jochen	46fd6ed121	Merge pull request #41 from dhanji/jochen-fix-max_tokens Fix bugs where insufficient max_tokens were passed to LLM	2025-12-11 16:02:04 +11:00
Jochen	68fbc54812	Update README.md	2025-12-11 15:01:43 +11:00
Jochen	7b47495881	Document retry config location and verify planning mode logic Add documentation for retry configuration in planning mode: - Document retry settings in .g3.toml under [agent] section - Note RetryConfig implementation in g3-core/src/retry.rs - Clarify hardcoded vs config-based retry values Verify existing retry loop and coach feedback parsing: - Confirm execute_with_retry() handles recoverable errors - Document feedback extraction source priority order - Provide manual verification steps for testing	2025-12-11 14:56:27 +11:00
Jochen	1a13fc5345	Add explicit flush to append_entry and strengthen commit ordering docs Add file.flush() call in append_entry() to ensure planner history entries are written to disk before git commits execute. While the file handle drop should flush, explicit flush simplifies reasoning about the ordering invariant. Extend code comments in stage_and_commit() to document that the write_git_commit-before-git::commit ordering has regressed multiple times and must be preserved in any refactoring. Requirements: completed_requirements_2025-12-11_10-05-08.md	2025-12-11 10:05:39 +11:00
Jochen	b3ac7746b9	Preserve planner history ordering and add regression guardrails Ensure planner writes GIT COMMIT entry before invoking git commit. Keep history entry even when git commit fails, matching summary text. Document invariant in code comment above write_git_commit call. Add lightweight test to assert history write precedes git::commit using test doubles instead of a real git repository. Investigate git history to find regression and its prior fix, and record a short root-cause summary outside the codebase. Reference completed_requirements_2025-12-10_16-55-05.md for details. Reference completed_todo_2025-12-10_16-55-05.md for task tracking.	2025-12-10 16:55:24 +11:00
Jochen	5f3a2a4203	remove debug statements	2025-12-10 16:26:59 +11:00
Jochen	87bceba54f	Fix planner UI whitespace and workspace logs directory Resolve two critical issues in planner mode that persisted through multiple fix attempts: 1. Remove excessive whitespace between tool call displays by replacing direct println!() calls with ui_writer methods and eliminating redundant newlines in agent response streaming. 2. Ensure all log files (errors, sessions, tool calls, context dumps) are written to <workspace>/logs instead of codepath by properly initializing G3_WORKSPACE_PATH from --workspace argument.	2025-12-10 16:18:49 +11:00
Jochen	a03a432963	another attempt :/	2025-12-10 11:29:10 +11:00
Jochen	75aa2d983e	Refine planner mode UI and error handling Improve planner mode user experience with better error reporting, cleaner tool output, and consistent log file placement. - Propagate and display classified LLM errors to users with appropriate icons and context - Display tool calls on single lines with truncated arguments - Show LLM text responses without overwriting via UiWriter - Ensure all logs write to workspace/logs directory consistently - Set G3_WORKSPACE_PATH early in planning mode initialization	2025-12-09 22:44:00 +11:00
Jochen	a9dbe5f7d3	some manual fixes after rebase	2025-12-09 17:11:19 +11:00
Jochen	633da0d8a6	Refine planner mode UI, logging, and history tracking - Display coach feedback content (up to 25 lines) instead of just length - Write GIT COMMIT entry to history before actual commit for better a... - Implement single-line status updates during LLM processing with too... - Display non-tool LLM text responses in planner UI - Redirect all logs to <workspace>/logs directory instead of codepath - Preserve TODO file in planner mode for history (prevent deletion) Completed files: - completed_requirements_2025-12-09_16-16-51.md - completed_todo_2025-12-09_16-16-51.md	2025-12-09 17:03:53 +11:00
Jochen	ff8b3e7c7b	Implement planning mode	2025-12-09 17:03:53 +11:00
Jochen	4aa84e2144	disable thinking if there is no token budget	2025-12-09 16:45:28 +11:00
Jochen	2283d9ddbf	small fix to provider name check	2025-12-09 14:43:35 +11:00
Jochen	fb2cf6f898	fix for thinking budget and hardcoded max token on summary	2025-12-09 12:41:52 +11:00
Jochen	696c441a47	validate max_tokens for call, also fallbacks for summary When the CW is full, max_tokens is often passed at 0 or tiny. The LLM will fail. For Anthropic with thining, there is also the thinking budget. This can happen during summary attempts, in that case first try thinnify, skinnify etc..	2025-12-09 10:15:32 +11:00
Dhanji R. Prasanna	48e6d594bc	tweak todo tool output	2025-12-08 11:05:01 +11:00
Dhanji R. Prasanna	678403da35	add a force thinnify cmd	2025-12-05 15:32:13 +11:00
Jochen	0970e4f356	Merge pull request #40 from dhanji/jochen-fix-coach-feedback now coach feedback works again	2025-12-03 10:55:15 +11:00
Jochen	758a313de0	Merge pull request #39 from dhanji/jochen-sonnet-thinking Fix temperature param + add thinking for anthropic	2025-12-03 10:54:34 +11:00
Jochen	0327a6dfdf	make sure coach feedback is extracted.	2025-12-02 22:00:58 +11:00
Jochen	928f2bfa9d	actually record coach feedback and use it	2025-12-02 21:23:50 +11:00
Jochen	21af6ba574	fix temperature for summary request too.	2025-12-02 21:20:16 +11:00
Jochen	ae16243f49	Fix temperature param + add thinking for anthropic The temperature param was not passed to the llm. Now support anthropic models in 'thinking' mode.	2025-12-02 17:24:55 +11:00
Dhanji R. Prasanna	9ee0468b87	test for system message	2025-12-02 14:45:12 +11:00
Dhanji R. Prasanna	d9ad244197	add markdown format only to final_output and fix todo duplication	2025-12-02 14:26:22 +11:00
Dhanji R. Prasanna	a6537e4dba	todo_write outputs entire list	2025-12-02 13:48:05 +11:00

... 4 5 6 7 8 ...

604 Commits