Commit Graph

496 Commits

Author SHA1 Message Date
Dhanji R. Prasanna
38fcaaf449 Add edge case tests for filter_json_tool_calls
- test_brace_inside_json_string_value: braces inside JSON strings
- test_multiple_braces_in_string: multiple braces in string values
- test_escaped_quotes_with_braces: escaped quotes with braces
- test_brace_in_string_across_chunks: streaming with braces in strings
- test_complex_nested_with_string_braces: nested JSON with string braces
- test_str_replace_with_diff_content: real-world str_replace case
- test_tool_call_after_other_content: tool call after other output
- test_tool_call_with_nested_tool_pattern_in_string: nested patterns

All 27 tests pass.
2025-12-22 13:30:57 +11:00
Dhanji R. Prasanna
3bc254962c clean up filter_json a bit (more to come) 2025-12-22 12:03:09 +11:00
Dhanji R. Prasanna
87d9b39ae4 update gitignore 2025-12-22 11:50:01 +11:00
Dhanji R. Prasanna
01a5284d6d Move fixed_filter_json from g3-core to g3-cli
Properly separates UI display concern from core library:
- fixed_filter_json module now lives in g3-cli (UI layer)
- UiWriter trait gains filter_json_tool_calls() and reset_json_filter() methods
- g3-core delegates filtering to UI layer via trait methods
- Different UiWriter implementations can choose their own filtering behavior
- ConsoleUiWriter filters JSON tool calls for clean terminal display
- MachineUiWriter/NullUiWriter use default pass-through

Benefits:
- Proper separation of concerns
- Core stays clean without display-specific logic
- Testability - filter can be tested independently in g3-cli
2025-12-22 10:32:21 +11:00
Dhanji R. Prasanna
fbf31e5f68 Fix continuation errors: auto-continue when final_output not called
- Add final_output_called flag to track if LLM properly completed
- Auto-continue with prompt if tools executed but final_output missing
- Remove unused last_action_was_tool and any_text_response variables
- Simplifies previous complex incomplete response detection logic
2025-12-20 15:32:12 +11:00
Dhanji R. Prasanna
ba8bd371fc fix randomly ending iteration 2025-12-19 16:40:01 +11:00
Dhanji R. Prasanna
e771382bd0 agent mode + fowler bot 2025-12-19 16:14:03 +11:00
Dhanji R. Prasanna
b4f6da6bf2 duplicate tool call bugfix 2025-12-19 15:24:03 +11:00
Dhanji R. Prasanna
faa6512b1f Revert to Safari as default WebDriver browser
Chrome headless has too many issues:
- Session creation hangs when Chrome is already running
- Cloudflare and other bot protection blocks headless browsers
- Version mismatch issues between Chrome and ChromeDriver

Safari is more reliable for web automation on macOS.
Chrome headless is still available via --chrome-headless flag.
2025-12-16 12:36:18 +11:00
Dhanji R. Prasanna
bbe57b4764 Fix ChromeDriver session hanging when Chrome is already running
- Add unique user-data-dir per process to avoid profile conflicts
- Add 30-second timeout to connection attempts to prevent indefinite hangs
- Fix borrow checker issue with ClientBuilder

The session creation was hanging because ChromeDriver was trying to
use the same profile as the running Chrome browser. Using a unique
temp directory (/tmp/g3-chrome-{pid}) isolates the headless session.
2025-12-15 17:36:34 +11:00
Dhanji R. Prasanna
81cba42c8d Add Chrome for Testing support for reliable WebDriver automation
- Add setup script (scripts/setup-chrome-for-testing.sh) that downloads
  matching Chrome and ChromeDriver versions from Google's CDN
- Add chrome_binary config option to specify custom Chrome binary path
- Update ChromeDriver to support custom binary via with_port_headless_and_binary()
- Update README with Chrome for Testing setup instructions
- Update config.example.toml with chrome_binary documentation

Chrome for Testing is Google's dedicated browser for automated testing
that guarantees version compatibility with ChromeDriver, avoiding the
common 'version mismatch' errors when Chrome auto-updates.
2025-12-15 17:02:30 +11:00
Dhanji R. Prasanna
d142cdfffe Improve ChromeDriver connection reliability with retry loop
- Replace simple 1.5s sleep with retry loop (10 attempts, 200ms apart)
- Better error reporting showing number of attempts
- More robust handling of ChromeDriver startup timing
2025-12-15 16:57:15 +11:00
Dhanji R. Prasanna
3d1b86d24b Make Chrome headless the default WebDriver browser
- Add --safari flag to CLI for explicitly choosing Safari
- Update --chrome-headless flag description to indicate it's the default
- Update README to reflect Chrome headless as default
- Remove broken link to non-existent docs/webdriver-setup.md
- Add Safari flag handling in all webdriver config locations

The config already had ChromeHeadless as the default, this commit
updates the CLI and documentation to match.
2025-12-15 16:51:42 +11:00
Dhanji R. Prasanna
d32bd9be03 Enable webdriver by default 2025-12-15 15:31:04 +11:00
Jochen
4aa5bf75ce Merge pull request #42 from dhanji/jochen-planner
Add planning mode
2025-12-11 16:07:26 +11:00
Jochen
46fd6ed121 Merge pull request #41 from dhanji/jochen-fix-max_tokens
Fix bugs where insufficient max_tokens were passed to LLM
2025-12-11 16:02:04 +11:00
Jochen
68fbc54812 Update README.md 2025-12-11 15:01:43 +11:00
Jochen
7b47495881 Document retry config location and verify planning mode logic
Add documentation for retry configuration in planning mode:
- Document retry settings in .g3.toml under [agent] section
- Note RetryConfig implementation in g3-core/src/retry.rs
- Clarify hardcoded vs config-based retry values

Verify existing retry loop and coach feedback parsing:
- Confirm execute_with_retry() handles recoverable errors
- Document feedback extraction source priority order
- Provide manual verification steps for testing
2025-12-11 14:56:27 +11:00
Jochen
1a13fc5345 Add explicit flush to append_entry and strengthen commit ordering docs
Add file.flush() call in append_entry() to ensure planner history
entries are written to disk before git commits execute. While the
file handle drop should flush, explicit flush simplifies reasoning
about the ordering invariant.

Extend code comments in stage_and_commit() to document that the
write_git_commit-before-git::commit ordering has regressed multiple
times and must be preserved in any refactoring.

Requirements: completed_requirements_2025-12-11_10-05-08.md
2025-12-11 10:05:39 +11:00
Jochen
b3ac7746b9 Preserve planner history ordering and add regression guardrails
Ensure planner writes GIT COMMIT entry before invoking git commit.
Keep history entry even when git commit fails, matching summary text.
Document invariant in code comment above write_git_commit call.
Add lightweight test to assert history write precedes git::commit using
test doubles instead of a real git repository.
Investigate git history to find regression and its prior fix, and
record a short root-cause summary outside the codebase.
Reference completed_requirements_2025-12-10_16-55-05.md for details.
Reference completed_todo_2025-12-10_16-55-05.md for task tracking.
2025-12-10 16:55:24 +11:00
Jochen
5f3a2a4203 remove debug statements 2025-12-10 16:26:59 +11:00
Jochen
87bceba54f Fix planner UI whitespace and workspace logs directory
Resolve two critical issues in planner mode that persisted through
multiple fix attempts:

1. Remove excessive whitespace between tool call displays by replacing
   direct println!() calls with ui_writer methods and eliminating
   redundant newlines in agent response streaming.

2. Ensure all log files (errors, sessions, tool calls, context dumps)
   are written to <workspace>/logs instead of codepath by properly
   initializing G3_WORKSPACE_PATH from --workspace argument.
2025-12-10 16:18:49 +11:00
Jochen
a03a432963 another attempt :/ 2025-12-10 11:29:10 +11:00
Jochen
75aa2d983e Refine planner mode UI and error handling
Improve planner mode user experience with better error reporting,
cleaner tool output, and consistent log file placement.

- Propagate and display classified LLM errors to users with
  appropriate icons and context
- Display tool calls on single lines with truncated arguments
- Show LLM text responses without overwriting via UiWriter
- Ensure all logs write to workspace/logs directory consistently
- Set G3_WORKSPACE_PATH early in planning mode initialization
2025-12-09 22:44:00 +11:00
Jochen
a9dbe5f7d3 some manual fixes after rebase 2025-12-09 17:11:19 +11:00
Jochen
633da0d8a6 Refine planner mode UI, logging, and history tracking
- Display coach feedback content (up to 25 lines) instead of just length
- Write GIT COMMIT entry to history before actual commit for better a...
- Implement single-line status updates during LLM processing with too...
- Display non-tool LLM text responses in planner UI
- Redirect all logs to <workspace>/logs directory instead of codepath
- Preserve TODO file in planner mode for history (prevent deletion)

Completed files:
- completed_requirements_2025-12-09_16-16-51.md
- completed_todo_2025-12-09_16-16-51.md
2025-12-09 17:03:53 +11:00
Jochen
ff8b3e7c7b Implement planning mode 2025-12-09 17:03:53 +11:00
Jochen
4aa84e2144 disable thinking if there is no token budget 2025-12-09 16:45:28 +11:00
Jochen
2283d9ddbf small fix to provider name check 2025-12-09 14:43:35 +11:00
Jochen
fb2cf6f898 fix for thinking budget and hardcoded max token on summary 2025-12-09 12:41:52 +11:00
Jochen
696c441a47 validate max_tokens for call, also fallbacks for summary
When the CW is full, max_tokens is often passed at 0 or tiny. The LLM will fail. For Anthropic with thining, there is also the thinking budget.
This can happen during summary attempts, in that case
first try thinnify, skinnify etc..
2025-12-09 10:15:32 +11:00
Dhanji R. Prasanna
48e6d594bc tweak todo tool output 2025-12-08 11:05:01 +11:00
Dhanji R. Prasanna
678403da35 add a force thinnify cmd 2025-12-05 15:32:13 +11:00
Jochen
0970e4f356 Merge pull request #40 from dhanji/jochen-fix-coach-feedback
now coach feedback works again
2025-12-03 10:55:15 +11:00
Jochen
758a313de0 Merge pull request #39 from dhanji/jochen-sonnet-thinking
Fix temperature param + add thinking for anthropic
2025-12-03 10:54:34 +11:00
Jochen
0327a6dfdf make sure coach feedback is extracted. 2025-12-02 22:00:58 +11:00
Jochen
928f2bfa9d actually record coach feedback and use it 2025-12-02 21:23:50 +11:00
Jochen
21af6ba574 fix temperature for summary request too. 2025-12-02 21:20:16 +11:00
Jochen
ae16243f49 Fix temperature param + add thinking for anthropic
The temperature param was not passed to the llm.
Now support anthropic models in 'thinking' mode.
2025-12-02 17:24:55 +11:00
Dhanji R. Prasanna
9ee0468b87 test for system message 2025-12-02 14:45:12 +11:00
Dhanji R. Prasanna
d9ad244197 add markdown format only to final_output and fix todo duplication 2025-12-02 14:26:22 +11:00
Dhanji R. Prasanna
a6537e4dba todo_write outputs entire list 2025-12-02 13:48:05 +11:00
Dhanji R. Prasanna
df3f25f2f0 test for resume unfinished todos 2025-12-02 11:07:13 +11:00
Dhanji R. Prasanna
f8f989d4c6 resume unfinished TODOs 2025-12-02 11:06:58 +11:00
Dhanji R. Prasanna
0e4c935a70 clean up TODO output 2025-12-02 06:48:58 +11:00
Dhanji R. Prasanna
1b4ea93ba4 token counting bugfix 2025-12-01 14:52:10 +11:00
Dhanji R. Prasanna
4496eee046 fix compaction to restore system message 2025-12-01 14:38:21 +11:00
Dhanji R. Prasanna
8928fb92be append instead of replace system msg 2025-11-29 16:13:00 +11:00
Dhanji R. Prasanna
81fd2ab92f unused var 2025-11-29 15:44:30 +11:00
Jochen
af7fb8f7f1 Merge pull request #38 from dhanji/jochen-debug-with-ids
dumps context window for monitoring sizes, also add message id for internal debugging
2025-11-28 16:43:26 +11:00