Commit Graph

403 Commits

Author SHA1 Message Date
Dhanji R. Prasanna
3d1b86d24b Make Chrome headless the default WebDriver browser
- Add --safari flag to CLI for explicitly choosing Safari
- Update --chrome-headless flag description to indicate it's the default
- Update README to reflect Chrome headless as default
- Remove broken link to non-existent docs/webdriver-setup.md
- Add Safari flag handling in all webdriver config locations

The config already had ChromeHeadless as the default, this commit
updates the CLI and documentation to match.
2025-12-15 16:51:42 +11:00
Jochen
7b47495881 Document retry config location and verify planning mode logic
Add documentation for retry configuration in planning mode:
- Document retry settings in .g3.toml under [agent] section
- Note RetryConfig implementation in g3-core/src/retry.rs
- Clarify hardcoded vs config-based retry values

Verify existing retry loop and coach feedback parsing:
- Confirm execute_with_retry() handles recoverable errors
- Document feedback extraction source priority order
- Provide manual verification steps for testing
2025-12-11 14:56:27 +11:00
Jochen
5f3a2a4203 remove debug statements 2025-12-10 16:26:59 +11:00
Jochen
87bceba54f Fix planner UI whitespace and workspace logs directory
Resolve two critical issues in planner mode that persisted through
multiple fix attempts:

1. Remove excessive whitespace between tool call displays by replacing
   direct println!() calls with ui_writer methods and eliminating
   redundant newlines in agent response streaming.

2. Ensure all log files (errors, sessions, tool calls, context dumps)
   are written to <workspace>/logs instead of codepath by properly
   initializing G3_WORKSPACE_PATH from --workspace argument.
2025-12-10 16:18:49 +11:00
Jochen
a03a432963 another attempt :/ 2025-12-10 11:29:10 +11:00
Jochen
75aa2d983e Refine planner mode UI and error handling
Improve planner mode user experience with better error reporting,
cleaner tool output, and consistent log file placement.

- Propagate and display classified LLM errors to users with
  appropriate icons and context
- Display tool calls on single lines with truncated arguments
- Show LLM text responses without overwriting via UiWriter
- Ensure all logs write to workspace/logs directory consistently
- Set G3_WORKSPACE_PATH early in planning mode initialization
2025-12-09 22:44:00 +11:00
Jochen
a9dbe5f7d3 some manual fixes after rebase 2025-12-09 17:11:19 +11:00
Jochen
633da0d8a6 Refine planner mode UI, logging, and history tracking
- Display coach feedback content (up to 25 lines) instead of just length
- Write GIT COMMIT entry to history before actual commit for better a...
- Implement single-line status updates during LLM processing with too...
- Display non-tool LLM text responses in planner UI
- Redirect all logs to <workspace>/logs directory instead of codepath
- Preserve TODO file in planner mode for history (prevent deletion)

Completed files:
- completed_requirements_2025-12-09_16-16-51.md
- completed_todo_2025-12-09_16-16-51.md
2025-12-09 17:03:53 +11:00
Jochen
ff8b3e7c7b Implement planning mode 2025-12-09 17:03:53 +11:00
Jochen
4aa84e2144 disable thinking if there is no token budget 2025-12-09 16:45:28 +11:00
Jochen
2283d9ddbf small fix to provider name check 2025-12-09 14:43:35 +11:00
Jochen
fb2cf6f898 fix for thinking budget and hardcoded max token on summary 2025-12-09 12:41:52 +11:00
Jochen
696c441a47 validate max_tokens for call, also fallbacks for summary
When the CW is full, max_tokens is often passed at 0 or tiny. The LLM will fail. For Anthropic with thining, there is also the thinking budget.
This can happen during summary attempts, in that case
first try thinnify, skinnify etc..
2025-12-09 10:15:32 +11:00
Dhanji R. Prasanna
48e6d594bc tweak todo tool output 2025-12-08 11:05:01 +11:00
Dhanji R. Prasanna
678403da35 add a force thinnify cmd 2025-12-05 15:32:13 +11:00
Jochen
928f2bfa9d actually record coach feedback and use it 2025-12-02 21:23:50 +11:00
Jochen
21af6ba574 fix temperature for summary request too. 2025-12-02 21:20:16 +11:00
Jochen
ae16243f49 Fix temperature param + add thinking for anthropic
The temperature param was not passed to the llm.
Now support anthropic models in 'thinking' mode.
2025-12-02 17:24:55 +11:00
Dhanji R. Prasanna
9ee0468b87 test for system message 2025-12-02 14:45:12 +11:00
Dhanji R. Prasanna
d9ad244197 add markdown format only to final_output and fix todo duplication 2025-12-02 14:26:22 +11:00
Dhanji R. Prasanna
a6537e4dba todo_write outputs entire list 2025-12-02 13:48:05 +11:00
Dhanji R. Prasanna
df3f25f2f0 test for resume unfinished todos 2025-12-02 11:07:13 +11:00
Dhanji R. Prasanna
f8f989d4c6 resume unfinished TODOs 2025-12-02 11:06:58 +11:00
Dhanji R. Prasanna
1b4ea93ba4 token counting bugfix 2025-12-01 14:52:10 +11:00
Dhanji R. Prasanna
4496eee046 fix compaction to restore system message 2025-12-01 14:38:21 +11:00
Dhanji R. Prasanna
81fd2ab92f unused var 2025-11-29 15:44:30 +11:00
Jochen
dcfd681b05 add summary context window 2025-11-28 16:33:31 +11:00
Jochen
52f78653b4 add context window monitor
Writes the current context window to logs/current_context_window (uses a symlink to a session ID).

This PR was unfortunately generated by a different LLM and did a ton of superficial reformating, it's actually a fairly small and benign change, but I don't want to roll back everything. Hope that's ok.
2025-11-27 21:00:02 +11:00
Jochen
7e1ce36a4b Merge pull request #35 from dhanji/jochen_write_existing_file
remove check for whether a file exists in the workspace
2025-11-27 13:44:45 +11:00
Jochen
99125fc39e completely remove the skipping first player logic 2025-11-27 13:21:40 +11:00
Jochen
5170744099 add cache_control to user messages 2025-11-27 13:12:42 +11:00
Jochen
c58aa80932 explain what file was found in workspace 2025-11-26 21:43:59 +11:00
Jochen
c837308148 never add more than 4 cache controls
Anthropic API throws errors otherwise.
2025-11-26 18:38:30 +11:00
Jochen
1e1702001c Add logging for discovery 2025-11-26 10:41:35 +11:00
Jochen
bd29addefa reorder system prompt 2025-11-26 10:26:52 +11:00
Jochen
ad198a8501 add code exploration fast start
This tries to short-circuit multiple round-trips to llm for reading code.
It's a precursor to trying to context engineer tailored to specific tasks.
In initial experiments, it's only marginally faster than regular mode, and burns more tokens.
2025-11-25 22:51:32 +11:00
Jochen
a96a15d1fc add code coverage command 2025-11-21 14:38:58 +11:00
Jochen
a097c3abef first cut 2025-11-21 13:56:36 +11:00
Jochen
551a577ee1 changed user choice for TODO stale check
user can ignore, mark stale or quit.
2025-11-21 12:35:14 +11:00
Jochen
84718223bc remove minor comment 2025-11-21 12:26:41 +11:00
Jochen
28a83d2dcf check for stale TODOs
on by default, can be disabled
2025-11-21 12:09:01 +11:00
Jochen
9f0d5add1e remove redundant SYSTEM_NATIVE_TOOL_CALLS_MULTIPLE 2025-11-21 11:04:14 +11:00
Jochen
be6c6bfca4 fix ref to system prompt 2025-11-21 10:49:39 +11:00
Jochen
94a41c5c34 don't write warning to console 2025-11-21 10:49:27 +11:00
Jochen
09dbad2d68 allow multiple tool calls, log warnings if there are duplicate calls.
controlled via a flag to the agent config:
allow_multiple_tool_calls = true
2025-11-21 10:49:15 +11:00
Jochen
ffbf410b17 log tool calls 2025-11-21 10:49:02 +11:00
Dhanji Prasanna
14c8d066c9 ensure system prompt is always added first 2025-11-20 08:45:03 +11:00
Jochen
b6e226df67 Merge pull request #23 from dhanji/jochen-add-code-instructions
system prompt now includes code style guide
2025-11-19 16:25:20 +11:00
Jochen
1069664e16 fix bad max_tokens and context_window logic
for non-databricks code
2025-11-19 13:51:16 +11:00
Jochen
3f21bdc7b2 fix tests 2025-11-19 12:42:37 +11:00