Commit Graph

79 Commits

Author SHA1 Message Date
Dhanji R. Prasanna
10e2fe9b94 Add tests for duplicate detection logic
Added 13 tests to verify that duplicate detection only catches
IMMEDIATELY SEQUENTIAL duplicates:

- test_find_complete_json_object_end_* - Tests for JSON parsing helper
- test_same_tool_with_text_between_not_duplicate - Key test ensuring
  tool calls separated by text are NOT duplicates
- test_different_tools_back_to_back_not_duplicate
- test_same_tool_different_args_not_duplicate
- test_identical_tool_calls_back_to_back_are_duplicates
- test_has_text_after_tool_call - Tests text detection logic
- test_tool_call_with_newlines_between
- test_tool_call_with_whitespace_text_between
- test_tool_call_in_middle_of_text
- test_multiple_different_tool_calls_with_text

Also made find_complete_json_object_end public for testing.
2025-12-22 17:11:05 +11:00
Dhanji R. Prasanna
3a07a02b02 Add comprehensive tests for StreamingToolParser
Tests cover:
- Multiple tool calls in one response (single chunk and across chunks)
- Tool call followed by text (before, after, and both)
- Incomplete tool calls at various truncation points
- Parser reset behavior (buffer, incomplete state, unexecuted state)
- Buffer management and edge cases (streaming accumulation, empty chunks)
- JSON edge cases (escaped quotes, backslashes, nested braces)
- Tool call pattern variations (spacing, newlines)
- mark_tool_calls_consumed() functionality
- Duplicate tool call detection
- Multiple tool calls returned on stream finish
- has_message_like_keys validation
2025-12-22 16:10:34 +11:00
Dhanji R. Prasanna
8070147a0c Fix multiple tool call handling and improve auto-continue logic
- Add last_consumed_position tracking to StreamingToolParser to prevent
  re-detecting already-executed tool calls
- Add mark_tool_calls_consumed() method to mark tool calls as processed
- Add find_first_tool_call_start() for forward scanning of tool patterns
- Replace try_parse_json_tool_call_from_buffer() with
  try_parse_all_json_tool_calls_from_buffer() to find ALL tool calls
- Update has_incomplete_tool_call() and has_unexecuted_tool_call() to
  only check unconsumed portion of buffer
- Fix tool execution loop to not reset parser when unexecuted tools remain
- Simplify should_auto_continue logic (remove redundant condition)
- Add comprehensive tests for auto-continue condition logic
2025-12-22 16:08:57 +11:00
Dhanji R. Prasanna
a755301cf9 attempt 2 2025-12-22 15:33:23 +11:00
Dhanji R. Prasanna
0e4febc3fb attempted fix of autocontinue 2025-12-22 15:01:27 +11:00
Dhanji R. Prasanna
01a5284d6d Move fixed_filter_json from g3-core to g3-cli
Properly separates UI display concern from core library:
- fixed_filter_json module now lives in g3-cli (UI layer)
- UiWriter trait gains filter_json_tool_calls() and reset_json_filter() methods
- g3-core delegates filtering to UI layer via trait methods
- Different UiWriter implementations can choose their own filtering behavior
- ConsoleUiWriter filters JSON tool calls for clean terminal display
- MachineUiWriter/NullUiWriter use default pass-through

Benefits:
- Proper separation of concerns
- Core stays clean without display-specific logic
- Testability - filter can be tested independently in g3-cli
2025-12-22 10:32:21 +11:00
Dhanji R. Prasanna
fbf31e5f68 Fix continuation errors: auto-continue when final_output not called
- Add final_output_called flag to track if LLM properly completed
- Auto-continue with prompt if tools executed but final_output missing
- Remove unused last_action_was_tool and any_text_response variables
- Simplifies previous complex incomplete response detection logic
2025-12-20 15:32:12 +11:00
Jochen
75aa2d983e Refine planner mode UI and error handling
Improve planner mode user experience with better error reporting,
cleaner tool output, and consistent log file placement.

- Propagate and display classified LLM errors to users with
  appropriate icons and context
- Display tool calls on single lines with truncated arguments
- Show LLM text responses without overwriting via UiWriter
- Ensure all logs write to workspace/logs directory consistently
- Set G3_WORKSPACE_PATH early in planning mode initialization
2025-12-09 22:44:00 +11:00
Jochen
ff8b3e7c7b Implement planning mode 2025-12-09 17:03:53 +11:00
Jochen
696c441a47 validate max_tokens for call, also fallbacks for summary
When the CW is full, max_tokens is often passed at 0 or tiny. The LLM will fail. For Anthropic with thining, there is also the thinking budget.
This can happen during summary attempts, in that case
first try thinnify, skinnify etc..
2025-12-09 10:15:32 +11:00
Dhanji R. Prasanna
9ee0468b87 test for system message 2025-12-02 14:45:12 +11:00
Dhanji R. Prasanna
d9ad244197 add markdown format only to final_output and fix todo duplication 2025-12-02 14:26:22 +11:00
Dhanji R. Prasanna
df3f25f2f0 test for resume unfinished todos 2025-12-02 11:07:13 +11:00
Dhanji R. Prasanna
1b4ea93ba4 token counting bugfix 2025-12-01 14:52:10 +11:00
Dhanji R. Prasanna
4496eee046 fix compaction to restore system message 2025-12-01 14:38:21 +11:00
Jochen
52f78653b4 add context window monitor
Writes the current context window to logs/current_context_window (uses a symlink to a session ID).

This PR was unfortunately generated by a different LLM and did a ton of superficial reformating, it's actually a fairly small and benign change, but I don't want to roll back everything. Hope that's ok.
2025-11-27 21:00:02 +11:00
Jochen
a097c3abef first cut 2025-11-21 13:56:36 +11:00
Jochen
551a577ee1 changed user choice for TODO stale check
user can ignore, mark stale or quit.
2025-11-21 12:35:14 +11:00
Jochen
28a83d2dcf check for stale TODOs
on by default, can be disabled
2025-11-21 12:09:01 +11:00
Jochen
3f21bdc7b2 fix tests 2025-11-19 12:42:37 +11:00
Dhanji R. Prasanna
8eda691cb1 todo persistence 2025-11-06 15:24:57 +11:00
Dhanji R. Prasanna
4327c839a9 added scheme and kotlin to code_search 2025-11-05 14:17:15 +11:00
Dhanji R. Prasanna
26e26cf367 test fixes 2025-11-05 14:11:59 +11:00
Dhanji R. Prasanna
fa38439a06 adding more languages to tree-sitter (java, go, cpp,..) 2025-11-05 14:07:50 +11:00
Dhanji R. Prasanna
f25a3d5e06 tree-sitter replaces ast-grep 2025-11-05 13:56:23 +11:00
Dhanji Prasanna
e1e732150a coach rigor +++ 2025-10-24 10:15:42 +11:00
Dhanji Prasanna
3afad3d61f progressive context thinning 2025-10-20 15:29:44 +11:00
Dhanji Prasanna
4a819e8f27 context window counting bug 2025-10-10 14:40:10 +11:00
Dhanji Prasanna
260c949576 token counting fixes 2025-10-09 12:11:21 +11:00