Commit Graph

13 Commits

Author SHA1 Message Date
Dhanji R. Prasanna
afc5bc8574 Readability improvements across streaming_parser, input_formatter, commands
- streaming_parser.rs: Reduced ~70 lines by removing redundant comments,
  consolidating doc comments, using slice syntax for TOOL_CALL_PATTERNS
- input_formatter.rs: Lazy regex compilation via once_cell (performance),
  cleaner function structure, reduced comment noise
- commands.rs: Extracted format_research_task_summary() and
  format_research_report_header() helpers, reduced ~40 lines of duplication
- pending_research.rs: Fixed 2 unused variable warnings in tests

All changes are behavior-preserving. 446 tests pass.

Agent: carmack
2026-01-30 14:48:08 +11:00
Dhanji R. Prasanna
fa3c9203e0 Fix streaming parser bug: detect abandoned tool call fragments
When the LLM 'stutters' and emits incomplete tool call fragments like:
  {"tool": "shell", "args": {...}}
  {"tool":
  {"tool": "shell", "args": {...}}

The parser would get stuck waiting for the incomplete fragment to complete,
causing the entire response to be lost (no tool executed, no text displayed).

This was observed in butler session butler_c6ab59af2e4f991c where the user's
'send!' command produced no response.

Fix: Enhanced is_json_invalidated() to detect when a new tool call pattern
({"tool"}) appears after a newline while parsing an incomplete JSON fragment.
This indicates the previous fragment was abandoned and should be invalidated.

Safety:
- Tool patterns inside JSON strings (e.g., writing example code) are not
  affected because the check only runs outside strings
- Added tests for the stuttering pattern and the file-writing edge case
2026-01-30 14:00:18 +11:00
Dhanji R. Prasanna
4321503e89 Refactor streaming_parser.rs and context_window.rs for readability
streaming_parser.rs (879 → 806 lines, -8%):
- Extract CodeFenceTracker struct for cleaner fence state management
- Consolidate pattern matching into module-level functions
- Rename functions for clarity (find_json_object_end, parse_all_json_tool_calls)
- Add clear section headers with // === separators
- Simplify try_parse_json_tool_call state machine

context_window.rs (889 → 843 lines, -5%):
- Eliminate duplication: reset_with_summary now delegates to reset_with_summary_and_stub
- Extract PreservedMessages struct for cleaner message preservation
- Add ThinResult::no_changes() helper to reduce boilerplate
- Simplify should_compact() and should_thin() with early returns
- Add clear section headers for navigation

All 44 tests pass. Behavior unchanged.

Agent: carmack
2026-01-20 16:17:38 +05:30
Dhanji R. Prasanna
1604ed613a Add integration tests proving tool results are never parsed as tool calls
Adds 3 new tests to json_parsing_stress_test.rs:
- test_tool_result_with_json_not_parsed: Full agent integration test proving
  that JSON in tool results (sent TO the LLM) is never parsed by the
  streaming parser (which only sees LLM output)
- test_parser_only_processes_completion_chunks: Documents that StreamingToolParser
  only accepts CompletionChunk, not Message objects
- test_architectural_separation_documented: Documents the data flow showing
  tool results flow TO the LLM while the parser only sees FROM the LLM

This proves the architectural guarantee: there is no code path where
tool result content could be parsed as a tool call, because:
1. Tool results are Message objects added to context_window
2. The streaming parser only processes CompletionChunk from provider.stream_completion()
3. These are completely separate data types flowing in opposite directions

Total: 41 JSON parsing stress tests now pass.
2026-01-19 16:21:36 +05:30
Dhanji R. Prasanna
a84fead03b refactor: improve readability of streaming parser and JSON filter
Agent: carmack

Changes:
- streaming_parser.rs: Unified find_first/last_tool_call_start into single
  find_tool_call_start with SearchDirection enum, reducing duplication.
  Simplified is_json_invalidated from 45 to 20 lines with clearer logic.
  Fixed redundant !escape_next check in find_complete_json_object_end.

- filter_json.rs: Simplified check_tool_pattern from 40 to 24 lines.
  Replaced repetitive prefix checks with loop over ["t", "to", "too", "tool"].
  Reduced trailing return statements with direct expression returns.

- ui_writer_impl.rs: Added ansi module for duration color constants.
  Simplified duration_color function by removing redundant comments.

- language_prompts.rs: Fixed test assertions to match actual prompt content
  ("obvious, readable Racket" instead of "RACKET-SPECIFIC GUIDANCE").

All 174+ tests pass. No behavior changes.
2026-01-15 13:49:29 +05:30
Dhanji R. Prasanna
0ae1a13cdb feat: real-time tool call streaming indicator with blinking UI
- Add ToolParsingHint enum (Detected/Active/Complete) for UI feedback
- New UiWriter methods: print_tool_streaming_hint(), print_tool_streaming_active()
- Refactor ConsoleUiWriter state to use atomics in ParsingHintState
- Add tool_call_streaming field to CompletionChunk for provider hints
- Anthropic provider sends streaming hints when tool name detected
- New streaming helpers: make_tool_streaming_hint(), make_tool_streaming_active()

Parser improvements:
- Add is_json_invalidated() to detect false positive tool patterns
- Fix tool result poisoning when file contents contain partial JSON
- Unescaped newlines in strings or prose after JSON invalidates detection

User sees ' ● tool_name |' immediately when tool call starts streaming,
with blinking indicator while args are received.
2026-01-15 13:49:29 +05:30
Dhanji R. Prasanna
d68f059acf fix: detect invalidated JSON tool calls to prevent parser poisoning
When partial JSON tool call patterns appear in LLM output (e.g., from
quoting file content), the parser would incorrectly report them as
"incomplete tool calls", triggering auto-continue loops.

Fix: Added is_json_invalidated() to detect when partial JSON has been
invalidated by subsequent content that cannot be valid JSON:
- Unescaped newline inside a string (invalid JSON)
- Newline followed by prose text outside a string

The check is only applied to incomplete JSON - complete tool calls
with trailing text are still correctly detected.

Added 6 new tests covering:
- Tool results with partial JSON patterns
- LLM quoting file content inline vs on own line
- Comment prefixes (// # -- etc) with partial patterns
- Real incomplete tool calls (should still be detected)
2026-01-15 13:49:29 +05:30
Dhanji R. Prasanna
999ac6fe66 fix: prevent parser poisoning from inline tool-call JSON patterns
The streaming parser was incorrectly detecting tool call patterns that
appeared inline in prose (e.g., when explaining the format), causing
g3 to return control mid-task.

Fix: Modified find_first_tool_call_start() and find_last_tool_call_start()
to only recognize patterns that appear on their own line (at start of
buffer or after newline with only whitespace before the pattern).

Changes:
- Added is_on_own_line() helper to check line-boundary conditions
- Updated detection methods to skip inline patterns
- Removed sanitize_inline_tool_patterns() and LBRACE_HOMOGLYPH (no longer needed)
- Rewrote tests for new behavior
- Added streaming_repro tests that use process_chunk() to verify the exact bug scenario

28 tests covering: streaming repro, line boundaries, Unicode, code contexts, edge cases
2026-01-15 13:49:29 +05:30
Dhanji R. Prasanna
4c36cc058c fix: prevent parser poisoning from inline tool-call JSON patterns
When the streaming parser encountered fragments of JSON that looked like
partial tool calls (e.g., {"tool":) embedded in inline text (like code
examples or prose), it would incorrectly enter JSON parsing mode and
poison the parser state, causing control to be returned to the user
mid-task.

This fix:
- Adds sanitize_inline_tool_patterns() to detect tool-call patterns that
  are NOT on their own line and replace the opening brace with a Unicode
  homoglyph (fullwidth left curly bracket U+FF5B)
- Integrates sanitization into process_chunk() before text is buffered
- Updates system prompts to instruct LLMs to use homoglyphs when showing
  example tool call JSON in prose
- Adds comprehensive tests for the sanitization logic

Real tool calls from LLMs always appear on their own line, so those are
left untouched. Only inline patterns (with non-whitespace before them)
are sanitized.
2026-01-13 10:58:41 +05:30
Dhanji R. Prasanna
e301075666 Fix panic on multi-byte chars in filter_json buffer truncation
The buffer truncation code was slicing at a raw byte offset which could
land in the middle of a multi-byte character (like emojis), causing a
panic. Fixed by using char_indices() to find valid character boundaries.

Also added stop_reason field to CompletionChunk initializers in tests
to complete the stop_reason feature addition.

- Fix byte boundary panic in filter_json.rs line 327
- Add test for multi-byte character handling
- Update test files with missing stop_reason field
2026-01-09 15:20:57 +11:00
Dhanji R. Prasanna
67be0f20c7 fix: remove allow_multiple_tool_calls config and simplify tool execution flow
This fixes a bug where the agent would stop responding abruptly without
calling final_output. The root cause was the allow_multiple_tool_calls
config option (default: false) which caused the agent to break out of
the streaming loop mid-stream after executing the first tool, losing
any subsequent content.

Changes:
- Remove allow_multiple_tool_calls config option entirely
- Always process all tool calls without breaking mid-stream
- Simplify system prompt generation (no longer needs boolean param)
- Let the stream complete fully before continuing to next iteration
- Change find_last_tool_call_start to find_first_tool_call_start
- Remove parser.reset() call on duplicate detection

Benefits:
- Simpler logic with less conditional branching
- No lost content after tool calls
- Consistent behavior for all users
- Reduced config complexity
2026-01-09 13:28:07 +11:00
Dhanji R. Prasanna
1980e62511 Improve code readability in g3-core
- streaming_parser.rs: Rename has_message_like_keys to args_contain_prose_fragments
  with improved documentation explaining the heuristic for detecting malformed
  tool calls where LLM prose leaked into JSON keys

- context_window.rs: Simplify build_thin_result_message using early return
  pattern and match expression for cleaner control flow

Agent: carmack
2026-01-07 11:16:42 +11:00
Dhanji R. Prasanna
fd22ce9890 refactor(g3-core): extract 4 modules from monolithic lib.rs
Reduce lib.rs from 7481 to 6557 lines (-12.4%) by extracting:

- paths.rs: Session/workspace path utilities (get_todo_path, get_logs_dir, etc.)
- streaming_parser.rs: StreamingToolParser for LLM response parsing
- utils.rs: Diff parsing and shell escaping utilities
- webdriver_session.rs: Unified Safari/Chrome WebDriver abstraction

All public APIs preserved via re-exports for backward compatibility.
Added 13 new unit tests across extracted modules.
All 225 tests pass.
2025-12-24 14:32:39 +11:00