Adds 3 new tests to json_parsing_stress_test.rs:
- test_tool_result_with_json_not_parsed: Full agent integration test proving
that JSON in tool results (sent TO the LLM) is never parsed by the
streaming parser (which only sees LLM output)
- test_parser_only_processes_completion_chunks: Documents that StreamingToolParser
only accepts CompletionChunk, not Message objects
- test_architectural_separation_documented: Documents the data flow showing
tool results flow TO the LLM while the parser only sees FROM the LLM
This proves the architectural guarantee: there is no code path where
tool result content could be parsed as a tool call, because:
1. Tool results are Message objects added to context_window
2. The streaming parser only processes CompletionChunk from provider.stream_completion()
3. These are completely separate data types flowing in opposite directions
Total: 41 JSON parsing stress tests now pass.
Agent: carmack
Changes:
- streaming_parser.rs: Unified find_first/last_tool_call_start into single
find_tool_call_start with SearchDirection enum, reducing duplication.
Simplified is_json_invalidated from 45 to 20 lines with clearer logic.
Fixed redundant !escape_next check in find_complete_json_object_end.
- filter_json.rs: Simplified check_tool_pattern from 40 to 24 lines.
Replaced repetitive prefix checks with loop over ["t", "to", "too", "tool"].
Reduced trailing return statements with direct expression returns.
- ui_writer_impl.rs: Added ansi module for duration color constants.
Simplified duration_color function by removing redundant comments.
- language_prompts.rs: Fixed test assertions to match actual prompt content
("obvious, readable Racket" instead of "RACKET-SPECIFIC GUIDANCE").
All 174+ tests pass. No behavior changes.
- Add ToolParsingHint enum (Detected/Active/Complete) for UI feedback
- New UiWriter methods: print_tool_streaming_hint(), print_tool_streaming_active()
- Refactor ConsoleUiWriter state to use atomics in ParsingHintState
- Add tool_call_streaming field to CompletionChunk for provider hints
- Anthropic provider sends streaming hints when tool name detected
- New streaming helpers: make_tool_streaming_hint(), make_tool_streaming_active()
Parser improvements:
- Add is_json_invalidated() to detect false positive tool patterns
- Fix tool result poisoning when file contents contain partial JSON
- Unescaped newlines in strings or prose after JSON invalidates detection
User sees ' ● tool_name |' immediately when tool call starts streaming,
with blinking indicator while args are received.
When partial JSON tool call patterns appear in LLM output (e.g., from
quoting file content), the parser would incorrectly report them as
"incomplete tool calls", triggering auto-continue loops.
Fix: Added is_json_invalidated() to detect when partial JSON has been
invalidated by subsequent content that cannot be valid JSON:
- Unescaped newline inside a string (invalid JSON)
- Newline followed by prose text outside a string
The check is only applied to incomplete JSON - complete tool calls
with trailing text are still correctly detected.
Added 6 new tests covering:
- Tool results with partial JSON patterns
- LLM quoting file content inline vs on own line
- Comment prefixes (// # -- etc) with partial patterns
- Real incomplete tool calls (should still be detected)
The streaming parser was incorrectly detecting tool call patterns that
appeared inline in prose (e.g., when explaining the format), causing
g3 to return control mid-task.
Fix: Modified find_first_tool_call_start() and find_last_tool_call_start()
to only recognize patterns that appear on their own line (at start of
buffer or after newline with only whitespace before the pattern).
Changes:
- Added is_on_own_line() helper to check line-boundary conditions
- Updated detection methods to skip inline patterns
- Removed sanitize_inline_tool_patterns() and LBRACE_HOMOGLYPH (no longer needed)
- Rewrote tests for new behavior
- Added streaming_repro tests that use process_chunk() to verify the exact bug scenario
28 tests covering: streaming repro, line boundaries, Unicode, code contexts, edge cases
When the streaming parser encountered fragments of JSON that looked like
partial tool calls (e.g., {"tool":) embedded in inline text (like code
examples or prose), it would incorrectly enter JSON parsing mode and
poison the parser state, causing control to be returned to the user
mid-task.
This fix:
- Adds sanitize_inline_tool_patterns() to detect tool-call patterns that
are NOT on their own line and replace the opening brace with a Unicode
homoglyph (fullwidth left curly bracket U+FF5B)
- Integrates sanitization into process_chunk() before text is buffered
- Updates system prompts to instruct LLMs to use homoglyphs when showing
example tool call JSON in prose
- Adds comprehensive tests for the sanitization logic
Real tool calls from LLMs always appear on their own line, so those are
left untouched. Only inline patterns (with non-whitespace before them)
are sanitized.
The buffer truncation code was slicing at a raw byte offset which could
land in the middle of a multi-byte character (like emojis), causing a
panic. Fixed by using char_indices() to find valid character boundaries.
Also added stop_reason field to CompletionChunk initializers in tests
to complete the stop_reason feature addition.
- Fix byte boundary panic in filter_json.rs line 327
- Add test for multi-byte character handling
- Update test files with missing stop_reason field
This fixes a bug where the agent would stop responding abruptly without
calling final_output. The root cause was the allow_multiple_tool_calls
config option (default: false) which caused the agent to break out of
the streaming loop mid-stream after executing the first tool, losing
any subsequent content.
Changes:
- Remove allow_multiple_tool_calls config option entirely
- Always process all tool calls without breaking mid-stream
- Simplify system prompt generation (no longer needs boolean param)
- Let the stream complete fully before continuing to next iteration
- Change find_last_tool_call_start to find_first_tool_call_start
- Remove parser.reset() call on duplicate detection
Benefits:
- Simpler logic with less conditional branching
- No lost content after tool calls
- Consistent behavior for all users
- Reduced config complexity
- streaming_parser.rs: Rename has_message_like_keys to args_contain_prose_fragments
with improved documentation explaining the heuristic for detecting malformed
tool calls where LLM prose leaked into JSON keys
- context_window.rs: Simplify build_thin_result_message using early return
pattern and match expression for cleaner control flow
Agent: carmack
Reduce lib.rs from 7481 to 6557 lines (-12.4%) by extracting:
- paths.rs: Session/workspace path utilities (get_todo_path, get_logs_dir, etc.)
- streaming_parser.rs: StreamingToolParser for LLM response parsing
- utils.rs: Diff parsing and shell escaping utilities
- webdriver_session.rs: Unified Safari/Chrome WebDriver abstraction
All public APIs preserved via re-exports for backward compatibility.
Added 13 new unit tests across extracted modules.
All 225 tests pass.