alex/g3 - g3 - Millerson GIT hosting

alex/g3

Author	SHA1	Message	Date
Dhanji R. Prasanna	afc5bc8574	Readability improvements across streaming_parser, input_formatter, commands - streaming_parser.rs: Reduced ~70 lines by removing redundant comments, consolidating doc comments, using slice syntax for TOOL_CALL_PATTERNS - input_formatter.rs: Lazy regex compilation via once_cell (performance), cleaner function structure, reduced comment noise - commands.rs: Extracted format_research_task_summary() and format_research_report_header() helpers, reduced ~40 lines of duplication - pending_research.rs: Fixed 2 unused variable warnings in tests All changes are behavior-preserving. 446 tests pass. Agent: carmack	2026-01-30 14:48:08 +11:00
Dhanji R. Prasanna	fa3c9203e0	Fix streaming parser bug: detect abandoned tool call fragments When the LLM 'stutters' and emits incomplete tool call fragments like: {"tool": "shell", "args": {...}} {"tool": {"tool": "shell", "args": {...}} The parser would get stuck waiting for the incomplete fragment to complete, causing the entire response to be lost (no tool executed, no text displayed). This was observed in butler session butler_c6ab59af2e4f991c where the user's 'send!' command produced no response. Fix: Enhanced is_json_invalidated() to detect when a new tool call pattern ({"tool"}) appears after a newline while parsing an incomplete JSON fragment. This indicates the previous fragment was abandoned and should be invalidated. Safety: - Tool patterns inside JSON strings (e.g., writing example code) are not affected because the check only runs outside strings - Added tests for the stuttering pattern and the file-writing edge case	2026-01-30 14:00:18 +11:00
Dhanji R. Prasanna	4321503e89	Refactor streaming_parser.rs and context_window.rs for readability streaming_parser.rs (879 → 806 lines, -8%): - Extract CodeFenceTracker struct for cleaner fence state management - Consolidate pattern matching into module-level functions - Rename functions for clarity (find_json_object_end, parse_all_json_tool_calls) - Add clear section headers with // === separators - Simplify try_parse_json_tool_call state machine context_window.rs (889 → 843 lines, -5%): - Eliminate duplication: reset_with_summary now delegates to reset_with_summary_and_stub - Extract PreservedMessages struct for cleaner message preservation - Add ThinResult::no_changes() helper to reduce boilerplate - Simplify should_compact() and should_thin() with early returns - Add clear section headers for navigation All 44 tests pass. Behavior unchanged. Agent: carmack	2026-01-20 16:17:38 +05:30
Dhanji R. Prasanna	1604ed613a	Add integration tests proving tool results are never parsed as tool calls Adds 3 new tests to json_parsing_stress_test.rs: - test_tool_result_with_json_not_parsed: Full agent integration test proving that JSON in tool results (sent TO the LLM) is never parsed by the streaming parser (which only sees LLM output) - test_parser_only_processes_completion_chunks: Documents that StreamingToolParser only accepts CompletionChunk, not Message objects - test_architectural_separation_documented: Documents the data flow showing tool results flow TO the LLM while the parser only sees FROM the LLM This proves the architectural guarantee: there is no code path where tool result content could be parsed as a tool call, because: 1. Tool results are Message objects added to context_window 2. The streaming parser only processes CompletionChunk from provider.stream_completion() 3. These are completely separate data types flowing in opposite directions Total: 41 JSON parsing stress tests now pass.	2026-01-19 16:21:36 +05:30
Dhanji R. Prasanna	a84fead03b	refactor: improve readability of streaming parser and JSON filter Agent: carmack Changes: - streaming_parser.rs: Unified find_first/last_tool_call_start into single find_tool_call_start with SearchDirection enum, reducing duplication. Simplified is_json_invalidated from 45 to 20 lines with clearer logic. Fixed redundant !escape_next check in find_complete_json_object_end. - filter_json.rs: Simplified check_tool_pattern from 40 to 24 lines. Replaced repetitive prefix checks with loop over ["t", "to", "too", "tool"]. Reduced trailing return statements with direct expression returns. - ui_writer_impl.rs: Added ansi module for duration color constants. Simplified duration_color function by removing redundant comments. - language_prompts.rs: Fixed test assertions to match actual prompt content ("obvious, readable Racket" instead of "RACKET-SPECIFIC GUIDANCE"). All 174+ tests pass. No behavior changes.	2026-01-15 13:49:29 +05:30
Dhanji R. Prasanna	0ae1a13cdb	feat: real-time tool call streaming indicator with blinking UI - Add ToolParsingHint enum (Detected/Active/Complete) for UI feedback - New UiWriter methods: print_tool_streaming_hint(), print_tool_streaming_active() - Refactor ConsoleUiWriter state to use atomics in ParsingHintState - Add tool_call_streaming field to CompletionChunk for provider hints - Anthropic provider sends streaming hints when tool name detected - New streaming helpers: make_tool_streaming_hint(), make_tool_streaming_active() Parser improvements: - Add is_json_invalidated() to detect false positive tool patterns - Fix tool result poisoning when file contents contain partial JSON - Unescaped newlines in strings or prose after JSON invalidates detection User sees ' ● tool_name \|' immediately when tool call starts streaming, with blinking indicator while args are received.	2026-01-15 13:49:29 +05:30
Dhanji R. Prasanna	d68f059acf	fix: detect invalidated JSON tool calls to prevent parser poisoning When partial JSON tool call patterns appear in LLM output (e.g., from quoting file content), the parser would incorrectly report them as "incomplete tool calls", triggering auto-continue loops. Fix: Added is_json_invalidated() to detect when partial JSON has been invalidated by subsequent content that cannot be valid JSON: - Unescaped newline inside a string (invalid JSON) - Newline followed by prose text outside a string The check is only applied to incomplete JSON - complete tool calls with trailing text are still correctly detected. Added 6 new tests covering: - Tool results with partial JSON patterns - LLM quoting file content inline vs on own line - Comment prefixes (// # -- etc) with partial patterns - Real incomplete tool calls (should still be detected)	2026-01-15 13:49:29 +05:30
Dhanji R. Prasanna	999ac6fe66	fix: prevent parser poisoning from inline tool-call JSON patterns The streaming parser was incorrectly detecting tool call patterns that appeared inline in prose (e.g., when explaining the format), causing g3 to return control mid-task. Fix: Modified find_first_tool_call_start() and find_last_tool_call_start() to only recognize patterns that appear on their own line (at start of buffer or after newline with only whitespace before the pattern). Changes: - Added is_on_own_line() helper to check line-boundary conditions - Updated detection methods to skip inline patterns - Removed sanitize_inline_tool_patterns() and LBRACE_HOMOGLYPH (no longer needed) - Rewrote tests for new behavior - Added streaming_repro tests that use process_chunk() to verify the exact bug scenario 28 tests covering: streaming repro, line boundaries, Unicode, code contexts, edge cases	2026-01-15 13:49:29 +05:30
Dhanji R. Prasanna	4c36cc058c	fix: prevent parser poisoning from inline tool-call JSON patterns When the streaming parser encountered fragments of JSON that looked like partial tool calls (e.g., {"tool":) embedded in inline text (like code examples or prose), it would incorrectly enter JSON parsing mode and poison the parser state, causing control to be returned to the user mid-task. This fix: - Adds sanitize_inline_tool_patterns() to detect tool-call patterns that are NOT on their own line and replace the opening brace with a Unicode homoglyph (fullwidth left curly bracket U+FF5B) - Integrates sanitization into process_chunk() before text is buffered - Updates system prompts to instruct LLMs to use homoglyphs when showing example tool call JSON in prose - Adds comprehensive tests for the sanitization logic Real tool calls from LLMs always appear on their own line, so those are left untouched. Only inline patterns (with non-whitespace before them) are sanitized.	2026-01-13 10:58:41 +05:30
Dhanji R. Prasanna	e301075666	Fix panic on multi-byte chars in filter_json buffer truncation The buffer truncation code was slicing at a raw byte offset which could land in the middle of a multi-byte character (like emojis), causing a panic. Fixed by using char_indices() to find valid character boundaries. Also added stop_reason field to CompletionChunk initializers in tests to complete the stop_reason feature addition. - Fix byte boundary panic in filter_json.rs line 327 - Add test for multi-byte character handling - Update test files with missing stop_reason field	2026-01-09 15:20:57 +11:00
Dhanji R. Prasanna	67be0f20c7	fix: remove allow_multiple_tool_calls config and simplify tool execution flow This fixes a bug where the agent would stop responding abruptly without calling final_output. The root cause was the allow_multiple_tool_calls config option (default: false) which caused the agent to break out of the streaming loop mid-stream after executing the first tool, losing any subsequent content. Changes: - Remove allow_multiple_tool_calls config option entirely - Always process all tool calls without breaking mid-stream - Simplify system prompt generation (no longer needs boolean param) - Let the stream complete fully before continuing to next iteration - Change find_last_tool_call_start to find_first_tool_call_start - Remove parser.reset() call on duplicate detection Benefits: - Simpler logic with less conditional branching - No lost content after tool calls - Consistent behavior for all users - Reduced config complexity	2026-01-09 13:28:07 +11:00
Dhanji R. Prasanna	1980e62511	Improve code readability in g3-core - streaming_parser.rs: Rename has_message_like_keys to args_contain_prose_fragments with improved documentation explaining the heuristic for detecting malformed tool calls where LLM prose leaked into JSON keys - context_window.rs: Simplify build_thin_result_message using early return pattern and match expression for cleaner control flow Agent: carmack	2026-01-07 11:16:42 +11:00
Dhanji R. Prasanna	fd22ce9890	refactor(g3-core): extract 4 modules from monolithic lib.rs Reduce lib.rs from 7481 to 6557 lines (-12.4%) by extracting: - paths.rs: Session/workspace path utilities (get_todo_path, get_logs_dir, etc.) - streaming_parser.rs: StreamingToolParser for LLM response parsing - utils.rs: Diff parsing and shell escaping utilities - webdriver_session.rs: Unified Safari/Chrome WebDriver abstraction All public APIs preserved via re-exports for backward compatibility. Added 13 new unit tests across extracted modules. All 225 tests pass.	2025-12-24 14:32:39 +11:00

13 Commits