Commit Graph

814 Commits

Author SHA1 Message Date
Dhanji R. Prasanna
3001df3b1a style(cli): simplify project prompt format
Change from: butler |[finances]>
Change to:   butler | finances>
2026-01-22 08:15:18 +05:30
Dhanji R. Prasanna
af8b849311 fix(read_image): use correct media type when resize fails to reduce size
When resize_image_to_dimensions() returns a larger file than the original,
we fall back to using the original bytes. Previously, was_resized was set
to true if the original dimensions exceeded MAX_IMAGE_DIMENSION, which
caused final_media_type to be set to 'image/jpeg' even though we were
using the original PNG bytes.

This caused Anthropic API errors like:
  'Image does not match the provided media type image/jpeg'

Fix: Set was_resized=false when falling back to original bytes, so the
original media type (detected from magic bytes) is preserved.
2026-01-22 07:58:05 +05:30
Dhanji R. Prasanna
022f5c70a6 feat(cli): show active project name in interactive prompt
When a project is loaded via /project, the prompt now shows:
  agent_name |[project_name]>

where the |[project_name]> part is displayed in blue.

Examples:
- Default: g3>
- With project: g3 |[myapp]>
- Agent mode: butler>
- Agent + project: butler |[myapp]>

The prompt automatically resets when /unproject is called.

Added build_prompt() function with 7 unit tests covering all prompt states.
2026-01-22 07:24:00 +05:30
Dhanji R. Prasanna
9325a43ff3 feat(cli): shorten file paths in tool output display
Add three-level path shortening hierarchy for cleaner CLI output:
1. Project path -> <project_name>/... (when project loaded via /project)
2. Workspace path -> ./... (relative to current working directory)
3. Home path -> ~/... (fallback for paths under home directory)

Changes:
- Add shorten_path() and shorten_paths_in_command() functions in display.rs
- Add project_path/project_name fields to ConsoleUiWriter
- Add set_workspace_path(), set_project_path(), clear_project() to UiWriter trait
- Add ui_writer() getter to Agent struct
- Wire up project path setting in /project and /unproject commands
- Set workspace path when creating agents in all CLI modes

Before: ● read_file | /Users/dhanji/icloud/butler/projects/appa_estate/status.md
After:  ● read_file | appa_estate/status.md (with project loaded)
        ● read_file | ./src/main.rs (workspace-relative)
        ● read_file | ~/Documents/file.txt (home-relative)
2026-01-21 21:27:16 +05:30
Dhanji R. Prasanna
0f7961d3c6 Remove libVisionBridge.dylib from install script
The VisionBridge library is no longer needed.
2026-01-21 15:27:14 +05:30
Dhanji R. Prasanna
d7d32db4a4 Fix tab completion in agent+chat mode
Remove duplicate logging initialization in agent_mode.rs. Logging is already
initialized in run() before agent mode is dispatched. The duplicate
tracing_subscriber::fmt::layer() was interfering with rustyline's terminal
state, breaking tab completion.
2026-01-21 15:24:27 +05:30
Dhanji R. Prasanna
581de4845c Add /project and /unproject to tab completion 2026-01-21 14:58:23 +05:30
Dhanji R. Prasanna
feb7c3e40d Add /project and /unproject commands for project-specific context
- Add Project struct in crates/g3-cli/src/project.rs with file loading logic
- Load brief.md, contacts.yaml, status.md from project path
- Load projects.md from workspace root for cross-project context
- Project content appended to system message (survives compaction/dehydration)
- /project <path> loads project and auto-submits prompt asking about state
- /unproject clears project content and resets context
- Add set_project_content(), clear_project_content(), has_project_content() to Agent
- Add new_for_test_with_readme() for testing with custom README content
- Add 6 unit tests for Project struct
- Add 9 integration tests for project context behavior
2026-01-21 14:53:30 +05:30
Dhanji R. Prasanna
a34a3b08e9 Rename Project Memory to Workspace Memory
Rename all references from "Project Memory" to "Workspace Memory" to avoid
future conflation if a "project" concept is introduced later.

Changes:
- Rename read_project_memory() -> read_workspace_memory()
- Update all prompts, tool descriptions, and comments
- Update header parsing in memory.rs to use "# Workspace Memory"
- Update display detection for "=== Workspace Memory ==="
- Update documentation and analysis/memory.md

11 files changed, ~36 occurrences updated.
2026-01-21 14:08:42 +05:30
Dhanji R. Prasanna
6a5ce11e7b Consolidate redundant assistant message test files
Deleted 4 redundant test files (~956 lines):
- assistant_message_dedup_test.rs (416 lines, 12 tests)
- consecutive_assistant_message_test.rs (248 lines, 6 tests)
- missing_assistant_message_test.rs (100 lines, 4 tests)
- early_return_path_test.rs (192 lines, 5 tests) - whitebox test

Created consolidated assistant_message_test.rs (369 lines, 14 tests):
- Helper function tests for consecutive message detection
- ContextWindow unit tests for normal and tool execution flows
- Bug demonstration tests documenting what bugs looked like
- Invariant tests for user/assistant alternation
- Missing assistant message fallback logic tests

The early_return_path_test was removed because it:
- Referenced specific line numbers in production code (brittle)
- Reimplemented internal logic (whitebox anti-pattern)
- Duplicated coverage from mock_provider_integration_test.rs

All 729 g3-core tests pass.
2026-01-21 10:27:07 +05:30
Dhanji R. Prasanna
c5d549c211 Readability pass: remove verbose comments and clean up tests
- completion.rs: Remove redundant comments, clean up test output (println! -> let _)
- g3_status.rs: Condense doc comments, rename from_str() to parse()
- streaming.rs: Remove obvious doc comments that duplicate function names
- simple_output.rs, ui_writer_impl.rs: Update Status::parse() calls

All changes are behavior-preserving. 132 lines removed, code is more scannable.

Agent: carmack
2026-01-21 07:13:20 +05:30
Dhanji R. Prasanna
c4ce853cc6 Fix streaming markdown tests for Dracula heading colors
Update test assertions to match new heading color scheme:
- H1: bold pink (\x1b[1;95m) instead of bold magenta
- H2: purple/magenta (\x1b[35m) - unchanged
- H3: cyan (\x1b[36m) instead of magenta
2026-01-21 07:01:53 +05:30
Dhanji R. Prasanna
9397687949 Remove unused mouse control and macax accessibility code
Removed dead code that was never used by any g3 tool:

- macax/ module (accessibility control via AXApplication, AXElement)
- move_mouse() and click_at() methods from ComputerController trait
- macax_demo.rs and test_type_text.rs examples

The ComputerController trait now only has take_screenshot(),
which is the only method actually used by the screenshot tool.
2026-01-21 06:54:31 +05:30
Dhanji R. Prasanna
a89cad955a Remove VisionBridge OCR (unused)
VisionBridge was a Swift library for Apple Vision OCR that was built
every compile but never actually used by any g3 tool.

Removed:
- vision-bridge/ Swift package directory
- src/ocr/ module (vision.rs, tesseract.rs, mod.rs)
- OCR methods from ComputerController trait
- OCR-related code from platform implementations
- TextLocation type (no longer needed)
- test_vision.rs example

Simplified:
- build.rs (now empty, no Swift compilation)
- MacOSController (no longer holds OCR engine)
- LinuxController and WindowsController (stub implementations)

Build time improvement: No more 'Building VisionBridge Swift package...'
messages on every compile.
2026-01-21 06:42:01 +05:30
Dhanji R. Prasanna
38b0019ad4 Fix compile warnings and tweak error message format
Warnings fixed:
- Remove unused 'warn' import from retry.rs
- Prefix unused 'output' param with underscore
- Prefix unused 'rel_start' with underscore
- Add #[allow(dead_code)] to G3Status::info()

Message format tweaked per feedback:
- 'g3: model overloaded [error]' (no attempt info)
- 'g3: retrying in 2.2s (1/3) ... [done]' (attempt info moved here)
- Handle empty error message in Status::Error to show just '[error]'
2026-01-20 22:49:55 +05:30
Dhanji R. Prasanna
60578e310c Clean up error and retry messages for recoverable errors
Before:
   Error: Anthropic API error: AnthropicError { error_type: "overloaded_error", ... }
  ⚠️  Model busy detected (attempt 2/3). Retrying in 2.2s...
  [ERROR logs dumped to terminal]

After:
  g3: model overloaded [error: attempt 1/3]
  g3: retrying in 2.2s ... [done]

Changes:
- Use G3Status formatting for clean, consistent output
- Downgrade ERROR logs to debug for recoverable errors
- Apply same treatment to all recoverable error types:
  rate limited, server error, network error, timeout,
  model overloaded, token limit, context length exceeded
- Update both g3-cli (task_execution.rs) and g3-core (retry.rs)
2026-01-20 22:40:09 +05:30
Dhanji R. Prasanna
53e1ea9766 Strikethrough completed TODO items in todo_read/todo_write output
Completed items (- [x]) now display with strikethrough text:
  ■ ~~Write tests~~

Incomplete items remain unchanged:
  □ Implement feature
2026-01-20 22:24:13 +05:30
Dhanji R. Prasanna
3e9d8b2c8d Distinguish heading levels with Dracula color scheme
Headings now have distinct visual hierarchy:
- # H1  → Bold pink (most prominent)
- ## H2 → Purple/magenta
- ### H3 → Cyan
- #### H4 → White
- ##### H5 → Dim
- ###### H6 → Dim

Previously H2-H6 were all identical magenta.
2026-01-20 22:19:41 +05:30
Dhanji R. Prasanna
d7f22679a9 Remove '📋 Task: ' prefix from ACD stub
The first user message in dehydrated context stubs is now shown
without any prefix, consistent with the removal of 'Task: ' prefix
from user messages.
2026-01-20 21:57:12 +05:30
Dhanji R. Prasanna
07c0bf1e39 Remove 'Task: ' prefix from user messages
The prefix was causing duplication when users typed 'Task: ...' themselves,
resulting in '📋 Task: Task: ...' in context dumps.

User messages are now stored as-is without any prefix.
2026-01-20 21:53:28 +05:30
Dhanji R. Prasanna
2eb9f2e67c Add template processing to agent prompt files
Agent prompt files (both workspace agents/<name>.md and embedded)
now support template variables like {{today}}.

This allows agent definitions to include dynamic content:
  # My Agent
  Today is {{today}}. Your mission is...
2026-01-20 21:45:15 +05:30
Dhanji R. Prasanna
58afbe5764 Merge sessions/single/b1aa4d5a 2026-01-20 21:44:12 +05:30
Dhanji R. Prasanna
9eb8931fab Change /dump output to use g3 status formatting
Replace '📄 Context dumped to: <filename>' with 'g3: context dumped to <filename> [done]'
where g3: is bold green, filename is cyan, and [done] is bold green.

Add G3Status::complete_with_path() method for status messages with highlighted paths.
2026-01-20 21:43:48 +05:30
Dhanji R. Prasanna
a882ac8893 Add template processing to one-shot and agent modes
Template variables like {{today}} are now processed in:
- One-shot mode: g3 "task with {{today}}"
- Agent mode: g3 --agent carmack "task with {{today}}"

This completes template support across all prompt entry points:
- --include-prompt files
- /run command
- One-shot task argument
- Agent mode task argument
2026-01-20 21:39:43 +05:30
Dhanji R. Prasanna
6e8dc2e866 Add template processing to /run command
Apply the same {{var}} template variable injection to prompts
loaded via the /run command in interactive mode.
2026-01-20 21:36:48 +05:30
Dhanji R. Prasanna
1a1f149206 Add template variable injection for --include-prompt
Supports {{var}} syntax for variable substitution in included prompt files.

Currently supported variables:
- {{today}}: Current date in ISO format (YYYY-MM-DD)

Unknown variables trigger a warning and are left unchanged.

- Add template.rs module with process_template() function
- Integrate template processing into read_include_prompt()
- Add comprehensive tests for template processing
2026-01-20 21:34:15 +05:30
Dhanji R. Prasanna
9a0a2a2726 Make dehydration stub more compact
Change from multi-line verbose format to single-line compact format:

Before:
   DEHYDRATED CONTEXT (fragment_id: 188c7ac71613)
     • 8 messages (4 user, 4 assistant)
     • 3 tool calls (shell ×3)
     • ~299 tokens saved

     To restore this history, call: rehydrate(fragment_id: "188c7ac71613")

After:
   DEHYDRATED CONTEXT: 3 tool calls (shell x3), 8 total msgs. To restore, call: rehydrate(fragment_id: "188c7ac71613")

- Combine all info into single line
- Remove tokens saved (not essential for rehydration decision)
- Use ASCII 'x' instead of '×' for simplicity
- Add 'no tool calls' case for fragments without tools
- Update related tests
2026-01-20 21:26:42 +05:30
Dhanji R. Prasanna
4321503e89 Refactor streaming_parser.rs and context_window.rs for readability
streaming_parser.rs (879 → 806 lines, -8%):
- Extract CodeFenceTracker struct for cleaner fence state management
- Consolidate pattern matching into module-level functions
- Rename functions for clarity (find_json_object_end, parse_all_json_tool_calls)
- Add clear section headers with // === separators
- Simplify try_parse_json_tool_call state machine

context_window.rs (889 → 843 lines, -5%):
- Eliminate duplication: reset_with_summary now delegates to reset_with_summary_and_stub
- Extract PreservedMessages struct for cleaner message preservation
- Add ThinResult::no_changes() helper to reduce boilerplate
- Simplify should_compact() and should_thin() with early returns
- Add clear section headers for navigation

All 44 tests pass. Behavior unchanged.

Agent: carmack
2026-01-20 16:17:38 +05:30
Dhanji R. Prasanna
1f5eff15e5 Updating memory for streaming structs 2026-01-20 15:47:43 +05:30
Dhanji R. Prasanna
168cfff2ed refactor(g3-core): extract tool output formatting to streaming.rs
Centralize tool output formatting logic that was duplicated/scattered in
stream_completion_with_tools(). This eliminates code-path aliasing where
tool type checks were done in multiple places.

Changes:
- Add ToolOutputFormat enum (SelfHandled, Compact, Regular)
- Add format_tool_result_summary() for centralized formatting decisions
- Add is_compact_tool() and is_self_handled_tool() helper functions
- Move parse_diff_stats() from lib.rs to streaming.rs
- Simplify tool execution display logic in lib.rs using new helpers

Net effect: -86 lines in lib.rs, +112 lines in streaming.rs
The streaming.rs additions are reusable, well-named functions.

All 585+ workspace tests pass.

Agent: fowler
2026-01-20 15:45:35 +05:30
Dhanji R. Prasanna
9abb3735d2 refactor(g3-core): use StreamingState and IterationState structs in stream_completion_with_tools
Consolidate scattered state variables in the 834-line stream_completion_with_tools()
function to use the existing StreamingState and IterationState structs from
streaming.rs. This eliminates code-path aliasing where state was tracked in
multiple places and makes the streaming loop easier to reason about.

Changes:
- Add assistant_message_added field to StreamingState
- Add stream_stop_reason field to IterationState
- Replace 8 inline state variables with StreamingState::new()
- Replace 7 iteration-local variables with IterationState::new()
- All 585 workspace tests pass

This is a pure refactor with no behavior changes. The state structs were already
defined in streaming.rs but not used in the main streaming loop.

Agent: fowler
2026-01-20 15:05:23 +05:30
Dhanji R. Prasanna
dec22f5e58 refactor(g3-cli): extract commands module and fix test organization
- Extract handle_command() from interactive.rs to new commands.rs module
  (320 lines, 15 match arms for /help, /compact, /thinnify, etc.)
- Fix orphaned tests in completion.rs that were outside mod tests block
- Add #[allow(dead_code)] to with_include_prompt_filename() (used in tests)
- interactive.rs reduced from 595 to 290 lines

Agent: fowler
2026-01-20 14:30:50 +05:30
Dhanji R. Prasanna
710c54105b refactor(cli): extract display utilities to eliminate code duplication
Created display.rs module with shared display functions:
- format_workspace_path() / print_workspace_path()
- LoadedContent struct for tracking loaded project files
- print_loaded_status() for status line display
- print_project_heading() for README heading

Updated interactive.rs and agent_mode.rs to use the new module,
eliminating duplicated workspace path formatting and loaded items
status line logic.

Results:
- interactive.rs: 641 → 595 lines (-46)
- agent_mode.rs: 312 → 288 lines (-24)
- New display.rs: 197 lines with 5 unit tests

Agent: fowler
2026-01-20 14:22:46 +05:30
Dhanji R. Prasanna
ecea49d328 Fix --acd flag not being passed to agent mode
The --acd flag was being checked AFTER the agent mode early return,
so it was never applied when running with --agent.

Fix: Pass acd_enabled parameter to run_agent_mode() and call
agent.set_acd_enabled(true) when the flag is set.
2026-01-20 14:12:40 +05:30
Dhanji R. Prasanna
1ec01bb4e3 Limit /resume completion to 8 most recent sessions
Always shows at most 8 sessions in tab completion, sorted by newest first.
This applies whether the user types /resume <TAB> or /resume abc<TAB>.

Implementation:
- list_sessions() returns all sessions sorted by mtime (newest first)
- Completion filters by prefix, then takes first 8 matches
2026-01-20 13:52:28 +05:30
Dhanji R. Prasanna
02ceb6e64c Add /resume <session-id> tab completion
Phase 2 of tab completion: semantic completion for session IDs.

Features:
- /resume <TAB> lists all available sessions from .g3/sessions/
- /resume abc<TAB> filters to sessions starting with 'abc'
- Gracefully returns empty if .g3/sessions/ doesn't exist

Implementation:
- Added list_sessions() helper method to G3Helper
- Added Case 4 in complete() for /resume command
- Updated module docs to reflect new capability

Tests:
- test_resume_completion_lists_sessions - verifies listing and filtering
- test_resume_completion_graceful_no_panic - verifies no crash without sessions dir
2026-01-20 13:04:05 +05:30
Dhanji R. Prasanna
8acbdd7ad4 Add tests for bare quote and non-path quoted text edge cases
Verifies that tab completion correctly ignores:
- Bare quotes: "<TAB> - no path prefix, no completion
- Quoted non-paths: "hello world<TAB> - not a path, no completion
- Quoted text without path prefix: "foo<TAB> - no completion

Also fixes test placement (moved tests inside mod tests block)
2026-01-20 11:44:29 +05:30
Dhanji R. Prasanna
58b1a51e2d Fix tab completion for quoted paths and backslash escapes
Edge cases now handled:
1. Unclosed quotes: "~/My <TAB> - completes paths inside quotes
2. Backslash escapes: ~/My\ <TAB> - unescapes before completing
3. Closed quotes: "~/My Files/"<TAB> - works correctly

Key changes:
- extract_word() now tracks backslash escapes (prev_was_backslash)
- is_path_prefix() strips leading quotes before checking
- Added strip_quotes() and unescape_path() helper methods
- complete() now:
  - Strips quotes and unescapes paths before calling FilenameCompleter
  - Re-wraps completions in quotes or escapes as appropriate
  - Preserves user's quoting style (double vs single quotes)
  - Uses backslash escapes if user was already using them

Tests added:
- test_actual_completion_with_quotes - verifies all three edge cases
2026-01-20 11:41:32 +05:30
Dhanji R. Prasanna
96cc18b83c Extend tab completion to path-like prefixes anywhere in line
Path completion now works for:
- ./<TAB> - current directory
- ../<TAB> - parent directory
- ~/<TAB> - home directory
- /<TAB> (not at start of line) - root directory

Command completion (/<TAB>) only triggers at the start of the line.
If no command matches, falls through to path completion (e.g., /etc).

Quote-aware word extraction handles paths with spaces:
- "~/My Files/<TAB>" works correctly

Added tests for:
- Path prefix detection
- Word extraction with quotes
- Command vs path disambiguation
2026-01-20 11:19:13 +05:30
Dhanji R. Prasanna
dd3db0227d Add tab completion for commands and file paths
Implement tab completion in interactive mode using rustyline:

- Command completion: /<TAB> shows all commands, /com<TAB> -> /compact
- File path completion: /run <TAB> completes file/directory paths
- Supports tilde expansion for home directory

Architecture is extensible for future semantic completions:
- /resume <TAB> -> session IDs (Phase 2)
- /rehydrate <TAB> -> fragment IDs (Phase 2)

New module: completion.rs with G3Helper struct implementing
rustyline's Completer trait.
2026-01-20 10:57:33 +05:30
Dhanji R. Prasanna
4db2150386 Change /run status message from 'running' to 'loading' 2026-01-20 10:34:06 +05:30
Dhanji R. Prasanna
6873f980a1 Use G3Status for /run command output
Change from custom emoji format to consistent g3: status message:
'g3: running <path> ... [done]'
2026-01-20 10:27:26 +05:30
Dhanji R. Prasanna
f24ea333f1 Add /run command to execute prompts from files
New interactive command: /run <file-path>
- Reads the specified file and executes its content as a prompt
- Supports tilde expansion for home directory paths
- Behaves exactly like pasting the file content into the g3> prompt
- Shows helpful error messages for missing files or empty content
2026-01-20 10:23:24 +05:30
Dhanji R. Prasanna
10bce7f66f Remove ANSI formatting codes from g3-core
Move terminal formatting responsibility to g3-cli layer:

- format_str_replace_summary(): Remove ANSI codes, add colorize_str_replace_summary()
  helper in CLI to apply green/red colors for insertions/deletions
- format_timing_footer(): Remove dimming ANSI codes (now plain text)
- str_replace tool result: Remove ANSI codes from success message

Remaining acceptable ANSI usage in g3-core:
- iTerm2 inline image protocol (terminal-specific escape sequence)
- Image metadata dimming (direct print, would need larger refactor)
- Terminal beep for stale TODO warning (audio, not visual)
- ANSI stripping utility in research.rs (not output)

This continues the separation of concerns: g3-core handles logic,
g3-cli handles all terminal formatting.
2026-01-20 10:00:37 +05:30
Dhanji R. Prasanna
182f5f98fe Centralize g3 status message formatting
Extract a new g3_status module in g3-cli that provides consistent formatting
for all 'g3:' prefixed system status messages.

Key changes:
- Add G3Status struct with methods for progress, done, failed, error, etc.
- Add Status enum with Done, Failed, Error, Resolved, Insufficient, NoChanges
- Add ThinResult struct in g3-core for semantic thinning data
- Update UiWriter trait with print_thin_result() method
- Refactor context thinning to return ThinResult instead of formatted strings
- Update all callers to use the new centralized formatting
- Session resume/decline messages now use G3Status
- Compaction status messages now use G3Status

This maintains clean separation of concerns: g3-core emits semantic data,
g3-cli handles all terminal formatting and colors.
2026-01-20 09:50:55 +05:30
Dhanji R. Prasanna
7bd72a4a51 Add tests for tool-specific timeout durations
Adds 8 unit tests verifying:
- Research tool has 20-minute timeout
- All other tools (shell, read_file, write_file, str_replace, code_search,
  webdriver_*, etc.) have standard 8-minute timeout
- Comprehensive test_only_research_has_extended_timeout covers 19 tools

This ensures future changes don't accidentally affect other tool timeouts.
2026-01-19 21:58:16 +05:30
Dhanji R. Prasanna
4b7be3f9ee Increase research tool timeout to 20 minutes
The research tool often runs past 8 minutes due to web browsing and
analysis. Increased its timeout to 20 minutes while keeping other
tools at 8 minutes.

Changes:
- Tool timeout is now tool-specific (20 min for research, 8 min for others)
- Timeout error message now shows the correct duration for each tool
2026-01-19 21:51:08 +05:30
Dhanji R. Prasanna
f4cce22db3 Add test documenting LLM duplicate text behavior
Adds test_llm_repeats_text_before_each_tool_call() which documents the
scenario where the LLM re-outputs the same preamble text before each
tool call in a multi-tool response.

Analysis showed this is LLM behavior, not a g3 bug:
- Each assistant message is correctly stored with different tool calls
- The duplicate display is the LLM choosing to repeat context
- Storage is correct, display accurately reflects LLM output

Decision: Accept as LLM behavior (Option B). Future LLM improvements
may resolve this naturally without g3 code changes.
2026-01-19 18:44:01 +05:30
Dhanji R. Prasanna
6ff21a7d47 Fix JSON filter to preserve code fence and indented content
Two cosmetic bugs fixed:
1. JSON inside code fences was being filtered - now tracks fence state
   and passes through all content inside ``` ... ``` blocks
2. Indented JSON was being filtered - now recognizes that real tool
   calls are never indented, so indented JSON is always documentation

Changes:
- Added in_code_fence and fence_buffer fields to FilterState
- Added track_code_fence() to detect ``` markers (with/without language)
- Added pass_through_char() for content inside code fences
- Modified '{' handling to only filter when no leading whitespace
- Added 4 new unit tests for code fence and indentation cases
- Updated 3 stress tests to expect new (correct) behavior

All 16 filter_json unit tests and 59 stress tests pass.
2026-01-19 17:00:43 +05:30
Dhanji R. Prasanna
1604ed613a Add integration tests proving tool results are never parsed as tool calls
Adds 3 new tests to json_parsing_stress_test.rs:
- test_tool_result_with_json_not_parsed: Full agent integration test proving
  that JSON in tool results (sent TO the LLM) is never parsed by the
  streaming parser (which only sees LLM output)
- test_parser_only_processes_completion_chunks: Documents that StreamingToolParser
  only accepts CompletionChunk, not Message objects
- test_architectural_separation_documented: Documents the data flow showing
  tool results flow TO the LLM while the parser only sees FROM the LLM

This proves the architectural guarantee: there is no code path where
tool result content could be parsed as a tool call, because:
1. Tool results are Message objects added to context_window
2. The streaming parser only processes CompletionChunk from provider.stream_completion()
3. These are completely separate data types flowing in opposite directions

Total: 41 JSON parsing stress tests now pass.
2026-01-19 16:21:36 +05:30