Commit Graph

4 Commits

Author SHA1 Message Date
Dhanji R. Prasanna
2a4cd1f4d6 fix: strip duplicate tool call JSON from assistant messages when LLM stutters
When the LLM emits identical JSON tool calls as text content (JSON
fallback mode), the raw duplicate JSON was being stored in the assistant
message in conversation history. This confused the model on subsequent
turns, causing it to stall or repeat itself.

Root cause: raw_content_for_log used get_text_content() which returns
the full parser buffer including all duplicate tool call JSONs.

Fix: Added get_text_before_tool_calls() to StreamingToolParser that
returns only the text before the first JSON tool call. Changed
raw_content_for_log to use this method so the assistant message only
contains the preamble text + the single executed tool call.

Added 5 integration tests covering stuttered duplicates, triple
stutter, cross-turn dedup, and different-args boundary case.

Added MockResponse helpers for simulating LLM stutter patterns.
2026-02-10 19:53:11 +11:00
Dhanji R. Prasanna
5b4079e861 Add prompt cache statistics tracking to /stats command
- Extend Usage struct with cache_creation_tokens and cache_read_tokens fields
- Parse Anthropic cache_creation_input_tokens and cache_read_input_tokens
- Parse OpenAI prompt_tokens_details.cached_tokens for automatic prefix caching
- Add CacheStats struct to Agent for cumulative tracking across API calls
- Add "Prompt Cache Statistics" section to /stats output showing:
  - API call count and cache hit count
  - Hit rate percentage
  - Total input tokens and cache read/creation tokens
  - Cache efficiency (% of input served from cache)
- Update all provider implementations and test files
2026-01-27 11:32:45 +11:00
Dhanji R. Prasanna
2043a83e7d Add comprehensive MockProvider integration tests
Added 6 new integration tests for stream_completion_with_tools:
- test_text_before_tool_call_preserved: text before native tool call is saved
- test_native_tool_call_execution: native tool calls execute correctly
- test_duplicate_tool_calls_skipped: sequential duplicates are detected
- test_json_fallback_tool_calling: JSON tool calls work without native support
- test_text_after_tool_execution_preserved: follow-up text is saved
- test_multiple_tool_calls_executed: multiple tool calls in sequence work

Also added MockResponse helper methods:
- text_then_native_tool(): text followed by native tool call
- duplicate_native_tool_calls(): same tool call twice (for dedup testing)

Fixed text_with_json_tool() to ensure "tool" key comes before "args"
(serde_json alphabetizes keys, breaking pattern detection).

Total: 18 integration tests covering historical bugs and core behaviors.
2026-01-19 14:44:30 +05:30
Dhanji R. Prasanna
292a3aa48d Add MockProvider for integration testing
Adds a configurable mock LLM provider that can simulate various behaviors:
- Text-only responses (single or multi-chunk streaming)
- Native tool calls
- JSON tool calls in text
- Truncated responses (max_tokens)
- Multi-turn conversations

Features:
- Builder pattern for easy test setup
- Request tracking for verification
- Preset scenarios for common patterns
- Full LLMProvider trait implementation

Also adds integration tests that use MockProvider to test the
stream_completion_with_tools code path, including:
- test_butler_bug_scenario: reproduces the exact bug where text-only
  responses were not saved to context, causing consecutive user messages

This enables testing complex streaming behaviors without real API calls.
2026-01-19 13:59:31 +05:30