fix: strip duplicate tool call JSON from assistant messages when LLM stutters
When the LLM emits identical JSON tool calls as text content (JSON fallback mode), the raw duplicate JSON was being stored in the assistant message in conversation history. This confused the model on subsequent turns, causing it to stall or repeat itself. Root cause: raw_content_for_log used get_text_content() which returns the full parser buffer including all duplicate tool call JSONs. Fix: Added get_text_before_tool_calls() to StreamingToolParser that returns only the text before the first JSON tool call. Changed raw_content_for_log to use this method so the assistant message only contains the preamble text + the single executed tool call. Added 5 integration tests covering stuttered duplicates, triple stutter, cross-turn dedup, and different-args boundary case. Added MockResponse helpers for simulating LLM stutter patterns.
This commit is contained in:
@@ -2368,7 +2368,10 @@ Skip if nothing new. Be brief."#;
|
||||
let already_displayed_chars = iter.current_response.chars().count();
|
||||
let text_content = iter.parser.get_text_content();
|
||||
let clean_content = streaming::clean_llm_tokens(&text_content);
|
||||
let raw_content_for_log = clean_content.clone();
|
||||
// Use only the text before tool calls for the log message.
|
||||
// This prevents duplicate tool call JSON from being stored
|
||||
// in the assistant message when the LLM stutters.
|
||||
let raw_content_for_log = streaming::clean_llm_tokens(iter.parser.get_text_before_tool_calls());
|
||||
let filtered_content =
|
||||
self.ui_writer.filter_json_tool_calls(&clean_content);
|
||||
let final_display_content = filtered_content.trim();
|
||||
|
||||
Reference in New Issue
Block a user