fix: strip duplicate tool call JSON from assistant messages when LLM stutters

When the LLM emits identical JSON tool calls as text content (JSON
fallback mode), the raw duplicate JSON was being stored in the assistant
message in conversation history. This confused the model on subsequent
turns, causing it to stall or repeat itself.

Root cause: raw_content_for_log used get_text_content() which returns
the full parser buffer including all duplicate tool call JSONs.

Fix: Added get_text_before_tool_calls() to StreamingToolParser that
returns only the text before the first JSON tool call. Changed
raw_content_for_log to use this method so the assistant message only
contains the preamble text + the single executed tool call.

Added 5 integration tests covering stuttered duplicates, triple
stutter, cross-turn dedup, and different-args boundary case.

Added MockResponse helpers for simulating LLM stutter patterns.
This commit is contained in:
Dhanji R. Prasanna
2026-02-10 19:53:11 +11:00
parent f9625f1a2d
commit 2a4cd1f4d6
4 changed files with 302 additions and 2 deletions

View File

@@ -2368,7 +2368,10 @@ Skip if nothing new. Be brief."#;
let already_displayed_chars = iter.current_response.chars().count();
let text_content = iter.parser.get_text_content();
let clean_content = streaming::clean_llm_tokens(&text_content);
let raw_content_for_log = clean_content.clone();
// Use only the text before tool calls for the log message.
// This prevents duplicate tool call JSON from being stored
// in the assistant message when the LLM stutters.
let raw_content_for_log = streaming::clean_llm_tokens(iter.parser.get_text_before_tool_calls());
let filtered_content =
self.ui_writer.filter_json_tool_calls(&clean_content);
let final_display_content = filtered_content.trim();

View File

@@ -481,6 +481,20 @@ impl StreamingToolParser {
&self.text_buffer
}
/// Get only the text content before the first JSON tool call.
/// Returns the full buffer if no tool calls are found.
/// This is used to save the "preamble" text (e.g. "Let me run that.")
/// without including raw duplicate tool call JSON in the assistant message.
pub fn get_text_before_tool_calls(&self) -> &str {
let fence_ranges = find_code_fence_ranges(&self.text_buffer);
if let Some(pos) = find_first_tool_call_start(&self.text_buffer) {
if !is_position_in_fence_ranges(pos, &fence_ranges) {
return self.text_buffer[..pos].trim_end();
}
}
&self.text_buffer
}
pub fn get_content_before_position(&self, pos: usize) -> String {
if pos <= self.text_buffer.len() {
self.text_buffer[..pos].to_string()