fix: prevent parser poisoning from inline tool-call JSON patterns

When the streaming parser encountered fragments of JSON that looked like
partial tool calls (e.g., {"tool":) embedded in inline text (like code
examples or prose), it would incorrectly enter JSON parsing mode and
poison the parser state, causing control to be returned to the user
mid-task.

This fix:
- Adds sanitize_inline_tool_patterns() to detect tool-call patterns that
  are NOT on their own line and replace the opening brace with a Unicode
  homoglyph (fullwidth left curly bracket U+FF5B)
- Integrates sanitization into process_chunk() before text is buffered
- Updates system prompts to instruct LLMs to use homoglyphs when showing
  example tool call JSON in prose
- Adds comprehensive tests for the sanitization logic

Real tool calls from LLMs always appear on their own line, so those are
left untouched. Only inline patterns (with non-whitespace before them)
are sanitized.
This commit is contained in:
Dhanji R. Prasanna
2026-01-13 10:58:41 +05:30
parent a0b9126555
commit 4c36cc058c
2 changed files with 146 additions and 2 deletions

View File

@@ -138,6 +138,7 @@ After discovering where WebDriver tools live:
- Use Markdown formatting for all responses except tool calls.
- Whenever taking actions, use the pronoun 'I'
- When you discover features, patterns and code locations, call `remember` to save them.
- When showing example tool call JSON in prose or code blocks, use the fullwidth left curly bracket `` (U+FF5B) instead of `{` to prevent parser confusion.
";
pub const SYSTEM_PROMPT_FOR_NATIVE_TOOL_USE: &'static str = SYSTEM_NATIVE_TOOL_CALLS;
@@ -337,6 +338,7 @@ Do NOT save duplicates - check the Project Memory section (loaded at startup) to
- Use Markdown formatting for all responses except tool calls.
- Whenever taking actions, use the pronoun 'I'
- After discovering code locations via search tools, call `remember` to save them.
- When showing example tool call JSON in prose or code blocks, use the fullwidth left curly bracket `` (U+FF5B) instead of `{` to prevent parser confusion.
";
pub const SYSTEM_PROMPT_FOR_NON_NATIVE_TOOL_USE: &'static str = SYSTEM_NON_NATIVE_TOOL_USE;