Add characterization tests for stream_completion_with_tools
Add 32 blackbox characterization tests to lock down the behavior of the
stream_completion_with_tools function (1067 lines) before refactoring.
Tests cover key behaviors through stable boundaries:
- StreamingToolParser: tool call detection, incomplete detection, text accumulation
- Auto-continue logic: autonomous mode decisions, priority ordering
- Duplicate detection: sequential duplicates, cross-message duplicates
- Context window: token tracking, compaction threshold, history preservation
- Tool execution: read_file, shell, write_file, todo tools through Agent
- Streaming utilities: LLM token cleaning, duration formatting, truncation
- Parser sanitization: inline tool pattern handling, homoglyph replacement
These tests intentionally do NOT assert:
- Internal parser state or implementation details
- Specific timing values
- UI output formatting
- Provider-specific behavior
Agent: hopper