- Fix aliasing issue where resolve_max_tokens() used fallback_default_max_tokens
(8192) instead of provider-specific defaults
- Update fallback_default_max_tokens from 8192 to 32000
- Set provider-specific max_tokens defaults:
- Anthropic: 32000
- OpenAI: 32000 (was 16000)
- Databricks: 32000 (was 50000, now matches Anthropic as passthru)
- Embedded: 2048
- Context window lengths unchanged:
- OpenAI: 400,000
- Anthropic: 200,000
- Databricks (Claude): 200,000
This fixes the 'LLM response was cut off due to max_tokens limit' error
in agent mode that occurred because 8192 was being used instead of 32000.
- Add ToolParsingHint enum (Detected/Active/Complete) for UI feedback
- New UiWriter methods: print_tool_streaming_hint(), print_tool_streaming_active()
- Refactor ConsoleUiWriter state to use atomics in ParsingHintState
- Add tool_call_streaming field to CompletionChunk for provider hints
- Anthropic provider sends streaming hints when tool name detected
- New streaming helpers: make_tool_streaming_hint(), make_tool_streaming_active()
Parser improvements:
- Add is_json_invalidated() to detect false positive tool patterns
- Fix tool result poisoning when file contents contain partial JSON
- Unescaped newlines in strings or prose after JSON invalidates detection
User sees ' ● tool_name |' immediately when tool call starts streaming,
with blinking indicator while args are received.
The buffer truncation code was slicing at a raw byte offset which could
land in the middle of a multi-byte character (like emojis), causing a
panic. Fixed by using char_indices() to find valid character boundaries.
Also added stop_reason field to CompletionChunk initializers in tests
to complete the stop_reason feature addition.
- Fix byte boundary panic in filter_json.rs line 327
- Add test for multi-byte character handling
- Update test files with missing stop_reason field
Agent: carmack
openai.rs:
- Use make_text_chunk() for streaming text content
- Use make_final_chunk() for final completion chunk
- Simplify tool_calls conversion logic
embedded.rs:
- Use make_text_chunk() for all 4 streaming text chunks
- Use make_final_chunk() for final completion chunk
- Remove unused CompletionChunk import
Net reduction: 35 lines removed
All tests pass. Behavior unchanged.
Writes the current context window to logs/current_context_window (uses a symlink to a session ID).
This PR was unfortunately generated by a different LLM and did a ton of superficial reformating, it's actually a fairly small and benign change, but I don't want to roll back everything. Hope that's ok.
This tries to short-circuit multiple round-trips to llm for reading code.
It's a precursor to trying to context engineer tailored to specific tasks.
In initial experiments, it's only marginally faster than regular mode, and burns more tokens.