- Fix aliasing issue where resolve_max_tokens() used fallback_default_max_tokens
(8192) instead of provider-specific defaults
- Update fallback_default_max_tokens from 8192 to 32000
- Set provider-specific max_tokens defaults:
- Anthropic: 32000
- OpenAI: 32000 (was 16000)
- Databricks: 32000 (was 50000, now matches Anthropic as passthru)
- Embedded: 2048
- Context window lengths unchanged:
- OpenAI: 400,000
- Anthropic: 200,000
- Databricks (Claude): 200,000
This fixes the 'LLM response was cut off due to max_tokens limit' error
in agent mode that occurred because 8192 was being used instead of 32000.
This fixes a bug where the agent would stop responding abruptly without
calling final_output. The root cause was the allow_multiple_tool_calls
config option (default: false) which caused the agent to break out of
the streaming loop mid-stream after executing the first tool, losing
any subsequent content.
Changes:
- Remove allow_multiple_tool_calls config option entirely
- Always process all tool calls without breaking mid-stream
- Simplify system prompt generation (no longer needs boolean param)
- Let the stream complete fully before continuing to next iteration
- Change find_last_tool_call_start to find_first_tool_call_start
- Remove parser.reset() call on duplicate detection
Benefits:
- Simpler logic with less conditional branching
- No lost content after tool calls
- Consistent behavior for all users
- Reduced config complexity
Writes the current context window to logs/current_context_window (uses a symlink to a session ID).
This PR was unfortunately generated by a different LLM and did a ton of superficial reformating, it's actually a fairly small and benign change, but I don't want to roll back everything. Hope that's ok.