2.6 KiB
2.6 KiB
Anthropic max_tokens Error Fix - Test Plan
Changes Made
1. Fixed Context Window Size Detection
- Problem: Code used hardcoded 200k limit for Anthropic instead of configured max_tokens
- Fix: Modified
determine_context_length()to check configured max_tokens first before falling back to defaults - Files:
crates/g3-core/src/lib.rslines 923-945, 967-985
2. Added Thinning Before Summarization
- Problem: Code attempted summarization even when context window was nearly full
- Fix: Added logic to try thinning first when context usage is between 80-90%
- Files:
crates/g3-core/src/lib.rslines 2415-2439
3. Added Capacity Checks Before Summarization
- Problem: No validation that sufficient tokens remained for summarization
- Fix: Added capacity checks for all provider types with helpful error messages
- Files:
crates/g3-core/src/lib.rslines 2480-2520
4. Improved Error Messages
- Problem: Generic errors when summarization failed
- Fix: Specific error messages suggesting
/thinnifyand/compactcommands - Files: Multiple locations in summarization logic
5. Dynamic Buffer Calculation
- Problem: Fixed 5k buffer regardless of model size
- Fix: Proportional buffer (2.5% of model limit, min 1k, max 10k)
- Files:
crates/g3-core/src/lib.rsline 2487
Test Cases
Test 1: Configured max_tokens Respected
# In g3.toml
[providers.anthropic]
api_key = "your-key"
model = "claude-3-5-sonnet-20241022"
max_tokens = 50000 # Should use this instead of 200k default
Test 2: Thinning Before Summarization
- Fill context to 85% capacity
- Verify thinning is attempted before summarization
- Check that summarization is skipped if thinning resolves the issue
Test 3: Capacity Error Handling
- Fill context to 98% capacity
- Verify helpful error message is shown instead of API error
- Check that
/thinnifyand/compactcommands are suggested
Test 4: Provider-Specific Handling
- Test with different providers (anthropic, databricks, embedded)
- Verify each uses appropriate capacity checks and buffers
Expected Behavior
- No more max_tokens API errors from Anthropic when context window is full
- Automatic thinning when approaching capacity (80-90%)
- Clear error messages with actionable suggestions when at capacity
- Respect configured limits instead of hardcoded defaults
- Graceful degradation with helpful user guidance
Manual Testing Commands
# Test with small max_tokens to trigger the issue quickly
g3 --chat
# Then paste large amounts of text to fill context window
# Verify thinning and error handling work correctly