Files
g3/test_anthropic_fix.md
Jochen af20c93c61 respect context length for anthropic
use the context length as per the config, rather than just hard-coded values.
2025-11-06 15:07:46 +11:00

2.6 KiB

Anthropic max_tokens Error Fix - Test Plan

Changes Made

1. Fixed Context Window Size Detection

  • Problem: Code used hardcoded 200k limit for Anthropic instead of configured max_tokens
  • Fix: Modified determine_context_length() to check configured max_tokens first before falling back to defaults
  • Files: crates/g3-core/src/lib.rs lines 923-945, 967-985

2. Added Thinning Before Summarization

  • Problem: Code attempted summarization even when context window was nearly full
  • Fix: Added logic to try thinning first when context usage is between 80-90%
  • Files: crates/g3-core/src/lib.rs lines 2415-2439

3. Added Capacity Checks Before Summarization

  • Problem: No validation that sufficient tokens remained for summarization
  • Fix: Added capacity checks for all provider types with helpful error messages
  • Files: crates/g3-core/src/lib.rs lines 2480-2520

4. Improved Error Messages

  • Problem: Generic errors when summarization failed
  • Fix: Specific error messages suggesting /thinnify and /compact commands
  • Files: Multiple locations in summarization logic

5. Dynamic Buffer Calculation

  • Problem: Fixed 5k buffer regardless of model size
  • Fix: Proportional buffer (2.5% of model limit, min 1k, max 10k)
  • Files: crates/g3-core/src/lib.rs line 2487

Test Cases

Test 1: Configured max_tokens Respected

# In g3.toml
[providers.anthropic]
api_key = "your-key"
model = "claude-3-5-sonnet-20241022"
max_tokens = 50000  # Should use this instead of 200k default

Test 2: Thinning Before Summarization

  • Fill context to 85% capacity
  • Verify thinning is attempted before summarization
  • Check that summarization is skipped if thinning resolves the issue

Test 3: Capacity Error Handling

  • Fill context to 98% capacity
  • Verify helpful error message is shown instead of API error
  • Check that /thinnify and /compact commands are suggested

Test 4: Provider-Specific Handling

  • Test with different providers (anthropic, databricks, embedded)
  • Verify each uses appropriate capacity checks and buffers

Expected Behavior

  1. No more max_tokens API errors from Anthropic when context window is full
  2. Automatic thinning when approaching capacity (80-90%)
  3. Clear error messages with actionable suggestions when at capacity
  4. Respect configured limits instead of hardcoded defaults
  5. Graceful degradation with helpful user guidance

Manual Testing Commands

# Test with small max_tokens to trigger the issue quickly
g3 --chat
# Then paste large amounts of text to fill context window
# Verify thinning and error handling work correctly