alex/g3

Files

Jochen af20c93c61 respect context length for anthropic

use the context length as per the config, rather than just hard-coded values.

2025-11-06 15:07:46 +11:00

2.6 KiB

Raw Blame History

Anthropic max_tokens Error Fix - Test Plan

Changes Made

1. Fixed Context Window Size Detection

Problem: Code used hardcoded 200k limit for Anthropic instead of configured max_tokens
Fix: Modified determine_context_length() to check configured max_tokens first before falling back to defaults
Files: crates/g3-core/src/lib.rs lines 923-945, 967-985

2. Added Thinning Before Summarization

Problem: Code attempted summarization even when context window was nearly full
Fix: Added logic to try thinning first when context usage is between 80-90%
Files: crates/g3-core/src/lib.rs lines 2415-2439

3. Added Capacity Checks Before Summarization

Problem: No validation that sufficient tokens remained for summarization
Fix: Added capacity checks for all provider types with helpful error messages
Files: crates/g3-core/src/lib.rs lines 2480-2520

4. Improved Error Messages

Problem: Generic errors when summarization failed
Fix: Specific error messages suggesting /thinnify and /compact commands
Files: Multiple locations in summarization logic

5. Dynamic Buffer Calculation

Problem: Fixed 5k buffer regardless of model size
Fix: Proportional buffer (2.5% of model limit, min 1k, max 10k)
Files: crates/g3-core/src/lib.rs line 2487

Test Cases

Test 1: Configured max_tokens Respected

# In g3.toml
[providers.anthropic]
api_key = "your-key"
model = "claude-3-5-sonnet-20241022"
max_tokens = 50000  # Should use this instead of 200k default

Test 2: Thinning Before Summarization

Fill context to 85% capacity
Verify thinning is attempted before summarization
Check that summarization is skipped if thinning resolves the issue

Test 3: Capacity Error Handling

Fill context to 98% capacity
Verify helpful error message is shown instead of API error
Check that /thinnify and /compact commands are suggested

Test 4: Provider-Specific Handling

Test with different providers (anthropic, databricks, embedded)
Verify each uses appropriate capacity checks and buffers

Expected Behavior

No more max_tokens API errors from Anthropic when context window is full
Automatic thinning when approaching capacity (80-90%)
Clear error messages with actionable suggestions when at capacity
Respect configured limits instead of hardcoded defaults
Graceful degradation with helpful user guidance

Manual Testing Commands

# Test with small max_tokens to trigger the issue quickly
g3 --chat
# Then paste large amounts of text to fill context window
# Verify thinning and error handling work correctly

2.6 KiB Raw Blame History

Anthropic max_tokens Error Fix - Test Plan

Changes Made

1. Fixed Context Window Size Detection

2. Added Thinning Before Summarization

3. Added Capacity Checks Before Summarization

4. Improved Error Messages

5. Dynamic Buffer Calculation

Test Cases

Test 1: Configured max_tokens Respected

Test 2: Thinning Before Summarization

Test 3: Capacity Error Handling

Test 4: Provider-Specific Handling

Expected Behavior

Manual Testing Commands

2.6 KiB

Raw Blame History