Add 1% safety buffer to context window to prevent API token limit errors
Our token estimation heuristic (chars/3 * 1.1 for code, chars/4 * 1.1 for text) slightly undercounts over long sessions with hundreds of tool calls. This accumulated drift of ~89 tokens caused Anthropic API 400 errors: 'prompt is too long: 200089 tokens > 200000 maximum' Fix: ContextWindow::new() now applies a 1% buffer, setting total_tokens to 99% of the provider-reported limit. For a 200k window this gives 198k, providing a 2000-token safety margin that absorbs estimation drift. All percentage calculations, compaction thresholds, and thinning triggers operate against the buffered limit, so compaction fires earlier and we never send a request the API will reject.
This commit is contained in:
@@ -94,7 +94,8 @@ fn test_percentage_based_on_used_tokens() {
|
||||
|
||||
// Initially 0%
|
||||
assert_eq!(window.percentage_used(), 0.0);
|
||||
assert_eq!(window.remaining_tokens(), 1000);
|
||||
// After 1% buffer: total_tokens = 990
|
||||
assert_eq!(window.remaining_tokens(), 990);
|
||||
|
||||
// Add messages to increase used_tokens
|
||||
// A message with ~100 chars should be roughly 25-30 tokens
|
||||
@@ -107,7 +108,7 @@ fn test_percentage_based_on_used_tokens() {
|
||||
assert!(percentage < 100.0, "percentage should be < 100");
|
||||
|
||||
// remaining_tokens should decrease
|
||||
assert!(window.remaining_tokens() < 1000, "remaining tokens should decrease");
|
||||
assert!(window.remaining_tokens() < 990, "remaining tokens should decrease");
|
||||
}
|
||||
|
||||
/// Test that the 80% compaction threshold works correctly.
|
||||
|
||||
Reference in New Issue
Block a user