Add 1% safety buffer to context window to prevent API token limit errors

Our token estimation heuristic (chars/3 * 1.1 for code, chars/4 * 1.1 for text)
slightly undercounts over long sessions with hundreds of tool calls. This
accumulated drift of ~89 tokens caused Anthropic API 400 errors:
  'prompt is too long: 200089 tokens > 200000 maximum'

Fix: ContextWindow::new() now applies a 1% buffer, setting total_tokens to 99%
of the provider-reported limit. For a 200k window this gives 198k, providing a
2000-token safety margin that absorbs estimation drift.

All percentage calculations, compaction thresholds, and thinning triggers
operate against the buffered limit, so compaction fires earlier and we never
send a request the API will reject.
This commit is contained in:
Dhanji R. Prasanna
2026-02-13 15:46:53 +11:00
parent a7e0b0ef9e
commit 0410efd41b
5 changed files with 203 additions and 11 deletions

View File

@@ -94,7 +94,8 @@ fn test_percentage_based_on_used_tokens() {
// Initially 0%
assert_eq!(window.percentage_used(), 0.0);
assert_eq!(window.remaining_tokens(), 1000);
// After 1% buffer: total_tokens = 990
assert_eq!(window.remaining_tokens(), 990);
// Add messages to increase used_tokens
// A message with ~100 chars should be roughly 25-30 tokens
@@ -107,7 +108,7 @@ fn test_percentage_based_on_used_tokens() {
assert!(percentage < 100.0, "percentage should be < 100");
// remaining_tokens should decrease
assert!(window.remaining_tokens() < 1000, "remaining tokens should decrease");
assert!(window.remaining_tokens() < 990, "remaining tokens should decrease");
}
/// Test that the 80% compaction threshold works correctly.