Commit Graph

8 Commits

Author SHA1 Message Date
Dhanji R. Prasanna
1b26de6cd2 Fix context window progress bar showing wrong token counts
Calibrate used_tokens from API prompt_tokens (ground truth) to fix
progress bar drift in interactive mode. Three issues fixed:

1. update_usage_from_response() only updated cumulative_tokens, never
   calibrated used_tokens. Now snaps used_tokens to prompt_tokens when
   available (falls back to heuristic when prompt_tokens is 0).

2. Moved calibration call inline during streaming (when usage chunk
   arrives) instead of after the loop. Text-only responses — the most
   common case in interactive mode — take an early return path that
   bypassed the post-loop usage update entirely.

3. Removed mock Usage with hardcoded prompt_tokens=100 from
   execute_single_task() which corrupted calibration.
2026-03-18 15:31:20 +11:00
Dhanji R. Prasanna
0410efd41b Add 1% safety buffer to context window to prevent API token limit errors
Our token estimation heuristic (chars/3 * 1.1 for code, chars/4 * 1.1 for text)
slightly undercounts over long sessions with hundreds of tool calls. This
accumulated drift of ~89 tokens caused Anthropic API 400 errors:
  'prompt is too long: 200089 tokens > 200000 maximum'

Fix: ContextWindow::new() now applies a 1% buffer, setting total_tokens to 99%
of the provider-reported limit. For a 200k window this gives 198k, providing a
2000-token safety margin that absorbs estimation drift.

All percentage calculations, compaction thresholds, and thinning triggers
operate against the buffered limit, so compaction fires earlier and we never
send a request the API will reject.
2026-02-13 15:46:53 +11:00
Dhanji R. Prasanna
5b4079e861 Add prompt cache statistics tracking to /stats command
- Extend Usage struct with cache_creation_tokens and cache_read_tokens fields
- Parse Anthropic cache_creation_input_tokens and cache_read_input_tokens
- Parse OpenAI prompt_tokens_details.cached_tokens for automatic prefix caching
- Add CacheStats struct to Agent for cumulative tracking across API calls
- Add "Prompt Cache Statistics" section to /stats output showing:
  - API call count and cache hit count
  - Hit rate percentage
  - Total input tokens and cache read/creation tokens
  - Cache efficiency (% of input served from cache)
- Update all provider implementations and test files
2026-01-27 11:32:45 +11:00
Dhanji R. Prasanna
5bfaee8dd5 use consistent naming for compaction 2026-01-08 12:54:03 +11:00
Dhanji R. Prasanna
1b4ea93ba4 token counting bugfix 2025-12-01 14:52:10 +11:00
Jochen
52f78653b4 add context window monitor
Writes the current context window to logs/current_context_window (uses a symlink to a session ID).

This PR was unfortunately generated by a different LLM and did a ton of superficial reformating, it's actually a fairly small and benign change, but I don't want to roll back everything. Hope that's ok.
2025-11-27 21:00:02 +11:00
Dhanji Prasanna
4a819e8f27 context window counting bug 2025-10-10 14:40:10 +11:00
Dhanji Prasanna
260c949576 token counting fixes 2025-10-09 12:11:21 +11:00