Calibrate used_tokens from API prompt_tokens (ground truth) to fix
progress bar drift in interactive mode. Three issues fixed:
1. update_usage_from_response() only updated cumulative_tokens, never
calibrated used_tokens. Now snaps used_tokens to prompt_tokens when
available (falls back to heuristic when prompt_tokens is 0).
2. Moved calibration call inline during streaming (when usage chunk
arrives) instead of after the loop. Text-only responses — the most
common case in interactive mode — take an early return path that
bypassed the post-loop usage update entirely.
3. Removed mock Usage with hardcoded prompt_tokens=100 from
execute_single_task() which corrupted calibration.
Our token estimation heuristic (chars/3 * 1.1 for code, chars/4 * 1.1 for text)
slightly undercounts over long sessions with hundreds of tool calls. This
accumulated drift of ~89 tokens caused Anthropic API 400 errors:
'prompt is too long: 200089 tokens > 200000 maximum'
Fix: ContextWindow::new() now applies a 1% buffer, setting total_tokens to 99%
of the provider-reported limit. For a 200k window this gives 198k, providing a
2000-token safety margin that absorbs estimation drift.
All percentage calculations, compaction thresholds, and thinning triggers
operate against the buffered limit, so compaction fires earlier and we never
send a request the API will reject.
- Extend Usage struct with cache_creation_tokens and cache_read_tokens fields
- Parse Anthropic cache_creation_input_tokens and cache_read_input_tokens
- Parse OpenAI prompt_tokens_details.cached_tokens for automatic prefix caching
- Add CacheStats struct to Agent for cumulative tracking across API calls
- Add "Prompt Cache Statistics" section to /stats output showing:
- API call count and cache hit count
- Hit rate percentage
- Total input tokens and cache read/creation tokens
- Cache efficiency (% of input served from cache)
- Update all provider implementations and test files
Writes the current context window to logs/current_context_window (uses a symlink to a session ID).
This PR was unfortunately generated by a different LLM and did a ton of superficial reformating, it's actually a fairly small and benign change, but I don't want to roll back everything. Hope that's ok.