- Extend Usage struct with cache_creation_tokens and cache_read_tokens fields
- Parse Anthropic cache_creation_input_tokens and cache_read_input_tokens
- Parse OpenAI prompt_tokens_details.cached_tokens for automatic prefix caching
- Add CacheStats struct to Agent for cumulative tracking across API calls
- Add "Prompt Cache Statistics" section to /stats output showing:
- API call count and cache hit count
- Hit rate percentage
- Total input tokens and cache read/creation tokens
- Cache efficiency (% of input served from cache)
- Update all provider implementations and test files
Extract a new g3_status module in g3-cli that provides consistent formatting
for all 'g3:' prefixed system status messages.
Key changes:
- Add G3Status struct with methods for progress, done, failed, error, etc.
- Add Status enum with Done, Failed, Error, Resolved, Insufficient, NoChanges
- Add ThinResult struct in g3-core for semantic thinning data
- Update UiWriter trait with print_thin_result() method
- Refactor context thinning to return ThinResult instead of formatted strings
- Update all callers to use the new centralized formatting
- Session resume/decline messages now use G3Status
- Compaction status messages now use G3Status
This maintains clean separation of concerns: g3-core emits semantic data,
g3-cli handles all terminal formatting and colors.
Adds tests to verify that:
- All streaming chunks are processed before control returns to caller
- Both tool calls in a multi-tool-call stream are executed
- The finished signal properly terminates stream processing
Also adds Agent::new_for_test() to allow injecting mock providers.