Agent: fowler
Eliminate code-path aliasing and near-duplicates across recent commits:
1. Deduplicate find_json_object_end: Three near-identical copies in
streaming_parser.rs, context_window.rs, and acd.rs consolidated into
a single canonical implementation in utils.rs. All callers now route
through the canonical version. The utils.rs version uses the most
defensive variant (with found_start guard). (-84 lines)
2. Deduplicate provider constructors: AnthropicProvider::new() and
GeminiProvider::new() now delegate to their respective new_with_name()
methods instead of duplicating the full constructor body.
(OpenAI already delegated.) (-28 lines)
3. Inline convert_cache_control: Removed identity function that just
cloned CacheControl. Call sites now use .map(|cc| cc.clone())
directly. (-4 lines)
Net: -65 lines, 0 behavior changes, all 683 library tests pass.
read_image tool results placed images as top-level Image content blocks
alongside ToolResult blocks in user messages. The Anthropic API rejects
this combination, reporting orphaned tool_use IDs even though the
tool_result was present — the malformed message structure prevented
the API from recognizing it as a valid tool result.
Added ToolResultContent enum (Text | Blocks) with custom serde so that
when images are attached to a tool result, they are nested inside the
tool_result content array as structured blocks, matching the Anthropic
API's expected format for multi-modal tool results.
Regular tool results (no images) continue to use simple string content.
Regular user messages (not tool results) continue to use top-level
Image blocks.
4 new tests covering image nesting, string fallback, regular user
messages, and orphan detection with structured content.
After context compaction, the preserved last assistant message retained
its structured tool_calls field, but the corresponding tool_result was
summarized away. This created orphaned tool_use blocks that violated
the Anthropic API constraint: 'Each tool_use block must have a
corresponding tool_result block in the next message', causing 400 errors.
Primary fix: clear tool_calls from the preserved assistant message in
extract_preserved_messages(). The tool call was already executed and
its result is captured in the summary.
Defense-in-depth: added strip_orphaned_tool_use() post-processing in
Anthropic convert_messages() to detect and strip any orphaned tool_use
blocks before they reach the API.
Added 7 tests: 3 unit tests for compaction stripping, 3 unit tests for
Anthropic orphan detection, 1 integration test reproducing the exact
bug scenario from the h3 session.
The agent would stop mid-task because native tool calls were stored as
inline JSON text in Message.content. When sent back to the Anthropic API
via convert_messages(), they went as plain text instead of structured
tool_use/tool_result blocks. The model would occasionally get confused
and emit text describing what it wanted to do instead of invoking the
tool mechanism.
Changes:
- Add MessageToolCall struct and tool_calls/tool_result_id fields to Message
- Add id field to core ToolCall struct to preserve provider tool call IDs
- Update Anthropic convert_messages() to emit tool_use and tool_result blocks
- Add ToolResult variant to AnthropicContent enum
- Store tool calls structurally in tool message construction (not inline JSON)
- Fix add_message() to preserve empty-content messages with tool_calls
- Fix check_duplicate_in_previous_message() to check structured tool_calls
- Generate valid IDs for JSON fallback tool calls (Anthropic pattern requirement)
- Update planner create_tool_message() to use structured tool calls
- Extend Usage struct with cache_creation_tokens and cache_read_tokens fields
- Parse Anthropic cache_creation_input_tokens and cache_read_input_tokens
- Parse OpenAI prompt_tokens_details.cached_tokens for automatic prefix caching
- Add CacheStats struct to Agent for cumulative tracking across API calls
- Add "Prompt Cache Statistics" section to /stats output showing:
- API call count and cache hit count
- Hit rate percentage
- Total input tokens and cache read/creation tokens
- Cache efficiency (% of input served from cache)
- Update all provider implementations and test files
- Add ToolParsingHint enum (Detected/Active/Complete) for UI feedback
- New UiWriter methods: print_tool_streaming_hint(), print_tool_streaming_active()
- Refactor ConsoleUiWriter state to use atomics in ParsingHintState
- Add tool_call_streaming field to CompletionChunk for provider hints
- Anthropic provider sends streaming hints when tool name detected
- New streaming helpers: make_tool_streaming_hint(), make_tool_streaming_active()
Parser improvements:
- Add is_json_invalidated() to detect false positive tool patterns
- Fix tool result poisoning when file contents contain partial JSON
- Unescaped newlines in strings or prose after JSON invalidates detection
User sees ' ● tool_name |' immediately when tool call starts streaming,
with blinking indicator while args are received.
The buffer truncation code was slicing at a raw byte offset which could
land in the middle of a multi-byte character (like emojis), causing a
panic. Fixed by using char_indices() to find valid character boundaries.
Also added stop_reason field to CompletionChunk initializers in tests
to complete the stop_reason feature addition.
- Fix byte boundary panic in filter_json.rs line 327
- Add test for multi-byte character handling
- Update test files with missing stop_reason field
- Remove unused assignment to final_output_called (returns immediately after)
- Mark cache_config field as #[allow(dead_code)] (reserved for future use)
- Mark print_status_line method as #[allow(dead_code)] (reserved for future use)
Converted ~77 info! macro calls to debug! across the codebase to prevent
log messages from interrupting the CLI experience during normal operation.
Users can still see these logs by setting RUST_LOG=debug if needed.
Affected crates:
- g3-cli
- g3-computer-control
- g3-console
- g3-core
- g3-ensembles
- g3-execution
- g3-providers
Writes the current context window to logs/current_context_window (uses a symlink to a session ID).
This PR was unfortunately generated by a different LLM and did a ton of superficial reformating, it's actually a fairly small and benign change, but I don't want to roll back everything. Hope that's ok.
This tries to short-circuit multiple round-trips to llm for reading code.
It's a precursor to trying to context engineer tailored to specific tasks.
In initial experiments, it's only marginally faster than regular mode, and burns more tokens.