fix: store tool calls structurally for proper API roundtripping

The agent would stop mid-task because native tool calls were stored as inline JSON text in Message.content. When sent back to the Anthropic API via convert_messages(), they went as plain text instead of structured tool_use/tool_result blocks. The model would occasionally get confused and emit text describing what it wanted to do instead of invoking the tool mechanism. Changes: - Add MessageToolCall struct and tool_calls/tool_result_id fields to Message - Add id field to core ToolCall struct to preserve provider tool call IDs - Update Anthropic convert_messages() to emit tool_use and tool_result blocks - Add ToolResult variant to AnthropicContent enum - Store tool calls structurally in tool message construction (not inline JSON) - Fix add_message() to preserve empty-content messages with tool_calls - Fix check_duplicate_in_previous_message() to check structured tool_calls - Generate valid IDs for JSON fallback tool calls (Anthropic pattern requirement) - Update planner create_tool_message() to use structured tool calls
2026-02-11 08:48:07 +11:00
parent 2a4cd1f4d6
commit d3f0112f46
15 changed files with 355 additions and 53 deletions
--- a/analysis/memory.md
+++ b/analysis/memory.md
@@ -1,5 +1,5 @@
 # Workspace Memory
-> Updated: 2026-02-07T05:28:12Z | Size: 26.3k chars
+> Updated: 2026-02-10T21:08:38Z | Size: 27.8k chars

 ### Remember Tool Wiring
 - `crates/g3-core/src/tools/memory.rs` [0..5000] - `execute_remember()`, `get_memory_path()`, `merge_memory()`
@@ -429,4 +429,17 @@ Makes tool output responsive to terminal width - no line wrapping, with 4-char r
 - **Bug**: The `_ =>` catch-all in when condition evaluation did naive string `contains` check. For `Matches` (regex like `^Re: `), it checked if fact values literally contained the regex pattern string — which never matched. Result: when conditions with `matches` rule always evaluated as not-met → vacuous pass → violations slipped through.
 - **Fix**: Replaced hand-rolled when evaluation with synthetic `CompiledPredicate` delegation to `evaluate_predicate_datalog()`, which handles all 12 rule types correctly.
 - **Tests**: `test_execute_rules_when_matches_condition_met`, `test_execute_rules_when_matches_condition_met_but_predicate_fails`, `test_execute_rules_when_matches_condition_not_met`
- **Note**: The `invariants.rs` path was NOT affected — it already delegated to `evaluate_predicate()` which handles all rules.
+- **Note**: The `invariants.rs` path was NOT affected — it already delegated to `evaluate_predicate()` which handles all rules.
+
+### Structured Tool Call Messages (2026-02-11)
+- `crates/g3-providers/src/lib.rs` [102..106] - `MessageToolCall` struct (id, name, input)
+- `crates/g3-providers/src/lib.rs` [124..131] - `Message.tool_calls: Vec<MessageToolCall>`, `Message.tool_result_id: Option<String>`
+- `crates/g3-providers/src/anthropic.rs` [284..340] - `convert_messages()` emits `tool_use` blocks for assistant messages with `tool_calls`, `tool_result` blocks for user messages with `tool_result_id`
+- `crates/g3-providers/src/anthropic.rs` [935..941] - `AnthropicContent::ToolResult` variant added
+- `crates/g3-core/src/lib.rs` [82..88] - `ToolCall.id` field added (from native providers)
+- `crates/g3-core/src/lib.rs` [2530..2545] - Tool messages store structured `tool_calls` instead of inline JSON text
+- `crates/g3-core/src/lib.rs` [1385..1400] - `check_duplicate_in_previous_message()` checks structured `tool_calls` field
+- `crates/g3-core/src/context_window.rs` [107..109] - `add_message_with_tokens()` preserves messages with `tool_calls` even if content is empty
+- `crates/g3-core/src/streaming_parser.rs` [339] - `process_chunk()` preserves tool call `id` from provider
+
+**Bug fixed**: Agent would stop mid-task because native tool calls were stored as inline JSON text in `Message.content`. When sent back to Anthropic API via `convert_messages()`, they went as plain text instead of structured `tool_use`/`tool_result` blocks. The model would occasionally get confused and emit text describing what it wanted to do instead of invoking the tool mechanism.