alex/g3 - g3 - Millerson GIT hosting

alex/g3

Author	SHA1	Message	Date
Dhanji R. Prasanna	1ad74baaa5	Readability refactor: extract mega-functions into focused helpers Agent: carmack 4 files refactored, net -250 lines, all tests passing (417 + 71). datalog.rs: - Extract 7 predicate evaluation helpers from evaluate_predicate_datalog() (~200-line match → 12-line dispatch table) - Extract rule_body_for_predicate() from format_datalog_program() (~75-line match → 2-line call) invariants.rs: - Extract 7 per-rule helpers from evaluate_predicate() (~230-line match → 12-line dispatch table) envelope.rs: - Simplify summary construction in verify_envelope() - Eliminate redundant clone in stamp_envelope() anthropic.rs: - Introduce StreamState struct with 6 handler methods - parse_streaming_response: ~290 lines → ~90 lines - Max nesting depth reduced from 8 to 4 levels	2026-02-13 16:21:38 +11:00
Dhanji R. Prasanna	a7e0b0ef9e	Refactor: deduplicate JSON parsing, provider constructors, and identity function Agent: fowler Eliminate code-path aliasing and near-duplicates across recent commits: 1. Deduplicate find_json_object_end: Three near-identical copies in streaming_parser.rs, context_window.rs, and acd.rs consolidated into a single canonical implementation in utils.rs. All callers now route through the canonical version. The utils.rs version uses the most defensive variant (with found_start guard). (-84 lines) 2. Deduplicate provider constructors: AnthropicProvider::new() and GeminiProvider::new() now delegate to their respective new_with_name() methods instead of duplicating the full constructor body. (OpenAI already delegated.) (-28 lines) 3. Inline convert_cache_control: Removed identity function that just cloned CacheControl. Call sites now use .map(\|cc\| cc.clone()) directly. (-4 lines) Net: -65 lines, 0 behavior changes, all 683 library tests pass.	2026-02-13 12:37:09 +11:00
Dhanji R. Prasanna	fcb839e5fd	fix: nest images inside tool_result content for Anthropic API compliance read_image tool results placed images as top-level Image content blocks alongside ToolResult blocks in user messages. The Anthropic API rejects this combination, reporting orphaned tool_use IDs even though the tool_result was present — the malformed message structure prevented the API from recognizing it as a valid tool result. Added ToolResultContent enum (Text \| Blocks) with custom serde so that when images are attached to a tool result, they are nested inside the tool_result content array as structured blocks, matching the Anthropic API's expected format for multi-modal tool results. Regular tool results (no images) continue to use simple string content. Regular user messages (not tool results) continue to use top-level Image blocks. 4 new tests covering image nesting, string fallback, regular user messages, and orphan detection with structured content.	2026-02-13 10:50:52 +11:00
Dhanji R. Prasanna	d61be719c2	fix: strip orphaned tool_calls from preserved assistant message during compaction After context compaction, the preserved last assistant message retained its structured tool_calls field, but the corresponding tool_result was summarized away. This created orphaned tool_use blocks that violated the Anthropic API constraint: 'Each tool_use block must have a corresponding tool_result block in the next message', causing 400 errors. Primary fix: clear tool_calls from the preserved assistant message in extract_preserved_messages(). The tool call was already executed and its result is captured in the summary. Defense-in-depth: added strip_orphaned_tool_use() post-processing in Anthropic convert_messages() to detect and strip any orphaned tool_use blocks before they reach the API. Added 7 tests: 3 unit tests for compaction stripping, 3 unit tests for Anthropic orphan detection, 1 integration test reproducing the exact bug scenario from the h3 session.	2026-02-11 15:22:03 +11:00
Dhanji R. Prasanna	d3f0112f46	fix: store tool calls structurally for proper API roundtripping The agent would stop mid-task because native tool calls were stored as inline JSON text in Message.content. When sent back to the Anthropic API via convert_messages(), they went as plain text instead of structured tool_use/tool_result blocks. The model would occasionally get confused and emit text describing what it wanted to do instead of invoking the tool mechanism. Changes: - Add MessageToolCall struct and tool_calls/tool_result_id fields to Message - Add id field to core ToolCall struct to preserve provider tool call IDs - Update Anthropic convert_messages() to emit tool_use and tool_result blocks - Add ToolResult variant to AnthropicContent enum - Store tool calls structurally in tool message construction (not inline JSON) - Fix add_message() to preserve empty-content messages with tool_calls - Fix check_duplicate_in_previous_message() to check structured tool_calls - Generate valid IDs for JSON fallback tool calls (Anthropic pattern requirement) - Update planner create_tool_message() to use structured tool calls	2026-02-11 08:48:07 +11:00
Dhanji R. Prasanna	2a4cd1f4d6	fix: strip duplicate tool call JSON from assistant messages when LLM stutters When the LLM emits identical JSON tool calls as text content (JSON fallback mode), the raw duplicate JSON was being stored in the assistant message in conversation history. This confused the model on subsequent turns, causing it to stall or repeat itself. Root cause: raw_content_for_log used get_text_content() which returns the full parser buffer including all duplicate tool call JSONs. Fix: Added get_text_before_tool_calls() to StreamingToolParser that returns only the text before the first JSON tool call. Changed raw_content_for_log to use this method so the assistant message only contains the preamble text + the single executed tool call. Added 5 integration tests covering stuttered duplicates, triple stutter, cross-turn dedup, and different-args boundary case. Added MockResponse helpers for simulating LLM stutter patterns.	2026-02-10 19:53:11 +11:00
Dhanji R. Prasanna	a93ce932a3	refactor: Clean up Cargo dependencies - remove unused, update outdated - Remove unused const_format from g3-planner (never imported) - Remove unused thiserror from workspace and 5 crates (declared but never used) - Update termimad 0.31 -> 0.34 in studio (consistency with g3-cli) - Update indicatif 0.17 -> 0.18 in g3-cli - Update ratatui 0.29 -> 0.30 in g3-cli - Update walkdir 2.4 -> 2.5 in g3-core - Update image 0.24 -> 0.25 in g3-computer-control (macOS + Linux) - Update config 0.14 -> 0.15 in workspace Blocked: reqwest 0.11 -> 0.12/0.13 requires breaking API changes to bytes_stream() used in 4 providers - needs separate migration effort. All tests pass. No behavior changes. Agent: fowler	2026-02-06 14:22:59 +11:00
Dhanji R. Prasanna	7fc9eb0778	Fix doc-test failure in GLM adapter Use quadruple backticks for outer code fence to properly escape the nested code fence example showing JSON format.	2026-01-30 14:53:04 +11:00
Dhanji R. Prasanna	3003bdebaa	refactor: fix flaky test and remove dead code in recent commits Fixes issues in the last 11 commits: 1. pending_research.rs: Fix flaky test_generate_id_uniqueness - Replaced random u16 suffix with atomic counter for guaranteed uniqueness - The timestamp+random approach could collide when generating IDs rapidly - Now uses static AtomicU32 counter that increments monotonically 2. embedded/adapters/glm.rs: Remove unused in_code_fence field - Field was written but never read (dead code) - Removed from struct definition, constructor, and reset() 3. embedded/adapters/glm.rs: Fix orphaned tests - Two tests (test_strip_code_fences, test_code_fenced_tool_call) were outside the #[cfg(test)] mod tests block - Moved closing brace to include them in the test module All 446 library tests pass. Agent: fowler	2026-01-30 14:28:43 +11:00
Dhanji R. Prasanna	8191a5e8e6	feat(embedded): add GLM tool format adapter for code fence stripping GLM-4 models wrap tool calls in markdown code fences and inline backticks, which prevents the streaming parser from detecting them. This adapter: - Strips ```json and ``` code fence markers during streaming - Strips inline backticks from tool call JSON - Handles chunked streaming correctly (buffers potential fence lines) - Transforms GLM native format (<\|assistant\|>tool_name) to g3 JSON format Also refactors embedded provider into module structure: - embedded/mod.rs - module exports - embedded/provider.rs - main EmbeddedProvider (moved from embedded.rs) - embedded/adapters/mod.rs - ToolFormatAdapter trait - embedded/adapters/glm.rs - GLM-specific adapter Includes 22 unit tests covering edge cases like nested JSON in strings, chunk boundary handling, and false pattern detection. Updates README to show GLM-4 9B now works (⭐⭐) for agentic tasks.	2026-01-29 12:52:09 +11:00
Dhanji R. Prasanna	d4941dc95a	refactor(providers): improve readability of embedded.rs and gemini.rs embedded.rs (937→789 lines, -16%): - Extract duplicated inference setup into prepare_context() helper - Extract stop sequence handling into find_stop_sequence() and truncate_at_stop_sequence() - Add InferenceParams struct to consolidate request parameter extraction - Add clear section markers for code organization - Tests now use module-level format functions directly (no duplication) gemini.rs: - Extract common request building into build_request() method - Reduces duplication between complete() and stream() methods All 399 unit tests pass. Behavior unchanged. Agent: carmack	2026-01-29 11:39:46 +11:00
Dhanji R. Prasanna	21f8d5a1aa	Add integration tests for CacheStats and Gemini serialization Agent: hopper Added two new integration test files: 1. cache_stats_integration_test.rs (g3-core) - Tests CacheStats accumulation through streaming completion flow - Verifies cache hit detection (cache_read_tokens > 0) - Tests multi-request accumulation of cache statistics - Verifies cache efficiency and hit rate calculations - Uses MockProvider to simulate provider usage data 2. gemini_serialization_test.rs (g3-providers) - Tests Gemini API message format conversion - Verifies system messages become system_instruction - Verifies assistant role maps to "model" (Gemini terminology) - Tests tool conversion to function_declarations format - Characterizes multi-system-message behavior (last wins) Both test files follow blackbox/integration testing principles: - Test observable behavior through stable surfaces - Do not assert internal implementation details - Include documentation of what is/is not asserted	2026-01-29 11:28:52 +11:00
Dhanji R. Prasanna	f6717b4435	Add Gemini 3 model context window detection	2026-01-29 10:20:56 +11:00
Dhanji R. Prasanna	735e9c9312	Add Google Gemini provider support - Add GeminiProvider with streaming and native tool calling - Support gemini-2.5-pro, gemini-2.0-flash, gemini-1.5-pro/flash models - Model-specific context window detection (1M-2M tokens) - Message conversion: assistant -> model role mapping - System messages extracted to system_instruction field - Tool schema conversion with functionCall/functionResponse parts - SSE streaming with JSON array buffer parsing - 8 unit tests for conversion and parsing logic - Register provider in g3-core and validate in g3-cli	2026-01-29 10:11:42 +11:00
Dhanji R. Prasanna	58fe74334d	Auto-detect context window size from GGUF for embedded providers - Add context_window_size() method to LLMProvider trait - Implement for EmbeddedProvider to return the auto-detected context length - Update Agent to query provider directly instead of using hardcoded defaults - Removes need for model-specific context length mappings	2026-01-28 11:16:14 +11:00
Dhanji R. Prasanna	e32c302023	Fix embedded provider initialization and logging - Use global OnceLock for llama.cpp backend to prevent BackendAlreadyInitialized error - Suppress verbose llama.cpp stderr logging during model loading - Fix provider validation to accept "embedded.name" format (extract type before dot)	2026-01-28 10:33:10 +11:00
Dhanji R. Prasanna	ba6e1f9896	Remove unused code to eliminate build warnings - Remove unused SYSTEM_PROMPT_FOR_NATIVE_TOOL_USE and SYSTEM_PROMPT_FOR_NON_NATIVE_TOOL_USE constants - Remove unused gpu_layers field from EmbeddedProvider struct - Remove unused clean_stop_sequences method from EmbeddedProvider	2026-01-28 10:01:44 +11:00
Dhanji R. Prasanna	a902be1562	Refactor system prompts to eliminate duplication; upgrade embedded provider - Refactor prompts.rs: extract shared sections (intro, TODO, workspace memory, web research, response guidelines) used by both native and non-native prompts - Fix typo in native prompt: "save them.." -> "save them." - Fix non-native prompt: add missing closing braces in JSON examples, add IMPORTANT steps section, align with native prompt quality - Add 9 unit tests to verify both prompts contain required sections - Upgrade llama-cpp-2 dependency and refactor embedded provider - Update config.example.toml with embedded model examples - Update workspace memory	2026-01-28 09:56:39 +11:00
Dhanji R. Prasanna	5b4079e861	Add prompt cache statistics tracking to /stats command - Extend Usage struct with cache_creation_tokens and cache_read_tokens fields - Parse Anthropic cache_creation_input_tokens and cache_read_input_tokens - Parse OpenAI prompt_tokens_details.cached_tokens for automatic prefix caching - Add CacheStats struct to Agent for cumulative tracking across API calls - Add "Prompt Cache Statistics" section to /stats output showing: - API call count and cache hit count - Hit rate percentage - Total input tokens and cache read/creation tokens - Cache efficiency (% of input served from cache) - Update all provider implementations and test files	2026-01-27 11:32:45 +11:00
Dhanji R. Prasanna	2043a83e7d	Add comprehensive MockProvider integration tests Added 6 new integration tests for stream_completion_with_tools: - test_text_before_tool_call_preserved: text before native tool call is saved - test_native_tool_call_execution: native tool calls execute correctly - test_duplicate_tool_calls_skipped: sequential duplicates are detected - test_json_fallback_tool_calling: JSON tool calls work without native support - test_text_after_tool_execution_preserved: follow-up text is saved - test_multiple_tool_calls_executed: multiple tool calls in sequence work Also added MockResponse helper methods: - text_then_native_tool(): text followed by native tool call - duplicate_native_tool_calls(): same tool call twice (for dedup testing) Fixed text_with_json_tool() to ensure "tool" key comes before "args" (serde_json alphabetizes keys, breaking pattern detection). Total: 18 integration tests covering historical bugs and core behaviors.	2026-01-19 14:44:30 +05:30
Dhanji R. Prasanna	292a3aa48d	Add MockProvider for integration testing Adds a configurable mock LLM provider that can simulate various behaviors: - Text-only responses (single or multi-chunk streaming) - Native tool calls - JSON tool calls in text - Truncated responses (max_tokens) - Multi-turn conversations Features: - Builder pattern for easy test setup - Request tracking for verification - Preset scenarios for common patterns - Full LLMProvider trait implementation Also adds integration tests that use MockProvider to test the stream_completion_with_tools code path, including: - test_butler_bug_scenario: reproduces the exact bug where text-only responses were not saved to context, causing consecutive user messages This enables testing complex streaming behaviors without real API calls.	2026-01-19 13:59:31 +05:30
Dhanji R. Prasanna	01cb4f6691	fix: use consistent max_tokens defaults across providers - Fix aliasing issue where resolve_max_tokens() used fallback_default_max_tokens (8192) instead of provider-specific defaults - Update fallback_default_max_tokens from 8192 to 32000 - Set provider-specific max_tokens defaults: - Anthropic: 32000 - OpenAI: 32000 (was 16000) - Databricks: 32000 (was 50000, now matches Anthropic as passthru) - Embedded: 2048 - Context window lengths unchanged: - OpenAI: 400,000 - Anthropic: 200,000 - Databricks (Claude): 200,000 This fixes the 'LLM response was cut off due to max_tokens limit' error in agent mode that occurred because 8192 was being used instead of 32000.	2026-01-16 07:05:57 +05:30
Dhanji R. Prasanna	0ae1a13cdb	feat: real-time tool call streaming indicator with blinking UI - Add ToolParsingHint enum (Detected/Active/Complete) for UI feedback - New UiWriter methods: print_tool_streaming_hint(), print_tool_streaming_active() - Refactor ConsoleUiWriter state to use atomics in ParsingHintState - Add tool_call_streaming field to CompletionChunk for provider hints - Anthropic provider sends streaming hints when tool name detected - New streaming helpers: make_tool_streaming_hint(), make_tool_streaming_active() Parser improvements: - Add is_json_invalidated() to detect false positive tool patterns - Fix tool result poisoning when file contents contain partial JSON - Unescaped newlines in strings or prose after JSON invalidates detection User sees ' ● tool_name \|' immediately when tool call starts streaming, with blinking indicator while args are received.	2026-01-15 13:49:29 +05:30
Dhanji R. Prasanna	f4562cd4c9	config: default agent settings and provider override	2026-01-14 20:14:33 +05:30
Dhanji R. Prasanna	f415dbb84b	Fix ACD turn summary loss and add /dump command ACD (Aggressive Context Dehydration) fixes: - Fixed dehydrate_context() to extract turn summary from context window instead of using the passed-in final_response (which contained only the timing footer, not the actual LLM response) - Removed final_response parameter from dehydrate_context() since it now self-extracts the last assistant message as the summary - This ensures the actual turn summary is preserved after dehydration, not just the timing footer New /dump command: - Added /dump command to dump entire context window to tmp/ for debugging - Shows message index, role, kind, content length, and full content - Available in both console and machine modes UTF-8 safety: - Fixed truncate_to_word_boundary() to use character indices instead of byte indices, preventing panics on multi-byte UTF-8 characters - Added UTF-8 string slicing guidance to AGENTS.md Agent: g3	2026-01-12 05:13:02 +05:30
Dhanji R. Prasanna	e301075666	Fix panic on multi-byte chars in filter_json buffer truncation The buffer truncation code was slicing at a raw byte offset which could land in the middle of a multi-byte character (like emojis), causing a panic. Fixed by using char_indices() to find valid character boundaries. Also added stop_reason field to CompletionChunk initializers in tests to complete the stop_reason feature addition. - Fix byte boundary panic in filter_json.rs line 327 - Add test for multi-byte character handling - Update test files with missing stop_reason field	2026-01-09 15:20:57 +11:00
Dhanji R. Prasanna	3776ed847e	refactor: use shared streaming helpers in openai and embedded providers Agent: carmack openai.rs: - Use make_text_chunk() for streaming text content - Use make_final_chunk() for final completion chunk - Simplify tool_calls conversion logic embedded.rs: - Use make_text_chunk() for all 4 streaming text chunks - Use make_final_chunk() for final completion chunk - Remove unused CompletionChunk import Net reduction: 35 lines removed All tests pass. Behavior unchanged.	2026-01-07 13:01:03 +11:00
Dhanji R. Prasanna	2bf475960c	refactor: extract shared streaming utilities module Agent: carmack Create crates/g3-providers/src/streaming.rs with shared helpers: - decode_utf8_streaming(): Handle incomplete UTF-8 sequences in SSE streams - is_incomplete_json_error(): Detect incomplete vs malformed JSON - make_final_chunk(): Create finished completion chunks - make_text_chunk(): Create text content chunks - make_tool_chunk(): Create tool call chunks Refactor anthropic.rs: - Use shared decode_utf8_streaming (removes 15 lines of inline UTF-8 handling) - Use make_final_chunk, make_text_chunk, make_tool_chunk helpers - Reduces verbose CompletionChunk constructions throughout Refactor databricks.rs: - Remove local copies of streaming helpers (now uses shared module) - Reduces duplication between providers Net reduction: 118 lines removed, 16 lines added (including new module) All tests pass. Behavior unchanged.	2026-01-07 12:48:07 +11:00
Dhanji R. Prasanna	bb63050779	refactor: improve readability of streaming and file ops code Agent: carmack databricks.rs: - Extract ToolCallAccumulator struct to replace opaque (String, String, String) tuple - Add decode_utf8_streaming() helper for cleaner UTF-8 handling - Add is_incomplete_json_error() helper for JSON parse error detection - Add make_final_chunk() helper to reduce duplication - Add finalize_tool_calls() to convert accumulators to final format - Refactor parse_streaming_response from ~270 lines to ~100 lines - Reduce nesting depth from 8+ levels to 4 levels - Use early returns and let-else for cleaner control flow file_ops.rs: - Replace repetitive if-let chains with declarative PATH_CONTENT_KEYS table - Use match expression instead of nested if-else - Reduce extract_path_and_content from 44 lines to 20 lines All tests pass. Behavior unchanged.	2026-01-07 12:39:05 +11:00
Dhanji R. Prasanna	5d20da2609	Add 54 integration tests for CLI, tools, and message serialization New test files: - crates/g3-cli/tests/cli_integration_test.rs (14 tests) Blackbox CLI tests: help/version flags, argument validation, conflicting modes, flock mode requirements - crates/g3-core/tests/tool_execution_test.rs (20 tests) Tool call structure tests and unified diff application: read_file, write_file, str_replace, shell, background_process, todo, final_output, code_search, take_screenshot - crates/g3-providers/tests/message_serialization_test.rs (20 tests) Round-trip serialization tests for Message, MessageRole, CacheControl, and Tool types. Covers Unicode, special chars, and edge cases. All tests follow blackbox/integration-first principles with documentation of what they protect and intentionally do not assert.	2026-01-07 09:23:34 +11:00
Dhanji R. Prasanna	3601cc0547	Enhance read_image tool with magic byte detection and multi-image support - Fix media type detection using magic bytes instead of file extension - Correctly identifies JPEG files with .png extension (and vice versa) - Supports PNG, JPEG, GIF, and WebP formats - Add multi-image support with file_paths array parameter - Load multiple images in a single tool call - All images queued for LLM analysis - Enhanced CLI output: - Inline image preview via iTerm2 imgcat protocol (height=5) - Dimmed info line showing: path \| dimensions \| media type \| file size - Proper │ prefix alignment with tool output boxing - Human-readable file sizes (bytes, KB, MB) - Add image dimension extraction from file headers - PNG, JPEG, GIF, WebP dimension parsing - Add comprehensive tests for magic byte detection and dimensions	2025-12-26 11:19:37 +11:00
Dhanji R. Prasanna	3ece02ff31	fix: resolve compiler warnings across crates - Remove unused assignment to final_output_called (returns immediately after) - Mark cache_config field as #[allow(dead_code)] (reserved for future use) - Mark print_status_line method as #[allow(dead_code)] (reserved for future use)	2025-12-25 18:47:22 +11:00
Dhanji R. Prasanna	923def0ab2	Convert all INFO logs to DEBUG to reduce CLI noise Converted ~77 info! macro calls to debug! across the codebase to prevent log messages from interrupting the CLI experience during normal operation. Users can still see these logs by setting RUST_LOG=debug if needed. Affected crates: - g3-cli - g3-computer-control - g3-console - g3-core - g3-ensembles - g3-execution - g3-providers	2025-12-22 16:27:35 +11:00
Dhanji R. Prasanna	b4f6da6bf2	duplicate tool call bugfix	2025-12-19 15:24:03 +11:00
Jochen	ff8b3e7c7b	Implement planning mode	2025-12-09 17:03:53 +11:00
Jochen	4aa84e2144	disable thinking if there is no token budget	2025-12-09 16:45:28 +11:00
Jochen	fb2cf6f898	fix for thinking budget and hardcoded max token on summary	2025-12-09 12:41:52 +11:00
Jochen	ae16243f49	Fix temperature param + add thinking for anthropic The temperature param was not passed to the llm. Now support anthropic models in 'thinking' mode.	2025-12-02 17:24:55 +11:00
Dhanji R. Prasanna	8928fb92be	append instead of replace system msg	2025-11-29 16:13:00 +11:00
Jochen	52f78653b4	add context window monitor Writes the current context window to logs/current_context_window (uses a symlink to a session ID). This PR was unfortunately generated by a different LLM and did a ton of superficial reformating, it's actually a fairly small and benign change, but I don't want to roll back everything. Hope that's ok.	2025-11-27 21:00:02 +11:00
Jochen	93dc4acf86	generate internal id (debugging only) NOT set to provider... Anthropic will reject a message with id	2025-11-27 18:30:42 +11:00
Jochen	ad198a8501	add code exploration fast start This tries to short-circuit multiple round-trips to llm for reading code. It's a precursor to trying to context engineer tailored to specific tasks. In initial experiments, it's only marginally faster than regular mode, and burns more tokens.	2025-11-25 22:51:32 +11:00
Jochen	9bffd8b1bf	cache_control removed from databricks	2025-11-19 12:15:49 +11:00
Jochen	bfee8040e9	regression tests added	2025-11-19 11:32:14 +11:00
Jochen	a150ba6a55	adds ttl to cache control	2025-11-18 23:23:49 +11:00
Jochen	296bf5a449	adds cache_control	2025-11-18 22:38:52 +11:00
Michael Neale	81cd956c20	allow openai to be used to name named compatible providers	2025-11-10 16:12:33 +11:00
Dhanji R. Prasanna	cef234d91a	more color	2025-11-06 13:51:58 +11:00
Dhanji R. Prasanna	d78732df14	colors	2025-11-06 13:41:06 +11:00
Dhanji R. Prasanna	d007e8f471	improve code_search nudge and increase anthropic tmieout	2025-11-05 15:05:29 +11:00

1 2

70 Commits