alex/g3 - g3 - Millerson GIT hosting

alex/g3

Author	SHA1	Message	Date
Dhanji R. Prasanna	7fc9eb0778	Fix doc-test failure in GLM adapter Use quadruple backticks for outer code fence to properly escape the nested code fence example showing JSON format.	2026-01-30 14:53:04 +11:00
Dhanji R. Prasanna	3003bdebaa	refactor: fix flaky test and remove dead code in recent commits Fixes issues in the last 11 commits: 1. pending_research.rs: Fix flaky test_generate_id_uniqueness - Replaced random u16 suffix with atomic counter for guaranteed uniqueness - The timestamp+random approach could collide when generating IDs rapidly - Now uses static AtomicU32 counter that increments monotonically 2. embedded/adapters/glm.rs: Remove unused in_code_fence field - Field was written but never read (dead code) - Removed from struct definition, constructor, and reset() 3. embedded/adapters/glm.rs: Fix orphaned tests - Two tests (test_strip_code_fences, test_code_fenced_tool_call) were outside the #[cfg(test)] mod tests block - Moved closing brace to include them in the test module All 446 library tests pass. Agent: fowler	2026-01-30 14:28:43 +11:00
Dhanji R. Prasanna	8191a5e8e6	feat(embedded): add GLM tool format adapter for code fence stripping GLM-4 models wrap tool calls in markdown code fences and inline backticks, which prevents the streaming parser from detecting them. This adapter: - Strips ```json and ``` code fence markers during streaming - Strips inline backticks from tool call JSON - Handles chunked streaming correctly (buffers potential fence lines) - Transforms GLM native format (<\|assistant\|>tool_name) to g3 JSON format Also refactors embedded provider into module structure: - embedded/mod.rs - module exports - embedded/provider.rs - main EmbeddedProvider (moved from embedded.rs) - embedded/adapters/mod.rs - ToolFormatAdapter trait - embedded/adapters/glm.rs - GLM-specific adapter Includes 22 unit tests covering edge cases like nested JSON in strings, chunk boundary handling, and false pattern detection. Updates README to show GLM-4 9B now works (⭐⭐) for agentic tasks.	2026-01-29 12:52:09 +11:00
Dhanji R. Prasanna	d4941dc95a	refactor(providers): improve readability of embedded.rs and gemini.rs embedded.rs (937→789 lines, -16%): - Extract duplicated inference setup into prepare_context() helper - Extract stop sequence handling into find_stop_sequence() and truncate_at_stop_sequence() - Add InferenceParams struct to consolidate request parameter extraction - Add clear section markers for code organization - Tests now use module-level format functions directly (no duplication) gemini.rs: - Extract common request building into build_request() method - Reduces duplication between complete() and stream() methods All 399 unit tests pass. Behavior unchanged. Agent: carmack	2026-01-29 11:39:46 +11:00
Dhanji R. Prasanna	f6717b4435	Add Gemini 3 model context window detection	2026-01-29 10:20:56 +11:00
Dhanji R. Prasanna	735e9c9312	Add Google Gemini provider support - Add GeminiProvider with streaming and native tool calling - Support gemini-2.5-pro, gemini-2.0-flash, gemini-1.5-pro/flash models - Model-specific context window detection (1M-2M tokens) - Message conversion: assistant -> model role mapping - System messages extracted to system_instruction field - Tool schema conversion with functionCall/functionResponse parts - SSE streaming with JSON array buffer parsing - 8 unit tests for conversion and parsing logic - Register provider in g3-core and validate in g3-cli	2026-01-29 10:11:42 +11:00
Dhanji R. Prasanna	58fe74334d	Auto-detect context window size from GGUF for embedded providers - Add context_window_size() method to LLMProvider trait - Implement for EmbeddedProvider to return the auto-detected context length - Update Agent to query provider directly instead of using hardcoded defaults - Removes need for model-specific context length mappings	2026-01-28 11:16:14 +11:00
Dhanji R. Prasanna	e32c302023	Fix embedded provider initialization and logging - Use global OnceLock for llama.cpp backend to prevent BackendAlreadyInitialized error - Suppress verbose llama.cpp stderr logging during model loading - Fix provider validation to accept "embedded.name" format (extract type before dot)	2026-01-28 10:33:10 +11:00
Dhanji R. Prasanna	ba6e1f9896	Remove unused code to eliminate build warnings - Remove unused SYSTEM_PROMPT_FOR_NATIVE_TOOL_USE and SYSTEM_PROMPT_FOR_NON_NATIVE_TOOL_USE constants - Remove unused gpu_layers field from EmbeddedProvider struct - Remove unused clean_stop_sequences method from EmbeddedProvider	2026-01-28 10:01:44 +11:00
Dhanji R. Prasanna	a902be1562	Refactor system prompts to eliminate duplication; upgrade embedded provider - Refactor prompts.rs: extract shared sections (intro, TODO, workspace memory, web research, response guidelines) used by both native and non-native prompts - Fix typo in native prompt: "save them.." -> "save them." - Fix non-native prompt: add missing closing braces in JSON examples, add IMPORTANT steps section, align with native prompt quality - Add 9 unit tests to verify both prompts contain required sections - Upgrade llama-cpp-2 dependency and refactor embedded provider - Update config.example.toml with embedded model examples - Update workspace memory	2026-01-28 09:56:39 +11:00
Dhanji R. Prasanna	5b4079e861	Add prompt cache statistics tracking to /stats command - Extend Usage struct with cache_creation_tokens and cache_read_tokens fields - Parse Anthropic cache_creation_input_tokens and cache_read_input_tokens - Parse OpenAI prompt_tokens_details.cached_tokens for automatic prefix caching - Add CacheStats struct to Agent for cumulative tracking across API calls - Add "Prompt Cache Statistics" section to /stats output showing: - API call count and cache hit count - Hit rate percentage - Total input tokens and cache read/creation tokens - Cache efficiency (% of input served from cache) - Update all provider implementations and test files	2026-01-27 11:32:45 +11:00
Dhanji R. Prasanna	2043a83e7d	Add comprehensive MockProvider integration tests Added 6 new integration tests for stream_completion_with_tools: - test_text_before_tool_call_preserved: text before native tool call is saved - test_native_tool_call_execution: native tool calls execute correctly - test_duplicate_tool_calls_skipped: sequential duplicates are detected - test_json_fallback_tool_calling: JSON tool calls work without native support - test_text_after_tool_execution_preserved: follow-up text is saved - test_multiple_tool_calls_executed: multiple tool calls in sequence work Also added MockResponse helper methods: - text_then_native_tool(): text followed by native tool call - duplicate_native_tool_calls(): same tool call twice (for dedup testing) Fixed text_with_json_tool() to ensure "tool" key comes before "args" (serde_json alphabetizes keys, breaking pattern detection). Total: 18 integration tests covering historical bugs and core behaviors.	2026-01-19 14:44:30 +05:30
Dhanji R. Prasanna	292a3aa48d	Add MockProvider for integration testing Adds a configurable mock LLM provider that can simulate various behaviors: - Text-only responses (single or multi-chunk streaming) - Native tool calls - JSON tool calls in text - Truncated responses (max_tokens) - Multi-turn conversations Features: - Builder pattern for easy test setup - Request tracking for verification - Preset scenarios for common patterns - Full LLMProvider trait implementation Also adds integration tests that use MockProvider to test the stream_completion_with_tools code path, including: - test_butler_bug_scenario: reproduces the exact bug where text-only responses were not saved to context, causing consecutive user messages This enables testing complex streaming behaviors without real API calls.	2026-01-19 13:59:31 +05:30
Dhanji R. Prasanna	01cb4f6691	fix: use consistent max_tokens defaults across providers - Fix aliasing issue where resolve_max_tokens() used fallback_default_max_tokens (8192) instead of provider-specific defaults - Update fallback_default_max_tokens from 8192 to 32000 - Set provider-specific max_tokens defaults: - Anthropic: 32000 - OpenAI: 32000 (was 16000) - Databricks: 32000 (was 50000, now matches Anthropic as passthru) - Embedded: 2048 - Context window lengths unchanged: - OpenAI: 400,000 - Anthropic: 200,000 - Databricks (Claude): 200,000 This fixes the 'LLM response was cut off due to max_tokens limit' error in agent mode that occurred because 8192 was being used instead of 32000.	2026-01-16 07:05:57 +05:30
Dhanji R. Prasanna	0ae1a13cdb	feat: real-time tool call streaming indicator with blinking UI - Add ToolParsingHint enum (Detected/Active/Complete) for UI feedback - New UiWriter methods: print_tool_streaming_hint(), print_tool_streaming_active() - Refactor ConsoleUiWriter state to use atomics in ParsingHintState - Add tool_call_streaming field to CompletionChunk for provider hints - Anthropic provider sends streaming hints when tool name detected - New streaming helpers: make_tool_streaming_hint(), make_tool_streaming_active() Parser improvements: - Add is_json_invalidated() to detect false positive tool patterns - Fix tool result poisoning when file contents contain partial JSON - Unescaped newlines in strings or prose after JSON invalidates detection User sees ' ● tool_name \|' immediately when tool call starts streaming, with blinking indicator while args are received.	2026-01-15 13:49:29 +05:30
Dhanji R. Prasanna	f4562cd4c9	config: default agent settings and provider override	2026-01-14 20:14:33 +05:30
Dhanji R. Prasanna	f415dbb84b	Fix ACD turn summary loss and add /dump command ACD (Aggressive Context Dehydration) fixes: - Fixed dehydrate_context() to extract turn summary from context window instead of using the passed-in final_response (which contained only the timing footer, not the actual LLM response) - Removed final_response parameter from dehydrate_context() since it now self-extracts the last assistant message as the summary - This ensures the actual turn summary is preserved after dehydration, not just the timing footer New /dump command: - Added /dump command to dump entire context window to tmp/ for debugging - Shows message index, role, kind, content length, and full content - Available in both console and machine modes UTF-8 safety: - Fixed truncate_to_word_boundary() to use character indices instead of byte indices, preventing panics on multi-byte UTF-8 characters - Added UTF-8 string slicing guidance to AGENTS.md Agent: g3	2026-01-12 05:13:02 +05:30
Dhanji R. Prasanna	e301075666	Fix panic on multi-byte chars in filter_json buffer truncation The buffer truncation code was slicing at a raw byte offset which could land in the middle of a multi-byte character (like emojis), causing a panic. Fixed by using char_indices() to find valid character boundaries. Also added stop_reason field to CompletionChunk initializers in tests to complete the stop_reason feature addition. - Fix byte boundary panic in filter_json.rs line 327 - Add test for multi-byte character handling - Update test files with missing stop_reason field	2026-01-09 15:20:57 +11:00
Dhanji R. Prasanna	3776ed847e	refactor: use shared streaming helpers in openai and embedded providers Agent: carmack openai.rs: - Use make_text_chunk() for streaming text content - Use make_final_chunk() for final completion chunk - Simplify tool_calls conversion logic embedded.rs: - Use make_text_chunk() for all 4 streaming text chunks - Use make_final_chunk() for final completion chunk - Remove unused CompletionChunk import Net reduction: 35 lines removed All tests pass. Behavior unchanged.	2026-01-07 13:01:03 +11:00
Dhanji R. Prasanna	2bf475960c	refactor: extract shared streaming utilities module Agent: carmack Create crates/g3-providers/src/streaming.rs with shared helpers: - decode_utf8_streaming(): Handle incomplete UTF-8 sequences in SSE streams - is_incomplete_json_error(): Detect incomplete vs malformed JSON - make_final_chunk(): Create finished completion chunks - make_text_chunk(): Create text content chunks - make_tool_chunk(): Create tool call chunks Refactor anthropic.rs: - Use shared decode_utf8_streaming (removes 15 lines of inline UTF-8 handling) - Use make_final_chunk, make_text_chunk, make_tool_chunk helpers - Reduces verbose CompletionChunk constructions throughout Refactor databricks.rs: - Remove local copies of streaming helpers (now uses shared module) - Reduces duplication between providers Net reduction: 118 lines removed, 16 lines added (including new module) All tests pass. Behavior unchanged.	2026-01-07 12:48:07 +11:00
Dhanji R. Prasanna	bb63050779	refactor: improve readability of streaming and file ops code Agent: carmack databricks.rs: - Extract ToolCallAccumulator struct to replace opaque (String, String, String) tuple - Add decode_utf8_streaming() helper for cleaner UTF-8 handling - Add is_incomplete_json_error() helper for JSON parse error detection - Add make_final_chunk() helper to reduce duplication - Add finalize_tool_calls() to convert accumulators to final format - Refactor parse_streaming_response from ~270 lines to ~100 lines - Reduce nesting depth from 8+ levels to 4 levels - Use early returns and let-else for cleaner control flow file_ops.rs: - Replace repetitive if-let chains with declarative PATH_CONTENT_KEYS table - Use match expression instead of nested if-else - Reduce extract_path_and_content from 44 lines to 20 lines All tests pass. Behavior unchanged.	2026-01-07 12:39:05 +11:00
Dhanji R. Prasanna	3601cc0547	Enhance read_image tool with magic byte detection and multi-image support - Fix media type detection using magic bytes instead of file extension - Correctly identifies JPEG files with .png extension (and vice versa) - Supports PNG, JPEG, GIF, and WebP formats - Add multi-image support with file_paths array parameter - Load multiple images in a single tool call - All images queued for LLM analysis - Enhanced CLI output: - Inline image preview via iTerm2 imgcat protocol (height=5) - Dimmed info line showing: path \| dimensions \| media type \| file size - Proper │ prefix alignment with tool output boxing - Human-readable file sizes (bytes, KB, MB) - Add image dimension extraction from file headers - PNG, JPEG, GIF, WebP dimension parsing - Add comprehensive tests for magic byte detection and dimensions	2025-12-26 11:19:37 +11:00
Dhanji R. Prasanna	3ece02ff31	fix: resolve compiler warnings across crates - Remove unused assignment to final_output_called (returns immediately after) - Mark cache_config field as #[allow(dead_code)] (reserved for future use) - Mark print_status_line method as #[allow(dead_code)] (reserved for future use)	2025-12-25 18:47:22 +11:00
Dhanji R. Prasanna	923def0ab2	Convert all INFO logs to DEBUG to reduce CLI noise Converted ~77 info! macro calls to debug! across the codebase to prevent log messages from interrupting the CLI experience during normal operation. Users can still see these logs by setting RUST_LOG=debug if needed. Affected crates: - g3-cli - g3-computer-control - g3-console - g3-core - g3-ensembles - g3-execution - g3-providers	2025-12-22 16:27:35 +11:00
Dhanji R. Prasanna	b4f6da6bf2	duplicate tool call bugfix	2025-12-19 15:24:03 +11:00
Jochen	ff8b3e7c7b	Implement planning mode	2025-12-09 17:03:53 +11:00
Jochen	4aa84e2144	disable thinking if there is no token budget	2025-12-09 16:45:28 +11:00
Jochen	fb2cf6f898	fix for thinking budget and hardcoded max token on summary	2025-12-09 12:41:52 +11:00
Jochen	ae16243f49	Fix temperature param + add thinking for anthropic The temperature param was not passed to the llm. Now support anthropic models in 'thinking' mode.	2025-12-02 17:24:55 +11:00
Dhanji R. Prasanna	8928fb92be	append instead of replace system msg	2025-11-29 16:13:00 +11:00
Jochen	52f78653b4	add context window monitor Writes the current context window to logs/current_context_window (uses a symlink to a session ID). This PR was unfortunately generated by a different LLM and did a ton of superficial reformating, it's actually a fairly small and benign change, but I don't want to roll back everything. Hope that's ok.	2025-11-27 21:00:02 +11:00
Jochen	93dc4acf86	generate internal id (debugging only) NOT set to provider... Anthropic will reject a message with id	2025-11-27 18:30:42 +11:00
Jochen	ad198a8501	add code exploration fast start This tries to short-circuit multiple round-trips to llm for reading code. It's a precursor to trying to context engineer tailored to specific tasks. In initial experiments, it's only marginally faster than regular mode, and burns more tokens.	2025-11-25 22:51:32 +11:00
Jochen	9bffd8b1bf	cache_control removed from databricks	2025-11-19 12:15:49 +11:00
Jochen	a150ba6a55	adds ttl to cache control	2025-11-18 23:23:49 +11:00
Jochen	296bf5a449	adds cache_control	2025-11-18 22:38:52 +11:00
Michael Neale	81cd956c20	allow openai to be used to name named compatible providers	2025-11-10 16:12:33 +11:00
Dhanji R. Prasanna	cef234d91a	more color	2025-11-06 13:51:58 +11:00
Dhanji R. Prasanna	d78732df14	colors	2025-11-06 13:41:06 +11:00
Dhanji R. Prasanna	d007e8f471	improve code_search nudge and increase anthropic tmieout	2025-11-05 15:05:29 +11:00
Dhanji Prasanna	22a0090cdc	fix unexpected EOF on streams	2025-11-04 16:28:41 +11:00
Dhanji Prasanna	f89bbfc89a	fix final_output bug	2025-10-31 14:48:36 +11:00
Dhanji Prasanna	d0ac222e2e	more macax tooling	2025-10-24 10:45:24 +11:00
Dhanji Prasanna	c5d6fbef08	control commands	2025-10-22 22:14:12 +11:00
Jochen	010a43d203	coach/player provider split + add OpenAI Allows coach and player LLM providers to be separately specified. Also adds OpenAI provider	2025-10-21 16:59:13 +11:00
Dhanji Prasanna	bb90cc7826	some fixes	2025-10-14 12:44:02 +11:00
Dhanji Prasanna	062e6de63f	fix for buffered messages at end, colorized context bars	2025-10-13 13:36:37 +11:00
Dhanji Prasanna	260c949576	token counting fixes	2025-10-09 12:11:21 +11:00
Dhanji Prasanna	bcba99ec6c	auto refresh token	2025-10-04 17:32:48 +10:00
Dhanji Prasanna	57b1b51e65	retro mode ui!	2025-10-02 14:47:19 +10:00

1 2

60 Commits