alex/g3 - g3 - Millerson GIT hosting

alex/g3

Author	SHA1	Message	Date
Dhanji R. Prasanna	51f12769d5	Merge sessions/hopper/297c7be9	2026-01-30 14:30:53 +11:00
Dhanji R. Prasanna	58bbfde6f4	test: add integration tests for streaming parser stuttering bug fix Add characterization tests for the streaming parser stuttering bug fix (`fa3c920`). These tests verify that when an LLM "stutters" and emits incomplete tool call fragments followed by complete tool calls, the parser: 1. Does not get stuck waiting for the incomplete fragment to complete 2. Successfully parses complete tool calls that appear after the fragment Tests cover: - The exact pattern from butler session butler_c6ab59af2e4f991c - Edge cases that should NOT trigger invalidation (nested JSON, patterns in strings) - Recovery behavior after reset - Multiple complete tool calls - Boundary conditions (chunk boundaries, minimal patterns) Agent: hopper	2026-01-30 14:30:27 +11:00
Dhanji R. Prasanna	3003bdebaa	refactor: fix flaky test and remove dead code in recent commits Fixes issues in the last 11 commits: 1. pending_research.rs: Fix flaky test_generate_id_uniqueness - Replaced random u16 suffix with atomic counter for guaranteed uniqueness - The timestamp+random approach could collide when generating IDs rapidly - Now uses static AtomicU32 counter that increments monotonically 2. embedded/adapters/glm.rs: Remove unused in_code_fence field - Field was written but never read (dead code) - Removed from struct definition, constructor, and reset() 3. embedded/adapters/glm.rs: Fix orphaned tests - Two tests (test_strip_code_fences, test_code_fenced_tool_call) were outside the #[cfg(test)] mod tests block - Moved closing brace to include them in the test module All 446 library tests pass. Agent: fowler	2026-01-30 14:28:43 +11:00
Dhanji R. Prasanna	6bb07ce4f5	Merge sessions/interactive/3c2a09df	2026-01-30 14:20:12 +11:00
Dhanji R. Prasanna	f1a5241777	Add /research <id> and /research latest commands Allow users to view research reports directly from the CLI: - /research - List all research tasks (unchanged) - /research <id> - View the full report for a specific research task - /research latest - View the most recent completed research report Report display includes query, status, elapsed time, and full content.	2026-01-30 14:06:28 +11:00
Dhanji R. Prasanna	fa3c9203e0	Fix streaming parser bug: detect abandoned tool call fragments When the LLM 'stutters' and emits incomplete tool call fragments like: {"tool": "shell", "args": {...}} {"tool": {"tool": "shell", "args": {...}} The parser would get stuck waiting for the incomplete fragment to complete, causing the entire response to be lost (no tool executed, no text displayed). This was observed in butler session butler_c6ab59af2e4f991c where the user's 'send!' command produced no response. Fix: Enhanced is_json_invalidated() to detect when a new tool call pattern ({"tool"}) appears after a newline while parsing an incomplete JSON fragment. This indicates the previous fragment was abandoned and should be invalidated. Safety: - Tool patterns inside JSON strings (e.g., writing example code) are not affected because the check only runs outside strings - Added tests for the stuttering pattern and the file-writing edge case	2026-01-30 14:00:18 +11:00
Dhanji R. Prasanna	f93d05f444	Add real-time research completion notifications When background research completes, g3 now immediately prints a status message instead of waiting for the next user interaction: - Added ResearchCompletionNotification and broadcast channel to PendingResearchManager for push-based notifications - Added spawn_research_notification_handler() in interactive mode that listens for completions in a background task - When idle (at prompt): clears line, prints status, reprints prompt - When busy (processing): prints status inline (interleaving is fine) - Added G3Status::research_complete() for consistent formatting - Added enable_research_notifications() method to Agent Output format: "g3: 1 research report ... [done]"	2026-01-30 13:35:35 +11:00
Dhanji R. Prasanna	5428504777	Fix input formatting bugs: newline, line wrapping, and TTY check Fixes three bugs in the input formatter introduced in `4e16942`: 1. Bug 2 & 3 (missing newline, line duplication): - Changed print! to println! to add trailing newline - Calculate visual lines based on terminal width instead of logical line count, fixing duplication for wrapped lines 2. Bug 1 (^M on non-interactive prompts): - Added TTY check to skip formatting when stdout is not a terminal - Prevents terminal state corruption for stdin prompts	2026-01-30 13:28:31 +11:00
Dhanji R. Prasanna	b252ff443d	Merge sessions/interactive/9681cb67	2026-01-30 13:01:00 +11:00
Dhanji R. Prasanna	5ab1598e03	feat: async research tool - runs in background, returns immediately The research tool now spawns the scout agent in a background tokio task and returns immediately with a research_id placeholder. This allows the agent to continue working while research runs (30-120 seconds). Key changes: - New PendingResearchManager for tracking async research tasks - research tool returns immediately with placeholder containing research_id - research_status tool to check progress of pending research - Auto-injection of completed research at natural break points: - Start of each tool iteration (before LLM call) - Before prompting user in interactive mode - /research CLI command to list all research tasks - Updated system prompt to explain async behavior The agent can: - Continue with other work while research runs - Check status with research_status tool - Yield turn to user if results are critical before continuing	2026-01-30 13:00:02 +11:00
Dhanji R. Prasanna	4e1694248f	Add input formatting for interactive CLI When users type prompts in interactive mode, the input is now reformatted in place with enhanced highlighting: - ALL CAPS words (2+ chars) become bold green (e.g., FIX, BUG, HTTP2) - Quoted text ("..." or ...) becomes cyan - Standard markdown formatting is also supported New module: input_formatter.rs with 10 unit tests Integrated into interactive.rs for both single-line and multiline input	2026-01-30 12:03:36 +11:00
Dhanji R. Prasanna	2e21502357	Fix --project flag not working in agent mode - Add CommonFlags struct to group flags that apply across all modes - Refactor run_agent_mode() to accept CommonFlags instead of individual params - Add project loading logic for agent chat mode - Add integration tests for --project with agent mode This refactor prevents future bugs where new flags work in one mode but are forgotten in another.	2026-01-30 11:28:48 +11:00
Dhanji R. Prasanna	51d22b3282	gemini model perf	2026-01-30 10:09:46 +11:00
Dhanji R. Prasanna	8191a5e8e6	feat(embedded): add GLM tool format adapter for code fence stripping GLM-4 models wrap tool calls in markdown code fences and inline backticks, which prevents the streaming parser from detecting them. This adapter: - Strips ```json and ``` code fence markers during streaming - Strips inline backticks from tool call JSON - Handles chunked streaming correctly (buffers potential fence lines) - Transforms GLM native format (<\|assistant\|>tool_name) to g3 JSON format Also refactors embedded provider into module structure: - embedded/mod.rs - module exports - embedded/provider.rs - main EmbeddedProvider (moved from embedded.rs) - embedded/adapters/mod.rs - ToolFormatAdapter trait - embedded/adapters/glm.rs - GLM-specific adapter Includes 22 unit tests covering edge cases like nested JSON in strings, chunk boundary handling, and false pattern detection. Updates README to show GLM-4 9B now works (⭐⭐) for agentic tasks.	2026-01-29 12:52:09 +11:00
Dhanji R. Prasanna	457ba35f80	docs: Fix documentation accuracy and add missing Gemini provider Corrections made: - docs/architecture.md: Fix crate count from 9 to 8 (actual count) - docs/tools.md: Fix code_search supported languages (kotlin -> haskell, scheme, racket) - docs/CODE_SEARCH.md: Add missing Haskell and Scheme to supported languages list - docs/providers.md: Add complete Gemini provider documentation section - docs/configuration.md: Add Gemini configuration section The Gemini provider (crates/g3-providers/src/gemini.rs) was fully implemented but not documented. The code_search tool actually supports haskell and scheme (via tree-sitter) but documentation incorrectly listed kotlin. Agent: lamport	2026-01-29 12:06:53 +11:00
Dhanji R. Prasanna	f9e0b94cc1	tiny tweak	2026-01-29 12:02:11 +11:00
Dhanji R. Prasanna	853237e62e	Update dependency analysis artifacts Generated comprehensive static dependency analysis for g3 workspace: - graph.json: 108 nodes (9 crates, 99 files), 186 edges - graph.summary.md: Overview with metrics, entrypoints, fan-in/fan-out rankings - sccs.md: No cycles detected (DAG structure confirmed) - layers.observed.md: 4-layer crate hierarchy identified - hotspots.md: ui_writer.rs (15 fan-in), agent_mode.rs (13 fan-out) as key nodes - limitations.md: Documents extraction methodology and caveats Updated AGENTS.md with artifact documentation table. Agent: euler	2026-01-29 11:46:39 +11:00
Dhanji R. Prasanna	cba7d31996	Merge sessions/carmack/ee92b215	2026-01-29 11:40:48 +11:00
Dhanji R. Prasanna	d4941dc95a	refactor(providers): improve readability of embedded.rs and gemini.rs embedded.rs (937→789 lines, -16%): - Extract duplicated inference setup into prepare_context() helper - Extract stop sequence handling into find_stop_sequence() and truncate_at_stop_sequence() - Add InferenceParams struct to consolidate request parameter extraction - Add clear section markers for code organization - Tests now use module-level format functions directly (no duplication) gemini.rs: - Extract common request building into build_request() method - Reduces duplication between complete() and stream() methods All 399 unit tests pass. Behavior unchanged. Agent: carmack	2026-01-29 11:39:46 +11:00
Dhanji R. Prasanna	cb3c523edf	Compact workspace memory: -7.5% size, all concepts preserved Transformations applied: - Fixed incorrect line numbers in Streaming Utilities (IterationState 65→166, StreamingState 17→16) - Updated file sizes with verified byte counts (context_window.rs, streaming.rs, compaction.rs, acd.rs) - Tightened verbose descriptions throughout - Removed redundant "Format" column from Chat Template table - Shortened download command (python3 -m huggingface_hub... → huggingface-cli) - Collapsed "Known issues" log-style narrative in Embedded Provider - Removed filler words and redundant explanations Metrics: 224→212 lines (-5%), 12581→11630 chars (-7.5%) All 26 semantic entries preserved. Agent: huffman	2026-01-29 11:38:53 +11:00
Dhanji R. Prasanna	1bff9b0025	huffman tweak to cover more ground	2026-01-29 11:36:09 +11:00
Dhanji R. Prasanna	653c5f72ac	Compact workspace memory: 402→224 lines (-44%), 22k→12.6k chars (-43%) Merged duplicate entries: - Context Window & Compaction + Context Compaction → unified section - Streaming Markdown Formatter + Code Blocks → single entry - CLI Argument Parsing + CLI Entry Points + CLI Module Structure → CLI Module Structure - Auto-Memory Feature + Tool Call Tracking + Auto-Memory Reminder Format → Auto-Memory System - Agent Mode folded into CLI Module Structure Tightened verbose sections: - UTF-8 pattern: removed 10-line code example, kept pattern + danger zones - ACD Fragment Storage: replaced 15-line JSON with inline field list - GLM-4 downloads: replaced 12-line bash with table + single download template Entry count: 37 → 26 (-30%) All char ranges, function names, and gotchas preserved. Agent: huffman	2026-01-29 11:34:17 +11:00
Dhanji R. Prasanna	bd4473b75f	model performance tweaks to readme	2026-01-29 11:31:29 +11:00
Dhanji R. Prasanna	1bff9d5dcc	tiny tweaks to huffman	2026-01-29 11:31:17 +11:00
Dhanji R. Prasanna	7cf9c3b7bb	Merge sessions/hopper/8e287188	2026-01-29 11:30:54 +11:00
Dhanji R. Prasanna	21f8d5a1aa	Add integration tests for CacheStats and Gemini serialization Agent: hopper Added two new integration test files: 1. cache_stats_integration_test.rs (g3-core) - Tests CacheStats accumulation through streaming completion flow - Verifies cache hit detection (cache_read_tokens > 0) - Tests multi-request accumulation of cache statistics - Verifies cache efficiency and hit rate calculations - Uses MockProvider to simulate provider usage data 2. gemini_serialization_test.rs (g3-providers) - Tests Gemini API message format conversion - Verifies system messages become system_instruction - Verifies assistant role maps to "model" (Gemini terminology) - Tests tool conversion to function_declarations format - Characterizes multi-system-message behavior (last wins) Both test files follow blackbox/integration testing principles: - Test observable behavior through stable surfaces - Do not assert internal implementation details - Include documentation of what is/is not asserted	2026-01-29 11:28:52 +11:00
Dhanji R. Prasanna	570a824780	Rename archivist agent to huffman Named after David Huffman, inventor of Huffman coding - compression that preserves information with fewer bits. Fits the agent's purpose: compact memory, preserve semantics.	2026-01-29 11:22:59 +11:00
Dhanji R. Prasanna	627dd45966	Add archivist to built-in agents list in README	2026-01-29 11:20:23 +11:00
Dhanji R. Prasanna	b45ff37b68	Add archivist agent for memory compaction and signal optimization New agent that maintains workspace memory quality: - Deduplicates entries within memory - Tightens verbose phrasing to terse declarations - Collapses log-style narratives to current-state facts - Removes AGENTS.md ↔ Memory duplication - Ports code locations from AGENTS.md to Memory Goal: increase signal, reduce noise, preserve all semantic information. Agent: archivist	2026-01-29 11:19:47 +11:00
Dhanji R. Prasanna	56f558dc1b	Fix compiler warnings in test files Eliminate unused variable and import warnings across test files: - streaming_parser_test.rs: prefix unused `tools` with underscore - webdriver_session.rs: remove unused `use super::*` import - mock_provider_integration_test.rs: prefix unused `result` and `task_result` - test_preflight_max_tokens.rs: prefix unused `proposed_max` - todo_staleness_test.rs: add #[allow(dead_code)] for test helper methods - json_parsing_stress_test.rs: prefix unused `tools` - read_file_token_limit_test.rs: add #[allow(dead_code)] for unused helper - background_process_demo_test.rs: remove unused PathBuf import - test_session_continuation.rs: prefix unused `temp_dir` in 7 tests All tests pass. No behavior changes. Agent: fowler	2026-01-29 11:15:10 +11:00
Dhanji R. Prasanna	5c1e0630b5	Merge sessions/interactive/664ee473	2026-01-29 11:14:28 +11:00
Dhanji R. Prasanna	9a998e201a	Tighten AGENTS.md: remove redundant content covered by Memory Removed sections that duplicate Workspace Memory: - Recommended Entry Points (Memory has precise file/line locations) - For Debugging paths (Memory has session/error log details) - Dependency Analysis Artifacts (reference info, not actionable) Kept essential guardrails: - Critical Invariants (MUST/MUST NOT rules) - Dangerous Code Paths (risk warnings, not locations) - Do/Dont coding standards - Common Incorrect Assumptions Reduction: 125 lines → 69 lines (~45% smaller, ~650 tokens saved)	2026-01-29 11:13:25 +11:00
Dhanji R. Prasanna	7bfb9efa19	Remove automatic README loading from context window README.md is no longer auto-loaded into the LLM context at startup. This saves ~4,600 tokens per session while AGENTS.md and memory.md still provide all critical information for code tasks. Changes: - Delete read_project_readme() function - Remove readme_content parameter from combine_project_content() - Rename extract_readme_heading() -> extract_project_heading() - Rename Agent constructors: _with_readme_ -> _with_project_context_ - Update context preservation to only check for Agent Configuration - Remove has_readme field from LoadedContent - Update all tests to use new markers and function names The LLM can still read README.md on-demand via read_file when needed.	2026-01-29 11:07:41 +11:00
Dhanji R. Prasanna	5ea43d7b39	Add --project CLI flag for loading projects at startup Adds a new --project <PATH> flag that loads project files (brief.md, contacts.yaml, status.md) at startup, similar to the /project command but WITHOUT auto-executing the project status prompt. Changes: - Add --project flag to cli_args.rs - Add load_and_validate_project() helper in project.rs (shared by both --project flag and /project command) - Modify run_interactive() to accept optional initial_project parameter - Wire up --project in lib.rs to load project before interactive mode - Refactor /project command to use shared helper (reduces duplication) - Add 4 new tests for load_and_validate_project()	2026-01-29 11:06:08 +11:00
Dhanji R. Prasanna	05d253ee2a	docs: add embedded model performance comparison for agentic tasks Added a new section documenting local LLM performance on complex agentic tasks (comic book repacking test case). Includes: - Cloud model baseline (Claude Opus 4.5, Sonnet 4.5, Claude 4 family) - Local model ratings (Qwen3-32B, Qwen3-14B, GLM-4 9B, Qwen3-4B) - Key findings about MoE vs dense models - Configuration example for embedded providers	2026-01-29 10:33:53 +11:00
Dhanji R. Prasanna	f6717b4435	Add Gemini 3 model context window detection	2026-01-29 10:20:56 +11:00
Dhanji R. Prasanna	735e9c9312	Add Google Gemini provider support - Add GeminiProvider with streaming and native tool calling - Support gemini-2.5-pro, gemini-2.0-flash, gemini-1.5-pro/flash models - Model-specific context window detection (1M-2M tokens) - Message conversion: assistant -> model role mapping - System messages extracted to system_instruction field - Tool schema conversion with functionCall/functionResponse parts - SSE streaming with JSON array buffer parsing - 8 unit tests for conversion and parsing logic - Register provider in g3-core and validate in g3-cli	2026-01-29 10:11:42 +11:00
Dhanji R. Prasanna	fe33568ee0	Fix embedded provider max_tokens default (2048 -> 8192) The resolve_max_tokens() function was returning 2048 for embedded providers, which caused responses to be truncated prematurely. Increased to 8192 to allow the provider's own effective_max_tokens() calculation to work properly.	2026-01-28 13:58:14 +11:00
Dhanji R. Prasanna	58fe74334d	Auto-detect context window size from GGUF for embedded providers - Add context_window_size() method to LLMProvider trait - Implement for EmbeddedProvider to return the auto-detected context length - Update Agent to query provider directly instead of using hardcoded defaults - Removes need for model-specific context length mappings	2026-01-28 11:16:14 +11:00
Dhanji R. Prasanna	55dba121b7	Add GLM-4 to context length defaults (32k) GLM-4 models support 32k context but were falling back to the conservative 4096 default, causing context overflow on startup.	2026-01-28 10:46:36 +11:00
Dhanji R. Prasanna	e32c302023	Fix embedded provider initialization and logging - Use global OnceLock for llama.cpp backend to prevent BackendAlreadyInitialized error - Suppress verbose llama.cpp stderr logging during model loading - Fix provider validation to accept "embedded.name" format (extract type before dot)	2026-01-28 10:33:10 +11:00
Dhanji R. Prasanna	ba6e1f9896	Remove unused code to eliminate build warnings - Remove unused SYSTEM_PROMPT_FOR_NATIVE_TOOL_USE and SYSTEM_PROMPT_FOR_NON_NATIVE_TOOL_USE constants - Remove unused gpu_layers field from EmbeddedProvider struct - Remove unused clean_stop_sequences method from EmbeddedProvider	2026-01-28 10:01:44 +11:00
Dhanji R. Prasanna	a902be1562	Refactor system prompts to eliminate duplication; upgrade embedded provider - Refactor prompts.rs: extract shared sections (intro, TODO, workspace memory, web research, response guidelines) used by both native and non-native prompts - Fix typo in native prompt: "save them.." -> "save them." - Fix non-native prompt: add missing closing braces in JSON examples, add IMPORTANT steps section, align with native prompt quality - Add 9 unit tests to verify both prompts contain required sections - Upgrade llama-cpp-2 dependency and refactor embedded provider - Update config.example.toml with embedded model examples - Update workspace memory	2026-01-28 09:56:39 +11:00
Dhanji R. Prasanna	585684a86e	Fix dead_code warning in studio crate - Add #[allow(dead_code)] to GitWorktree::list() method	2026-01-27 13:09:56 +11:00
Dhanji R. Prasanna	755acabd47	Highlight command argument completions in cyan - /run path completions shown in cyan - /resume session ID completions shown in cyan - /project name completions shown in cyan	2026-01-27 12:45:37 +11:00
Dhanji R. Prasanna	8389b0d652	Add TAB autocompletion for /project command - Complete project names from ~/projects/ directory - Display shows project name, replacement uses ~/projects/<name> path - Projects sorted alphabetically - Added test for project completion	2026-01-27 12:43:24 +11:00
Dhanji R. Prasanna	cdb8b0f5eb	refactor(g3-core): consolidate Agent construction into single canonical path Eliminate code-path aliasing in Agent construction methods by introducing a single `build_agent()` helper that all constructors delegate to. Before: 3 nearly-identical `Ok(Self { ... })` blocks (~30 lines each) with subtle differences in auto_compact, is_autonomous, quiet, and computer_controller fields - prone to drift over time. After: Single canonical `build_agent()` method that constructs Agent with all fields. All public constructors delegate to this single path: - new_for_test() -> new_for_test_with_readme() -> build_agent() - new_with_mode_and_readme() -> build_agent() Changes: - Add `build_agent()` private helper method (single source of truth) - Simplify `new_for_test()` to delegate to `new_for_test_with_readme()` - Update `new_for_test_with_readme()` to use `build_agent()` - Update `new_with_mode_and_readme()` to use `build_agent()` Net reduction: ~43 lines (-109/+66) All 190 tests pass. Agent: fowler	2026-01-27 12:01:12 +11:00
Dhanji R. Prasanna	ffea6b5fac	Tighten fowler prompt	2026-01-27 11:54:21 +11:00
Dhanji R. Prasanna	dfa0e4bfa2	refactor(g3-core): add section markers to lib.rs for better organization Added clear section comments to organize the 3000-line lib.rs into logical groupings: - CONSTRUCTION METHODS (~line 159) - CONFIGURATION & PROVIDER RESOLUTION (~line 444) - TASK EXECUTION (~line 782) - SESSION MANAGEMENT (~line 1069) - CONTEXT WINDOW OPERATIONS (~line 1148) - STREAMING & LLM INTERACTION (~line 1563) - TOOL EXECUTION (~line 2825) This improves code navigation and provides clear boundaries for future extraction into separate modules. No behavioral changes - all 191 tests pass. Agent: fowler	2026-01-27 11:46:17 +11:00
Dhanji R. Prasanna	5b4079e861	Add prompt cache statistics tracking to /stats command - Extend Usage struct with cache_creation_tokens and cache_read_tokens fields - Parse Anthropic cache_creation_input_tokens and cache_read_input_tokens - Parse OpenAI prompt_tokens_details.cached_tokens for automatic prefix caching - Add CacheStats struct to Agent for cumulative tracking across API calls - Add "Prompt Cache Statistics" section to /stats output showing: - API call count and cache hit count - Hit rate percentage - Total input tokens and cache read/creation tokens - Cache efficiency (% of input served from cache) - Update all provider implementations and test files	2026-01-27 11:32:45 +11:00

1 2 3 4 5 ...

732 Commits