Commit Graph

732 Commits

Author SHA1 Message Date
Dhanji R. Prasanna
51f12769d5 Merge sessions/hopper/297c7be9 2026-01-30 14:30:53 +11:00
Dhanji R. Prasanna
58bbfde6f4 test: add integration tests for streaming parser stuttering bug fix
Add characterization tests for the streaming parser stuttering bug fix (fa3c920).
These tests verify that when an LLM "stutters" and emits incomplete tool call
fragments followed by complete tool calls, the parser:

1. Does not get stuck waiting for the incomplete fragment to complete
2. Successfully parses complete tool calls that appear after the fragment

Tests cover:
- The exact pattern from butler session butler_c6ab59af2e4f991c
- Edge cases that should NOT trigger invalidation (nested JSON, patterns in strings)
- Recovery behavior after reset
- Multiple complete tool calls
- Boundary conditions (chunk boundaries, minimal patterns)

Agent: hopper
2026-01-30 14:30:27 +11:00
Dhanji R. Prasanna
3003bdebaa refactor: fix flaky test and remove dead code in recent commits
Fixes issues in the last 11 commits:

1. pending_research.rs: Fix flaky test_generate_id_uniqueness
   - Replaced random u16 suffix with atomic counter for guaranteed uniqueness
   - The timestamp+random approach could collide when generating IDs rapidly
   - Now uses static AtomicU32 counter that increments monotonically

2. embedded/adapters/glm.rs: Remove unused in_code_fence field
   - Field was written but never read (dead code)
   - Removed from struct definition, constructor, and reset()

3. embedded/adapters/glm.rs: Fix orphaned tests
   - Two tests (test_strip_code_fences, test_code_fenced_tool_call) were
     outside the #[cfg(test)] mod tests block
   - Moved closing brace to include them in the test module

All 446 library tests pass.

Agent: fowler
2026-01-30 14:28:43 +11:00
Dhanji R. Prasanna
6bb07ce4f5 Merge sessions/interactive/3c2a09df 2026-01-30 14:20:12 +11:00
Dhanji R. Prasanna
f1a5241777 Add /research <id> and /research latest commands
Allow users to view research reports directly from the CLI:

- /research - List all research tasks (unchanged)
- /research <id> - View the full report for a specific research task
- /research latest - View the most recent completed research report

Report display includes query, status, elapsed time, and full content.
2026-01-30 14:06:28 +11:00
Dhanji R. Prasanna
fa3c9203e0 Fix streaming parser bug: detect abandoned tool call fragments
When the LLM 'stutters' and emits incomplete tool call fragments like:
  {"tool": "shell", "args": {...}}
  {"tool":
  {"tool": "shell", "args": {...}}

The parser would get stuck waiting for the incomplete fragment to complete,
causing the entire response to be lost (no tool executed, no text displayed).

This was observed in butler session butler_c6ab59af2e4f991c where the user's
'send!' command produced no response.

Fix: Enhanced is_json_invalidated() to detect when a new tool call pattern
({"tool"}) appears after a newline while parsing an incomplete JSON fragment.
This indicates the previous fragment was abandoned and should be invalidated.

Safety:
- Tool patterns inside JSON strings (e.g., writing example code) are not
  affected because the check only runs outside strings
- Added tests for the stuttering pattern and the file-writing edge case
2026-01-30 14:00:18 +11:00
Dhanji R. Prasanna
f93d05f444 Add real-time research completion notifications
When background research completes, g3 now immediately prints a status
message instead of waiting for the next user interaction:

- Added ResearchCompletionNotification and broadcast channel to
  PendingResearchManager for push-based notifications
- Added spawn_research_notification_handler() in interactive mode that
  listens for completions in a background task
- When idle (at prompt): clears line, prints status, reprints prompt
- When busy (processing): prints status inline (interleaving is fine)
- Added G3Status::research_complete() for consistent formatting
- Added enable_research_notifications() method to Agent

Output format: "g3: 1 research report ... [done]"
2026-01-30 13:35:35 +11:00
Dhanji R. Prasanna
5428504777 Fix input formatting bugs: newline, line wrapping, and TTY check
Fixes three bugs in the input formatter introduced in 4e16942:

1. Bug 2 & 3 (missing newline, line duplication):
   - Changed print! to println! to add trailing newline
   - Calculate visual lines based on terminal width instead of
     logical line count, fixing duplication for wrapped lines

2. Bug 1 (^M on non-interactive prompts):
   - Added TTY check to skip formatting when stdout is not a terminal
   - Prevents terminal state corruption for stdin prompts
2026-01-30 13:28:31 +11:00
Dhanji R. Prasanna
b252ff443d Merge sessions/interactive/9681cb67 2026-01-30 13:01:00 +11:00
Dhanji R. Prasanna
5ab1598e03 feat: async research tool - runs in background, returns immediately
The research tool now spawns the scout agent in a background tokio task
and returns immediately with a research_id placeholder. This allows the
agent to continue working while research runs (30-120 seconds).

Key changes:
- New PendingResearchManager for tracking async research tasks
- research tool returns immediately with placeholder containing research_id
- research_status tool to check progress of pending research
- Auto-injection of completed research at natural break points:
  - Start of each tool iteration (before LLM call)
  - Before prompting user in interactive mode
- /research CLI command to list all research tasks
- Updated system prompt to explain async behavior

The agent can:
- Continue with other work while research runs
- Check status with research_status tool
- Yield turn to user if results are critical before continuing
2026-01-30 13:00:02 +11:00
Dhanji R. Prasanna
4e1694248f Add input formatting for interactive CLI
When users type prompts in interactive mode, the input is now
reformatted in place with enhanced highlighting:

- ALL CAPS words (2+ chars) become bold green (e.g., FIX, BUG, HTTP2)
- Quoted text ("..." or ...) becomes cyan
- Standard markdown formatting is also supported

New module: input_formatter.rs with 10 unit tests
Integrated into interactive.rs for both single-line and multiline input
2026-01-30 12:03:36 +11:00
Dhanji R. Prasanna
2e21502357 Fix --project flag not working in agent mode
- Add CommonFlags struct to group flags that apply across all modes
- Refactor run_agent_mode() to accept CommonFlags instead of individual params
- Add project loading logic for agent chat mode
- Add integration tests for --project with agent mode

This refactor prevents future bugs where new flags work in one mode
but are forgotten in another.
2026-01-30 11:28:48 +11:00
Dhanji R. Prasanna
51d22b3282 gemini model perf 2026-01-30 10:09:46 +11:00
Dhanji R. Prasanna
8191a5e8e6 feat(embedded): add GLM tool format adapter for code fence stripping
GLM-4 models wrap tool calls in markdown code fences and inline backticks,
which prevents the streaming parser from detecting them. This adapter:

- Strips ```json and ``` code fence markers during streaming
- Strips inline backticks from tool call JSON
- Handles chunked streaming correctly (buffers potential fence lines)
- Transforms GLM native format (<|assistant|>tool_name) to g3 JSON format

Also refactors embedded provider into module structure:
- embedded/mod.rs - module exports
- embedded/provider.rs - main EmbeddedProvider (moved from embedded.rs)
- embedded/adapters/mod.rs - ToolFormatAdapter trait
- embedded/adapters/glm.rs - GLM-specific adapter

Includes 22 unit tests covering edge cases like nested JSON in strings,
chunk boundary handling, and false pattern detection.

Updates README to show GLM-4 9B now works () for agentic tasks.
2026-01-29 12:52:09 +11:00
Dhanji R. Prasanna
457ba35f80 docs: Fix documentation accuracy and add missing Gemini provider
Corrections made:
- docs/architecture.md: Fix crate count from 9 to 8 (actual count)
- docs/tools.md: Fix code_search supported languages (kotlin -> haskell, scheme, racket)
- docs/CODE_SEARCH.md: Add missing Haskell and Scheme to supported languages list
- docs/providers.md: Add complete Gemini provider documentation section
- docs/configuration.md: Add Gemini configuration section

The Gemini provider (crates/g3-providers/src/gemini.rs) was fully implemented
but not documented. The code_search tool actually supports haskell and scheme
(via tree-sitter) but documentation incorrectly listed kotlin.

Agent: lamport
2026-01-29 12:06:53 +11:00
Dhanji R. Prasanna
f9e0b94cc1 tiny tweak 2026-01-29 12:02:11 +11:00
Dhanji R. Prasanna
853237e62e Update dependency analysis artifacts
Generated comprehensive static dependency analysis for g3 workspace:

- graph.json: 108 nodes (9 crates, 99 files), 186 edges
- graph.summary.md: Overview with metrics, entrypoints, fan-in/fan-out rankings
- sccs.md: No cycles detected (DAG structure confirmed)
- layers.observed.md: 4-layer crate hierarchy identified
- hotspots.md: ui_writer.rs (15 fan-in), agent_mode.rs (13 fan-out) as key nodes
- limitations.md: Documents extraction methodology and caveats

Updated AGENTS.md with artifact documentation table.

Agent: euler
2026-01-29 11:46:39 +11:00
Dhanji R. Prasanna
cba7d31996 Merge sessions/carmack/ee92b215 2026-01-29 11:40:48 +11:00
Dhanji R. Prasanna
d4941dc95a refactor(providers): improve readability of embedded.rs and gemini.rs
embedded.rs (937→789 lines, -16%):
- Extract duplicated inference setup into prepare_context() helper
- Extract stop sequence handling into find_stop_sequence() and truncate_at_stop_sequence()
- Add InferenceParams struct to consolidate request parameter extraction
- Add clear section markers for code organization
- Tests now use module-level format functions directly (no duplication)

gemini.rs:
- Extract common request building into build_request() method
- Reduces duplication between complete() and stream() methods

All 399 unit tests pass. Behavior unchanged.

Agent: carmack
2026-01-29 11:39:46 +11:00
Dhanji R. Prasanna
cb3c523edf Compact workspace memory: -7.5% size, all concepts preserved
Transformations applied:
- Fixed incorrect line numbers in Streaming Utilities (IterationState 65→166, StreamingState 17→16)
- Updated file sizes with verified byte counts (context_window.rs, streaming.rs, compaction.rs, acd.rs)
- Tightened verbose descriptions throughout
- Removed redundant "Format" column from Chat Template table
- Shortened download command (python3 -m huggingface_hub... → huggingface-cli)
- Collapsed "Known issues" log-style narrative in Embedded Provider
- Removed filler words and redundant explanations

Metrics: 224→212 lines (-5%), 12581→11630 chars (-7.5%)
All 26 semantic entries preserved.

Agent: huffman
2026-01-29 11:38:53 +11:00
Dhanji R. Prasanna
1bff9b0025 huffman tweak to cover more ground 2026-01-29 11:36:09 +11:00
Dhanji R. Prasanna
653c5f72ac Compact workspace memory: 402→224 lines (-44%), 22k→12.6k chars (-43%)
Merged duplicate entries:
- Context Window & Compaction + Context Compaction → unified section
- Streaming Markdown Formatter + Code Blocks → single entry
- CLI Argument Parsing + CLI Entry Points + CLI Module Structure → CLI Module Structure
- Auto-Memory Feature + Tool Call Tracking + Auto-Memory Reminder Format → Auto-Memory System
- Agent Mode folded into CLI Module Structure

Tightened verbose sections:
- UTF-8 pattern: removed 10-line code example, kept pattern + danger zones
- ACD Fragment Storage: replaced 15-line JSON with inline field list
- GLM-4 downloads: replaced 12-line bash with table + single download template

Entry count: 37 → 26 (-30%)
All char ranges, function names, and gotchas preserved.

Agent: huffman
2026-01-29 11:34:17 +11:00
Dhanji R. Prasanna
bd4473b75f model performance tweaks to readme 2026-01-29 11:31:29 +11:00
Dhanji R. Prasanna
1bff9d5dcc tiny tweaks to huffman 2026-01-29 11:31:17 +11:00
Dhanji R. Prasanna
7cf9c3b7bb Merge sessions/hopper/8e287188 2026-01-29 11:30:54 +11:00
Dhanji R. Prasanna
21f8d5a1aa Add integration tests for CacheStats and Gemini serialization
Agent: hopper

Added two new integration test files:

1. cache_stats_integration_test.rs (g3-core)
   - Tests CacheStats accumulation through streaming completion flow
   - Verifies cache hit detection (cache_read_tokens > 0)
   - Tests multi-request accumulation of cache statistics
   - Verifies cache efficiency and hit rate calculations
   - Uses MockProvider to simulate provider usage data

2. gemini_serialization_test.rs (g3-providers)
   - Tests Gemini API message format conversion
   - Verifies system messages become system_instruction
   - Verifies assistant role maps to "model" (Gemini terminology)
   - Tests tool conversion to function_declarations format
   - Characterizes multi-system-message behavior (last wins)

Both test files follow blackbox/integration testing principles:
- Test observable behavior through stable surfaces
- Do not assert internal implementation details
- Include documentation of what is/is not asserted
2026-01-29 11:28:52 +11:00
Dhanji R. Prasanna
570a824780 Rename archivist agent to huffman
Named after David Huffman, inventor of Huffman coding -
compression that preserves information with fewer bits.

Fits the agent's purpose: compact memory, preserve semantics.
2026-01-29 11:22:59 +11:00
Dhanji R. Prasanna
627dd45966 Add archivist to built-in agents list in README 2026-01-29 11:20:23 +11:00
Dhanji R. Prasanna
b45ff37b68 Add archivist agent for memory compaction and signal optimization
New agent that maintains workspace memory quality:
- Deduplicates entries within memory
- Tightens verbose phrasing to terse declarations
- Collapses log-style narratives to current-state facts
- Removes AGENTS.md ↔ Memory duplication
- Ports code locations from AGENTS.md to Memory

Goal: increase signal, reduce noise, preserve all semantic information.

Agent: archivist
2026-01-29 11:19:47 +11:00
Dhanji R. Prasanna
56f558dc1b Fix compiler warnings in test files
Eliminate unused variable and import warnings across test files:
- streaming_parser_test.rs: prefix unused `tools` with underscore
- webdriver_session.rs: remove unused `use super::*` import
- mock_provider_integration_test.rs: prefix unused `result` and `task_result`
- test_preflight_max_tokens.rs: prefix unused `proposed_max`
- todo_staleness_test.rs: add #[allow(dead_code)] for test helper methods
- json_parsing_stress_test.rs: prefix unused `tools`
- read_file_token_limit_test.rs: add #[allow(dead_code)] for unused helper
- background_process_demo_test.rs: remove unused PathBuf import
- test_session_continuation.rs: prefix unused `temp_dir` in 7 tests

All tests pass. No behavior changes.

Agent: fowler
2026-01-29 11:15:10 +11:00
Dhanji R. Prasanna
5c1e0630b5 Merge sessions/interactive/664ee473 2026-01-29 11:14:28 +11:00
Dhanji R. Prasanna
9a998e201a Tighten AGENTS.md: remove redundant content covered by Memory
Removed sections that duplicate Workspace Memory:
- Recommended Entry Points (Memory has precise file/line locations)
- For Debugging paths (Memory has session/error log details)
- Dependency Analysis Artifacts (reference info, not actionable)

Kept essential guardrails:
- Critical Invariants (MUST/MUST NOT rules)
- Dangerous Code Paths (risk warnings, not locations)
- Do/Dont coding standards
- Common Incorrect Assumptions

Reduction: 125 lines → 69 lines (~45% smaller, ~650 tokens saved)
2026-01-29 11:13:25 +11:00
Dhanji R. Prasanna
7bfb9efa19 Remove automatic README loading from context window
README.md is no longer auto-loaded into the LLM context at startup.
This saves ~4,600 tokens per session while AGENTS.md and memory.md
still provide all critical information for code tasks.

Changes:
- Delete read_project_readme() function
- Remove readme_content parameter from combine_project_content()
- Rename extract_readme_heading() -> extract_project_heading()
- Rename Agent constructors: *_with_readme_* -> *_with_project_context_*
- Update context preservation to only check for Agent Configuration
- Remove has_readme field from LoadedContent
- Update all tests to use new markers and function names

The LLM can still read README.md on-demand via read_file when needed.
2026-01-29 11:07:41 +11:00
Dhanji R. Prasanna
5ea43d7b39 Add --project CLI flag for loading projects at startup
Adds a new --project <PATH> flag that loads project files (brief.md,
contacts.yaml, status.md) at startup, similar to the /project command
but WITHOUT auto-executing the project status prompt.

Changes:
- Add --project flag to cli_args.rs
- Add load_and_validate_project() helper in project.rs (shared by both
  --project flag and /project command)
- Modify run_interactive() to accept optional initial_project parameter
- Wire up --project in lib.rs to load project before interactive mode
- Refactor /project command to use shared helper (reduces duplication)
- Add 4 new tests for load_and_validate_project()
2026-01-29 11:06:08 +11:00
Dhanji R. Prasanna
05d253ee2a docs: add embedded model performance comparison for agentic tasks
Added a new section documenting local LLM performance on complex agentic
tasks (comic book repacking test case). Includes:

- Cloud model baseline (Claude Opus 4.5, Sonnet 4.5, Claude 4 family)
- Local model ratings (Qwen3-32B, Qwen3-14B, GLM-4 9B, Qwen3-4B)
- Key findings about MoE vs dense models
- Configuration example for embedded providers
2026-01-29 10:33:53 +11:00
Dhanji R. Prasanna
f6717b4435 Add Gemini 3 model context window detection 2026-01-29 10:20:56 +11:00
Dhanji R. Prasanna
735e9c9312 Add Google Gemini provider support
- Add GeminiProvider with streaming and native tool calling
- Support gemini-2.5-pro, gemini-2.0-flash, gemini-1.5-pro/flash models
- Model-specific context window detection (1M-2M tokens)
- Message conversion: assistant -> model role mapping
- System messages extracted to system_instruction field
- Tool schema conversion with functionCall/functionResponse parts
- SSE streaming with JSON array buffer parsing
- 8 unit tests for conversion and parsing logic
- Register provider in g3-core and validate in g3-cli
2026-01-29 10:11:42 +11:00
Dhanji R. Prasanna
fe33568ee0 Fix embedded provider max_tokens default (2048 -> 8192)
The resolve_max_tokens() function was returning 2048 for embedded providers,
which caused responses to be truncated prematurely. Increased to 8192 to
allow the provider's own effective_max_tokens() calculation to work properly.
2026-01-28 13:58:14 +11:00
Dhanji R. Prasanna
58fe74334d Auto-detect context window size from GGUF for embedded providers
- Add context_window_size() method to LLMProvider trait
- Implement for EmbeddedProvider to return the auto-detected context length
- Update Agent to query provider directly instead of using hardcoded defaults
- Removes need for model-specific context length mappings
2026-01-28 11:16:14 +11:00
Dhanji R. Prasanna
55dba121b7 Add GLM-4 to context length defaults (32k)
GLM-4 models support 32k context but were falling back to the
conservative 4096 default, causing context overflow on startup.
2026-01-28 10:46:36 +11:00
Dhanji R. Prasanna
e32c302023 Fix embedded provider initialization and logging
- Use global OnceLock for llama.cpp backend to prevent BackendAlreadyInitialized error
- Suppress verbose llama.cpp stderr logging during model loading
- Fix provider validation to accept "embedded.name" format (extract type before dot)
2026-01-28 10:33:10 +11:00
Dhanji R. Prasanna
ba6e1f9896 Remove unused code to eliminate build warnings
- Remove unused SYSTEM_PROMPT_FOR_NATIVE_TOOL_USE and SYSTEM_PROMPT_FOR_NON_NATIVE_TOOL_USE constants
- Remove unused gpu_layers field from EmbeddedProvider struct
- Remove unused clean_stop_sequences method from EmbeddedProvider
2026-01-28 10:01:44 +11:00
Dhanji R. Prasanna
a902be1562 Refactor system prompts to eliminate duplication; upgrade embedded provider
- Refactor prompts.rs: extract shared sections (intro, TODO, workspace memory,
  web research, response guidelines) used by both native and non-native prompts
- Fix typo in native prompt: "save them.." -> "save them."
- Fix non-native prompt: add missing closing braces in JSON examples,
  add IMPORTANT steps section, align with native prompt quality
- Add 9 unit tests to verify both prompts contain required sections
- Upgrade llama-cpp-2 dependency and refactor embedded provider
- Update config.example.toml with embedded model examples
- Update workspace memory
2026-01-28 09:56:39 +11:00
Dhanji R. Prasanna
585684a86e Fix dead_code warning in studio crate
- Add #[allow(dead_code)] to GitWorktree::list() method
2026-01-27 13:09:56 +11:00
Dhanji R. Prasanna
755acabd47 Highlight command argument completions in cyan
- /run path completions shown in cyan
- /resume session ID completions shown in cyan
- /project name completions shown in cyan
2026-01-27 12:45:37 +11:00
Dhanji R. Prasanna
8389b0d652 Add TAB autocompletion for /project command
- Complete project names from ~/projects/ directory
- Display shows project name, replacement uses ~/projects/<name> path
- Projects sorted alphabetically
- Added test for project completion
2026-01-27 12:43:24 +11:00
Dhanji R. Prasanna
cdb8b0f5eb refactor(g3-core): consolidate Agent construction into single canonical path
Eliminate code-path aliasing in Agent construction methods by introducing
a single `build_agent()` helper that all constructors delegate to.

Before: 3 nearly-identical `Ok(Self { ... })` blocks (~30 lines each)
with subtle differences in auto_compact, is_autonomous, quiet, and
computer_controller fields - prone to drift over time.

After: Single canonical `build_agent()` method that constructs Agent
with all fields. All public constructors delegate to this single path:
- new_for_test() -> new_for_test_with_readme() -> build_agent()
- new_with_mode_and_readme() -> build_agent()

Changes:
- Add `build_agent()` private helper method (single source of truth)
- Simplify `new_for_test()` to delegate to `new_for_test_with_readme()`
- Update `new_for_test_with_readme()` to use `build_agent()`
- Update `new_with_mode_and_readme()` to use `build_agent()`

Net reduction: ~43 lines (-109/+66)
All 190 tests pass.

Agent: fowler
2026-01-27 12:01:12 +11:00
Dhanji R. Prasanna
ffea6b5fac Tighten fowler prompt 2026-01-27 11:54:21 +11:00
Dhanji R. Prasanna
dfa0e4bfa2 refactor(g3-core): add section markers to lib.rs for better organization
Added clear section comments to organize the 3000-line lib.rs into
logical groupings:

- CONSTRUCTION METHODS (~line 159)
- CONFIGURATION & PROVIDER RESOLUTION (~line 444)
- TASK EXECUTION (~line 782)
- SESSION MANAGEMENT (~line 1069)
- CONTEXT WINDOW OPERATIONS (~line 1148)
- STREAMING & LLM INTERACTION (~line 1563)
- TOOL EXECUTION (~line 2825)

This improves code navigation and provides clear boundaries for
future extraction into separate modules.

No behavioral changes - all 191 tests pass.

Agent: fowler
2026-01-27 11:46:17 +11:00
Dhanji R. Prasanna
5b4079e861 Add prompt cache statistics tracking to /stats command
- Extend Usage struct with cache_creation_tokens and cache_read_tokens fields
- Parse Anthropic cache_creation_input_tokens and cache_read_input_tokens
- Parse OpenAI prompt_tokens_details.cached_tokens for automatic prefix caching
- Add CacheStats struct to Agent for cumulative tracking across API calls
- Add "Prompt Cache Statistics" section to /stats output showing:
  - API call count and cache hit count
  - Hit rate percentage
  - Total input tokens and cache read/creation tokens
  - Cache efficiency (% of input served from cache)
- Update all provider implementations and test files
2026-01-27 11:32:45 +11:00