Commit Graph

725 Commits

Author SHA1 Message Date
Dhanji R. Prasanna
5428504777 Fix input formatting bugs: newline, line wrapping, and TTY check
Fixes three bugs in the input formatter introduced in 4e16942:

1. Bug 2 & 3 (missing newline, line duplication):
   - Changed print! to println! to add trailing newline
   - Calculate visual lines based on terminal width instead of
     logical line count, fixing duplication for wrapped lines

2. Bug 1 (^M on non-interactive prompts):
   - Added TTY check to skip formatting when stdout is not a terminal
   - Prevents terminal state corruption for stdin prompts
2026-01-30 13:28:31 +11:00
Dhanji R. Prasanna
b252ff443d Merge sessions/interactive/9681cb67 2026-01-30 13:01:00 +11:00
Dhanji R. Prasanna
5ab1598e03 feat: async research tool - runs in background, returns immediately
The research tool now spawns the scout agent in a background tokio task
and returns immediately with a research_id placeholder. This allows the
agent to continue working while research runs (30-120 seconds).

Key changes:
- New PendingResearchManager for tracking async research tasks
- research tool returns immediately with placeholder containing research_id
- research_status tool to check progress of pending research
- Auto-injection of completed research at natural break points:
  - Start of each tool iteration (before LLM call)
  - Before prompting user in interactive mode
- /research CLI command to list all research tasks
- Updated system prompt to explain async behavior

The agent can:
- Continue with other work while research runs
- Check status with research_status tool
- Yield turn to user if results are critical before continuing
2026-01-30 13:00:02 +11:00
Dhanji R. Prasanna
4e1694248f Add input formatting for interactive CLI
When users type prompts in interactive mode, the input is now
reformatted in place with enhanced highlighting:

- ALL CAPS words (2+ chars) become bold green (e.g., FIX, BUG, HTTP2)
- Quoted text ("..." or ...) becomes cyan
- Standard markdown formatting is also supported

New module: input_formatter.rs with 10 unit tests
Integrated into interactive.rs for both single-line and multiline input
2026-01-30 12:03:36 +11:00
Dhanji R. Prasanna
2e21502357 Fix --project flag not working in agent mode
- Add CommonFlags struct to group flags that apply across all modes
- Refactor run_agent_mode() to accept CommonFlags instead of individual params
- Add project loading logic for agent chat mode
- Add integration tests for --project with agent mode

This refactor prevents future bugs where new flags work in one mode
but are forgotten in another.
2026-01-30 11:28:48 +11:00
Dhanji R. Prasanna
51d22b3282 gemini model perf 2026-01-30 10:09:46 +11:00
Dhanji R. Prasanna
8191a5e8e6 feat(embedded): add GLM tool format adapter for code fence stripping
GLM-4 models wrap tool calls in markdown code fences and inline backticks,
which prevents the streaming parser from detecting them. This adapter:

- Strips ```json and ``` code fence markers during streaming
- Strips inline backticks from tool call JSON
- Handles chunked streaming correctly (buffers potential fence lines)
- Transforms GLM native format (<|assistant|>tool_name) to g3 JSON format

Also refactors embedded provider into module structure:
- embedded/mod.rs - module exports
- embedded/provider.rs - main EmbeddedProvider (moved from embedded.rs)
- embedded/adapters/mod.rs - ToolFormatAdapter trait
- embedded/adapters/glm.rs - GLM-specific adapter

Includes 22 unit tests covering edge cases like nested JSON in strings,
chunk boundary handling, and false pattern detection.

Updates README to show GLM-4 9B now works () for agentic tasks.
2026-01-29 12:52:09 +11:00
Dhanji R. Prasanna
457ba35f80 docs: Fix documentation accuracy and add missing Gemini provider
Corrections made:
- docs/architecture.md: Fix crate count from 9 to 8 (actual count)
- docs/tools.md: Fix code_search supported languages (kotlin -> haskell, scheme, racket)
- docs/CODE_SEARCH.md: Add missing Haskell and Scheme to supported languages list
- docs/providers.md: Add complete Gemini provider documentation section
- docs/configuration.md: Add Gemini configuration section

The Gemini provider (crates/g3-providers/src/gemini.rs) was fully implemented
but not documented. The code_search tool actually supports haskell and scheme
(via tree-sitter) but documentation incorrectly listed kotlin.

Agent: lamport
2026-01-29 12:06:53 +11:00
Dhanji R. Prasanna
f9e0b94cc1 tiny tweak 2026-01-29 12:02:11 +11:00
Dhanji R. Prasanna
853237e62e Update dependency analysis artifacts
Generated comprehensive static dependency analysis for g3 workspace:

- graph.json: 108 nodes (9 crates, 99 files), 186 edges
- graph.summary.md: Overview with metrics, entrypoints, fan-in/fan-out rankings
- sccs.md: No cycles detected (DAG structure confirmed)
- layers.observed.md: 4-layer crate hierarchy identified
- hotspots.md: ui_writer.rs (15 fan-in), agent_mode.rs (13 fan-out) as key nodes
- limitations.md: Documents extraction methodology and caveats

Updated AGENTS.md with artifact documentation table.

Agent: euler
2026-01-29 11:46:39 +11:00
Dhanji R. Prasanna
cba7d31996 Merge sessions/carmack/ee92b215 2026-01-29 11:40:48 +11:00
Dhanji R. Prasanna
d4941dc95a refactor(providers): improve readability of embedded.rs and gemini.rs
embedded.rs (937→789 lines, -16%):
- Extract duplicated inference setup into prepare_context() helper
- Extract stop sequence handling into find_stop_sequence() and truncate_at_stop_sequence()
- Add InferenceParams struct to consolidate request parameter extraction
- Add clear section markers for code organization
- Tests now use module-level format functions directly (no duplication)

gemini.rs:
- Extract common request building into build_request() method
- Reduces duplication between complete() and stream() methods

All 399 unit tests pass. Behavior unchanged.

Agent: carmack
2026-01-29 11:39:46 +11:00
Dhanji R. Prasanna
cb3c523edf Compact workspace memory: -7.5% size, all concepts preserved
Transformations applied:
- Fixed incorrect line numbers in Streaming Utilities (IterationState 65→166, StreamingState 17→16)
- Updated file sizes with verified byte counts (context_window.rs, streaming.rs, compaction.rs, acd.rs)
- Tightened verbose descriptions throughout
- Removed redundant "Format" column from Chat Template table
- Shortened download command (python3 -m huggingface_hub... → huggingface-cli)
- Collapsed "Known issues" log-style narrative in Embedded Provider
- Removed filler words and redundant explanations

Metrics: 224→212 lines (-5%), 12581→11630 chars (-7.5%)
All 26 semantic entries preserved.

Agent: huffman
2026-01-29 11:38:53 +11:00
Dhanji R. Prasanna
1bff9b0025 huffman tweak to cover more ground 2026-01-29 11:36:09 +11:00
Dhanji R. Prasanna
653c5f72ac Compact workspace memory: 402→224 lines (-44%), 22k→12.6k chars (-43%)
Merged duplicate entries:
- Context Window & Compaction + Context Compaction → unified section
- Streaming Markdown Formatter + Code Blocks → single entry
- CLI Argument Parsing + CLI Entry Points + CLI Module Structure → CLI Module Structure
- Auto-Memory Feature + Tool Call Tracking + Auto-Memory Reminder Format → Auto-Memory System
- Agent Mode folded into CLI Module Structure

Tightened verbose sections:
- UTF-8 pattern: removed 10-line code example, kept pattern + danger zones
- ACD Fragment Storage: replaced 15-line JSON with inline field list
- GLM-4 downloads: replaced 12-line bash with table + single download template

Entry count: 37 → 26 (-30%)
All char ranges, function names, and gotchas preserved.

Agent: huffman
2026-01-29 11:34:17 +11:00
Dhanji R. Prasanna
bd4473b75f model performance tweaks to readme 2026-01-29 11:31:29 +11:00
Dhanji R. Prasanna
1bff9d5dcc tiny tweaks to huffman 2026-01-29 11:31:17 +11:00
Dhanji R. Prasanna
7cf9c3b7bb Merge sessions/hopper/8e287188 2026-01-29 11:30:54 +11:00
Dhanji R. Prasanna
21f8d5a1aa Add integration tests for CacheStats and Gemini serialization
Agent: hopper

Added two new integration test files:

1. cache_stats_integration_test.rs (g3-core)
   - Tests CacheStats accumulation through streaming completion flow
   - Verifies cache hit detection (cache_read_tokens > 0)
   - Tests multi-request accumulation of cache statistics
   - Verifies cache efficiency and hit rate calculations
   - Uses MockProvider to simulate provider usage data

2. gemini_serialization_test.rs (g3-providers)
   - Tests Gemini API message format conversion
   - Verifies system messages become system_instruction
   - Verifies assistant role maps to "model" (Gemini terminology)
   - Tests tool conversion to function_declarations format
   - Characterizes multi-system-message behavior (last wins)

Both test files follow blackbox/integration testing principles:
- Test observable behavior through stable surfaces
- Do not assert internal implementation details
- Include documentation of what is/is not asserted
2026-01-29 11:28:52 +11:00
Dhanji R. Prasanna
570a824780 Rename archivist agent to huffman
Named after David Huffman, inventor of Huffman coding -
compression that preserves information with fewer bits.

Fits the agent's purpose: compact memory, preserve semantics.
2026-01-29 11:22:59 +11:00
Dhanji R. Prasanna
627dd45966 Add archivist to built-in agents list in README 2026-01-29 11:20:23 +11:00
Dhanji R. Prasanna
b45ff37b68 Add archivist agent for memory compaction and signal optimization
New agent that maintains workspace memory quality:
- Deduplicates entries within memory
- Tightens verbose phrasing to terse declarations
- Collapses log-style narratives to current-state facts
- Removes AGENTS.md ↔ Memory duplication
- Ports code locations from AGENTS.md to Memory

Goal: increase signal, reduce noise, preserve all semantic information.

Agent: archivist
2026-01-29 11:19:47 +11:00
Dhanji R. Prasanna
56f558dc1b Fix compiler warnings in test files
Eliminate unused variable and import warnings across test files:
- streaming_parser_test.rs: prefix unused `tools` with underscore
- webdriver_session.rs: remove unused `use super::*` import
- mock_provider_integration_test.rs: prefix unused `result` and `task_result`
- test_preflight_max_tokens.rs: prefix unused `proposed_max`
- todo_staleness_test.rs: add #[allow(dead_code)] for test helper methods
- json_parsing_stress_test.rs: prefix unused `tools`
- read_file_token_limit_test.rs: add #[allow(dead_code)] for unused helper
- background_process_demo_test.rs: remove unused PathBuf import
- test_session_continuation.rs: prefix unused `temp_dir` in 7 tests

All tests pass. No behavior changes.

Agent: fowler
2026-01-29 11:15:10 +11:00
Dhanji R. Prasanna
5c1e0630b5 Merge sessions/interactive/664ee473 2026-01-29 11:14:28 +11:00
Dhanji R. Prasanna
9a998e201a Tighten AGENTS.md: remove redundant content covered by Memory
Removed sections that duplicate Workspace Memory:
- Recommended Entry Points (Memory has precise file/line locations)
- For Debugging paths (Memory has session/error log details)
- Dependency Analysis Artifacts (reference info, not actionable)

Kept essential guardrails:
- Critical Invariants (MUST/MUST NOT rules)
- Dangerous Code Paths (risk warnings, not locations)
- Do/Dont coding standards
- Common Incorrect Assumptions

Reduction: 125 lines → 69 lines (~45% smaller, ~650 tokens saved)
2026-01-29 11:13:25 +11:00
Dhanji R. Prasanna
7bfb9efa19 Remove automatic README loading from context window
README.md is no longer auto-loaded into the LLM context at startup.
This saves ~4,600 tokens per session while AGENTS.md and memory.md
still provide all critical information for code tasks.

Changes:
- Delete read_project_readme() function
- Remove readme_content parameter from combine_project_content()
- Rename extract_readme_heading() -> extract_project_heading()
- Rename Agent constructors: *_with_readme_* -> *_with_project_context_*
- Update context preservation to only check for Agent Configuration
- Remove has_readme field from LoadedContent
- Update all tests to use new markers and function names

The LLM can still read README.md on-demand via read_file when needed.
2026-01-29 11:07:41 +11:00
Dhanji R. Prasanna
5ea43d7b39 Add --project CLI flag for loading projects at startup
Adds a new --project <PATH> flag that loads project files (brief.md,
contacts.yaml, status.md) at startup, similar to the /project command
but WITHOUT auto-executing the project status prompt.

Changes:
- Add --project flag to cli_args.rs
- Add load_and_validate_project() helper in project.rs (shared by both
  --project flag and /project command)
- Modify run_interactive() to accept optional initial_project parameter
- Wire up --project in lib.rs to load project before interactive mode
- Refactor /project command to use shared helper (reduces duplication)
- Add 4 new tests for load_and_validate_project()
2026-01-29 11:06:08 +11:00
Dhanji R. Prasanna
05d253ee2a docs: add embedded model performance comparison for agentic tasks
Added a new section documenting local LLM performance on complex agentic
tasks (comic book repacking test case). Includes:

- Cloud model baseline (Claude Opus 4.5, Sonnet 4.5, Claude 4 family)
- Local model ratings (Qwen3-32B, Qwen3-14B, GLM-4 9B, Qwen3-4B)
- Key findings about MoE vs dense models
- Configuration example for embedded providers
2026-01-29 10:33:53 +11:00
Dhanji R. Prasanna
f6717b4435 Add Gemini 3 model context window detection 2026-01-29 10:20:56 +11:00
Dhanji R. Prasanna
735e9c9312 Add Google Gemini provider support
- Add GeminiProvider with streaming and native tool calling
- Support gemini-2.5-pro, gemini-2.0-flash, gemini-1.5-pro/flash models
- Model-specific context window detection (1M-2M tokens)
- Message conversion: assistant -> model role mapping
- System messages extracted to system_instruction field
- Tool schema conversion with functionCall/functionResponse parts
- SSE streaming with JSON array buffer parsing
- 8 unit tests for conversion and parsing logic
- Register provider in g3-core and validate in g3-cli
2026-01-29 10:11:42 +11:00
Dhanji R. Prasanna
fe33568ee0 Fix embedded provider max_tokens default (2048 -> 8192)
The resolve_max_tokens() function was returning 2048 for embedded providers,
which caused responses to be truncated prematurely. Increased to 8192 to
allow the provider's own effective_max_tokens() calculation to work properly.
2026-01-28 13:58:14 +11:00
Dhanji R. Prasanna
58fe74334d Auto-detect context window size from GGUF for embedded providers
- Add context_window_size() method to LLMProvider trait
- Implement for EmbeddedProvider to return the auto-detected context length
- Update Agent to query provider directly instead of using hardcoded defaults
- Removes need for model-specific context length mappings
2026-01-28 11:16:14 +11:00
Dhanji R. Prasanna
55dba121b7 Add GLM-4 to context length defaults (32k)
GLM-4 models support 32k context but were falling back to the
conservative 4096 default, causing context overflow on startup.
2026-01-28 10:46:36 +11:00
Dhanji R. Prasanna
e32c302023 Fix embedded provider initialization and logging
- Use global OnceLock for llama.cpp backend to prevent BackendAlreadyInitialized error
- Suppress verbose llama.cpp stderr logging during model loading
- Fix provider validation to accept "embedded.name" format (extract type before dot)
2026-01-28 10:33:10 +11:00
Dhanji R. Prasanna
ba6e1f9896 Remove unused code to eliminate build warnings
- Remove unused SYSTEM_PROMPT_FOR_NATIVE_TOOL_USE and SYSTEM_PROMPT_FOR_NON_NATIVE_TOOL_USE constants
- Remove unused gpu_layers field from EmbeddedProvider struct
- Remove unused clean_stop_sequences method from EmbeddedProvider
2026-01-28 10:01:44 +11:00
Dhanji R. Prasanna
a902be1562 Refactor system prompts to eliminate duplication; upgrade embedded provider
- Refactor prompts.rs: extract shared sections (intro, TODO, workspace memory,
  web research, response guidelines) used by both native and non-native prompts
- Fix typo in native prompt: "save them.." -> "save them."
- Fix non-native prompt: add missing closing braces in JSON examples,
  add IMPORTANT steps section, align with native prompt quality
- Add 9 unit tests to verify both prompts contain required sections
- Upgrade llama-cpp-2 dependency and refactor embedded provider
- Update config.example.toml with embedded model examples
- Update workspace memory
2026-01-28 09:56:39 +11:00
Dhanji R. Prasanna
585684a86e Fix dead_code warning in studio crate
- Add #[allow(dead_code)] to GitWorktree::list() method
2026-01-27 13:09:56 +11:00
Dhanji R. Prasanna
755acabd47 Highlight command argument completions in cyan
- /run path completions shown in cyan
- /resume session ID completions shown in cyan
- /project name completions shown in cyan
2026-01-27 12:45:37 +11:00
Dhanji R. Prasanna
8389b0d652 Add TAB autocompletion for /project command
- Complete project names from ~/projects/ directory
- Display shows project name, replacement uses ~/projects/<name> path
- Projects sorted alphabetically
- Added test for project completion
2026-01-27 12:43:24 +11:00
Dhanji R. Prasanna
cdb8b0f5eb refactor(g3-core): consolidate Agent construction into single canonical path
Eliminate code-path aliasing in Agent construction methods by introducing
a single `build_agent()` helper that all constructors delegate to.

Before: 3 nearly-identical `Ok(Self { ... })` blocks (~30 lines each)
with subtle differences in auto_compact, is_autonomous, quiet, and
computer_controller fields - prone to drift over time.

After: Single canonical `build_agent()` method that constructs Agent
with all fields. All public constructors delegate to this single path:
- new_for_test() -> new_for_test_with_readme() -> build_agent()
- new_with_mode_and_readme() -> build_agent()

Changes:
- Add `build_agent()` private helper method (single source of truth)
- Simplify `new_for_test()` to delegate to `new_for_test_with_readme()`
- Update `new_for_test_with_readme()` to use `build_agent()`
- Update `new_with_mode_and_readme()` to use `build_agent()`

Net reduction: ~43 lines (-109/+66)
All 190 tests pass.

Agent: fowler
2026-01-27 12:01:12 +11:00
Dhanji R. Prasanna
ffea6b5fac Tighten fowler prompt 2026-01-27 11:54:21 +11:00
Dhanji R. Prasanna
dfa0e4bfa2 refactor(g3-core): add section markers to lib.rs for better organization
Added clear section comments to organize the 3000-line lib.rs into
logical groupings:

- CONSTRUCTION METHODS (~line 159)
- CONFIGURATION & PROVIDER RESOLUTION (~line 444)
- TASK EXECUTION (~line 782)
- SESSION MANAGEMENT (~line 1069)
- CONTEXT WINDOW OPERATIONS (~line 1148)
- STREAMING & LLM INTERACTION (~line 1563)
- TOOL EXECUTION (~line 2825)

This improves code navigation and provides clear boundaries for
future extraction into separate modules.

No behavioral changes - all 191 tests pass.

Agent: fowler
2026-01-27 11:46:17 +11:00
Dhanji R. Prasanna
5b4079e861 Add prompt cache statistics tracking to /stats command
- Extend Usage struct with cache_creation_tokens and cache_read_tokens fields
- Parse Anthropic cache_creation_input_tokens and cache_read_input_tokens
- Parse OpenAI prompt_tokens_details.cached_tokens for automatic prefix caching
- Add CacheStats struct to Agent for cumulative tracking across API calls
- Add "Prompt Cache Statistics" section to /stats output showing:
  - API call count and cache hit count
  - Hit rate percentage
  - Total input tokens and cache read/creation tokens
  - Cache efficiency (% of input served from cache)
- Update all provider implementations and test files
2026-01-27 11:32:45 +11:00
Dhanji R. Prasanna
96899230a4 Tweak hopper to encourage mocks and stubbing 2026-01-27 10:44:48 +11:00
Dhanji R. Prasanna
2e84f1ece0 test: fix ACD test race condition and add read_image characterization test
- Fix test_rehydrate_success race condition by using UUID for unique session IDs
- Add #[serial] attribute to prevent parallel execution conflicts
- Improve cleanup to remove entire session directory tree
- Add characterization test for resize_image_to_dimensions fallback behavior
  (documents fix from commit af8b849 for media type preservation)

Agent: hopper
2026-01-26 16:19:53 +11:00
Dhanji R. Prasanna
726e2d71f5 test: add integration test for project content surviving compaction
Add test_project_content_survives_compaction() to verify that project
content loaded via /project command persists through context compaction.

This is a CHARACTERIZATION test that validates:
- Project content appended to README message survives compaction
- The README message (containing project content) is preserved as message[1]
- PROJECT INSTRUCTIONS, ACTIVE PROJECT markers, Brief and Status sections
  all survive the compaction process

Agent: hopper
2026-01-26 16:09:17 +11:00
Dhanji R. Prasanna
d6a986ce0f refactor(cli): extract execute_user_input() to eliminate duplication
Both multiline and single-line input paths in interactive.rs had identical
code for:
- Template processing (process_template)
- Task execution (execute_task_with_retry)
- Auto-memory reminder with error handling

Extracted to a single execute_user_input() helper function that handles
all three steps. This eliminates code-path aliasing where the two paths
could drift over time.

File reduced from 401 to 393 lines (-2%).
All 106 g3-cli tests pass.

Agent: fowler
2026-01-26 15:59:55 +11:00
Dhanji R. Prasanna
57f04a77aa Add template expansion to interactive prompts
Apply {{today}} and other template variables to user input in:
- Interactive mode (single and multiline)
- Accumulative mode requirements
2026-01-26 15:43:39 +11:00
Dhanji R. Prasanna
7806897f00 Expand {{today}} to include day of week: YYYY-MM-DD (Monday) 2026-01-26 15:29:47 +11:00
Dhanji R. Prasanna
9de8e8cc76 Fix compaction bug: use User role for summary to maintain alternation
The previous implementation added the summary as a System message, which
caused "Conversation must start with a user message" errors because the
first non-system message after compaction was Assistant (the preserved
last assistant message).

Fix: Change summary from System to User message, creating valid alternation:
[System Prompt] -> [Summary as USER] -> [Last Assistant] -> [Latest User]

This also prevents system message bloat across multiple compactions since
the summary is now part of the conversation flow and gets replaced on
each compaction.

Added test_second_compaction_no_bloat to verify no accumulation.
2026-01-26 15:24:04 +11:00