Commit Graph

803 Commits

Author SHA1 Message Date
Dhanji R. Prasanna
56f558dc1b Fix compiler warnings in test files
Eliminate unused variable and import warnings across test files:
- streaming_parser_test.rs: prefix unused `tools` with underscore
- webdriver_session.rs: remove unused `use super::*` import
- mock_provider_integration_test.rs: prefix unused `result` and `task_result`
- test_preflight_max_tokens.rs: prefix unused `proposed_max`
- todo_staleness_test.rs: add #[allow(dead_code)] for test helper methods
- json_parsing_stress_test.rs: prefix unused `tools`
- read_file_token_limit_test.rs: add #[allow(dead_code)] for unused helper
- background_process_demo_test.rs: remove unused PathBuf import
- test_session_continuation.rs: prefix unused `temp_dir` in 7 tests

All tests pass. No behavior changes.

Agent: fowler
2026-01-29 11:15:10 +11:00
Dhanji R. Prasanna
5c1e0630b5 Merge sessions/interactive/664ee473 2026-01-29 11:14:28 +11:00
Dhanji R. Prasanna
9a998e201a Tighten AGENTS.md: remove redundant content covered by Memory
Removed sections that duplicate Workspace Memory:
- Recommended Entry Points (Memory has precise file/line locations)
- For Debugging paths (Memory has session/error log details)
- Dependency Analysis Artifacts (reference info, not actionable)

Kept essential guardrails:
- Critical Invariants (MUST/MUST NOT rules)
- Dangerous Code Paths (risk warnings, not locations)
- Do/Dont coding standards
- Common Incorrect Assumptions

Reduction: 125 lines → 69 lines (~45% smaller, ~650 tokens saved)
2026-01-29 11:13:25 +11:00
Dhanji R. Prasanna
7bfb9efa19 Remove automatic README loading from context window
README.md is no longer auto-loaded into the LLM context at startup.
This saves ~4,600 tokens per session while AGENTS.md and memory.md
still provide all critical information for code tasks.

Changes:
- Delete read_project_readme() function
- Remove readme_content parameter from combine_project_content()
- Rename extract_readme_heading() -> extract_project_heading()
- Rename Agent constructors: *_with_readme_* -> *_with_project_context_*
- Update context preservation to only check for Agent Configuration
- Remove has_readme field from LoadedContent
- Update all tests to use new markers and function names

The LLM can still read README.md on-demand via read_file when needed.
2026-01-29 11:07:41 +11:00
Dhanji R. Prasanna
5ea43d7b39 Add --project CLI flag for loading projects at startup
Adds a new --project <PATH> flag that loads project files (brief.md,
contacts.yaml, status.md) at startup, similar to the /project command
but WITHOUT auto-executing the project status prompt.

Changes:
- Add --project flag to cli_args.rs
- Add load_and_validate_project() helper in project.rs (shared by both
  --project flag and /project command)
- Modify run_interactive() to accept optional initial_project parameter
- Wire up --project in lib.rs to load project before interactive mode
- Refactor /project command to use shared helper (reduces duplication)
- Add 4 new tests for load_and_validate_project()
2026-01-29 11:06:08 +11:00
Dhanji R. Prasanna
05d253ee2a docs: add embedded model performance comparison for agentic tasks
Added a new section documenting local LLM performance on complex agentic
tasks (comic book repacking test case). Includes:

- Cloud model baseline (Claude Opus 4.5, Sonnet 4.5, Claude 4 family)
- Local model ratings (Qwen3-32B, Qwen3-14B, GLM-4 9B, Qwen3-4B)
- Key findings about MoE vs dense models
- Configuration example for embedded providers
2026-01-29 10:33:53 +11:00
Dhanji R. Prasanna
f6717b4435 Add Gemini 3 model context window detection 2026-01-29 10:20:56 +11:00
Dhanji R. Prasanna
735e9c9312 Add Google Gemini provider support
- Add GeminiProvider with streaming and native tool calling
- Support gemini-2.5-pro, gemini-2.0-flash, gemini-1.5-pro/flash models
- Model-specific context window detection (1M-2M tokens)
- Message conversion: assistant -> model role mapping
- System messages extracted to system_instruction field
- Tool schema conversion with functionCall/functionResponse parts
- SSE streaming with JSON array buffer parsing
- 8 unit tests for conversion and parsing logic
- Register provider in g3-core and validate in g3-cli
2026-01-29 10:11:42 +11:00
Dhanji R. Prasanna
fe33568ee0 Fix embedded provider max_tokens default (2048 -> 8192)
The resolve_max_tokens() function was returning 2048 for embedded providers,
which caused responses to be truncated prematurely. Increased to 8192 to
allow the provider's own effective_max_tokens() calculation to work properly.
2026-01-28 13:58:14 +11:00
Dhanji R. Prasanna
58fe74334d Auto-detect context window size from GGUF for embedded providers
- Add context_window_size() method to LLMProvider trait
- Implement for EmbeddedProvider to return the auto-detected context length
- Update Agent to query provider directly instead of using hardcoded defaults
- Removes need for model-specific context length mappings
2026-01-28 11:16:14 +11:00
Dhanji R. Prasanna
55dba121b7 Add GLM-4 to context length defaults (32k)
GLM-4 models support 32k context but were falling back to the
conservative 4096 default, causing context overflow on startup.
2026-01-28 10:46:36 +11:00
Dhanji R. Prasanna
e32c302023 Fix embedded provider initialization and logging
- Use global OnceLock for llama.cpp backend to prevent BackendAlreadyInitialized error
- Suppress verbose llama.cpp stderr logging during model loading
- Fix provider validation to accept "embedded.name" format (extract type before dot)
2026-01-28 10:33:10 +11:00
Dhanji R. Prasanna
ba6e1f9896 Remove unused code to eliminate build warnings
- Remove unused SYSTEM_PROMPT_FOR_NATIVE_TOOL_USE and SYSTEM_PROMPT_FOR_NON_NATIVE_TOOL_USE constants
- Remove unused gpu_layers field from EmbeddedProvider struct
- Remove unused clean_stop_sequences method from EmbeddedProvider
2026-01-28 10:01:44 +11:00
Dhanji R. Prasanna
a902be1562 Refactor system prompts to eliminate duplication; upgrade embedded provider
- Refactor prompts.rs: extract shared sections (intro, TODO, workspace memory,
  web research, response guidelines) used by both native and non-native prompts
- Fix typo in native prompt: "save them.." -> "save them."
- Fix non-native prompt: add missing closing braces in JSON examples,
  add IMPORTANT steps section, align with native prompt quality
- Add 9 unit tests to verify both prompts contain required sections
- Upgrade llama-cpp-2 dependency and refactor embedded provider
- Update config.example.toml with embedded model examples
- Update workspace memory
2026-01-28 09:56:39 +11:00
Dhanji R. Prasanna
585684a86e Fix dead_code warning in studio crate
- Add #[allow(dead_code)] to GitWorktree::list() method
2026-01-27 13:09:56 +11:00
Dhanji R. Prasanna
755acabd47 Highlight command argument completions in cyan
- /run path completions shown in cyan
- /resume session ID completions shown in cyan
- /project name completions shown in cyan
2026-01-27 12:45:37 +11:00
Dhanji R. Prasanna
8389b0d652 Add TAB autocompletion for /project command
- Complete project names from ~/projects/ directory
- Display shows project name, replacement uses ~/projects/<name> path
- Projects sorted alphabetically
- Added test for project completion
2026-01-27 12:43:24 +11:00
Dhanji R. Prasanna
cdb8b0f5eb refactor(g3-core): consolidate Agent construction into single canonical path
Eliminate code-path aliasing in Agent construction methods by introducing
a single `build_agent()` helper that all constructors delegate to.

Before: 3 nearly-identical `Ok(Self { ... })` blocks (~30 lines each)
with subtle differences in auto_compact, is_autonomous, quiet, and
computer_controller fields - prone to drift over time.

After: Single canonical `build_agent()` method that constructs Agent
with all fields. All public constructors delegate to this single path:
- new_for_test() -> new_for_test_with_readme() -> build_agent()
- new_with_mode_and_readme() -> build_agent()

Changes:
- Add `build_agent()` private helper method (single source of truth)
- Simplify `new_for_test()` to delegate to `new_for_test_with_readme()`
- Update `new_for_test_with_readme()` to use `build_agent()`
- Update `new_with_mode_and_readme()` to use `build_agent()`

Net reduction: ~43 lines (-109/+66)
All 190 tests pass.

Agent: fowler
2026-01-27 12:01:12 +11:00
Dhanji R. Prasanna
ffea6b5fac Tighten fowler prompt 2026-01-27 11:54:21 +11:00
Dhanji R. Prasanna
dfa0e4bfa2 refactor(g3-core): add section markers to lib.rs for better organization
Added clear section comments to organize the 3000-line lib.rs into
logical groupings:

- CONSTRUCTION METHODS (~line 159)
- CONFIGURATION & PROVIDER RESOLUTION (~line 444)
- TASK EXECUTION (~line 782)
- SESSION MANAGEMENT (~line 1069)
- CONTEXT WINDOW OPERATIONS (~line 1148)
- STREAMING & LLM INTERACTION (~line 1563)
- TOOL EXECUTION (~line 2825)

This improves code navigation and provides clear boundaries for
future extraction into separate modules.

No behavioral changes - all 191 tests pass.

Agent: fowler
2026-01-27 11:46:17 +11:00
Dhanji R. Prasanna
5b4079e861 Add prompt cache statistics tracking to /stats command
- Extend Usage struct with cache_creation_tokens and cache_read_tokens fields
- Parse Anthropic cache_creation_input_tokens and cache_read_input_tokens
- Parse OpenAI prompt_tokens_details.cached_tokens for automatic prefix caching
- Add CacheStats struct to Agent for cumulative tracking across API calls
- Add "Prompt Cache Statistics" section to /stats output showing:
  - API call count and cache hit count
  - Hit rate percentage
  - Total input tokens and cache read/creation tokens
  - Cache efficiency (% of input served from cache)
- Update all provider implementations and test files
2026-01-27 11:32:45 +11:00
Dhanji R. Prasanna
96899230a4 Tweak hopper to encourage mocks and stubbing 2026-01-27 10:44:48 +11:00
Dhanji R. Prasanna
2e84f1ece0 test: fix ACD test race condition and add read_image characterization test
- Fix test_rehydrate_success race condition by using UUID for unique session IDs
- Add #[serial] attribute to prevent parallel execution conflicts
- Improve cleanup to remove entire session directory tree
- Add characterization test for resize_image_to_dimensions fallback behavior
  (documents fix from commit af8b849 for media type preservation)

Agent: hopper
2026-01-26 16:19:53 +11:00
Dhanji R. Prasanna
726e2d71f5 test: add integration test for project content surviving compaction
Add test_project_content_survives_compaction() to verify that project
content loaded via /project command persists through context compaction.

This is a CHARACTERIZATION test that validates:
- Project content appended to README message survives compaction
- The README message (containing project content) is preserved as message[1]
- PROJECT INSTRUCTIONS, ACTIVE PROJECT markers, Brief and Status sections
  all survive the compaction process

Agent: hopper
2026-01-26 16:09:17 +11:00
Dhanji R. Prasanna
d6a986ce0f refactor(cli): extract execute_user_input() to eliminate duplication
Both multiline and single-line input paths in interactive.rs had identical
code for:
- Template processing (process_template)
- Task execution (execute_task_with_retry)
- Auto-memory reminder with error handling

Extracted to a single execute_user_input() helper function that handles
all three steps. This eliminates code-path aliasing where the two paths
could drift over time.

File reduced from 401 to 393 lines (-2%).
All 106 g3-cli tests pass.

Agent: fowler
2026-01-26 15:59:55 +11:00
Dhanji R. Prasanna
57f04a77aa Add template expansion to interactive prompts
Apply {{today}} and other template variables to user input in:
- Interactive mode (single and multiline)
- Accumulative mode requirements
2026-01-26 15:43:39 +11:00
Dhanji R. Prasanna
7806897f00 Expand {{today}} to include day of week: YYYY-MM-DD (Monday) 2026-01-26 15:29:47 +11:00
Dhanji R. Prasanna
9de8e8cc76 Fix compaction bug: use User role for summary to maintain alternation
The previous implementation added the summary as a System message, which
caused "Conversation must start with a user message" errors because the
first non-system message after compaction was Assistant (the preserved
last assistant message).

Fix: Change summary from System to User message, creating valid alternation:
[System Prompt] -> [Summary as USER] -> [Last Assistant] -> [Latest User]

This also prevents system message bloat across multiple compactions since
the summary is now part of the conversation flow and gets replaced on
each compaction.

Added test_second_compaction_no_bloat to verify no accumulation.
2026-01-26 15:24:04 +11:00
Dhanji R. Prasanna
712eca1904 install.sh: explicit build targets and auto-fix PATH
- Build g3 and studio explicitly with -p flags
- Detect shell and rc file (zsh/bash/fish)
- Auto-add PATH to rc file with user confirmation
- Handle case where PATH is in rc but not loaded
2026-01-26 12:26:40 +11:00
Dhanji R. Prasanna
83f68dae17 style: convert CLI status messages to G3Status format
Convert remaining  emoji status messages in g3-cli to use the
consistent G3Status formatting system:

- accumulative.rs: 'autonomous run ... [done]'
- commands.rs /clear: 'clearing session ... [done]'
- commands.rs /readme: 'reloading README ... [done/failed/error]'
- commands.rs /unproject: 'unloading project ... [done]'

This provides a consistent 'g3: action ... [status]' format across
all CLI status messages.
2026-01-23 10:08:22 +05:30
Dhanji R. Prasanna
155db74aac style: use G3Status formatting for agent mode completion message
Change agent mode completion from ' Agent mode completed' to
'g3: <agent-name> session ... [done]' for consistency with other
g3 status messages.
2026-01-23 10:04:05 +05:30
Dhanji R. Prasanna
5d0d532b47 feat: preserve last assistant message during compaction
When context window compaction occurs, the last assistant message is now
preserved in addition to the system prompt, README, and summary. This
improves continuity after compaction by keeping the LLM's most recent
response, which often contains important context about what was just
done or what comes next.

New message order after compaction:
[System Prompt] -> [README/AGENTS.md] -> [ACD Stub?] -> [Summary] -> [Last Assistant] -> [Latest User?]

Changes:
- Add last_assistant_message field to PreservedMessages struct
- Modify extract_preserved_messages() to find last assistant message
- Modify reset_with_summary_and_stub() to include last assistant message
- Add comprehensive integration tests using MockProvider

Tests cover edge cases:
- No assistant message exists
- Tool-call-only assistant messages (still preserved)
- Multiple assistant messages (only last one preserved)
- No trailing user message
2026-01-23 09:54:03 +05:30
Dhanji R. Prasanna
dfdc21c3cf Use G3Status formatting for /project loading message
Changed from 'Project loaded: ✓ file1  ✓ file2' to
'g3: loading <project-name> .. ✓ file1  ✓ file2 .. [done]'

- Add G3Status::loading_project() for consistent status formatting
- Update /project command to use new formatting
- Remove unused crossterm imports from commands.rs
2026-01-22 21:03:46 +05:30
Dhanji R. Prasanna
a488a6aa99 feat(cli): colorize project name in prompt via rustyline Highlighter
Implement highlight_prompt() in G3Helper to colorize the project portion
of the prompt in blue. This uses rustyline's proper mechanism for ANSI
codes in prompts, which correctly handles cursor positioning.

Prompt 'butler | finances> ' now shows '| finances>' in blue.
2026-01-22 10:48:17 +05:30
Dhanji R. Prasanna
067c69723b fix(cli): use plain text prompt without ANSI colors
ANSI color codes in rustyline prompts cause various issues:
- \x01...\x02 markers break cursor movement
- Separate prefix printing causes gaps or disappearing text

Simplified to plain text prompt: 'butler | finances> '
This ensures reliable cursor positioning and tab completion.
2026-01-22 10:27:27 +05:30
Dhanji R. Prasanna
cb1f99c41c Revert "fix(cli): use '> ' as readline prompt when project active"
This reverts commit 4d9399f737.
2026-01-22 10:24:21 +05:30
Dhanji R. Prasanna
4d9399f737 fix(cli): use '> ' as readline prompt when project active
Previously used empty string as readline prompt after printing colored
prefix, which caused cursor positioning issues (large gap between
project name and cursor).

Now the prefix contains 'butler | finances' (colored) and readline
gets '> ' as its prompt, so cursor appears immediately after '> '.
2026-01-22 10:18:15 +05:30
Dhanji R. Prasanna
28dd60d4fc fix(cli): separate colored prefix from readline prompt
Rustyline's \x01...\x02 markers for ANSI codes didn't work correctly,
causing cursor positioning issues and breaking line editing.

New approach: build_prompt() returns (prefix, prompt) tuple where:
- prefix: colored text printed before readline (contains ANSI codes)
- prompt: plain text passed to readline (no ANSI codes)

This ensures rustyline correctly calculates line length while still
showing the colored project name.
2026-01-22 09:59:52 +05:30
Dhanji R. Prasanna
be35fa2a7f fix(cli): wrap ANSI codes in prompt for rustyline compatibility
Rustyline needs ANSI escape codes wrapped in \x01...\x02 markers
to correctly calculate visible prompt length. Without this, tab
completion breaks because rustyline miscalculates cursor position.
2026-01-22 08:30:30 +05:30
Dhanji R. Prasanna
3001df3b1a style(cli): simplify project prompt format
Change from: butler |[finances]>
Change to:   butler | finances>
2026-01-22 08:15:18 +05:30
Dhanji R. Prasanna
af8b849311 fix(read_image): use correct media type when resize fails to reduce size
When resize_image_to_dimensions() returns a larger file than the original,
we fall back to using the original bytes. Previously, was_resized was set
to true if the original dimensions exceeded MAX_IMAGE_DIMENSION, which
caused final_media_type to be set to 'image/jpeg' even though we were
using the original PNG bytes.

This caused Anthropic API errors like:
  'Image does not match the provided media type image/jpeg'

Fix: Set was_resized=false when falling back to original bytes, so the
original media type (detected from magic bytes) is preserved.
2026-01-22 07:58:05 +05:30
Dhanji R. Prasanna
022f5c70a6 feat(cli): show active project name in interactive prompt
When a project is loaded via /project, the prompt now shows:
  agent_name |[project_name]>

where the |[project_name]> part is displayed in blue.

Examples:
- Default: g3>
- With project: g3 |[myapp]>
- Agent mode: butler>
- Agent + project: butler |[myapp]>

The prompt automatically resets when /unproject is called.

Added build_prompt() function with 7 unit tests covering all prompt states.
2026-01-22 07:24:00 +05:30
Dhanji R. Prasanna
9325a43ff3 feat(cli): shorten file paths in tool output display
Add three-level path shortening hierarchy for cleaner CLI output:
1. Project path -> <project_name>/... (when project loaded via /project)
2. Workspace path -> ./... (relative to current working directory)
3. Home path -> ~/... (fallback for paths under home directory)

Changes:
- Add shorten_path() and shorten_paths_in_command() functions in display.rs
- Add project_path/project_name fields to ConsoleUiWriter
- Add set_workspace_path(), set_project_path(), clear_project() to UiWriter trait
- Add ui_writer() getter to Agent struct
- Wire up project path setting in /project and /unproject commands
- Set workspace path when creating agents in all CLI modes

Before: ● read_file | /Users/dhanji/icloud/butler/projects/appa_estate/status.md
After:  ● read_file | appa_estate/status.md (with project loaded)
        ● read_file | ./src/main.rs (workspace-relative)
        ● read_file | ~/Documents/file.txt (home-relative)
2026-01-21 21:27:16 +05:30
Dhanji R. Prasanna
0f7961d3c6 Remove libVisionBridge.dylib from install script
The VisionBridge library is no longer needed.
2026-01-21 15:27:14 +05:30
Dhanji R. Prasanna
d7d32db4a4 Fix tab completion in agent+chat mode
Remove duplicate logging initialization in agent_mode.rs. Logging is already
initialized in run() before agent mode is dispatched. The duplicate
tracing_subscriber::fmt::layer() was interfering with rustyline's terminal
state, breaking tab completion.
2026-01-21 15:24:27 +05:30
Dhanji R. Prasanna
581de4845c Add /project and /unproject to tab completion 2026-01-21 14:58:23 +05:30
Dhanji R. Prasanna
feb7c3e40d Add /project and /unproject commands for project-specific context
- Add Project struct in crates/g3-cli/src/project.rs with file loading logic
- Load brief.md, contacts.yaml, status.md from project path
- Load projects.md from workspace root for cross-project context
- Project content appended to system message (survives compaction/dehydration)
- /project <path> loads project and auto-submits prompt asking about state
- /unproject clears project content and resets context
- Add set_project_content(), clear_project_content(), has_project_content() to Agent
- Add new_for_test_with_readme() for testing with custom README content
- Add 6 unit tests for Project struct
- Add 9 integration tests for project context behavior
2026-01-21 14:53:30 +05:30
Dhanji R. Prasanna
a34a3b08e9 Rename Project Memory to Workspace Memory
Rename all references from "Project Memory" to "Workspace Memory" to avoid
future conflation if a "project" concept is introduced later.

Changes:
- Rename read_project_memory() -> read_workspace_memory()
- Update all prompts, tool descriptions, and comments
- Update header parsing in memory.rs to use "# Workspace Memory"
- Update display detection for "=== Workspace Memory ==="
- Update documentation and analysis/memory.md

11 files changed, ~36 occurrences updated.
2026-01-21 14:08:42 +05:30
Dhanji R. Prasanna
6a5ce11e7b Consolidate redundant assistant message test files
Deleted 4 redundant test files (~956 lines):
- assistant_message_dedup_test.rs (416 lines, 12 tests)
- consecutive_assistant_message_test.rs (248 lines, 6 tests)
- missing_assistant_message_test.rs (100 lines, 4 tests)
- early_return_path_test.rs (192 lines, 5 tests) - whitebox test

Created consolidated assistant_message_test.rs (369 lines, 14 tests):
- Helper function tests for consecutive message detection
- ContextWindow unit tests for normal and tool execution flows
- Bug demonstration tests documenting what bugs looked like
- Invariant tests for user/assistant alternation
- Missing assistant message fallback logic tests

The early_return_path_test was removed because it:
- Referenced specific line numbers in production code (brittle)
- Reimplemented internal logic (whitebox anti-pattern)
- Duplicated coverage from mock_provider_integration_test.rs

All 729 g3-core tests pass.
2026-01-21 10:27:07 +05:30
Dhanji R. Prasanna
c5d549c211 Readability pass: remove verbose comments and clean up tests
- completion.rs: Remove redundant comments, clean up test output (println! -> let _)
- g3_status.rs: Condense doc comments, rename from_str() to parse()
- streaming.rs: Remove obvious doc comments that duplicate function names
- simple_output.rs, ui_writer_impl.rs: Update Status::parse() calls

All changes are behavior-preserving. 132 lines removed, code is more scannable.

Agent: carmack
2026-01-21 07:13:20 +05:30