embedded.rs (937→789 lines, -16%):
- Extract duplicated inference setup into prepare_context() helper
- Extract stop sequence handling into find_stop_sequence() and truncate_at_stop_sequence()
- Add InferenceParams struct to consolidate request parameter extraction
- Add clear section markers for code organization
- Tests now use module-level format functions directly (no duplication)
gemini.rs:
- Extract common request building into build_request() method
- Reduces duplication between complete() and stream() methods
All 399 unit tests pass. Behavior unchanged.
Agent: carmack
Agent: hopper
Added two new integration test files:
1. cache_stats_integration_test.rs (g3-core)
- Tests CacheStats accumulation through streaming completion flow
- Verifies cache hit detection (cache_read_tokens > 0)
- Tests multi-request accumulation of cache statistics
- Verifies cache efficiency and hit rate calculations
- Uses MockProvider to simulate provider usage data
2. gemini_serialization_test.rs (g3-providers)
- Tests Gemini API message format conversion
- Verifies system messages become system_instruction
- Verifies assistant role maps to "model" (Gemini terminology)
- Tests tool conversion to function_declarations format
- Characterizes multi-system-message behavior (last wins)
Both test files follow blackbox/integration testing principles:
- Test observable behavior through stable surfaces
- Do not assert internal implementation details
- Include documentation of what is/is not asserted
README.md is no longer auto-loaded into the LLM context at startup.
This saves ~4,600 tokens per session while AGENTS.md and memory.md
still provide all critical information for code tasks.
Changes:
- Delete read_project_readme() function
- Remove readme_content parameter from combine_project_content()
- Rename extract_readme_heading() -> extract_project_heading()
- Rename Agent constructors: *_with_readme_* -> *_with_project_context_*
- Update context preservation to only check for Agent Configuration
- Remove has_readme field from LoadedContent
- Update all tests to use new markers and function names
The LLM can still read README.md on-demand via read_file when needed.
Adds a new --project <PATH> flag that loads project files (brief.md,
contacts.yaml, status.md) at startup, similar to the /project command
but WITHOUT auto-executing the project status prompt.
Changes:
- Add --project flag to cli_args.rs
- Add load_and_validate_project() helper in project.rs (shared by both
--project flag and /project command)
- Modify run_interactive() to accept optional initial_project parameter
- Wire up --project in lib.rs to load project before interactive mode
- Refactor /project command to use shared helper (reduces duplication)
- Add 4 new tests for load_and_validate_project()
- Add GeminiProvider with streaming and native tool calling
- Support gemini-2.5-pro, gemini-2.0-flash, gemini-1.5-pro/flash models
- Model-specific context window detection (1M-2M tokens)
- Message conversion: assistant -> model role mapping
- System messages extracted to system_instruction field
- Tool schema conversion with functionCall/functionResponse parts
- SSE streaming with JSON array buffer parsing
- 8 unit tests for conversion and parsing logic
- Register provider in g3-core and validate in g3-cli
The resolve_max_tokens() function was returning 2048 for embedded providers,
which caused responses to be truncated prematurely. Increased to 8192 to
allow the provider's own effective_max_tokens() calculation to work properly.
- Add context_window_size() method to LLMProvider trait
- Implement for EmbeddedProvider to return the auto-detected context length
- Update Agent to query provider directly instead of using hardcoded defaults
- Removes need for model-specific context length mappings
- Use global OnceLock for llama.cpp backend to prevent BackendAlreadyInitialized error
- Suppress verbose llama.cpp stderr logging during model loading
- Fix provider validation to accept "embedded.name" format (extract type before dot)
Eliminate code-path aliasing in Agent construction methods by introducing
a single `build_agent()` helper that all constructors delegate to.
Before: 3 nearly-identical `Ok(Self { ... })` blocks (~30 lines each)
with subtle differences in auto_compact, is_autonomous, quiet, and
computer_controller fields - prone to drift over time.
After: Single canonical `build_agent()` method that constructs Agent
with all fields. All public constructors delegate to this single path:
- new_for_test() -> new_for_test_with_readme() -> build_agent()
- new_with_mode_and_readme() -> build_agent()
Changes:
- Add `build_agent()` private helper method (single source of truth)
- Simplify `new_for_test()` to delegate to `new_for_test_with_readme()`
- Update `new_for_test_with_readme()` to use `build_agent()`
- Update `new_with_mode_and_readme()` to use `build_agent()`
Net reduction: ~43 lines (-109/+66)
All 190 tests pass.
Agent: fowler
- Extend Usage struct with cache_creation_tokens and cache_read_tokens fields
- Parse Anthropic cache_creation_input_tokens and cache_read_input_tokens
- Parse OpenAI prompt_tokens_details.cached_tokens for automatic prefix caching
- Add CacheStats struct to Agent for cumulative tracking across API calls
- Add "Prompt Cache Statistics" section to /stats output showing:
- API call count and cache hit count
- Hit rate percentage
- Total input tokens and cache read/creation tokens
- Cache efficiency (% of input served from cache)
- Update all provider implementations and test files
- Fix test_rehydrate_success race condition by using UUID for unique session IDs
- Add #[serial] attribute to prevent parallel execution conflicts
- Improve cleanup to remove entire session directory tree
- Add characterization test for resize_image_to_dimensions fallback behavior
(documents fix from commit af8b849 for media type preservation)
Agent: hopper
Add test_project_content_survives_compaction() to verify that project
content loaded via /project command persists through context compaction.
This is a CHARACTERIZATION test that validates:
- Project content appended to README message survives compaction
- The README message (containing project content) is preserved as message[1]
- PROJECT INSTRUCTIONS, ACTIVE PROJECT markers, Brief and Status sections
all survive the compaction process
Agent: hopper
Both multiline and single-line input paths in interactive.rs had identical
code for:
- Template processing (process_template)
- Task execution (execute_task_with_retry)
- Auto-memory reminder with error handling
Extracted to a single execute_user_input() helper function that handles
all three steps. This eliminates code-path aliasing where the two paths
could drift over time.
File reduced from 401 to 393 lines (-2%).
All 106 g3-cli tests pass.
Agent: fowler
The previous implementation added the summary as a System message, which
caused "Conversation must start with a user message" errors because the
first non-system message after compaction was Assistant (the preserved
last assistant message).
Fix: Change summary from System to User message, creating valid alternation:
[System Prompt] -> [Summary as USER] -> [Last Assistant] -> [Latest User]
This also prevents system message bloat across multiple compactions since
the summary is now part of the conversation flow and gets replaced on
each compaction.
Added test_second_compaction_no_bloat to verify no accumulation.
Convert remaining ✅ emoji status messages in g3-cli to use the
consistent G3Status formatting system:
- accumulative.rs: 'autonomous run ... [done]'
- commands.rs /clear: 'clearing session ... [done]'
- commands.rs /readme: 'reloading README ... [done/failed/error]'
- commands.rs /unproject: 'unloading project ... [done]'
This provides a consistent 'g3: action ... [status]' format across
all CLI status messages.
When context window compaction occurs, the last assistant message is now
preserved in addition to the system prompt, README, and summary. This
improves continuity after compaction by keeping the LLM's most recent
response, which often contains important context about what was just
done or what comes next.
New message order after compaction:
[System Prompt] -> [README/AGENTS.md] -> [ACD Stub?] -> [Summary] -> [Last Assistant] -> [Latest User?]
Changes:
- Add last_assistant_message field to PreservedMessages struct
- Modify extract_preserved_messages() to find last assistant message
- Modify reset_with_summary_and_stub() to include last assistant message
- Add comprehensive integration tests using MockProvider
Tests cover edge cases:
- No assistant message exists
- Tool-call-only assistant messages (still preserved)
- Multiple assistant messages (only last one preserved)
- No trailing user message
Implement highlight_prompt() in G3Helper to colorize the project portion
of the prompt in blue. This uses rustyline's proper mechanism for ANSI
codes in prompts, which correctly handles cursor positioning.
Prompt 'butler | finances> ' now shows '| finances>' in blue.
ANSI color codes in rustyline prompts cause various issues:
- \x01...\x02 markers break cursor movement
- Separate prefix printing causes gaps or disappearing text
Simplified to plain text prompt: 'butler | finances> '
This ensures reliable cursor positioning and tab completion.
Previously used empty string as readline prompt after printing colored
prefix, which caused cursor positioning issues (large gap between
project name and cursor).
Now the prefix contains 'butler | finances' (colored) and readline
gets '> ' as its prompt, so cursor appears immediately after '> '.
Rustyline's \x01...\x02 markers for ANSI codes didn't work correctly,
causing cursor positioning issues and breaking line editing.
New approach: build_prompt() returns (prefix, prompt) tuple where:
- prefix: colored text printed before readline (contains ANSI codes)
- prompt: plain text passed to readline (no ANSI codes)
This ensures rustyline correctly calculates line length while still
showing the colored project name.
When resize_image_to_dimensions() returns a larger file than the original,
we fall back to using the original bytes. Previously, was_resized was set
to true if the original dimensions exceeded MAX_IMAGE_DIMENSION, which
caused final_media_type to be set to 'image/jpeg' even though we were
using the original PNG bytes.
This caused Anthropic API errors like:
'Image does not match the provided media type image/jpeg'
Fix: Set was_resized=false when falling back to original bytes, so the
original media type (detected from magic bytes) is preserved.
When a project is loaded via /project, the prompt now shows:
agent_name |[project_name]>
where the |[project_name]> part is displayed in blue.
Examples:
- Default: g3>
- With project: g3 |[myapp]>
- Agent mode: butler>
- Agent + project: butler |[myapp]>
The prompt automatically resets when /unproject is called.
Added build_prompt() function with 7 unit tests covering all prompt states.
Remove duplicate logging initialization in agent_mode.rs. Logging is already
initialized in run() before agent mode is dispatched. The duplicate
tracing_subscriber::fmt::layer() was interfering with rustyline's terminal
state, breaking tab completion.
Rename all references from "Project Memory" to "Workspace Memory" to avoid
future conflation if a "project" concept is introduced later.
Changes:
- Rename read_project_memory() -> read_workspace_memory()
- Update all prompts, tool descriptions, and comments
- Update header parsing in memory.rs to use "# Workspace Memory"
- Update display detection for "=== Workspace Memory ==="
- Update documentation and analysis/memory.md
11 files changed, ~36 occurrences updated.
Update test assertions to match new heading color scheme:
- H1: bold pink (\x1b[1;95m) instead of bold magenta
- H2: purple/magenta (\x1b[35m) - unchanged
- H3: cyan (\x1b[36m) instead of magenta
Removed dead code that was never used by any g3 tool:
- macax/ module (accessibility control via AXApplication, AXElement)
- move_mouse() and click_at() methods from ComputerController trait
- macax_demo.rs and test_type_text.rs examples
The ComputerController trait now only has take_screenshot(),
which is the only method actually used by the screenshot tool.
VisionBridge was a Swift library for Apple Vision OCR that was built
every compile but never actually used by any g3 tool.
Removed:
- vision-bridge/ Swift package directory
- src/ocr/ module (vision.rs, tesseract.rs, mod.rs)
- OCR methods from ComputerController trait
- OCR-related code from platform implementations
- TextLocation type (no longer needed)
- test_vision.rs example
Simplified:
- build.rs (now empty, no Swift compilation)
- MacOSController (no longer holds OCR engine)
- LinuxController and WindowsController (stub implementations)
Build time improvement: No more 'Building VisionBridge Swift package...'
messages on every compile.
Warnings fixed:
- Remove unused 'warn' import from retry.rs
- Prefix unused 'output' param with underscore
- Prefix unused 'rel_start' with underscore
- Add #[allow(dead_code)] to G3Status::info()
Message format tweaked per feedback:
- 'g3: model overloaded [error]' (no attempt info)
- 'g3: retrying in 2.2s (1/3) ... [done]' (attempt info moved here)
- Handle empty error message in Status::Error to show just '[error]'