README.md is no longer auto-loaded into the LLM context at startup.
This saves ~4,600 tokens per session while AGENTS.md and memory.md
still provide all critical information for code tasks.
Changes:
- Delete read_project_readme() function
- Remove readme_content parameter from combine_project_content()
- Rename extract_readme_heading() -> extract_project_heading()
- Rename Agent constructors: *_with_readme_* -> *_with_project_context_*
- Update context preservation to only check for Agent Configuration
- Remove has_readme field from LoadedContent
- Update all tests to use new markers and function names
The LLM can still read README.md on-demand via read_file when needed.
Adds a new --project <PATH> flag that loads project files (brief.md,
contacts.yaml, status.md) at startup, similar to the /project command
but WITHOUT auto-executing the project status prompt.
Changes:
- Add --project flag to cli_args.rs
- Add load_and_validate_project() helper in project.rs (shared by both
--project flag and /project command)
- Modify run_interactive() to accept optional initial_project parameter
- Wire up --project in lib.rs to load project before interactive mode
- Refactor /project command to use shared helper (reduces duplication)
- Add 4 new tests for load_and_validate_project()
Added a new section documenting local LLM performance on complex agentic
tasks (comic book repacking test case). Includes:
- Cloud model baseline (Claude Opus 4.5, Sonnet 4.5, Claude 4 family)
- Local model ratings (Qwen3-32B, Qwen3-14B, GLM-4 9B, Qwen3-4B)
- Key findings about MoE vs dense models
- Configuration example for embedded providers
- Add GeminiProvider with streaming and native tool calling
- Support gemini-2.5-pro, gemini-2.0-flash, gemini-1.5-pro/flash models
- Model-specific context window detection (1M-2M tokens)
- Message conversion: assistant -> model role mapping
- System messages extracted to system_instruction field
- Tool schema conversion with functionCall/functionResponse parts
- SSE streaming with JSON array buffer parsing
- 8 unit tests for conversion and parsing logic
- Register provider in g3-core and validate in g3-cli
The resolve_max_tokens() function was returning 2048 for embedded providers,
which caused responses to be truncated prematurely. Increased to 8192 to
allow the provider's own effective_max_tokens() calculation to work properly.
- Add context_window_size() method to LLMProvider trait
- Implement for EmbeddedProvider to return the auto-detected context length
- Update Agent to query provider directly instead of using hardcoded defaults
- Removes need for model-specific context length mappings
- Use global OnceLock for llama.cpp backend to prevent BackendAlreadyInitialized error
- Suppress verbose llama.cpp stderr logging during model loading
- Fix provider validation to accept "embedded.name" format (extract type before dot)
Eliminate code-path aliasing in Agent construction methods by introducing
a single `build_agent()` helper that all constructors delegate to.
Before: 3 nearly-identical `Ok(Self { ... })` blocks (~30 lines each)
with subtle differences in auto_compact, is_autonomous, quiet, and
computer_controller fields - prone to drift over time.
After: Single canonical `build_agent()` method that constructs Agent
with all fields. All public constructors delegate to this single path:
- new_for_test() -> new_for_test_with_readme() -> build_agent()
- new_with_mode_and_readme() -> build_agent()
Changes:
- Add `build_agent()` private helper method (single source of truth)
- Simplify `new_for_test()` to delegate to `new_for_test_with_readme()`
- Update `new_for_test_with_readme()` to use `build_agent()`
- Update `new_with_mode_and_readme()` to use `build_agent()`
Net reduction: ~43 lines (-109/+66)
All 190 tests pass.
Agent: fowler
- Extend Usage struct with cache_creation_tokens and cache_read_tokens fields
- Parse Anthropic cache_creation_input_tokens and cache_read_input_tokens
- Parse OpenAI prompt_tokens_details.cached_tokens for automatic prefix caching
- Add CacheStats struct to Agent for cumulative tracking across API calls
- Add "Prompt Cache Statistics" section to /stats output showing:
- API call count and cache hit count
- Hit rate percentage
- Total input tokens and cache read/creation tokens
- Cache efficiency (% of input served from cache)
- Update all provider implementations and test files
- Fix test_rehydrate_success race condition by using UUID for unique session IDs
- Add #[serial] attribute to prevent parallel execution conflicts
- Improve cleanup to remove entire session directory tree
- Add characterization test for resize_image_to_dimensions fallback behavior
(documents fix from commit af8b849 for media type preservation)
Agent: hopper
Add test_project_content_survives_compaction() to verify that project
content loaded via /project command persists through context compaction.
This is a CHARACTERIZATION test that validates:
- Project content appended to README message survives compaction
- The README message (containing project content) is preserved as message[1]
- PROJECT INSTRUCTIONS, ACTIVE PROJECT markers, Brief and Status sections
all survive the compaction process
Agent: hopper
Both multiline and single-line input paths in interactive.rs had identical
code for:
- Template processing (process_template)
- Task execution (execute_task_with_retry)
- Auto-memory reminder with error handling
Extracted to a single execute_user_input() helper function that handles
all three steps. This eliminates code-path aliasing where the two paths
could drift over time.
File reduced from 401 to 393 lines (-2%).
All 106 g3-cli tests pass.
Agent: fowler
The previous implementation added the summary as a System message, which
caused "Conversation must start with a user message" errors because the
first non-system message after compaction was Assistant (the preserved
last assistant message).
Fix: Change summary from System to User message, creating valid alternation:
[System Prompt] -> [Summary as USER] -> [Last Assistant] -> [Latest User]
This also prevents system message bloat across multiple compactions since
the summary is now part of the conversation flow and gets replaced on
each compaction.
Added test_second_compaction_no_bloat to verify no accumulation.
- Build g3 and studio explicitly with -p flags
- Detect shell and rc file (zsh/bash/fish)
- Auto-add PATH to rc file with user confirmation
- Handle case where PATH is in rc but not loaded
Convert remaining ✅ emoji status messages in g3-cli to use the
consistent G3Status formatting system:
- accumulative.rs: 'autonomous run ... [done]'
- commands.rs /clear: 'clearing session ... [done]'
- commands.rs /readme: 'reloading README ... [done/failed/error]'
- commands.rs /unproject: 'unloading project ... [done]'
This provides a consistent 'g3: action ... [status]' format across
all CLI status messages.
When context window compaction occurs, the last assistant message is now
preserved in addition to the system prompt, README, and summary. This
improves continuity after compaction by keeping the LLM's most recent
response, which often contains important context about what was just
done or what comes next.
New message order after compaction:
[System Prompt] -> [README/AGENTS.md] -> [ACD Stub?] -> [Summary] -> [Last Assistant] -> [Latest User?]
Changes:
- Add last_assistant_message field to PreservedMessages struct
- Modify extract_preserved_messages() to find last assistant message
- Modify reset_with_summary_and_stub() to include last assistant message
- Add comprehensive integration tests using MockProvider
Tests cover edge cases:
- No assistant message exists
- Tool-call-only assistant messages (still preserved)
- Multiple assistant messages (only last one preserved)
- No trailing user message
Implement highlight_prompt() in G3Helper to colorize the project portion
of the prompt in blue. This uses rustyline's proper mechanism for ANSI
codes in prompts, which correctly handles cursor positioning.
Prompt 'butler | finances> ' now shows '| finances>' in blue.
ANSI color codes in rustyline prompts cause various issues:
- \x01...\x02 markers break cursor movement
- Separate prefix printing causes gaps or disappearing text
Simplified to plain text prompt: 'butler | finances> '
This ensures reliable cursor positioning and tab completion.
Previously used empty string as readline prompt after printing colored
prefix, which caused cursor positioning issues (large gap between
project name and cursor).
Now the prefix contains 'butler | finances' (colored) and readline
gets '> ' as its prompt, so cursor appears immediately after '> '.
Rustyline's \x01...\x02 markers for ANSI codes didn't work correctly,
causing cursor positioning issues and breaking line editing.
New approach: build_prompt() returns (prefix, prompt) tuple where:
- prefix: colored text printed before readline (contains ANSI codes)
- prompt: plain text passed to readline (no ANSI codes)
This ensures rustyline correctly calculates line length while still
showing the colored project name.
When resize_image_to_dimensions() returns a larger file than the original,
we fall back to using the original bytes. Previously, was_resized was set
to true if the original dimensions exceeded MAX_IMAGE_DIMENSION, which
caused final_media_type to be set to 'image/jpeg' even though we were
using the original PNG bytes.
This caused Anthropic API errors like:
'Image does not match the provided media type image/jpeg'
Fix: Set was_resized=false when falling back to original bytes, so the
original media type (detected from magic bytes) is preserved.
When a project is loaded via /project, the prompt now shows:
agent_name |[project_name]>
where the |[project_name]> part is displayed in blue.
Examples:
- Default: g3>
- With project: g3 |[myapp]>
- Agent mode: butler>
- Agent + project: butler |[myapp]>
The prompt automatically resets when /unproject is called.
Added build_prompt() function with 7 unit tests covering all prompt states.
Remove duplicate logging initialization in agent_mode.rs. Logging is already
initialized in run() before agent mode is dispatched. The duplicate
tracing_subscriber::fmt::layer() was interfering with rustyline's terminal
state, breaking tab completion.
Rename all references from "Project Memory" to "Workspace Memory" to avoid
future conflation if a "project" concept is introduced later.
Changes:
- Rename read_project_memory() -> read_workspace_memory()
- Update all prompts, tool descriptions, and comments
- Update header parsing in memory.rs to use "# Workspace Memory"
- Update display detection for "=== Workspace Memory ==="
- Update documentation and analysis/memory.md
11 files changed, ~36 occurrences updated.