Rename all references from "Project Memory" to "Workspace Memory" to avoid
future conflation if a "project" concept is introduced later.
Changes:
- Rename read_project_memory() -> read_workspace_memory()
- Update all prompts, tool descriptions, and comments
- Update header parsing in memory.rs to use "# Workspace Memory"
- Update display detection for "=== Workspace Memory ==="
- Update documentation and analysis/memory.md
11 files changed, ~36 occurrences updated.
Update test assertions to match new heading color scheme:
- H1: bold pink (\x1b[1;95m) instead of bold magenta
- H2: purple/magenta (\x1b[35m) - unchanged
- H3: cyan (\x1b[36m) instead of magenta
Removed dead code that was never used by any g3 tool:
- macax/ module (accessibility control via AXApplication, AXElement)
- move_mouse() and click_at() methods from ComputerController trait
- macax_demo.rs and test_type_text.rs examples
The ComputerController trait now only has take_screenshot(),
which is the only method actually used by the screenshot tool.
VisionBridge was a Swift library for Apple Vision OCR that was built
every compile but never actually used by any g3 tool.
Removed:
- vision-bridge/ Swift package directory
- src/ocr/ module (vision.rs, tesseract.rs, mod.rs)
- OCR methods from ComputerController trait
- OCR-related code from platform implementations
- TextLocation type (no longer needed)
- test_vision.rs example
Simplified:
- build.rs (now empty, no Swift compilation)
- MacOSController (no longer holds OCR engine)
- LinuxController and WindowsController (stub implementations)
Build time improvement: No more 'Building VisionBridge Swift package...'
messages on every compile.
Warnings fixed:
- Remove unused 'warn' import from retry.rs
- Prefix unused 'output' param with underscore
- Prefix unused 'rel_start' with underscore
- Add #[allow(dead_code)] to G3Status::info()
Message format tweaked per feedback:
- 'g3: model overloaded [error]' (no attempt info)
- 'g3: retrying in 2.2s (1/3) ... [done]' (attempt info moved here)
- Handle empty error message in Status::Error to show just '[error]'
The prefix was causing duplication when users typed 'Task: ...' themselves,
resulting in '📋 Task: Task: ...' in context dumps.
User messages are now stored as-is without any prefix.
Agent prompt files (both workspace agents/<name>.md and embedded)
now support template variables like {{today}}.
This allows agent definitions to include dynamic content:
# My Agent
Today is {{today}}. Your mission is...
Replace '📄 Context dumped to: <filename>' with 'g3: context dumped to <filename> [done]'
where g3: is bold green, filename is cyan, and [done] is bold green.
Add G3Status::complete_with_path() method for status messages with highlighted paths.
Supports {{var}} syntax for variable substitution in included prompt files.
Currently supported variables:
- {{today}}: Current date in ISO format (YYYY-MM-DD)
Unknown variables trigger a warning and are left unchanged.
- Add template.rs module with process_template() function
- Integrate template processing into read_include_prompt()
- Add comprehensive tests for template processing
Change from multi-line verbose format to single-line compact format:
Before:
⚡ DEHYDRATED CONTEXT (fragment_id: 188c7ac71613)
• 8 messages (4 user, 4 assistant)
• 3 tool calls (shell ×3)
• ~299 tokens saved
To restore this history, call: rehydrate(fragment_id: "188c7ac71613")
After:
⚡ DEHYDRATED CONTEXT: 3 tool calls (shell x3), 8 total msgs. To restore, call: rehydrate(fragment_id: "188c7ac71613")
- Combine all info into single line
- Remove tokens saved (not essential for rehydration decision)
- Use ASCII 'x' instead of '×' for simplicity
- Add 'no tool calls' case for fragments without tools
- Update related tests
Centralize tool output formatting logic that was duplicated/scattered in
stream_completion_with_tools(). This eliminates code-path aliasing where
tool type checks were done in multiple places.
Changes:
- Add ToolOutputFormat enum (SelfHandled, Compact, Regular)
- Add format_tool_result_summary() for centralized formatting decisions
- Add is_compact_tool() and is_self_handled_tool() helper functions
- Move parse_diff_stats() from lib.rs to streaming.rs
- Simplify tool execution display logic in lib.rs using new helpers
Net effect: -86 lines in lib.rs, +112 lines in streaming.rs
The streaming.rs additions are reusable, well-named functions.
All 585+ workspace tests pass.
Agent: fowler
Consolidate scattered state variables in the 834-line stream_completion_with_tools()
function to use the existing StreamingState and IterationState structs from
streaming.rs. This eliminates code-path aliasing where state was tracked in
multiple places and makes the streaming loop easier to reason about.
Changes:
- Add assistant_message_added field to StreamingState
- Add stream_stop_reason field to IterationState
- Replace 8 inline state variables with StreamingState::new()
- Replace 7 iteration-local variables with IterationState::new()
- All 585 workspace tests pass
This is a pure refactor with no behavior changes. The state structs were already
defined in streaming.rs but not used in the main streaming loop.
Agent: fowler
- Extract handle_command() from interactive.rs to new commands.rs module
(320 lines, 15 match arms for /help, /compact, /thinnify, etc.)
- Fix orphaned tests in completion.rs that were outside mod tests block
- Add #[allow(dead_code)] to with_include_prompt_filename() (used in tests)
- interactive.rs reduced from 595 to 290 lines
Agent: fowler
Created display.rs module with shared display functions:
- format_workspace_path() / print_workspace_path()
- LoadedContent struct for tracking loaded project files
- print_loaded_status() for status line display
- print_project_heading() for README heading
Updated interactive.rs and agent_mode.rs to use the new module,
eliminating duplicated workspace path formatting and loaded items
status line logic.
Results:
- interactive.rs: 641 → 595 lines (-46)
- agent_mode.rs: 312 → 288 lines (-24)
- New display.rs: 197 lines with 5 unit tests
Agent: fowler
The --acd flag was being checked AFTER the agent mode early return,
so it was never applied when running with --agent.
Fix: Pass acd_enabled parameter to run_agent_mode() and call
agent.set_acd_enabled(true) when the flag is set.
Always shows at most 8 sessions in tab completion, sorted by newest first.
This applies whether the user types /resume <TAB> or /resume abc<TAB>.
Implementation:
- list_sessions() returns all sessions sorted by mtime (newest first)
- Completion filters by prefix, then takes first 8 matches
Phase 2 of tab completion: semantic completion for session IDs.
Features:
- /resume <TAB> lists all available sessions from .g3/sessions/
- /resume abc<TAB> filters to sessions starting with 'abc'
- Gracefully returns empty if .g3/sessions/ doesn't exist
Implementation:
- Added list_sessions() helper method to G3Helper
- Added Case 4 in complete() for /resume command
- Updated module docs to reflect new capability
Tests:
- test_resume_completion_lists_sessions - verifies listing and filtering
- test_resume_completion_graceful_no_panic - verifies no crash without sessions dir
Verifies that tab completion correctly ignores:
- Bare quotes: "<TAB> - no path prefix, no completion
- Quoted non-paths: "hello world<TAB> - not a path, no completion
- Quoted text without path prefix: "foo<TAB> - no completion
Also fixes test placement (moved tests inside mod tests block)
Edge cases now handled:
1. Unclosed quotes: "~/My <TAB> - completes paths inside quotes
2. Backslash escapes: ~/My\ <TAB> - unescapes before completing
3. Closed quotes: "~/My Files/"<TAB> - works correctly
Key changes:
- extract_word() now tracks backslash escapes (prev_was_backslash)
- is_path_prefix() strips leading quotes before checking
- Added strip_quotes() and unescape_path() helper methods
- complete() now:
- Strips quotes and unescapes paths before calling FilenameCompleter
- Re-wraps completions in quotes or escapes as appropriate
- Preserves user's quoting style (double vs single quotes)
- Uses backslash escapes if user was already using them
Tests added:
- test_actual_completion_with_quotes - verifies all three edge cases
Path completion now works for:
- ./<TAB> - current directory
- ../<TAB> - parent directory
- ~/<TAB> - home directory
- /<TAB> (not at start of line) - root directory
Command completion (/<TAB>) only triggers at the start of the line.
If no command matches, falls through to path completion (e.g., /etc).
Quote-aware word extraction handles paths with spaces:
- "~/My Files/<TAB>" works correctly
Added tests for:
- Path prefix detection
- Word extraction with quotes
- Command vs path disambiguation
New interactive command: /run <file-path>
- Reads the specified file and executes its content as a prompt
- Supports tilde expansion for home directory paths
- Behaves exactly like pasting the file content into the g3> prompt
- Shows helpful error messages for missing files or empty content
Extract a new g3_status module in g3-cli that provides consistent formatting
for all 'g3:' prefixed system status messages.
Key changes:
- Add G3Status struct with methods for progress, done, failed, error, etc.
- Add Status enum with Done, Failed, Error, Resolved, Insufficient, NoChanges
- Add ThinResult struct in g3-core for semantic thinning data
- Update UiWriter trait with print_thin_result() method
- Refactor context thinning to return ThinResult instead of formatted strings
- Update all callers to use the new centralized formatting
- Session resume/decline messages now use G3Status
- Compaction status messages now use G3Status
This maintains clean separation of concerns: g3-core emits semantic data,
g3-cli handles all terminal formatting and colors.
Adds 8 unit tests verifying:
- Research tool has 20-minute timeout
- All other tools (shell, read_file, write_file, str_replace, code_search,
webdriver_*, etc.) have standard 8-minute timeout
- Comprehensive test_only_research_has_extended_timeout covers 19 tools
This ensures future changes don't accidentally affect other tool timeouts.
The research tool often runs past 8 minutes due to web browsing and
analysis. Increased its timeout to 20 minutes while keeping other
tools at 8 minutes.
Changes:
- Tool timeout is now tool-specific (20 min for research, 8 min for others)
- Timeout error message now shows the correct duration for each tool
Adds test_llm_repeats_text_before_each_tool_call() which documents the
scenario where the LLM re-outputs the same preamble text before each
tool call in a multi-tool response.
Analysis showed this is LLM behavior, not a g3 bug:
- Each assistant message is correctly stored with different tool calls
- The duplicate display is the LLM choosing to repeat context
- Storage is correct, display accurately reflects LLM output
Decision: Accept as LLM behavior (Option B). Future LLM improvements
may resolve this naturally without g3 code changes.
Two cosmetic bugs fixed:
1. JSON inside code fences was being filtered - now tracks fence state
and passes through all content inside ``` ... ``` blocks
2. Indented JSON was being filtered - now recognizes that real tool
calls are never indented, so indented JSON is always documentation
Changes:
- Added in_code_fence and fence_buffer fields to FilterState
- Added track_code_fence() to detect ``` markers (with/without language)
- Added pass_through_char() for content inside code fences
- Modified '{' handling to only filter when no leading whitespace
- Added 4 new unit tests for code fence and indentation cases
- Updated 3 stress tests to expect new (correct) behavior
All 16 filter_json unit tests and 59 stress tests pass.
Adds 3 new tests to json_parsing_stress_test.rs:
- test_tool_result_with_json_not_parsed: Full agent integration test proving
that JSON in tool results (sent TO the LLM) is never parsed by the
streaming parser (which only sees LLM output)
- test_parser_only_processes_completion_chunks: Documents that StreamingToolParser
only accepts CompletionChunk, not Message objects
- test_architectural_separation_documented: Documents the data flow showing
tool results flow TO the LLM while the parser only sees FROM the LLM
This proves the architectural guarantee: there is no code path where
tool result content could be parsed as a tool call, because:
1. Tool results are Message objects added to context_window
2. The streaming parser only processes CompletionChunk from provider.stream_completion()
3. These are completely separate data types flowing in opposite directions
Total: 41 JSON parsing stress tests now pass.
Added 6 new integration tests for stream_completion_with_tools:
- test_text_before_tool_call_preserved: text before native tool call is saved
- test_native_tool_call_execution: native tool calls execute correctly
- test_duplicate_tool_calls_skipped: sequential duplicates are detected
- test_json_fallback_tool_calling: JSON tool calls work without native support
- test_text_after_tool_execution_preserved: follow-up text is saved
- test_multiple_tool_calls_executed: multiple tool calls in sequence work
Also added MockResponse helper methods:
- text_then_native_tool(): text followed by native tool call
- duplicate_native_tool_calls(): same tool call twice (for dedup testing)
Fixed text_with_json_tool() to ensure "tool" key comes before "args"
(serde_json alphabetizes keys, breaking pattern detection).
Total: 18 integration tests covering historical bugs and core behaviors.
The bug was caused by mark_tool_calls_consumed() being called after
displaying each chunk, which advanced last_consumed_position to the
end of the current buffer. When the next chunk arrived with JSON,
the unchecked_buffer started at position 0 of the slice, causing
is_on_own_line() to return true (position 0 is always "on its own line").
Removed the problematic mark_tool_calls_consumed() call from the
"no tool executed" branch. The remaining call after actual tool
execution is correct and necessary.
Added integration test that verifies inline JSON in prose is not
detected as a tool call.
Adds a configurable mock LLM provider that can simulate various behaviors:
- Text-only responses (single or multi-chunk streaming)
- Native tool calls
- JSON tool calls in text
- Truncated responses (max_tokens)
- Multi-turn conversations
Features:
- Builder pattern for easy test setup
- Request tracking for verification
- Preset scenarios for common patterns
- Full LLMProvider trait implementation
Also adds integration tests that use MockProvider to test the
stream_completion_with_tools code path, including:
- test_butler_bug_scenario: reproduces the exact bug where text-only
responses were not saved to context, causing consecutive user messages
This enables testing complex streaming behaviors without real API calls.
Bug: When the LLM responded with text-only (no tool calls), the assistant
message was sometimes not saved to the context window. This caused consecutive
user messages where the LLM would lose track of previous responses.
Root causes found and fixed:
1. Early return path (line ~2535): When stream finishes with no tools executed
in previous iterations (any_tool_executed=false), the code returned early
without saving the assistant message. Fixed by adding save before return.
2. Post-loop path (line ~2657): When raw_clean was empty but current_response
had content, no message was saved. Fixed by falling back to current_response.
Both paths now properly save the assistant message before returning.
The assistant_message_added flag prevents any duplication.
Added tests:
- missing_assistant_message_test.rs: verifies the fallback logic
- assistant_message_dedup_test.rs: verifies no duplicate messages
- consecutive_assistant_message_test.rs: verifies alternation invariant
Output is now a single line:
Session number to resume (Enter to cancel): 1 ... resuming scout_88871653e8e5f4f7 [done]
- Session ID displayed in cyan
- [done] displayed in bold green
- [error: ...] displayed in bold red on failure
- Added print_inline() to SimpleOutput for inline prompts
The Anthropic API was rejecting requests with multiple high-resolution images
(~2000x3000 pixels each) even though individual file sizes were under limits.
Root cause: Code only checked per-image file size (3.75MB), not dimensions.
Claude recommends images ≤1568px on longest edge and has 32MB total request limit.
Changes:
- Add MAX_IMAGE_DIMENSION (1568px) and MAX_TOTAL_IMAGE_PAYLOAD (20MB) constants
- Trigger resize when dimensions > 1568px (not just file size > 3.75MB)
- Add new resize_image_to_dimensions() for dimension-constrained resizing
- Track cumulative payload size across multiple images
- Warn if total payload exceeds recommended limit
Test results with Walking Dead comic images:
- WD_0001_0001.jpg: 800KB 1987x3057 → 321KB 1019x1568
- WD_0001_1064.png: 150KB 1988x3057 → 143KB 1020x1568
- WD_0002_0001.jpg: 1023KB 1988x3056 → 292KB 1020x1568
- Total payload: ~2.5MB → ~1MB base64