The streaming parser was incorrectly detecting tool call patterns that
appeared inline in prose (e.g., when explaining the format), causing
g3 to return control mid-task.
Fix: Modified find_first_tool_call_start() and find_last_tool_call_start()
to only recognize patterns that appear on their own line (at start of
buffer or after newline with only whitespace before the pattern).
Changes:
- Added is_on_own_line() helper to check line-boundary conditions
- Updated detection methods to skip inline patterns
- Removed sanitize_inline_tool_patterns() and LBRACE_HOMOGLYPH (no longer needed)
- Rewrote tests for new behavior
- Added streaming_repro tests that use process_chunk() to verify the exact bug scenario
28 tests covering: streaming repro, line boundaries, Unicode, code contexts, edge cases
LLMs were prefixing shell commands with `cd <workspace> &&` unnecessarily,
wasting tokens and cluttering CLI display. Added clear guidance in the
shell tool description that commands already execute in the working directory.
- Rename take_screenshot -> screenshot, code_coverage -> coverage (shorter names)
- Align | character across all compact tools (pad to 11 chars for str_replace)
- Make code_search a compact tool with summary display
- Show language and search name in code_search output (e.g., rust:"find structs")
- Add format_code_search_summary() to extract match/file counts from JSON response
Remove dead code - constructor variants that had no callers:
- new_with_readme()
- new_autonomous_with_readme()
- new_with_quiet()
These were thin wrappers around new_with_mode_and_readme() that were
never used externally. All 5 remaining constructors have verified callers.
Results:
- lib.rs reduced from 2817 to 2797 lines (-20 lines)
- Eliminated code-path aliasing: 8 constructors → 5 constructors
- All g3-core tests pass
- Full workspace compiles cleanly
Agent: fowler
- Updated memory reminder prompt with per-symbol char ranges
- Added two few-shot examples: Session Continuation (feature) + UTF-8 Safe Slicing (pattern)
- Updated system prompt Memory Format section to match
- Format: file -> nested symbols with [start..end] ranges and descriptions
- Enables direct read_file navigation to specific functions
Extract the get_stats() function (158 lines) from lib.rs to a new stats.rs module.
Changes:
- Create stats.rs with AgentStatsSnapshot struct for capturing agent state
- Replace inline formatting logic with delegation to snapshot.format()
- Add unit tests for stats formatting (empty and populated states)
- Reduce lib.rs from 2961 to 2818 lines (-143 lines)
The new module improves:
- Testability: Stats formatting can now be unit tested in isolation
- Separation of concerns: Formatting logic is decoupled from Agent struct
- Readability: lib.rs is more focused on core agent behavior
All 271 workspace tests pass.
Agent: fowler
- Remove unused total_lines variable in file_ops.rs
- Fix UTF-8 boundary panic in utils.rs when generating diff error preview
The code was slicing at byte index 200 which could land inside a
multi-byte character (e.g., box-drawing chars like ─). Now uses
character-based slicing with chars().take() instead.
Improve readability of stream_completion_with_tools (~1000 line function):
- Add deduplicate_tool_calls() helper with closure for previous-message check
- Add should_auto_continue() with AutoContinueReason enum for clearer control flow
- Replace inline deduplication loop with helper call (-19 lines)
- Replace complex auto-continue conditional with match on reason enum (-13 lines)
- Add section comments for major phases (State Init, Pre-loop, Main Loop, Auto-Continue, Post-Loop)
- Add comprehensive tests for new helpers
Net reduction: 82 deletions, behavior unchanged (172+ tests pass)
Agent: carmack
When the streaming parser encountered fragments of JSON that looked like
partial tool calls (e.g., {"tool":) embedded in inline text (like code
examples or prose), it would incorrectly enter JSON parsing mode and
poison the parser state, causing control to be returned to the user
mid-task.
This fix:
- Adds sanitize_inline_tool_patterns() to detect tool-call patterns that
are NOT on their own line and replace the opening brace with a Unicode
homoglyph (fullwidth left curly bracket U+FF5B)
- Integrates sanitization into process_chunk() before text is buffered
- Updates system prompts to instruct LLMs to use homoglyphs when showing
example tool call JSON in prose
- Adds comprehensive tests for the sanitization logic
Real tool calls from LLMs always appear on their own line, so those are
left untouched. Only inline patterns (with non-whitespace before them)
are sanitized.
Agent: hopper
Adds 56 new integration tests covering the observable end-of-turn
behaviors in the streaming module:
- Timing footer formatting (5 tests): verifies user-facing timing display
with various durations, token counts, and context percentages
- Tool call duplicate detection (6 tests): ensures identical sequential
tool calls are detected while different tools/args are not
- Empty response detection (9 tests): validates detection of empty,
whitespace-only, and timing-only responses that trigger auto-continue
- Connection error classification (5 tests): verifies EOF, connection,
chunk, and body errors are correctly identified for graceful recovery
- Tool output summary formatting (17 tests): covers read_file, write_file,
str_replace, remember, screenshot, coverage, and rehydrate summaries
- Duration formatting (4 tests): milliseconds, seconds, minutes, zero
- Text truncation (4 tests): short/long strings, multiline, flag behavior
- LLM token cleaning (3 tests): removal of stop tokens like <|im_end|>
- Edge cases (4 tests): empty inputs, unicode handling, large numbers
All tests are blackbox/characterization style - they test observable
outputs through stable public interfaces without encoding internal
implementation details. Tests remain stable under refactoring that
preserves behavior.
The truncate_for_display() function now takes only the first line
of input before truncating. This prevents multi-line error messages
(like str_replace failures) from breaking the compact single-line
format.
Added tests for multi-line input handling.
The write_file compact display was showing 1 line because it was
counting lines in the success message, not the actual written content.
Now parses the tool result (e.g. '✅ wrote 150 lines | 4.2k chars')
to extract and display the correct counts.
Added format_write_file_result() to parse the tool output.
When compact tools (read_file, write_file, str_replace, etc.) failed,
they would fall through to the non-compact output path, causing:
- Missing or incorrect headers
- Stray footers with wrong formatting
- State leakage (is_shell_compact) between tool calls
Now failed compact tools display in the same single-line format as
successful ones, just with a truncated error message instead of the
success summary:
● read_file | path/to/file.txt | ❌ Failed to read file... | 123 ◉ 0ms
This keeps the UI consistent and avoids the "stray footer" bug.
This change removes the legacy logs/ directory and consolidates all
session data, error logs, and discovery files under the .g3/ directory.
New directory structure:
- .g3/sessions/<session_id>/session.json - session logs
- .g3/errors/ - error logs (was logs/errors/)
- .g3/background_processes/ - background process logs
- .g3/discovery/ - planner discovery files (was workspace/logs/)
Changes:
- paths.rs: Remove get_logs_dir()/logs_dir(), add get_errors_dir(),
get_background_processes_dir(), get_discovery_dir()
- session.rs: Anonymous sessions now use .g3/sessions/anonymous_<ts>/
- error_handling.rs: Errors now saved to .g3/errors/
- project.rs: Remove logs_dir() and ensure_logs_dir() methods
- feedback_extraction.rs: Remove logs_dir field and fallback logic
- planner: Use .g3/ for workspace data and .g3/discovery/ for reports
- flock.rs: Look for session metrics in .g3/sessions/
- coach_feedback.rs: Remove fallback to logs/ path
- Update all tests to use new paths
- Update README.md and .gitignore
Implement compact display format for read_file, write_file, str_replace, and shell:
- read_file/write_file/str_replace: Single line with dimmed summary and timing
Format: ● tool_name | path [range] | summary | tokens ◉ time
- shell: Two-line format with command header and dimmed output
Format: ● shell | command
└─ output (N lines) | tokens ◉ time
Changes:
- Add print_tool_compact() method to UiWriter trait
- Add is_shell_compact state tracking in ConsoleUiWriter
- Add format_write_file_summary() and format_str_replace_summary() helpers
- Fix duplicate response output by checking if response is empty before printing
- Add finish_streaming_markdown() call before return to flush markdown buffer
Agent: hopper
Added 4 new test files with blackbox/characterization-style integration tests:
- compaction_behavior_test.rs (14 tests): Token cap calculation, thinking mode
disable logic, summary message building, CompactionResult behavior
- retry_behavior_test.rs (17 tests): RetryConfig presets and customization,
RetryResult state handling, retry_operation behavior with simulated errors
- tool_execution_roundtrip_test.rs (16 tests): End-to-end tool execution through
Agent interface for read_file, write_file, shell, str_replace, and TODO tools
- error_classification_test.rs (25 tests): Recoverable vs non-recoverable error
classification, retry delay calculation, edge cases and priority handling
All tests follow integration-first philosophy:
- Test through stable public interfaces
- Assert observable behavior, not implementation details
- Use characterization style to document current behavior
- Enable refactoring by not encoding internal structure
Project memory is now stored at analysis/memory.md instead of .g3/memory.md.
This change enables:
- Shared memory across git worktrees (studio agent sessions)
- Version-controlled memory that persists across clones
- Memory changes tracked in git history and reviewable in PRs
Changes:
- crates/g3-core/src/tools/memory.rs: Update get_memory_path() to use analysis/
- crates/g3-cli/src/project_files.rs: Update read_project_memory() path
- crates/g3-core/src/prompts.rs: Update documentation references (2 occurrences)
- analysis/memory.md: Add memory file (copied from .g3/memory.md)
Added first_user_message field to Fragment struct that captures the
full first user message (task) from the dehydrated conversation.
This is now displayed at the top of the stub with a 📋 Task: prefix.
Removed the Topics section from the stub since the full task provides
better context for forensics and debugging.
Agent: g3
ACD (Aggressive Context Dehydration) fixes:
- Fixed dehydrate_context() to extract turn summary from context window
instead of using the passed-in final_response (which contained only
the timing footer, not the actual LLM response)
- Removed final_response parameter from dehydrate_context() since it
now self-extracts the last assistant message as the summary
- This ensures the actual turn summary is preserved after dehydration,
not just the timing footer
New /dump command:
- Added /dump command to dump entire context window to tmp/ for debugging
- Shows message index, role, kind, content length, and full content
- Available in both console and machine modes
UTF-8 safety:
- Fixed truncate_to_word_boundary() to use character indices instead of
byte indices, preventing panics on multi-byte UTF-8 characters
- Added UTF-8 string slicing guidance to AGENTS.md
Agent: g3
When read_file is called with an end position beyond the file length,
instead of returning an error that forces a retry, now clamps to the
actual file length and returns the content with an informative message.
This eliminates wasteful retry cycles where the LLM had to make a
second request with the corrected end position.
Change read_file output format so the "🔍 N lines read" appears as
the last line after the file content, not before it. This keeps the
output cleaner with just one metadata line at the end.
1. str_replace: Show insertion/deletion counts with colors
"✅ +N insertions | -M deletions" (green/red)
2. write_file: Compact format with human-readable sizes
"✅ wrote N lines | Xk chars"
3. read_file: Cleaner format
"🔍 N lines read" instead of "📄 File content (N lines)"
4. webdriver_quit: Show correct driver name (safaridriver vs chromedriver)
5. read_file: When start position exceeds file length, read last 100 chars
with explanation instead of failing
6. shell: Remove redundant "Command failed:" prefix from error messages
Collapsed nested if statements that check related conditions into
single conditions using &&. This improves readability by making
the logical relationship between conditions explicit.
Files changed:
- feedback_extraction.rs: 3 instances of tool_use/final_output checks
- tools/todo.rs: 1 instance of todo completion check
Agent: fowler
The function had two branches that both returned line.to_string():
- when !should_truncate
- when line.chars().count() <= max_width
Merged into a single condition. Also updated format! to use
inline variable syntax per clippy suggestion.
Agent: fowler
Adds a new --auto-memory CLI flag that automatically sends a reminder
to the LLM after each turn where tools were called, prompting it to
call the remember tool if it discovered any key code locations.
Changes:
- Add auto_memory field and set_auto_memory() method to Agent
- Add tool_calls_this_turn tracking in execute_tool_in_dir()
- Add send_auto_memory_reminder() that sends reminder after tool use
- Add --auto-memory CLI flag and wire it up in console/machine modes
- Call send_auto_memory_reminder() in single-shot and interactive modes
- Add visible status messages for auto-memory actions
Fixes bug where tool calls were not being tracked when execute_tool_in_dir
was called directly with working_dir=None.