The JSON filter only suppresses tool calls at line boundaries. When
"Memory checkpoint: " was printed without a trailing newline, the LLM
response `{"tool": "remember", ...}` appeared on the same line and
leaked through to the UI.
Fix:
- Add trailing newline to "Memory checkpoint:" message
- Reset JSON filter state before streaming the response
Added test: test_tool_call_not_at_line_start_passes_through
Documents the filter behavior and references the fix location.
- Shell outputs > 8KB are truncated to first 500 chars
- Full output saved to .g3/sessions/<session_id>/tools/shell_stdout_<id>.txt
- LLM can use read_file with start/end to paginate through large outputs
- read_file now uses seek() for O(1) random access instead of reading entire file
- UTF-8 safe: reads extra bytes at boundaries to find valid char positions
- Falls back to lossy conversion for binary files (no panics)
Files changed:
- paths.rs: get_tools_output_dir(), generate_short_id()
- shell.rs: truncate_large_output() integration
- file_ops.rs: seek-based read_file_range() helper
- New test: read_file_utf8_test.rs
- Fix aliasing issue where resolve_max_tokens() used fallback_default_max_tokens
(8192) instead of provider-specific defaults
- Update fallback_default_max_tokens from 8192 to 32000
- Set provider-specific max_tokens defaults:
- Anthropic: 32000
- OpenAI: 32000 (was 16000)
- Databricks: 32000 (was 50000, now matches Anthropic as passthru)
- Embedded: 2048
- Context window lengths unchanged:
- OpenAI: 400,000
- Anthropic: 200,000
- Databricks (Claude): 200,000
This fixes the 'LLM response was cut off due to max_tokens limit' error
in agent mode that occurred because 8192 was being used instead of 32000.
Agent: carmack
Changes:
- streaming_parser.rs: Unified find_first/last_tool_call_start into single
find_tool_call_start with SearchDirection enum, reducing duplication.
Simplified is_json_invalidated from 45 to 20 lines with clearer logic.
Fixed redundant !escape_next check in find_complete_json_object_end.
- filter_json.rs: Simplified check_tool_pattern from 40 to 24 lines.
Replaced repetitive prefix checks with loop over ["t", "to", "too", "tool"].
Reduced trailing return statements with direct expression returns.
- ui_writer_impl.rs: Added ansi module for duration color constants.
Simplified duration_color function by removing redundant comments.
- language_prompts.rs: Fixed test assertions to match actual prompt content
("obvious, readable Racket" instead of "RACKET-SPECIFIC GUIDANCE").
All 174+ tests pass. No behavior changes.
- Add ToolParsingHint enum (Detected/Active/Complete) for UI feedback
- New UiWriter methods: print_tool_streaming_hint(), print_tool_streaming_active()
- Refactor ConsoleUiWriter state to use atomics in ParsingHintState
- Add tool_call_streaming field to CompletionChunk for provider hints
- Anthropic provider sends streaming hints when tool name detected
- New streaming helpers: make_tool_streaming_hint(), make_tool_streaming_active()
Parser improvements:
- Add is_json_invalidated() to detect false positive tool patterns
- Fix tool result poisoning when file contents contain partial JSON
- Unescaped newlines in strings or prose after JSON invalidates detection
User sees ' ● tool_name |' immediately when tool call starts streaming,
with blinking indicator while args are received.
When partial JSON tool call patterns appear in LLM output (e.g., from
quoting file content), the parser would incorrectly report them as
"incomplete tool calls", triggering auto-continue loops.
Fix: Added is_json_invalidated() to detect when partial JSON has been
invalidated by subsequent content that cannot be valid JSON:
- Unescaped newline inside a string (invalid JSON)
- Newline followed by prose text outside a string
The check is only applied to incomplete JSON - complete tool calls
with trailing text are still correctly detected.
Added 6 new tests covering:
- Tool results with partial JSON patterns
- LLM quoting file content inline vs on own line
- Comment prefixes (// # -- etc) with partial patterns
- Real incomplete tool calls (should still be detected)
The streaming parser was incorrectly detecting tool call patterns that
appeared inline in prose (e.g., when explaining the format), causing
g3 to return control mid-task.
Fix: Modified find_first_tool_call_start() and find_last_tool_call_start()
to only recognize patterns that appear on their own line (at start of
buffer or after newline with only whitespace before the pattern).
Changes:
- Added is_on_own_line() helper to check line-boundary conditions
- Updated detection methods to skip inline patterns
- Removed sanitize_inline_tool_patterns() and LBRACE_HOMOGLYPH (no longer needed)
- Rewrote tests for new behavior
- Added streaming_repro tests that use process_chunk() to verify the exact bug scenario
28 tests covering: streaming repro, line boundaries, Unicode, code contexts, edge cases
LLMs were prefixing shell commands with `cd <workspace> &&` unnecessarily,
wasting tokens and cluttering CLI display. Added clear guidance in the
shell tool description that commands already execute in the working directory.
- Rename take_screenshot -> screenshot, code_coverage -> coverage (shorter names)
- Align | character across all compact tools (pad to 11 chars for str_replace)
- Make code_search a compact tool with summary display
- Show language and search name in code_search output (e.g., rust:"find structs")
- Add format_code_search_summary() to extract match/file counts from JSON response
Remove dead code - constructor variants that had no callers:
- new_with_readme()
- new_autonomous_with_readme()
- new_with_quiet()
These were thin wrappers around new_with_mode_and_readme() that were
never used externally. All 5 remaining constructors have verified callers.
Results:
- lib.rs reduced from 2817 to 2797 lines (-20 lines)
- Eliminated code-path aliasing: 8 constructors → 5 constructors
- All g3-core tests pass
- Full workspace compiles cleanly
Agent: fowler
- Updated memory reminder prompt with per-symbol char ranges
- Added two few-shot examples: Session Continuation (feature) + UTF-8 Safe Slicing (pattern)
- Updated system prompt Memory Format section to match
- Format: file -> nested symbols with [start..end] ranges and descriptions
- Enables direct read_file navigation to specific functions
Extract the get_stats() function (158 lines) from lib.rs to a new stats.rs module.
Changes:
- Create stats.rs with AgentStatsSnapshot struct for capturing agent state
- Replace inline formatting logic with delegation to snapshot.format()
- Add unit tests for stats formatting (empty and populated states)
- Reduce lib.rs from 2961 to 2818 lines (-143 lines)
The new module improves:
- Testability: Stats formatting can now be unit tested in isolation
- Separation of concerns: Formatting logic is decoupled from Agent struct
- Readability: lib.rs is more focused on core agent behavior
All 271 workspace tests pass.
Agent: fowler
- Remove unused total_lines variable in file_ops.rs
- Fix UTF-8 boundary panic in utils.rs when generating diff error preview
The code was slicing at byte index 200 which could land inside a
multi-byte character (e.g., box-drawing chars like ─). Now uses
character-based slicing with chars().take() instead.
Improve readability of stream_completion_with_tools (~1000 line function):
- Add deduplicate_tool_calls() helper with closure for previous-message check
- Add should_auto_continue() with AutoContinueReason enum for clearer control flow
- Replace inline deduplication loop with helper call (-19 lines)
- Replace complex auto-continue conditional with match on reason enum (-13 lines)
- Add section comments for major phases (State Init, Pre-loop, Main Loop, Auto-Continue, Post-Loop)
- Add comprehensive tests for new helpers
Net reduction: 82 deletions, behavior unchanged (172+ tests pass)
Agent: carmack
When the streaming parser encountered fragments of JSON that looked like
partial tool calls (e.g., {"tool":) embedded in inline text (like code
examples or prose), it would incorrectly enter JSON parsing mode and
poison the parser state, causing control to be returned to the user
mid-task.
This fix:
- Adds sanitize_inline_tool_patterns() to detect tool-call patterns that
are NOT on their own line and replace the opening brace with a Unicode
homoglyph (fullwidth left curly bracket U+FF5B)
- Integrates sanitization into process_chunk() before text is buffered
- Updates system prompts to instruct LLMs to use homoglyphs when showing
example tool call JSON in prose
- Adds comprehensive tests for the sanitization logic
Real tool calls from LLMs always appear on their own line, so those are
left untouched. Only inline patterns (with non-whitespace before them)
are sanitized.
Agent: hopper
Adds 56 new integration tests covering the observable end-of-turn
behaviors in the streaming module:
- Timing footer formatting (5 tests): verifies user-facing timing display
with various durations, token counts, and context percentages
- Tool call duplicate detection (6 tests): ensures identical sequential
tool calls are detected while different tools/args are not
- Empty response detection (9 tests): validates detection of empty,
whitespace-only, and timing-only responses that trigger auto-continue
- Connection error classification (5 tests): verifies EOF, connection,
chunk, and body errors are correctly identified for graceful recovery
- Tool output summary formatting (17 tests): covers read_file, write_file,
str_replace, remember, screenshot, coverage, and rehydrate summaries
- Duration formatting (4 tests): milliseconds, seconds, minutes, zero
- Text truncation (4 tests): short/long strings, multiline, flag behavior
- LLM token cleaning (3 tests): removal of stop tokens like <|im_end|>
- Edge cases (4 tests): empty inputs, unicode handling, large numbers
All tests are blackbox/characterization style - they test observable
outputs through stable public interfaces without encoding internal
implementation details. Tests remain stable under refactoring that
preserves behavior.
The truncate_for_display() function now takes only the first line
of input before truncating. This prevents multi-line error messages
(like str_replace failures) from breaking the compact single-line
format.
Added tests for multi-line input handling.
The write_file compact display was showing 1 line because it was
counting lines in the success message, not the actual written content.
Now parses the tool result (e.g. '✅ wrote 150 lines | 4.2k chars')
to extract and display the correct counts.
Added format_write_file_result() to parse the tool output.
When compact tools (read_file, write_file, str_replace, etc.) failed,
they would fall through to the non-compact output path, causing:
- Missing or incorrect headers
- Stray footers with wrong formatting
- State leakage (is_shell_compact) between tool calls
Now failed compact tools display in the same single-line format as
successful ones, just with a truncated error message instead of the
success summary:
● read_file | path/to/file.txt | ❌ Failed to read file... | 123 ◉ 0ms
This keeps the UI consistent and avoids the "stray footer" bug.
This change removes the legacy logs/ directory and consolidates all
session data, error logs, and discovery files under the .g3/ directory.
New directory structure:
- .g3/sessions/<session_id>/session.json - session logs
- .g3/errors/ - error logs (was logs/errors/)
- .g3/background_processes/ - background process logs
- .g3/discovery/ - planner discovery files (was workspace/logs/)
Changes:
- paths.rs: Remove get_logs_dir()/logs_dir(), add get_errors_dir(),
get_background_processes_dir(), get_discovery_dir()
- session.rs: Anonymous sessions now use .g3/sessions/anonymous_<ts>/
- error_handling.rs: Errors now saved to .g3/errors/
- project.rs: Remove logs_dir() and ensure_logs_dir() methods
- feedback_extraction.rs: Remove logs_dir field and fallback logic
- planner: Use .g3/ for workspace data and .g3/discovery/ for reports
- flock.rs: Look for session metrics in .g3/sessions/
- coach_feedback.rs: Remove fallback to logs/ path
- Update all tests to use new paths
- Update README.md and .gitignore
Implement compact display format for read_file, write_file, str_replace, and shell:
- read_file/write_file/str_replace: Single line with dimmed summary and timing
Format: ● tool_name | path [range] | summary | tokens ◉ time
- shell: Two-line format with command header and dimmed output
Format: ● shell | command
└─ output (N lines) | tokens ◉ time
Changes:
- Add print_tool_compact() method to UiWriter trait
- Add is_shell_compact state tracking in ConsoleUiWriter
- Add format_write_file_summary() and format_str_replace_summary() helpers
- Fix duplicate response output by checking if response is empty before printing
- Add finish_streaming_markdown() call before return to flush markdown buffer
Agent: hopper
Added 4 new test files with blackbox/characterization-style integration tests:
- compaction_behavior_test.rs (14 tests): Token cap calculation, thinking mode
disable logic, summary message building, CompactionResult behavior
- retry_behavior_test.rs (17 tests): RetryConfig presets and customization,
RetryResult state handling, retry_operation behavior with simulated errors
- tool_execution_roundtrip_test.rs (16 tests): End-to-end tool execution through
Agent interface for read_file, write_file, shell, str_replace, and TODO tools
- error_classification_test.rs (25 tests): Recoverable vs non-recoverable error
classification, retry delay calculation, edge cases and priority handling
All tests follow integration-first philosophy:
- Test through stable public interfaces
- Assert observable behavior, not implementation details
- Use characterization style to document current behavior
- Enable refactoring by not encoding internal structure
Project memory is now stored at analysis/memory.md instead of .g3/memory.md.
This change enables:
- Shared memory across git worktrees (studio agent sessions)
- Version-controlled memory that persists across clones
- Memory changes tracked in git history and reviewable in PRs
Changes:
- crates/g3-core/src/tools/memory.rs: Update get_memory_path() to use analysis/
- crates/g3-cli/src/project_files.rs: Update read_project_memory() path
- crates/g3-core/src/prompts.rs: Update documentation references (2 occurrences)
- analysis/memory.md: Add memory file (copied from .g3/memory.md)
Added first_user_message field to Fragment struct that captures the
full first user message (task) from the dehydrated conversation.
This is now displayed at the top of the stub with a 📋 Task: prefix.
Removed the Topics section from the stub since the full task provides
better context for forensics and debugging.
Agent: g3
ACD (Aggressive Context Dehydration) fixes:
- Fixed dehydrate_context() to extract turn summary from context window
instead of using the passed-in final_response (which contained only
the timing footer, not the actual LLM response)
- Removed final_response parameter from dehydrate_context() since it
now self-extracts the last assistant message as the summary
- This ensures the actual turn summary is preserved after dehydration,
not just the timing footer
New /dump command:
- Added /dump command to dump entire context window to tmp/ for debugging
- Shows message index, role, kind, content length, and full content
- Available in both console and machine modes
UTF-8 safety:
- Fixed truncate_to_word_boundary() to use character indices instead of
byte indices, preventing panics on multi-byte UTF-8 characters
- Added UTF-8 string slicing guidance to AGENTS.md
Agent: g3
When read_file is called with an end position beyond the file length,
instead of returning an error that forces a retry, now clamps to the
actual file length and returns the content with an informative message.
This eliminates wasteful retry cycles where the LLM had to make a
second request with the corrected end position.
Change read_file output format so the "🔍 N lines read" appears as
the last line after the file content, not before it. This keeps the
output cleaner with just one metadata line at the end.
1. str_replace: Show insertion/deletion counts with colors
"✅ +N insertions | -M deletions" (green/red)
2. write_file: Compact format with human-readable sizes
"✅ wrote N lines | Xk chars"
3. read_file: Cleaner format
"🔍 N lines read" instead of "📄 File content (N lines)"
4. webdriver_quit: Show correct driver name (safaridriver vs chromedriver)
5. read_file: When start position exceeds file length, read last 100 chars
with explanation instead of failing
6. shell: Remove redundant "Command failed:" prefix from error messages