final_output removal:
- Remove final_output from tool definitions and dispatch
- Update system prompts to request summaries as regular text
- Remove final_output_called field from StreamingState
- Update auto_continue tests to remove final_output_called parameter
- Remove final_output test from tool_execution_test.rs
- Update planner and flock prompts to not reference final_output
- Keep backwards-compat code in feedback_extraction.rs and task_result.rs
Scout report handback:
- Change from file-based to delimiter-based report extraction
- Scout outputs report between ---SCOUT_REPORT_START/END--- markers
- Research tool extracts content between markers, strips ANSI codes
- Add comprehensive tests for extraction and ANSI stripping
657 tests pass.
The research tool now streams the underlying scout agent's output
to the CLI in real-time for visual indication of progress. This
output is displayed but not added to the conversation context.
Simplified the main system prompt's web research section to just direct
users to the research tool. Moved the detailed WebDriver usage instructions
to scout.md where they belong, since the scout agent is the one that
actually uses WebDriver for research.
Main prompt now simply says: use the research tool for web research.
Scout agent now has the full WebDriver best practices documentation.
New tool that spawns a scout agent to perform web research and return
a structured research brief. The scout agent uses webdriver to browse
the web and returns a decision-ready report.
Changes:
- Added 'research' tool definition (12 core tools total)
- Added research tool dispatch in tool_dispatch.rs
- Created tools/research.rs implementation:
- Spawns 'g3 --agent scout <query>' as subprocess
- Captures stdout and extracts last line (report file path)
- Reads and returns the report file contents
- Added exclude_research flag to ToolConfig
- Scout agent (agent_name == 'scout') does NOT have access to research
tool to prevent infinite recursion
- Updated system prompts to describe when to use research tool
- Added scout.md agent prompt with research brief output contract
The research tool is preferred for complex research tasks (APIs, SDKs,
libraries, approaches, bugs). WebDriver can still be used directly for
simple lookups or fine-grained control.
The buffer truncation code was slicing at a raw byte offset which could
land in the middle of a multi-byte character (like emojis), causing a
panic. Fixed by using char_indices() to find valid character boundaries.
Also added stop_reason field to CompletionChunk initializers in tests
to complete the stop_reason feature addition.
- Fix byte boundary panic in filter_json.rs line 327
- Add test for multi-byte character handling
- Update test files with missing stop_reason field
When the LLM executes a tool and then outputs text (e.g., analysis after
reading images), the text was being displayed during streaming but never
saved to the context window. This caused:
1. The response to appear truncated in the session log
2. Loss of context for subsequent turns
3. The LLM losing track of what it had already said
The fix saves current_response to the context window before breaking
out of the streaming loop for auto-continue after tool execution.
Reproduction scenario:
- User asks LLM to read images and analyze them
- LLM calls read_image tool
- Tool executes successfully
- LLM outputs analysis text ("Now I can see the results...")
- Text was displayed but lost from session log
Now the text is properly persisted to the context window.
- Remove final_output from tool definitions, dispatch, and misc tools
- Update system prompts to request summaries as regular markdown text
- Remove print_final_output from UiWriter trait and all implementations
- Remove final_output handling from agent core logic
- Rename final_output_summary → summary in session continuation
- Delete final_output test files
- Update tool count tests (12→11, 27→26)
This allows LLM summaries to stream through the markdown formatter
for a more natural, responsive user experience instead of buffering
everything into a tool call.
When shell commands output very long lines (e.g., JSON content from
tail -c 10000), the lines would wrap in the terminal. The cursor-up
escape code (\x1b[1A) only moves up one visual line, not the entire
wrapped content, causing the display to fill with uncleared text.
This fix truncates lines to 120 characters in update_tool_output_line()
before displaying them, preventing the wrapping issue.
This fixes a bug where the agent would stop responding abruptly without
calling final_output. The root cause was the allow_multiple_tool_calls
config option (default: false) which caused the agent to break out of
the streaming loop mid-stream after executing the first tool, losing
any subsequent content.
Changes:
- Remove allow_multiple_tool_calls config option entirely
- Always process all tool calls without breaking mid-stream
- Simplify system prompt generation (no longer needs boolean param)
- Let the stream complete fully before continuing to next iteration
- Change find_last_tool_call_start to find_first_tool_call_start
- Remove parser.reset() call on duplicate detection
Benefits:
- Simpler logic with less conditional branching
- No lost content after tool calls
- Consistent behavior for all users
- Reduced config complexity
Bug 1: Inline code after list bullets not detected
- After emitting a list bullet, at_line_start was not set to false
- This caused the next backtick to be treated as a potential code fence
- Fixed by setting at_line_start = false after emitting bullet
Bug 2: Code block closing on indented backticks
- Code blocks containing indented ``` (4+ spaces) were closing prematurely
- The .trim() check was too permissive
- Fixed by only allowing closing fence with <= 3 spaces indent (CommonMark spec)
Added tests for both edge cases.
Replace the separate syntax_highlight module with the streaming markdown
formatter for final_output rendering. This:
- Removes special buffered rendering logic for final_output
- Uses the same StreamingMarkdownFormatter used for agent responses
- Removes the spinner animation (content renders immediately)
- Deletes the now-unused syntax_highlight.rs module
- Updates test to use the streaming formatter
Benefits:
- Consistent rendering across all markdown output
- Less code to maintain (removed ~250 lines)
- Same syntax highlighting via syntect (already in streaming formatter)
Agent: carmack
Add get_session() helper function that:
- Checks if webdriver is enabled
- Acquires the session read lock
- Returns the cloned session or an error message
Refactored 12 webdriver tool functions to use this helper:
- execute_webdriver_navigate
- execute_webdriver_get_url
- execute_webdriver_get_title
- execute_webdriver_find_element
- execute_webdriver_find_elements
- execute_webdriver_click
- execute_webdriver_send_keys
- execute_webdriver_execute_script
- execute_webdriver_get_page_source
- execute_webdriver_screenshot
- execute_webdriver_back
- execute_webdriver_forward
- execute_webdriver_refresh
Each function previously had ~10 lines of identical boilerplate.
Now reduced to 4 lines using the helper.
Net reduction: 68 lines (678 -> 610)
All tests pass. Behavior unchanged.
Agent: carmack
openai.rs:
- Use make_text_chunk() for streaming text content
- Use make_final_chunk() for final completion chunk
- Simplify tool_calls conversion logic
embedded.rs:
- Use make_text_chunk() for all 4 streaming text chunks
- Use make_final_chunk() for final completion chunk
- Remove unused CompletionChunk import
Net reduction: 35 lines removed
All tests pass. Behavior unchanged.
Agent: carmack
databricks.rs:
- Extract ToolCallAccumulator struct to replace opaque (String, String, String) tuple
- Add decode_utf8_streaming() helper for cleaner UTF-8 handling
- Add is_incomplete_json_error() helper for JSON parse error detection
- Add make_final_chunk() helper to reduce duplication
- Add finalize_tool_calls() to convert accumulators to final format
- Refactor parse_streaming_response from ~270 lines to ~100 lines
- Reduce nesting depth from 8+ levels to 4 levels
- Use early returns and let-else for cleaner control flow
file_ops.rs:
- Replace repetitive if-let chains with declarative PATH_CONTENT_KEYS table
- Use match expression instead of nested if-else
- Reduce extract_path_and_content from 44 lines to 20 lines
All tests pass. Behavior unchanged.
- Add set_agent_mode() to UiWriter trait for visual mode differentiation
- ConsoleUiWriter uses royal blue (ANSI 256 color 69) for tool names in agent mode
- Fix extract_readme_heading() to search only README section of combined content
(was incorrectly showing AGENTS.md heading instead of README heading)
- streaming_parser.rs: Rename has_message_like_keys to args_contain_prose_fragments
with improved documentation explaining the heuristic for detecting malformed
tool calls where LLM prose leaked into JSON keys
- context_window.rs: Simplify build_thin_result_message using early return
pattern and match expression for cleaner control flow
Agent: carmack
The g3-console crate was not referenced by any other crate in the
workspace and appears to be an abandoned web console implementation.
Removed:
- crates/g3-console/ (entire directory)
- Workspace member entry in Cargo.toml
Agent: fowler
These files were not referenced anywhere in the codebase and appear
to be leftover from a previous TUI implementation that was abandoned.
Removed:
- crates/g3-cli/src/retro_tui.rs (62KB)
- crates/g3-cli/src/tui.rs (6KB)
Agent: fowler
Auto-continue was incorrectly triggering when the LLM asked questions
in interactive/chat mode. Now auto-continue only activates when
is_autonomous is true, allowing proper back-and-forth conversation
in interactive mode.
Agent: fowler
Adds a new CLI flag that allows users to force a new session when running
in agent mode, bypassing the automatic detection and resumption of
incomplete sessions.
Usage: g3 --agent my-agent --new-session
New test files:
- crates/g3-cli/tests/cli_integration_test.rs (14 tests)
Blackbox CLI tests: help/version flags, argument validation,
conflicting modes, flock mode requirements
- crates/g3-core/tests/tool_execution_test.rs (20 tests)
Tool call structure tests and unified diff application:
read_file, write_file, str_replace, shell, background_process,
todo, final_output, code_search, take_screenshot
- crates/g3-providers/tests/message_serialization_test.rs (20 tests)
Round-trip serialization tests for Message, MessageRole,
CacheControl, and Tool types. Covers Unicode, special chars,
and edge cases.
All tests follow blackbox/integration-first principles with
documentation of what they protect and intentionally do not assert.
- Extract check_duplicate_in_previous_message() helper to reduce nesting
from 6+ levels to 2 levels in stream_completion_with_tools
- Create do_thin_context() and do_thin_context_all() helpers to centralize
context thinning with event tracking
- Use provider_config::parse_provider_ref() in additional call sites
- All 295 tests pass
This continues the refactoring to eliminate code-path aliasing and
reduce cyclomatic complexity in the Agent implementation.
The blanket *.json ignore is not canonical for Rust projects.
JSON files that need ignoring are already covered by:
- .g3/ for session logs
- logs/ for error logs
- .build for Swift build artifacts
The Euler agent must now update AGENTS.md after generating artifacts:
- Add/update 'Dependency Analysis Artifacts' section
- Table listing each file in analysis/deps/ with one-line descriptions
- No findings, metrics, or recommendations in AGENTS.md