Commit Graph

353 Commits

Author SHA1 Message Date
Dhanji R. Prasanna
6f50d01ab6 Add comprehensive end-of-turn behavior tests for g3-core
Agent: hopper

Adds 56 new integration tests covering the observable end-of-turn
behaviors in the streaming module:

- Timing footer formatting (5 tests): verifies user-facing timing display
  with various durations, token counts, and context percentages

- Tool call duplicate detection (6 tests): ensures identical sequential
  tool calls are detected while different tools/args are not

- Empty response detection (9 tests): validates detection of empty,
  whitespace-only, and timing-only responses that trigger auto-continue

- Connection error classification (5 tests): verifies EOF, connection,
  chunk, and body errors are correctly identified for graceful recovery

- Tool output summary formatting (17 tests): covers read_file, write_file,
  str_replace, remember, screenshot, coverage, and rehydrate summaries

- Duration formatting (4 tests): milliseconds, seconds, minutes, zero

- Text truncation (4 tests): short/long strings, multiline, flag behavior

- LLM token cleaning (3 tests): removal of stop tokens like <|im_end|>

- Edge cases (4 tests): empty inputs, unicode handling, large numbers

All tests are blackbox/characterization style - they test observable
outputs through stable public interfaces without encoding internal
implementation details. Tests remain stable under refactoring that
preserves behavior.
2026-01-12 21:17:32 +05:30
Dhanji R. Prasanna
d164c97ad2 Fix multi-line error messages in compact tool output
The truncate_for_display() function now takes only the first line
of input before truncating. This prevents multi-line error messages
(like str_replace failures) from breaking the compact single-line
format.

Added tests for multi-line input handling.
2026-01-12 20:55:05 +05:30
Dhanji R. Prasanna
1b051aad94 Fix write_file compact summary to show actual line/char counts
The write_file compact display was showing 1 line because it was
counting lines in the success message, not the actual written content.

Now parses the tool result (e.g. ' wrote 150 lines | 4.2k chars')
to extract and display the correct counts.

Added format_write_file_result() to parse the tool output.
2026-01-12 20:32:54 +05:30
Dhanji R. Prasanna
6f3530544d Fix compact tool failure display to use single-line format
When compact tools (read_file, write_file, str_replace, etc.) failed,
they would fall through to the non-compact output path, causing:
- Missing or incorrect headers
- Stray footers with wrong formatting
- State leakage (is_shell_compact) between tool calls

Now failed compact tools display in the same single-line format as
successful ones, just with a truncated error message instead of the
success summary:

  ● read_file | path/to/file.txt |  Failed to read file... | 123 ◉ 0ms

This keeps the UI consistent and avoids the "stray footer" bug.
2026-01-12 20:02:08 +05:30
Dhanji R. Prasanna
78516722df Remove accidentally committed legacy logs/ directories 2026-01-12 18:20:20 +05:30
Dhanji R. Prasanna
c2aa80647a Remove legacy logs/ directory, consolidate all data under .g3/
This change removes the legacy logs/ directory and consolidates all
session data, error logs, and discovery files under the .g3/ directory.

New directory structure:
- .g3/sessions/<session_id>/session.json - session logs
- .g3/errors/ - error logs (was logs/errors/)
- .g3/background_processes/ - background process logs
- .g3/discovery/ - planner discovery files (was workspace/logs/)

Changes:
- paths.rs: Remove get_logs_dir()/logs_dir(), add get_errors_dir(),
  get_background_processes_dir(), get_discovery_dir()
- session.rs: Anonymous sessions now use .g3/sessions/anonymous_<ts>/
- error_handling.rs: Errors now saved to .g3/errors/
- project.rs: Remove logs_dir() and ensure_logs_dir() methods
- feedback_extraction.rs: Remove logs_dir field and fallback logic
- planner: Use .g3/ for workspace data and .g3/discovery/ for reports
- flock.rs: Look for session metrics in .g3/sessions/
- coach_feedback.rs: Remove fallback to logs/ path
- Update all tests to use new paths
- Update README.md and .gitignore
2026-01-12 18:20:08 +05:30
Dhanji R. Prasanna
43a5d27149 Add compact format for remember, take_screenshot, code_coverage, rehydrate
Extend compact single-line output to additional tools:
- remember: shows '📝 memory updated (size)'
- take_screenshot: shows '📸 path'
- code_coverage: shows '📊 report generated'
- rehydrate: shows '🔄 restored fragment_id'

Tools without file_path argument use simplified format:
  ● tool_name | summary | tokens ◉ time
2026-01-12 14:45:50 +05:30
Dhanji R. Prasanna
2c411c058a Compact single-line tool output for file operations and shell
Implement compact display format for read_file, write_file, str_replace, and shell:

- read_file/write_file/str_replace: Single line with dimmed summary and timing
  Format: ● tool_name | path [range] | summary | tokens ◉ time

- shell: Two-line format with command header and dimmed output
  Format: ● shell | command
          └─ output (N lines) | tokens ◉ time

Changes:
- Add print_tool_compact() method to UiWriter trait
- Add is_shell_compact state tracking in ConsoleUiWriter
- Add format_write_file_summary() and format_str_replace_summary() helpers
- Fix duplicate response output by checking if response is empty before printing
- Add finish_streaming_markdown() call before return to flush markdown buffer
2026-01-12 14:37:47 +05:30
Dhanji R. Prasanna
5dfabaf19a Add 72 integration tests for compaction, retry, tool execution, and error classification
Agent: hopper

Added 4 new test files with blackbox/characterization-style integration tests:

- compaction_behavior_test.rs (14 tests): Token cap calculation, thinking mode
  disable logic, summary message building, CompactionResult behavior

- retry_behavior_test.rs (17 tests): RetryConfig presets and customization,
  RetryResult state handling, retry_operation behavior with simulated errors

- tool_execution_roundtrip_test.rs (16 tests): End-to-end tool execution through
  Agent interface for read_file, write_file, shell, str_replace, and TODO tools

- error_classification_test.rs (25 tests): Recoverable vs non-recoverable error
  classification, retry delay calculation, edge cases and priority handling

All tests follow integration-first philosophy:
- Test through stable public interfaces
- Assert observable behavior, not implementation details
- Use characterization style to document current behavior
- Enable refactoring by not encoding internal structure
2026-01-12 11:40:19 +05:30
Dhanji R. Prasanna
d508ddd508 Move project memory from .g3/ to analysis/ for version control
Project memory is now stored at analysis/memory.md instead of .g3/memory.md.
This change enables:
- Shared memory across git worktrees (studio agent sessions)
- Version-controlled memory that persists across clones
- Memory changes tracked in git history and reviewable in PRs

Changes:
- crates/g3-core/src/tools/memory.rs: Update get_memory_path() to use analysis/
- crates/g3-cli/src/project_files.rs: Update read_project_memory() path
- crates/g3-core/src/prompts.rs: Update documentation references (2 occurrences)
- analysis/memory.md: Add memory file (copied from .g3/memory.md)
2026-01-12 10:20:33 +05:30
Dhanji R. Prasanna
8df044ac13 refactor(g3-core): reduce lib.rs complexity by extracting utilities
- Extract truncate_to_word_boundary() to utils.rs with tests
- Consolidate duplicate detection: use streaming::are_tool_calls_duplicate()
  instead of inline closures (eliminates code-path aliasing)
- Remove unused regex import
- Remove wrapper methods format_duration/format_timing_footer that just
  delegated to streaming module - call streaming::* directly

Reduces lib.rs from 2945 to 2897 lines (-48 lines, -1.6%)
All 159+ g3-core tests pass.

Agent: fowler
2026-01-12 09:47:47 +05:30
Dhanji R. Prasanna
02799a8e69 refactor(g3-core): extract streaming helpers and simplify cache control logic
Readability improvements to g3-core/src/lib.rs:

- Extract format_tool_arg_value() to streaming.rs for tool argument display
- Extract format_read_file_summary() to streaming.rs for file read summaries
- Add format_tool_output_summary() helper for consistent output formatting
- Add get_provider_cache_control() helper to eliminate duplicated cache lookup
- Simplify cache control logic in execute_single_task and stream_completion_with_tools
- Add unit tests for all new streaming helpers

Results:
- lib.rs: 2979 → 2945 lines (34 lines saved)
- streaming.rs: 305 → 379 lines (74 lines added as reusable, tested helpers)
- All 155+ tests pass

Agent: carmack
2026-01-12 07:21:40 +05:30
Dhanji R. Prasanna
f10374c925 Remove machine mode entirely from g3
- Delete machine_ui_writer.rs
- Remove --machine CLI flag from cli_args.rs
- Remove run_machine_mode(), run_interactive_machine(), run_autonomous_machine() functions
- Remove handle_machine_command() function
- Simplify OutputMode enum to just use SimpleOutput directly
- Simplify SimpleOutput struct (remove machine_mode field)
- Remove machine_mode parameter from setup_workspace_directory()
- Remove test_machine_option_accepted test
- Disable ACD by default in agent_mode (requires --acd flag)
- Change 'memory checkpoint' message formatting
- Remove dehydration status message
2026-01-12 06:01:31 +05:30
Dhanji R. Prasanna
14cc28d9ba Include full task in ACD dehydration stub for forensics
Added first_user_message field to Fragment struct that captures the
full first user message (task) from the dehydrated conversation.
This is now displayed at the top of the stub with a 📋 Task: prefix.

Removed the Topics section from the stub since the full task provides
better context for forensics and debugging.

Agent: g3
2026-01-12 05:17:45 +05:30
Dhanji R. Prasanna
f415dbb84b Fix ACD turn summary loss and add /dump command
ACD (Aggressive Context Dehydration) fixes:
- Fixed dehydrate_context() to extract turn summary from context window
  instead of using the passed-in final_response (which contained only
  the timing footer, not the actual LLM response)
- Removed final_response parameter from dehydrate_context() since it
  now self-extracts the last assistant message as the summary
- This ensures the actual turn summary is preserved after dehydration,
  not just the timing footer

New /dump command:
- Added /dump command to dump entire context window to tmp/ for debugging
- Shows message index, role, kind, content length, and full content
- Available in both console and machine modes

UTF-8 safety:
- Fixed truncate_to_word_boundary() to use character indices instead of
  byte indices, preventing panics on multi-byte UTF-8 characters
- Added UTF-8 string slicing guidance to AGENTS.md

Agent: g3
2026-01-12 05:13:02 +05:30
Dhanji R. Prasanna
ac17b95b24 fix(read_file): clamp end position instead of erroring when it exceeds file length
When read_file is called with an end position beyond the file length,
instead of returning an error that forces a retry, now clamps to the
actual file length and returns the content with an informative message.

This eliminates wasteful retry cycles where the LLM had to make a
second request with the corrected end position.
2026-01-12 05:11:09 +05:30
Dhanji R. Prasanna
da63e79a13 Move read_file metadata to end of output
Change read_file output format so the "🔍 N lines read" appears as
the last line after the file content, not before it. This keeps the
output cleaner with just one metadata line at the end.
2026-01-11 19:56:23 +05:30
Dhanji R. Prasanna
ed1c31dd70 Improve tool output formatting
1. str_replace: Show insertion/deletion counts with colors
   " +N insertions | -M deletions" (green/red)

2. write_file: Compact format with human-readable sizes
   " wrote N lines | Xk chars"

3. read_file: Cleaner format
   "🔍 N lines read" instead of "📄 File content (N lines)"

4. webdriver_quit: Show correct driver name (safaridriver vs chromedriver)

5. read_file: When start position exceeds file length, read last 100 chars
   with explanation instead of failing

6. shell: Remove redundant "Command failed:" prefix from error messages
2026-01-11 19:52:00 +05:30
Dhanji R. Prasanna
7c960875ef Add hint to re-read memory from disk in system prompt
Added note that agents can use read_file .g3/memory.md to refresh
project memory if needed (e.g., after another agent updates it).
2026-01-11 19:40:02 +05:30
Dhanji R. Prasanna
bb25c7881a Change agent mode header text
From: 🤖 Running as agent: fowler
To: >> agent mode | fowler
2026-01-11 17:24:26 +05:30
Dhanji R. Prasanna
4962f439f3 Simplify agent mode working directory display
Change from: 📁 Working directory: "/Users/dhanji/src/g3"
To: -> ~/src/g3

Replaces home directory with ~ for cleaner output.
2026-01-11 17:20:26 +05:30
Dhanji R. Prasanna
f83ae7fd39 Add status line showing loaded context in agent mode
Shows checkmarks for README, AGENTS.md, and Memory if loaded,
or dots if not found. Displayed below the working directory line.
2026-01-11 17:13:32 +05:30
Dhanji R. Prasanna
9509e51708 style: simplify auto-memory checkpoint message 2026-01-11 16:51:09 +05:30
Dhanji R. Prasanna
83c9b5d434 Add integration blackbox tests for g3-core
Adds 18 new integration tests covering:

- Background process lifecycle (start, check running, kill, list)
- Unified diff edge cases (multi-hunk, additions-only, deletions-only,
  CRLF normalization, range constraints, error handling)
- Error classification boundaries (rate limit, server error, timeout,
  network error, context length exceeded, model busy, non-recoverable)

These tests follow blackbox/integration-first principles:
- Test through stable public interfaces
- Do not encode internal implementation details
- Focus on observable behavior
- Enable refactoring without test breakage

Agent: hopper
2026-01-11 16:32:59 +05:30
Dhanji R. Prasanna
874be7b459 refactor(core): collapse nested if statements per clippy
Collapsed nested if statements that check related conditions into
single conditions using &&. This improves readability by making
the logical relationship between conditions explicit.

Files changed:
- feedback_extraction.rs: 3 instances of tool_use/final_output checks
- tools/todo.rs: 1 instance of todo completion check

Agent: fowler
2026-01-11 16:21:33 +05:30
Dhanji R. Prasanna
1c3de60bb9 refactor(core): simplify truncate_line() by merging identical branches
The function had two branches that both returned line.to_string():
- when !should_truncate
- when line.chars().count() <= max_width

Merged into a single condition. Also updated format! to use
inline variable syntax per clippy suggestion.

Agent: fowler
2026-01-11 16:18:48 +05:30
Dhanji R. Prasanna
280ae1fcbb feat: add --auto-memory flag to prompt LLM to save discoveries
Adds a new --auto-memory CLI flag that automatically sends a reminder
to the LLM after each turn where tools were called, prompting it to
call the remember tool if it discovered any key code locations.

Changes:
- Add auto_memory field and set_auto_memory() method to Agent
- Add tool_calls_this_turn tracking in execute_tool_in_dir()
- Add send_auto_memory_reminder() that sends reminder after tool use
- Add --auto-memory CLI flag and wire it up in console/machine modes
- Call send_auto_memory_reminder() in single-shot and interactive modes
- Add visible status messages for auto-memory actions

Fixes bug where tool calls were not being tracked when execute_tool_in_dir
was called directly with working_dir=None.
2026-01-11 08:00:51 +08:00
Dhanji R. Prasanna
bf53b81af3 remember tool prompt tweak 2026-01-11 07:22:43 +08:00
Dhanji R. Prasanna
e731bc8217 Make remember tool instructions more imperative in system prompts
- Change 'call remember' to 'you MUST call remember' in native prompt
- Change 'IF you discovered' to 'ALWAYS...when you discovered'
- Add explicit list of trigger tools (code_search, rg, grep, find, read_file)
- Add reminder to Response Guidelines section
- Add remember tool and Project Memory section to non-native prompt
- Remove redundant console output from remember tool
- Fix test compilation errors (missing summary parameter, temporary borrow)
2026-01-11 06:49:45 +08:00
Dhanji R. Prasanna
1090e30d6c Simplify system prompt: remove coding style and parallel tool call sections
- Remove IMPORTANT FOR CODING section (~1,500 chars of coding guidelines)
- Remove <use_parallel_tool_calls> block (~500 chars)
- Remove unused const_format dependency from g3-core
- Simplify get_system_prompt_for_native() to just return base prompt
- Response Guidelines now cleanly ends the static prompt

Prompt reduced from ~8,500 to ~6,500 characters.
2026-01-11 06:35:18 +08:00
Dhanji R. Prasanna
33c1aba86e Show human-readable descriptions in /resume session list
- Add description field to SessionContinuation struct
- Extract first user message (truncated to ~60 chars at word boundary)
- Display as quoted text instead of session ID hash
- Fall back to session ID if no description available

Example: [2 hours ago] 'when I call /resume it only shows me 2 sessions...'
2026-01-11 06:22:20 +08:00
Dhanji R. Prasanna
3fcef587e8 Fix /resume to show all sessions and use human-readable timestamps
- Change run_autonomous to return Agent instead of () so session
  continuation is properly saved in accumulative mode
- Update format_session_time to show relative times ("2 hours ago",
  "yesterday") for recent sessions and dates for older ones
- Handle Ctrl+C cancellation gracefully with informative message
2026-01-11 06:13:27 +08:00
Dhanji R. Prasanna
8926775acb Add session continuation symlink fix and /resume command
Fix session detection:
- Add save_session_continuation() calls at all session exit points
- Sessions now properly create .g3/session symlink for resume detection
- Fixes issue where g3 wasn't offering to resume previous sessions

Add /resume command:
- New list_sessions_for_directory() to scan available sessions
- New switch_to_session() method to safely switch between sessions
- Shows numbered list with timestamps, context %, and TODO status
- Saves current session before switching (can be resumed later)
- Restores full context if <80% used, otherwise uses summary
- Machine mode supports /resume and /resume <number>

Documentation:
- Add /clear and /resume to CONTROL_COMMANDS.md
- Update /help output with new commands
2026-01-11 05:30:58 +08:00
Dhanji R. Prasanna
86709834e2 Improve research tool error reporting for scout agent failures
When the scout agent fails (e.g., context window exhaustion), now:
- Captures both stdout and stderr from the scout process
- Detects context window exhaustion errors with specific patterns
- Provides detailed, actionable error messages to the user
- Shows suggestions for how to work around the issue
- Includes technical details (exit code, error output) for debugging

Handles two failure modes:
1. Scout agent exits with non-zero status
2. Scout agent exits successfully but doesn't produce valid report markers

Both cases now surface clear error messages instead of cryptic failures.
2026-01-10 20:50:43 +11:00
Dhanji R. Prasanna
60aeb67c56 Add stealth mode for Chrome headless to evade bot detection
Implements comprehensive anti-detection measures:
- Override navigator.webdriver to return undefined
- Inject fake chrome.runtime, chrome.loadTimes, chrome.csi objects
- Add realistic plugins and mimeTypes arrays
- Patch permissions API to hide automation
- Set realistic navigator properties (languages, hardwareConcurrency, deviceMemory)
- Remove ChromeDriver-specific window properties (cdc_*)
- Patch Function.prototype.toString to hide modifications
- Add Chrome flags: --disable-blink-features=AutomationControlled
- Set realistic user-agent without HeadlessChrome identifier
- Exclude 'enable-automation' switch

Tested against bot detection sites:
- bot.sannysoft.com: All major tests pass
- Search engines: Works with DuckDuckGo, Yahoo, Brave, Startpage
- Still detected by: Google reCAPTCHA, Cloudflare Turnstile, Bing
2026-01-10 20:34:14 +11:00
Dhanji R. Prasanna
6be0a03c4c Fix timing footer being saved to context window
The timing footer (e.g., ⏱️ 19.4s | 💭 4.7s) was being saved to the
conversation history as a separate assistant message. This happened
because stream_completion_with_tools returns the timing footer in
TaskResult.response for display, but the caller was also saving it
to context.

Fix: Strip the timing footer (identified by \n\n⏱️) before saving
to context window. The timing footer remains display-only.

Also includes:
- Research tool blank line fix: only add visual separator for research
  tool output, not all tools
- Research tool webdriver propagation: pass parent's webdriver browser
  choice (Safari vs Chrome headless) to scout subprocess
2026-01-10 15:55:59 +11:00
Dhanji R. Prasanna
68c9135913 Fix research tool UI: remove duplicate header, add footer spacing, remove spinner, widen command display
- Remove duplicate tool header (lib.rs already prints it)
- Add newline before timing footer for visual separation
- Remove spinner animation (incompatible with update_tool_output_line)
- Change shell command format to " > `cmd` ..." with 60 char width
2026-01-10 15:20:40 +11:00
Dhanji R. Prasanna
0aa1287ca6 Remove final_output tool and improve scout report handback
final_output removal:
- Remove final_output from tool definitions and dispatch
- Update system prompts to request summaries as regular text
- Remove final_output_called field from StreamingState
- Update auto_continue tests to remove final_output_called parameter
- Remove final_output test from tool_execution_test.rs
- Update planner and flock prompts to not reference final_output
- Keep backwards-compat code in feedback_extraction.rs and task_result.rs

Scout report handback:
- Change from file-based to delimiter-based report extraction
- Scout outputs report between ---SCOUT_REPORT_START/END--- markers
- Research tool extracts content between markers, strips ANSI codes
- Add comprehensive tests for extraction and ANSI stripping

657 tests pass.
2026-01-10 13:43:04 +11:00
Dhanji R. Prasanna
cab2fb187a Stream scout agent output to CLI during research
The research tool now streams the underlying scout agent's output
to the CLI in real-time for visual indication of progress. This
output is displayed but not added to the conversation context.
2026-01-09 20:39:53 +11:00
Dhanji R. Prasanna
22d1ac8096 Move WebDriver instructions from main prompt to scout agent
Simplified the main system prompt's web research section to just direct
users to the research tool. Moved the detailed WebDriver usage instructions
to scout.md where they belong, since the scout agent is the one that
actually uses WebDriver for research.

Main prompt now simply says: use the research tool for web research.
Scout agent now has the full WebDriver best practices documentation.
2026-01-09 16:01:47 +11:00
Dhanji R. Prasanna
33e5705fc3 Add research tool for web-based research via scout agent
New tool that spawns a scout agent to perform web research and return
a structured research brief. The scout agent uses webdriver to browse
the web and returns a decision-ready report.

Changes:
- Added 'research' tool definition (12 core tools total)
- Added research tool dispatch in tool_dispatch.rs
- Created tools/research.rs implementation:
  - Spawns 'g3 --agent scout <query>' as subprocess
  - Captures stdout and extracts last line (report file path)
  - Reads and returns the report file contents
- Added exclude_research flag to ToolConfig
- Scout agent (agent_name == 'scout') does NOT have access to research
  tool to prevent infinite recursion
- Updated system prompts to describe when to use research tool
- Added scout.md agent prompt with research brief output contract

The research tool is preferred for complex research tasks (APIs, SDKs,
libraries, approaches, bugs). WebDriver can still be used directly for
simple lookups or fine-grained control.
2026-01-09 15:59:19 +11:00
Dhanji R. Prasanna
de50726eeb Prefer ripgrep over grep in system prompts
Added guidance to use rg (ripgrep) instead of grep in shell commands.
Ripgrep is faster, has better defaults, and respects .gitignore.
2026-01-09 15:28:04 +11:00
Dhanji R. Prasanna
e301075666 Fix panic on multi-byte chars in filter_json buffer truncation
The buffer truncation code was slicing at a raw byte offset which could
land in the middle of a multi-byte character (like emojis), causing a
panic. Fixed by using char_indices() to find valid character boundaries.

Also added stop_reason field to CompletionChunk initializers in tests
to complete the stop_reason feature addition.

- Fix byte boundary panic in filter_json.rs line 327
- Add test for multi-byte character handling
- Update test files with missing stop_reason field
2026-01-09 15:20:57 +11:00
Dhanji R. Prasanna
c470964628 Fix: Save LLM text response to context after tool execution
When the LLM executes a tool and then outputs text (e.g., analysis after
reading images), the text was being displayed during streaming but never
saved to the context window. This caused:

1. The response to appear truncated in the session log
2. Loss of context for subsequent turns
3. The LLM losing track of what it had already said

The fix saves current_response to the context window before breaking
out of the streaming loop for auto-continue after tool execution.

Reproduction scenario:
- User asks LLM to read images and analyze them
- LLM calls read_image tool
- Tool executes successfully
- LLM outputs analysis text ("Now I can see the results...")
- Text was displayed but lost from session log

Now the text is properly persisted to the context window.
2026-01-09 15:04:43 +11:00
Dhanji R. Prasanna
777191b3cb Remove final_output tool - let summaries stream naturally
- Remove final_output from tool definitions, dispatch, and misc tools
- Update system prompts to request summaries as regular markdown text
- Remove print_final_output from UiWriter trait and all implementations
- Remove final_output handling from agent core logic
- Rename final_output_summary → summary in session continuation
- Delete final_output test files
- Update tool count tests (12→11, 27→26)

This allows LLM summaries to stream through the markdown formatter
for a more natural, responsive user experience instead of buffering
everything into a tool call.
2026-01-09 14:57:24 +11:00
Dhanji R. Prasanna
bebf04c7bd Tighten system prompt 2026-01-09 14:11:19 +11:00
Dhanji R. Prasanna
67be0f20c7 fix: remove allow_multiple_tool_calls config and simplify tool execution flow
This fixes a bug where the agent would stop responding abruptly without
calling final_output. The root cause was the allow_multiple_tool_calls
config option (default: false) which caused the agent to break out of
the streaming loop mid-stream after executing the first tool, losing
any subsequent content.

Changes:
- Remove allow_multiple_tool_calls config option entirely
- Always process all tool calls without breaking mid-stream
- Simplify system prompt generation (no longer needs boolean param)
- Let the stream complete fully before continuing to next iteration
- Change find_last_tool_call_start to find_first_tool_call_start
- Remove parser.reset() call on duplicate detection

Benefits:
- Simpler logic with less conditional branching
- No lost content after tool calls
- Consistent behavior for all users
- Reduced config complexity
2026-01-09 13:28:07 +11:00
Dhanji R. Prasanna
347513b04c Add comprehensive stress tests for streaming markdown formatter
Add 10 stress tests covering:
- Nested formatting (bold in italic, italic in bold)
- Empty/minimal content edge cases
- Escape sequences and special characters
- Lists with complex inline formatting
- Links with various content types
- Tables with formatting in cells
- Code blocks (should not format contents)
- Mixed block elements (headers, quotes, rules)
- Nested lists (3+ levels, mixed types)
- Pathological/adversarial inputs (unbalanced delimiters, unicode, long lines)

All 45 tests pass.
2026-01-08 20:27:28 +11:00
Dhanji R. Prasanna
381b852869 refactor(g3-core): Extract streaming utilities into dedicated module
Extract reusable utilities from the massive stream_completion_with_tools
function into a new streaming.rs module for improved readability:

- format_duration, format_timing_footer: timing display helpers
- clean_llm_tokens: consolidates 4 duplicate token-cleaning call sites
- log_stream_error: extracts 70+ lines of error logging
- is_empty_response, is_connection_error: predicate helpers
- truncate_for_display, truncate_line: string truncation utilities
- StreamingState, IterationState: state structs for future refactoring

Results:
- lib.rs reduced from 2978 to 2840 lines (138 lines, ~5%)
- New streaming.rs: 309 lines with 5 unit tests
- All 98+ tests pass

Agent: carmack
2026-01-08 13:20:11 +11:00
Dhanji R. Prasanna
267ef00848 refactor: extract session helper in webdriver.rs to reduce boilerplate
Agent: carmack

Add get_session() helper function that:
- Checks if webdriver is enabled
- Acquires the session read lock
- Returns the cloned session or an error message

Refactored 12 webdriver tool functions to use this helper:
- execute_webdriver_navigate
- execute_webdriver_get_url
- execute_webdriver_get_title
- execute_webdriver_find_element
- execute_webdriver_find_elements
- execute_webdriver_click
- execute_webdriver_send_keys
- execute_webdriver_execute_script
- execute_webdriver_get_page_source
- execute_webdriver_screenshot
- execute_webdriver_back
- execute_webdriver_forward
- execute_webdriver_refresh

Each function previously had ~10 lines of identical boilerplate.
Now reduced to 4 lines using the helper.

Net reduction: 68 lines (678 -> 610)
All tests pass. Behavior unchanged.
2026-01-08 13:05:44 +11:00