Commit Graph

199 Commits

Author SHA1 Message Date
Dhanji R. Prasanna
4962f439f3 Simplify agent mode working directory display
Change from: 📁 Working directory: "/Users/dhanji/src/g3"
To: -> ~/src/g3

Replaces home directory with ~ for cleaner output.
2026-01-11 17:20:26 +05:30
Dhanji R. Prasanna
f83ae7fd39 Add status line showing loaded context in agent mode
Shows checkmarks for README, AGENTS.md, and Memory if loaded,
or dots if not found. Displayed below the working directory line.
2026-01-11 17:13:32 +05:30
Dhanji R. Prasanna
2b87a89617 Revert "Add fancy ASCII art header for agent mode"
This reverts commit 08747595a1.
2026-01-11 17:12:32 +05:30
Dhanji R. Prasanna
08747595a1 Add fancy ASCII art header for agent mode
The agent mode header now shows:
- Agent name in uppercase with box art
- Working directory (truncated if too long)
- Status indicators for README, AGENTS.md, and Memory loading
- Task preview if provided

Also exports truncate_for_display and adds truncate_path_for_display
helper functions in project_files module.
2026-01-11 17:11:14 +05:30
Dhanji R. Prasanna
2fbdac7aa9 Fix extra newlines before tool calls in JSON filter
The JSON tool call filter was outputting newlines immediately as they
were encountered. When the LLM output contained multiple newlines before
a tool call, each newline was output before the tool call JSON was
detected and suppressed, leaving orphaned blank lines in the output.

Changes:
- Add pending_newlines field to FilterState to buffer newlines at line start
- First newline after content is output immediately, subsequent ones buffered
- When tool call confirmed, pending_newlines cleared (suppressing extra blanks)
- When not a tool call, pending_newlines output with the buffer
- Add flush_json_tool_filter() to flush pending content at end of streaming
- Update tests to reflect new behavior
- Add tests for newline suppression behavior
2026-01-11 17:04:27 +05:30
Dhanji R. Prasanna
cf3727f50d refactor(g3-cli): Extract focused modules from lib.rs for improved readability
Extract three cohesive modules from the monolithic lib.rs (3188 -> 2785 lines):

- metrics.rs (147 lines): Turn metrics tracking and histogram generation
  - TurnMetrics struct
  - format_elapsed_time() for human-readable durations
  - generate_turn_histogram() for performance visualization
  - Added unit tests for core functions

- project_files.rs (181 lines): Project file reading utilities
  - read_agents_config() for AGENTS.md loading
  - read_project_readme() for README detection
  - read_project_memory() for .g3/memory.md
  - extract_readme_heading() for display
  - Added unit tests

- coach_feedback.rs (129 lines): Coach feedback extraction from session logs
  - extract_from_logs() main entry point
  - Helper functions for log parsing and text extraction

All modules have clear single responsibilities, improved documentation,
and maintain identical behavior to the original inline functions.

Agent: carmack
2026-01-11 16:41:41 +05:30
Dhanji R. Prasanna
9c71d12561 style: change agent mode tool color from royal blue to light gray 2026-01-11 16:26:20 +05:30
Dhanji R. Prasanna
74a18794a0 fix: load AGENTS.md and memory in agent mode
Agent mode was only loading README.md but not AGENTS.md or project
memory (.g3/memory.md). This meant agents were missing important
context that normal mode had access to.

Now agent mode uses the same read_agents_config(), read_project_readme(),
and read_project_memory() functions as normal mode, combining all three
into the agent context.
2026-01-11 16:15:58 +05:30
Dhanji R. Prasanna
1d884251cb refactor(cli): remove duplicate agent mode check in run()
The same if-let block checking for agent mode was duplicated,
causing dead code on the second check. Removed the duplicate.

Agent: fowler
2026-01-11 16:14:50 +05:30
Dhanji R. Prasanna
cfd5d69cce refactor: auto-enable auto-memory in agent mode
Simplify auto-memory by always enabling it in agent mode instead of
requiring the --auto-memory flag. This makes sense because:
- Agent mode is non-interactive, so blocking is acceptable
- Agents benefit from automatically saving discoveries to memory
- Reduces flag complexity for users

The --auto-memory flag still works for other modes if desired.
2026-01-11 15:56:27 +05:30
Dhanji R. Prasanna
1575cafc4b fix: add --auto-memory support to agent mode
The --auto-memory flag was not being passed to run_agent_mode() and
send_auto_memory_reminder() was not being called after agent task
execution.

Changes:
- Pass auto_memory parameter to run_agent_mode()
- Add auto_memory parameter to run_agent_mode() function signature
- Call agent.set_auto_memory(true) when flag is enabled
- Call send_auto_memory_reminder() after execute_task() in agent mode
2026-01-11 08:03:46 +08:00
Dhanji R. Prasanna
280ae1fcbb feat: add --auto-memory flag to prompt LLM to save discoveries
Adds a new --auto-memory CLI flag that automatically sends a reminder
to the LLM after each turn where tools were called, prompting it to
call the remember tool if it discovered any key code locations.

Changes:
- Add auto_memory field and set_auto_memory() method to Agent
- Add tool_calls_this_turn tracking in execute_tool_in_dir()
- Add send_auto_memory_reminder() that sends reminder after tool use
- Add --auto-memory CLI flag and wire it up in console/machine modes
- Call send_auto_memory_reminder() in single-shot and interactive modes
- Add visible status messages for auto-memory actions

Fixes bug where tool calls were not being tracked when execute_tool_in_dir
was called directly with working_dir=None.
2026-01-11 08:00:51 +08:00
Dhanji R. Prasanna
39918cf281 fix: process bold/italic/code formatting inside markdown headers
The format_header() function was not calling format_inline_content()
to process inline formatting like **bold**, *italic*, and `code`
within headers. This caused raw markdown markers to appear in output.

Added 4 tests to verify the fix:
- test_bold_inside_header
- test_italic_inside_header
- test_code_inside_header
- test_mixed_formatting_inside_header
2026-01-11 08:00:34 +08:00
Dhanji R. Prasanna
fc9a2f835a Fix streaming markdown code fence detection bug
The code fence (```) was not being properly detected during streaming,
causing it to be rendered as inline code instead of a code block.

Root cause: When buffering a code fence after seeing ```, the code
was returning early for ALL characters including newlines. This meant
handle_newline() was never called and block_state was never set to
BlockState::CodeBlock.

Fixes:
- Don't return early for newlines when buffering code fence, allow them
  to fall through to handle_newline()
- Support indented code fences (up to 3 spaces per CommonMark spec) by
  using trim_start() when checking for ``` at line start
2026-01-11 07:42:02 +08:00
Dhanji R. Prasanna
e731bc8217 Make remember tool instructions more imperative in system prompts
- Change 'call remember' to 'you MUST call remember' in native prompt
- Change 'IF you discovered' to 'ALWAYS...when you discovered'
- Add explicit list of trigger tools (code_search, rg, grep, find, read_file)
- Add reminder to Response Guidelines section
- Add remember tool and Project Memory section to non-native prompt
- Remove redundant console output from remember tool
- Fix test compilation errors (missing summary parameter, temporary borrow)
2026-01-11 06:49:45 +08:00
Dhanji R. Prasanna
33c1aba86e Show human-readable descriptions in /resume session list
- Add description field to SessionContinuation struct
- Extract first user message (truncated to ~60 chars at word boundary)
- Display as quoted text instead of session ID hash
- Fall back to session ID if no description available

Example: [2 hours ago] 'when I call /resume it only shows me 2 sessions...'
2026-01-11 06:22:20 +08:00
Dhanji R. Prasanna
3fcef587e8 Fix /resume to show all sessions and use human-readable timestamps
- Change run_autonomous to return Agent instead of () so session
  continuation is properly saved in accumulative mode
- Update format_session_time to show relative times ("2 hours ago",
  "yesterday") for recent sessions and dates for older ones
- Handle Ctrl+C cancellation gracefully with informative message
2026-01-11 06:13:27 +08:00
Dhanji R. Prasanna
8926775acb Add session continuation symlink fix and /resume command
Fix session detection:
- Add save_session_continuation() calls at all session exit points
- Sessions now properly create .g3/session symlink for resume detection
- Fixes issue where g3 wasn't offering to resume previous sessions

Add /resume command:
- New list_sessions_for_directory() to scan available sessions
- New switch_to_session() method to safely switch between sessions
- Shows numbered list with timestamps, context %, and TODO status
- Saves current session before switching (can be resumed later)
- Restores full context if <80% used, otherwise uses summary
- Machine mode supports /resume and /resume <number>

Documentation:
- Add /clear and /resume to CONTROL_COMMANDS.md
- Update /help output with new commands
2026-01-11 05:30:58 +08:00
Dhanji R. Prasanna
9bef7753bf Add Chrome headless diagnostic tool
Runs automatically when --chrome-headless flag is used, checking:
- ChromeDriver installation and PATH
- Chrome/Chromium installation
- Chrome and ChromeDriver version compatibility
- config.toml chrome_binary setting
- Chrome for Testing installation
- ChromeDriver executable permissions (macOS quarantine)

Displays a detailed report with:
- Summary of detected versions and paths
- Pass/warning/error status for each check
- Specific fix suggestions for any issues found

Users can then ask g3 to help fix any detected issues.
2026-01-10 20:44:23 +11:00
Dhanji R. Prasanna
ea582766ba chrome-headless falg 2026-01-10 16:14:14 +11:00
Dhanji R. Prasanna
6be0a03c4c Fix timing footer being saved to context window
The timing footer (e.g., ⏱️ 19.4s | 💭 4.7s) was being saved to the
conversation history as a separate assistant message. This happened
because stream_completion_with_tools returns the timing footer in
TaskResult.response for display, but the caller was also saving it
to context.

Fix: Strip the timing footer (identified by \n\n⏱️) before saving
to context window. The timing footer remains display-only.

Also includes:
- Research tool blank line fix: only add visual separator for research
  tool output, not all tools
- Research tool webdriver propagation: pass parent's webdriver browser
  choice (Safari vs Chrome headless) to scout subprocess
2026-01-10 15:55:59 +11:00
Dhanji R. Prasanna
68c9135913 Fix research tool UI: remove duplicate header, add footer spacing, remove spinner, widen command display
- Remove duplicate tool header (lib.rs already prints it)
- Add newline before timing footer for visual separation
- Remove spinner animation (incompatible with update_tool_output_line)
- Change shell command format to " > `cmd` ..." with 60 char width
2026-01-10 15:20:40 +11:00
Dhanji R. Prasanna
0aa1287ca6 Remove final_output tool and improve scout report handback
final_output removal:
- Remove final_output from tool definitions and dispatch
- Update system prompts to request summaries as regular text
- Remove final_output_called field from StreamingState
- Update auto_continue tests to remove final_output_called parameter
- Remove final_output test from tool_execution_test.rs
- Update planner and flock prompts to not reference final_output
- Keep backwards-compat code in feedback_extraction.rs and task_result.rs

Scout report handback:
- Change from file-based to delimiter-based report extraction
- Scout outputs report between ---SCOUT_REPORT_START/END--- markers
- Research tool extracts content between markers, strips ANSI codes
- Add comprehensive tests for extraction and ANSI stripping

657 tests pass.
2026-01-10 13:43:04 +11:00
Dhanji R. Prasanna
c88ffa2431 Remove final_output tool, improve scout agent
- Remove final_output tool to allow LLM responses to stream naturally
- Update system prompts to request summaries instead of tool calls
- Rename final_output_summary to summary in session continuation
- Update tool count tests (12→11 core tools, 27→26 total)
- Delete obsolete final_output tests

Scout agent improvements:
- Simplify WebDriver usage instructions
- Prefer DuckDuckGo/Brave/Bing over Google
- Support passing task directly to agent mode
- Suppress completion message for scout (needs clean output for research tool)
2026-01-09 20:30:00 +11:00
Dhanji R. Prasanna
e301075666 Fix panic on multi-byte chars in filter_json buffer truncation
The buffer truncation code was slicing at a raw byte offset which could
land in the middle of a multi-byte character (like emojis), causing a
panic. Fixed by using char_indices() to find valid character boundaries.

Also added stop_reason field to CompletionChunk initializers in tests
to complete the stop_reason feature addition.

- Fix byte boundary panic in filter_json.rs line 327
- Add test for multi-byte character handling
- Update test files with missing stop_reason field
2026-01-09 15:20:57 +11:00
Dhanji R. Prasanna
777191b3cb Remove final_output tool - let summaries stream naturally
- Remove final_output from tool definitions, dispatch, and misc tools
- Update system prompts to request summaries as regular markdown text
- Remove print_final_output from UiWriter trait and all implementations
- Remove final_output handling from agent core logic
- Rename final_output_summary → summary in session continuation
- Delete final_output test files
- Update tool count tests (12→11, 27→26)

This allows LLM summaries to stream through the markdown formatter
for a more natural, responsive user experience instead of buffering
everything into a tool call.
2026-01-09 14:57:24 +11:00
Dhanji R. Prasanna
d96d8c1d90 Rewrite JSON tool call filter with clean state machine
Fixes bug where JSON tool calls were printed as text due to chunking issues.

Changes:
- Complete rewrite of filter_json.rs with 3-state machine:
  - Streaming: normal pass-through, watches for newline + whitespace + {
  - Buffering: confirms/denies tool pattern with ~20 char buffer
  - Suppressing: string-aware brace counting until balanced
- Character-by-character processing eliminates chunk boundary issues
- Proper handling of } inside JSON strings (was causing premature exit)
- Detects truncated JSON followed by complete JSON (LLM retry case)
- Removed regex dependency, simpler pattern matching
- Added 59 stress tests covering malformed JSON, partial patterns,
  streaming edge cases, adversarial inputs, and real-world patterns

All 86 filter_json tests pass.
2026-01-09 14:05:11 +11:00
Dhanji R. Prasanna
49b27b0cbc fix: truncate long lines in streaming tool output to prevent terminal wrapping
When shell commands output very long lines (e.g., JSON content from
tail -c 10000), the lines would wrap in the terminal. The cursor-up
escape code (\x1b[1A) only moves up one visual line, not the entire
wrapped content, causing the display to fill with uncleared text.

This fix truncates lines to 120 characters in update_tool_output_line()
before displaying them, preventing the wrapping issue.
2026-01-09 13:35:58 +11:00
Dhanji R. Prasanna
67be0f20c7 fix: remove allow_multiple_tool_calls config and simplify tool execution flow
This fixes a bug where the agent would stop responding abruptly without
calling final_output. The root cause was the allow_multiple_tool_calls
config option (default: false) which caused the agent to break out of
the streaming loop mid-stream after executing the first tool, losing
any subsequent content.

Changes:
- Remove allow_multiple_tool_calls config option entirely
- Always process all tool calls without breaking mid-stream
- Simplify system prompt generation (no longer needs boolean param)
- Let the stream complete fully before continuing to next iteration
- Change find_last_tool_call_start to find_first_tool_call_start
- Remove parser.reset() call on duplicate detection

Benefits:
- Simpler logic with less conditional branching
- No lost content after tool calls
- Consistent behavior for all users
- Reduced config complexity
2026-01-09 13:28:07 +11:00
Dhanji R. Prasanna
a72d5a650a Fix two markdown formatting bugs
Bug 1: Inline code after list bullets not detected
- After emitting a list bullet, at_line_start was not set to false
- This caused the next backtick to be treated as a potential code fence
- Fixed by setting at_line_start = false after emitting bullet

Bug 2: Code block closing on indented backticks
- Code blocks containing indented ``` (4+ spaces) were closing prematurely
- The .trim() check was too permissive
- Fixed by only allowing closing fence with <= 3 spaces indent (CommonMark spec)

Added tests for both edge cases.
2026-01-08 20:50:26 +11:00
Dhanji R. Prasanna
19a804e0be Add syntax highlighting for Racket, Elisp, and Scheme
Add language alias mapping in highlight_code() to map:
- racket, rkt -> lisp
- elisp, emacs-lisp -> lisp
- scheme -> lisp
- common-lisp, cl -> lisp
- shell, sh, zsh, dockerfile -> bash

Syntect's built-in Lisp syntax handles all Lisp-family languages well.
Added test to verify the aliases work correctly.
2026-01-08 20:35:34 +11:00
Dhanji R. Prasanna
df706308ca Unify final_output rendering with streaming markdown formatter
Replace the separate syntax_highlight module with the streaming markdown
formatter for final_output rendering. This:

- Removes special buffered rendering logic for final_output
- Uses the same StreamingMarkdownFormatter used for agent responses
- Removes the spinner animation (content renders immediately)
- Deletes the now-unused syntax_highlight.rs module
- Updates test to use the streaming formatter

Benefits:
- Consistent rendering across all markdown output
- Less code to maintain (removed ~250 lines)
- Same syntax highlighting via syntect (already in streaming formatter)
2026-01-08 20:30:44 +11:00
Dhanji R. Prasanna
347513b04c Add comprehensive stress tests for streaming markdown formatter
Add 10 stress tests covering:
- Nested formatting (bold in italic, italic in bold)
- Empty/minimal content edge cases
- Escape sequences and special characters
- Lists with complex inline formatting
- Links with various content types
- Tables with formatting in cells
- Code blocks (should not format contents)
- Mixed block elements (headers, quotes, rules)
- Nested lists (3+ levels, mixed types)
- Pathological/adversarial inputs (unbalanced delimiters, unicode, long lines)

All 45 tests pass.
2026-01-08 20:27:28 +11:00
Dhanji R. Prasanna
5bfaee8dd5 use consistent naming for compaction 2026-01-08 12:54:03 +11:00
Dhanji R. Prasanna
4e7aca50fa feat: royal blue tool names in agent mode + fix README heading display
- Add set_agent_mode() to UiWriter trait for visual mode differentiation
- ConsoleUiWriter uses royal blue (ANSI 256 color 69) for tool names in agent mode
- Fix extract_readme_heading() to search only README section of combined content
  (was incorrectly showing AGENTS.md heading instead of README heading)
2026-01-07 11:37:51 +11:00
Dhanji R. Prasanna
1056b4193b chore(g3-cli): remove orphaned retro_tui and tui modules
These files were not referenced anywhere in the codebase and appear
to be leftover from a previous TUI implementation that was abandoned.

Removed:
- crates/g3-cli/src/retro_tui.rs (62KB)
- crates/g3-cli/src/tui.rs (6KB)

Agent: fowler
2026-01-07 10:39:42 +11:00
Dhanji R. Prasanna
c4ae85de72 Add --new-session flag to skip session resumption in agent mode
Adds a new CLI flag that allows users to force a new session when running
in agent mode, bypassing the automatic detection and resumption of
incomplete sessions.

Usage: g3 --agent my-agent --new-session
2026-01-07 09:59:15 +11:00
Dhanji R. Prasanna
5d20da2609 Add 54 integration tests for CLI, tools, and message serialization
New test files:
- crates/g3-cli/tests/cli_integration_test.rs (14 tests)
  Blackbox CLI tests: help/version flags, argument validation,
  conflicting modes, flock mode requirements

- crates/g3-core/tests/tool_execution_test.rs (20 tests)
  Tool call structure tests and unified diff application:
  read_file, write_file, str_replace, shell, background_process,
  todo, final_output, code_search, take_screenshot

- crates/g3-providers/tests/message_serialization_test.rs (20 tests)
  Round-trip serialization tests for Message, MessageRole,
  CacheControl, and Tool types. Covers Unicode, special chars,
  and edge cases.

All tests follow blackbox/integration-first principles with
documentation of what they protect and intentionally do not assert.
2026-01-07 09:23:34 +11:00
Dhanji R. Prasanna
386176899e Remove vision tools (except take_screenshot) and macax tools
Vision tools removed:
- extract_text (OCR from image files)
- extract_text_with_boxes (OCR with bounding boxes)
- vision_find_text (find text in app windows)
- vision_click_text (find and click on text)
- vision_click_near_text (click near text labels)

macax tools removed:
- macax_list_apps
- macax_get_frontmost_app
- macax_activate_app
- macax_press_key
- macax_type_text

The LLM can now read images directly via read_image tool.
take_screenshot is retained for capturing application windows.

Files deleted:
- crates/g3-core/src/tools/vision.rs
- crates/g3-core/src/tools/macax.rs
- docs/macax-tools.md

Updated tool counts: 12 core + 15 webdriver = 27 total
2026-01-03 17:38:25 +11:00
Dhanji R. Prasanna
76bfb77f84 further fowler fixes and session fixes 2026-01-03 15:47:04 +11:00
Dhanji R. Prasanna
595ad6ad21 agent mode resumption 2026-01-03 14:50:08 +11:00
Dhanji R. Prasanna
8d071d5eed fix: fowler agent now respects --workspace flag and reads project docs
- Fixed run_agent_mode to call std::env::set_current_dir with workspace_dir
- Updated fowler.md to read README.md and AGENTS.md as part of Triage & Understanding step
2025-12-26 15:24:20 +11:00
Dhanji R. Prasanna
7e59e181f7 context line ui 2025-12-26 12:58:13 +11:00
Dhanji R. Prasanna
258f9878ff style: use ◉ symbol for token count in timing footer
Changes '227tk | 48% ctx' to '227 ◉ | 48%' for a cleaner look.
2025-12-25 18:40:17 +11:00
Dhanji R. Prasanna
cd64ebbf87 Add tokens consumed and context percentage to per-tool timing footer
The per-tool timing line now shows:
- Tokens delta (tokens added to context by this tool call)
- Context window usage percentage

Example: └─ ️ 1ms  523tk | 49% ctx

Changes:
- Updated UiWriter trait print_tool_timing signature
- Track tokens before/after adding tool messages to calculate delta
- Updated ConsoleUiWriter, MachineUiWriter, PlannerUiWriter, and test mocks
2025-12-24 15:44:19 +11:00
Dhanji R. Prasanna
923def0ab2 Convert all INFO logs to DEBUG to reduce CLI noise
Converted ~77 info! macro calls to debug! across the codebase to prevent
log messages from interrupting the CLI experience during normal operation.
Users can still see these logs by setting RUST_LOG=debug if needed.

Affected crates:
- g3-cli
- g3-computer-control
- g3-console
- g3-core
- g3-ensembles
- g3-execution
- g3-providers
2025-12-22 16:27:35 +11:00
Dhanji R. Prasanna
38fcaaf449 Add edge case tests for filter_json_tool_calls
- test_brace_inside_json_string_value: braces inside JSON strings
- test_multiple_braces_in_string: multiple braces in string values
- test_escaped_quotes_with_braces: escaped quotes with braces
- test_brace_in_string_across_chunks: streaming with braces in strings
- test_complex_nested_with_string_braces: nested JSON with string braces
- test_str_replace_with_diff_content: real-world str_replace case
- test_tool_call_after_other_content: tool call after other output
- test_tool_call_with_nested_tool_pattern_in_string: nested patterns

All 27 tests pass.
2025-12-22 13:30:57 +11:00
Dhanji R. Prasanna
3bc254962c clean up filter_json a bit (more to come) 2025-12-22 12:03:09 +11:00
Dhanji R. Prasanna
01a5284d6d Move fixed_filter_json from g3-core to g3-cli
Properly separates UI display concern from core library:
- fixed_filter_json module now lives in g3-cli (UI layer)
- UiWriter trait gains filter_json_tool_calls() and reset_json_filter() methods
- g3-core delegates filtering to UI layer via trait methods
- Different UiWriter implementations can choose their own filtering behavior
- ConsoleUiWriter filters JSON tool calls for clean terminal display
- MachineUiWriter/NullUiWriter use default pass-through

Benefits:
- Proper separation of concerns
- Core stays clean without display-specific logic
- Testability - filter can be tested independently in g3-cli
2025-12-22 10:32:21 +11:00
Dhanji R. Prasanna
fbf31e5f68 Fix continuation errors: auto-continue when final_output not called
- Add final_output_called flag to track if LLM properly completed
- Auto-continue with prompt if tools executed but final_output missing
- Remove unused last_action_was_tool and any_text_response variables
- Simplifies previous complex incomplete response detection logic
2025-12-20 15:32:12 +11:00