Commit Graph

455 Commits

Author SHA1 Message Date
Dhanji R. Prasanna
48036d01e3 fix(g3-core): disable auto-continue in interactive mode
Auto-continue was incorrectly triggering when the LLM asked questions
in interactive/chat mode. Now auto-continue only activates when
is_autonomous is true, allowing proper back-and-forth conversation
in interactive mode.

Agent: fowler
2026-01-07 10:37:30 +11:00
Dhanji R. Prasanna
a553764e93 docs(agents): add git authorship rule to all agent prompts
Ensure agents never override git author/email and instead put their
identity in the commit message body.

Agent: fowler
2026-01-07 10:27:44 +11:00
Dhanji R. Prasanna
b73dfacb7a refactor(g3-core): extract provider_registration and session modules
Extract two focused modules from the monolithic lib.rs (3372 lines):

1. provider_registration.rs (233 lines)
   - Consolidates duplicated provider registration patterns
   - Single determine_providers_to_register() function for mode-based selection
   - Unified register_providers() async function for all provider types
   - Includes unit tests for registration logic

2. session.rs (394 lines)
   - Session ID generation (generate_session_id)
   - Context window persistence (save_context_window, write_context_window_summary)
   - Error logging (log_error_to_session)
   - Utility functions (format_token_count, token_indicator)
   - Session restoration helper (restore_from_session_log)
   - Includes comprehensive unit tests

Also fixes:
- Removed redundant tool_executed assignment that triggered unused warning
- Removed unused Message import in session.rs

Results:
- lib.rs reduced from 3372 to 2976 lines (-396 lines, -11.7%)
- All tests pass, no warnings
- Behavior preserved (pure mechanical extraction)

Agent: fowler
2026-01-07 10:20:28 +11:00
Dhanji R. Prasanna
c4ae85de72 Add --new-session flag to skip session resumption in agent mode
Adds a new CLI flag that allows users to force a new session when running
in agent mode, bypassing the automatic detection and resumption of
incomplete sessions.

Usage: g3 --agent my-agent --new-session
2026-01-07 09:59:15 +11:00
Dhanji R. Prasanna
f0bd7959b1 chore(analysis): update dependency analysis artifacts
Authored by: Structural Analysis Agent (Euler)

Updated all dependency analysis artifacts with fresh extraction:
- graph.json: Canonical dependency graph with 10 crates, 139 files, 16 crate edges, 72 file edges
- graph.summary.md: Overview with fan-in/fan-out rankings and crate inventory
- sccs.md: SCC analysis confirming no cycles at crate or file level (clean DAG)
- layers.observed.md: 5-layer architecture diagram derived from dependencies
- hotspots.md: Coupling hotspots (g3-config highest fan-in, g3-cli highest fan-out)
- limitations.md: Documented extraction limitations (conditional compilation, macros, etc.)

Key findings:
- All 10 workspace crates form a directed acyclic graph
- g3-core/src/ui_writer.rs has highest file-level fan-in (10 dependents)
- g3-console is standalone with no workspace dependencies
- Clean layered architecture with no violations detected
2026-01-07 09:36:52 +11:00
Dhanji R. Prasanna
ff08a622eb ask all agents to commit their work 2026-01-07 09:31:02 +11:00
Dhanji R. Prasanna
5d20da2609 Add 54 integration tests for CLI, tools, and message serialization
New test files:
- crates/g3-cli/tests/cli_integration_test.rs (14 tests)
  Blackbox CLI tests: help/version flags, argument validation,
  conflicting modes, flock mode requirements

- crates/g3-core/tests/tool_execution_test.rs (20 tests)
  Tool call structure tests and unified diff application:
  read_file, write_file, str_replace, shell, background_process,
  todo, final_output, code_search, take_screenshot

- crates/g3-providers/tests/message_serialization_test.rs (20 tests)
  Round-trip serialization tests for Message, MessageRole,
  CacheControl, and Tool types. Covers Unicode, special chars,
  and edge cases.

All tests follow blackbox/integration-first principles with
documentation of what they protect and intentionally do not assert.
2026-01-07 09:23:34 +11:00
Dhanji R. Prasanna
9cb6282719 update lamport 2026-01-07 09:07:29 +11:00
Dhanji R. Prasanna
311b3bd75a added hopper testing agent and updated fowler to use euler 2026-01-07 09:06:46 +11:00
Dhanji R. Prasanna
e2445a5d22 refactor(g3-core): extract duplicate detection helper and consolidate thinning
- Extract check_duplicate_in_previous_message() helper to reduce nesting
  from 6+ levels to 2 levels in stream_completion_with_tools
- Create do_thin_context() and do_thin_context_all() helpers to centralize
  context thinning with event tracking
- Use provider_config::parse_provider_ref() in additional call sites
- All 295 tests pass

This continues the refactoring to eliminate code-path aliasing and
reduce cyclomatic complexity in the Agent implementation.
2026-01-07 08:45:51 +11:00
Dhanji R. Prasanna
a87928661d Remove overly broad *.json from .gitignore
The blanket *.json ignore is not canonical for Rust projects.
JSON files that need ignoring are already covered by:
- .g3/ for session logs
- logs/ for error logs
- .build for Swift build artifacts
2026-01-06 13:54:27 +11:00
Dhanji R. Prasanna
2d8e733820 Add dependency graph JSON data
Add exception to .gitignore for analysis/deps/graph.json
2026-01-06 13:24:01 +11:00
Dhanji R. Prasanna
6d6aed563d Add structural dependency analysis artifacts
- graph.json: Canonical dependency graph (10 crates, 16 edges, 76 files)
- graph.summary.md: One-page overview with fan-in/fan-out rankings
- sccs.md: Strongly Connected Components analysis (no cycles)
- layers.observed.md: 5-layer architecture diagram
- hotspots.md: Coupling hotspots (g3-config, g3-cli)
- limitations.md: Extraction limitations and validity conditions
2026-01-06 13:23:24 +11:00
Dhanji R. Prasanna
764d1bf67e Add ./tmp/ to .gitignore 2026-01-06 12:50:14 +11:00
Dhanji R. Prasanna
2592fee5d5 Generalize lamport.md examples to be language-agnostic
- Changed Rust-specific examples to generic ones:
  - 'Tool calls must be valid JSON' → 'API responses must be valid JSON'
  - 'Never block the async runtime' → 'Never block the event loop'
  - 'Crate/module' → 'Module/package'
  - 'run cargo test' → 'basic commands'
2026-01-06 12:49:00 +11:00
Dhanji R. Prasanna
e2fffaab94 Slim down AGENTS.md and update lamport.md for machine-specific output
AGENTS.md changes:
- Removed redundant sections that duplicated README.md:
  - System Overview (crate table)
  - File Structure Quick Reference
  - Testing Strategy
  - Pointers to Documentation
  - Architecture Decisions
- Kept unique machine-specific sections:
  - Critical Invariants (merged Performance Constraints)
  - Recommended Entry Points
  - Dangerous/Subtle Code Paths
  - Do's and Don'ts for Automated Changes
  - Common Incorrect Assumptions
  - Dependency Analysis Artifacts
- Reduced from ~220 lines to ~116 lines

lamport.md changes:
- Rewrote AGENTS.md section with explicit instructions
- Added REQUIRED sections list (5 sections only)
- Added DO NOT include list to prevent README duplication
- AGENTS.md now points to README for architecture/usage
2026-01-06 12:46:40 +11:00
Dhanji R. Prasanna
6d2cab93f5 Extend euler.md to require AGENTS.md updates
The Euler agent must now update AGENTS.md after generating artifacts:
- Add/update 'Dependency Analysis Artifacts' section
- Table listing each file in analysis/deps/ with one-line descriptions
- No findings, metrics, or recommendations in AGENTS.md
2026-01-06 12:35:12 +11:00
Dhanji R. Prasanna
9132c441f1 Remove Key findings section from dependency analysis docs 2026-01-06 12:33:48 +11:00
Dhanji R. Prasanna
d695f10604 Document dependency analysis artifacts in AGENTS.md
Added section explaining the analysis/deps/ directory contents:
- graph.json: Raw dependency graph data
- graph.summary.md: Overview metrics and rankings
- sccs.md: Cycle detection results
- layers.observed.md: Layer diagrams
- hotspots.md: Coupling hotspots
- limitations.md: Analysis limitations

Includes key findings from the Euler agent's static analysis.
2026-01-06 12:31:17 +11:00
Dhanji R. Prasanna
386176899e Remove vision tools (except take_screenshot) and macax tools
Vision tools removed:
- extract_text (OCR from image files)
- extract_text_with_boxes (OCR with bounding boxes)
- vision_find_text (find text in app windows)
- vision_click_text (find and click on text)
- vision_click_near_text (click near text labels)

macax tools removed:
- macax_list_apps
- macax_get_frontmost_app
- macax_activate_app
- macax_press_key
- macax_type_text

The LLM can now read images directly via read_image tool.
take_screenshot is retained for capturing application windows.

Files deleted:
- crates/g3-core/src/tools/vision.rs
- crates/g3-core/src/tools/macax.rs
- docs/macax-tools.md

Updated tool counts: 12 core + 15 webdriver = 27 total
2026-01-03 17:38:25 +11:00
Dhanji R. Prasanna
29e263ac49 Fix Unicode space handling in macOS screenshot filenames
macOS uses U+202F (Narrow No-Break Space) in screenshot filenames
between the time and am/pm. When users type or paste these paths,
they use regular spaces, causing file-not-found errors.

Changes:
- Add resolve_path_with_unicode_fallback() to try U+202F variants
- Add resolve_paths_in_shell_command() for shell command paths
- Apply fix to read_file, read_image, and shell tools
- Fix read_image prompt docs: file_path -> file_paths (array)
- Add 6 unit tests for Unicode space normalization
2026-01-03 17:17:08 +11:00
Dhanji R. Prasanna
f7e2f38fe9 lamport run 2026-01-03 16:48:30 +11:00
Dhanji R. Prasanna
f4a1bf5e93 fix agent-mode session resumption bug 2026-01-03 16:44:58 +11:00
Dhanji R. Prasanna
76bfb77f84 further fowler fixes and session fixes 2026-01-03 15:47:04 +11:00
Dhanji R. Prasanna
65867e7f96 refactor tools out of lib.rs 2026-01-03 15:06:34 +11:00
Dhanji R. Prasanna
595ad6ad21 agent mode resumption 2026-01-03 14:50:08 +11:00
Dhanji R. Prasanna
016efc1db6 Prevent agent mode from stopping after first TODO phase
- Add TODO completion check to final_output tool in autonomous mode only
- When incomplete TODO items exist, reject final_output and prompt LLM to continue
- Non-autonomous modes (interactive, chat) are unaffected
- Add 6 tests verifying behavior in both autonomous and non-autonomous modes

Fixes issue where LLM would call final_output after completing first phase,
causing agent to stop prematurely instead of continuing with remaining phases.
2025-12-27 12:35:31 +11:00
Dhanji R. Prasanna
8d071d5eed fix: fowler agent now respects --workspace flag and reads project docs
- Fixed run_agent_mode to call std::env::set_current_dir with workspace_dir
- Updated fowler.md to read README.md and AGENTS.md as part of Triage & Understanding step
2025-12-26 15:24:20 +11:00
Dhanji R. Prasanna
4c25e43ee4 refactoring 2025-12-26 15:16:12 +11:00
Dhanji R. Prasanna
7e59e181f7 context line ui 2025-12-26 12:58:13 +11:00
Dhanji R. Prasanna
666be4ff40 Fix duplicate tool call handling: move tool_executed flag and reset parser
- Move tool_executed = true after duplicate check to prevent auto-continue
  from triggering when only duplicate tools were detected
- Reset parser state when duplicate detected to clear any partial/polluted
  state from LLM stuttering or example tool calls in markdown blocks
2025-12-26 11:55:57 +11:00
Dhanji R. Prasanna
46611d9e13 Improve read_image output formatting
- Add newline after └─ before first image preview
- Show only filename (not full path) in info line
2025-12-26 11:36:10 +11:00
Dhanji R. Prasanna
2a4dad2842 Update read_image output with box drawing characters
- Print └─ before images to break out of tool output box
- Print ┌─ after images to resume tool output box
- Remove │ prefix from image preview and info lines
- Info line uses single space prefix, dimmed text
- Only include error messages in tool result (success info printed via imgcat)
2025-12-26 11:29:33 +11:00
Dhanji R. Prasanna
e688d3b29f Simplify read_image imgcat output formatting
- Remove │ prefix before image preview, use single space instead
- Keep info line on its own line with │ prefix
- Keep blank line spacing between images
2025-12-26 11:24:13 +11:00
Dhanji R. Prasanna
3601cc0547 Enhance read_image tool with magic byte detection and multi-image support
- Fix media type detection using magic bytes instead of file extension
  - Correctly identifies JPEG files with .png extension (and vice versa)
  - Supports PNG, JPEG, GIF, and WebP formats

- Add multi-image support with file_paths array parameter
  - Load multiple images in a single tool call
  - All images queued for LLM analysis

- Enhanced CLI output:
  - Inline image preview via iTerm2 imgcat protocol (height=5)
  - Dimmed info line showing: path | dimensions | media type | file size
  - Proper │ prefix alignment with tool output boxing
  - Human-readable file sizes (bytes, KB, MB)

- Add image dimension extraction from file headers
  - PNG, JPEG, GIF, WebP dimension parsing

- Add comprehensive tests for magic byte detection and dimensions
2025-12-26 11:19:37 +11:00
Dhanji R. Prasanna
3ece02ff31 fix: resolve compiler warnings across crates
- Remove unused assignment to final_output_called (returns immediately after)
- Mark cache_config field as #[allow(dead_code)] (reserved for future use)
- Mark print_status_line method as #[allow(dead_code)] (reserved for future use)
2025-12-25 18:47:22 +11:00
Dhanji R. Prasanna
258f9878ff style: use ◉ symbol for token count in timing footer
Changes '227tk | 48% ctx' to '227 ◉ | 48%' for a cleaner look.
2025-12-25 18:40:17 +11:00
Dhanji R. Prasanna
d09c80180e fix: remove redundant TODO list header that breaks boxing effect 2025-12-25 18:34:51 +11:00
Dhanji R. Prasanna
64f27c0abc feat: move TODO lists to session-scoped directories
TODO lists are now stored in .g3/sessions/<session_id>/todo.g3.md instead
of the workspace root. This prevents different g3 sessions from accidentally
picking up or overwriting each other's TODOs.

Changes:
- Add get_session_todo_path() function in paths.rs
- Update todo_read/todo_write handlers to use session-specific paths
- Remove TODO loading at Agent initialization (sessions start fresh)
- Update prompts to reflect session-scoped behavior

Fallback behavior preserved for planner mode (G3_TODO_PATH env var).
2025-12-25 18:33:03 +11:00
Dhanji R. Prasanna
d9c58576a1 feat: add background_process tool for launching long-running processes
Adds a new tool that allows launching processes (like game servers) in the
background while g3 continues to operate. The process runs independently
with stdout/stderr captured to a log file.

Features:
- Named process tracking for easy reference
- Automatic log capture to logs/background_processes/
- Returns PID and log file path for use with shell tool
- Automatic cleanup on agent shutdown via Drop trait

Usage: Use shell tool to interact with the process:
- Read logs: tail -100 <logfile>
- Check status: ps -p <pid>
- Stop process: kill <pid>

Files:
- New: crates/g3-core/src/background_process.rs
- New: crates/g3-core/tests/background_process_demo_test.rs
- Modified: crates/g3-core/src/lib.rs (tool definition + handler)
- Modified: crates/g3-core/src/prompts.rs (documentation)
2025-12-25 18:23:10 +11:00
Dhanji R. Prasanna
9ff5ba6098 Fix auto-continue false positives from tool-call-like content
When the LLM outputs text containing tool call patterns (e.g., reading
log files, showing examples, or discussing tool calls), the parser's
has_unexecuted_tool_call() would detect these as real tool calls and
trigger auto-continue, leading to repeated empty responses.

The fix: mark the parser buffer as consumed when content is displayed.
This prevents tool-call-like patterns in displayed text from triggering
false positives later. The fix is safe because:

1. Only runs when no tool was detected (inside 'if !tool_executed')
2. Legitimate tool calls are detected first by process_chunk()
3. Matches existing pattern of calling mark_tool_calls_consumed()
   after tool execution
2025-12-25 17:55:13 +11:00
Dhanji R. Prasanna
f9d0c33461 Revert "Fix auto-continue bug: ensure assistant message before continue prompt"
This reverts commit fe96969adb.
2025-12-24 15:52:23 +11:00
Dhanji R. Prasanna
fe96969adb Fix auto-continue bug: ensure assistant message before continue prompt
The auto-continue logic was adding User continue prompts without first
adding an Assistant message when the LLM returned an empty response.
This caused consecutive User messages in the conversation history,
which confused the LLM and caused it to return more empty responses.

The fix ensures an Assistant message is always added before the continue
prompt, using '[empty response]' as a placeholder when the LLM returned
nothing substantive. This maintains proper User/Assistant alternation.
2025-12-24 15:50:30 +11:00
Dhanji R. Prasanna
cd64ebbf87 Add tokens consumed and context percentage to per-tool timing footer
The per-tool timing line now shows:
- Tokens delta (tokens added to context by this tool call)
- Context window usage percentage

Example: └─ ️ 1ms  523tk | 49% ctx

Changes:
- Updated UiWriter trait print_tool_timing signature
- Track tokens before/after adding tool messages to calculate delta
- Updated ConsoleUiWriter, MachineUiWriter, PlannerUiWriter, and test mocks
2025-12-24 15:44:19 +11:00
Dhanji R. Prasanna
fd22ce9890 refactor(g3-core): extract 4 modules from monolithic lib.rs
Reduce lib.rs from 7481 to 6557 lines (-12.4%) by extracting:

- paths.rs: Session/workspace path utilities (get_todo_path, get_logs_dir, etc.)
- streaming_parser.rs: StreamingToolParser for LLM response parsing
- utils.rs: Diff parsing and shell escaping utilities
- webdriver_session.rs: Unified Safari/Chrome WebDriver abstraction

All public APIs preserved via re-exports for backward compatibility.
Added 13 new unit tests across extracted modules.
All 225 tests pass.
2025-12-24 14:32:39 +11:00
Dhanji R. Prasanna
382b905441 duplicate output fix 2025-12-23 17:20:23 +11:00
Dhanji R. Prasanna
ed246ce434 consolidate .g3/session -> .g3/sessions/* 2025-12-23 16:22:12 +11:00
Dhanji R. Prasanna
0b023b610f Update README with recent improvements
- Added section on Tool Call Duplicate Detection explaining the
  sequential-only duplicate prevention logic
- Added section on Timing Footer showing token usage and context %
- Updated Logging note to mention INFO->DEBUG conversion for cleaner CLI
2025-12-22 17:32:39 +11:00
Dhanji R. Prasanna
743d622468 Add token usage and context % to timing footer
Added a quality-of-life feature that displays:
- Tokens used in the current turn (from LLM response, not estimated)
- Current context window usage percentage

These are displayed dimmed after the timing info:
  ⏱️ 1.2s | 💭 0.3s  1234tk | 45% ctx

The token count comes directly from the LLM's usage response data,
not from any estimation. If no usage data is available from the LLM,
only the context percentage is shown.
2025-12-22 17:22:54 +11:00
Dhanji R. Prasanna
720ad8cad7 Merge branch 'dhanji/fix-auto-continue': Fix auto-continue and duplicate detection bugs 2025-12-22 17:12:24 +11:00