Commit Graph

401 Commits

Author SHA1 Message Date
Dhanji R. Prasanna
f0bd7959b1 chore(analysis): update dependency analysis artifacts
Authored by: Structural Analysis Agent (Euler)

Updated all dependency analysis artifacts with fresh extraction:
- graph.json: Canonical dependency graph with 10 crates, 139 files, 16 crate edges, 72 file edges
- graph.summary.md: Overview with fan-in/fan-out rankings and crate inventory
- sccs.md: SCC analysis confirming no cycles at crate or file level (clean DAG)
- layers.observed.md: 5-layer architecture diagram derived from dependencies
- hotspots.md: Coupling hotspots (g3-config highest fan-in, g3-cli highest fan-out)
- limitations.md: Documented extraction limitations (conditional compilation, macros, etc.)

Key findings:
- All 10 workspace crates form a directed acyclic graph
- g3-core/src/ui_writer.rs has highest file-level fan-in (10 dependents)
- g3-console is standalone with no workspace dependencies
- Clean layered architecture with no violations detected
2026-01-07 09:36:52 +11:00
Dhanji R. Prasanna
ff08a622eb ask all agents to commit their work 2026-01-07 09:31:02 +11:00
Dhanji R. Prasanna
5d20da2609 Add 54 integration tests for CLI, tools, and message serialization
New test files:
- crates/g3-cli/tests/cli_integration_test.rs (14 tests)
  Blackbox CLI tests: help/version flags, argument validation,
  conflicting modes, flock mode requirements

- crates/g3-core/tests/tool_execution_test.rs (20 tests)
  Tool call structure tests and unified diff application:
  read_file, write_file, str_replace, shell, background_process,
  todo, final_output, code_search, take_screenshot

- crates/g3-providers/tests/message_serialization_test.rs (20 tests)
  Round-trip serialization tests for Message, MessageRole,
  CacheControl, and Tool types. Covers Unicode, special chars,
  and edge cases.

All tests follow blackbox/integration-first principles with
documentation of what they protect and intentionally do not assert.
2026-01-07 09:23:34 +11:00
Dhanji R. Prasanna
9cb6282719 update lamport 2026-01-07 09:07:29 +11:00
Dhanji R. Prasanna
311b3bd75a added hopper testing agent and updated fowler to use euler 2026-01-07 09:06:46 +11:00
Dhanji R. Prasanna
e2445a5d22 refactor(g3-core): extract duplicate detection helper and consolidate thinning
- Extract check_duplicate_in_previous_message() helper to reduce nesting
  from 6+ levels to 2 levels in stream_completion_with_tools
- Create do_thin_context() and do_thin_context_all() helpers to centralize
  context thinning with event tracking
- Use provider_config::parse_provider_ref() in additional call sites
- All 295 tests pass

This continues the refactoring to eliminate code-path aliasing and
reduce cyclomatic complexity in the Agent implementation.
2026-01-07 08:45:51 +11:00
Dhanji R. Prasanna
a87928661d Remove overly broad *.json from .gitignore
The blanket *.json ignore is not canonical for Rust projects.
JSON files that need ignoring are already covered by:
- .g3/ for session logs
- logs/ for error logs
- .build for Swift build artifacts
2026-01-06 13:54:27 +11:00
Dhanji R. Prasanna
2d8e733820 Add dependency graph JSON data
Add exception to .gitignore for analysis/deps/graph.json
2026-01-06 13:24:01 +11:00
Dhanji R. Prasanna
6d6aed563d Add structural dependency analysis artifacts
- graph.json: Canonical dependency graph (10 crates, 16 edges, 76 files)
- graph.summary.md: One-page overview with fan-in/fan-out rankings
- sccs.md: Strongly Connected Components analysis (no cycles)
- layers.observed.md: 5-layer architecture diagram
- hotspots.md: Coupling hotspots (g3-config, g3-cli)
- limitations.md: Extraction limitations and validity conditions
2026-01-06 13:23:24 +11:00
Dhanji R. Prasanna
764d1bf67e Add ./tmp/ to .gitignore 2026-01-06 12:50:14 +11:00
Dhanji R. Prasanna
2592fee5d5 Generalize lamport.md examples to be language-agnostic
- Changed Rust-specific examples to generic ones:
  - 'Tool calls must be valid JSON' → 'API responses must be valid JSON'
  - 'Never block the async runtime' → 'Never block the event loop'
  - 'Crate/module' → 'Module/package'
  - 'run cargo test' → 'basic commands'
2026-01-06 12:49:00 +11:00
Dhanji R. Prasanna
e2fffaab94 Slim down AGENTS.md and update lamport.md for machine-specific output
AGENTS.md changes:
- Removed redundant sections that duplicated README.md:
  - System Overview (crate table)
  - File Structure Quick Reference
  - Testing Strategy
  - Pointers to Documentation
  - Architecture Decisions
- Kept unique machine-specific sections:
  - Critical Invariants (merged Performance Constraints)
  - Recommended Entry Points
  - Dangerous/Subtle Code Paths
  - Do's and Don'ts for Automated Changes
  - Common Incorrect Assumptions
  - Dependency Analysis Artifacts
- Reduced from ~220 lines to ~116 lines

lamport.md changes:
- Rewrote AGENTS.md section with explicit instructions
- Added REQUIRED sections list (5 sections only)
- Added DO NOT include list to prevent README duplication
- AGENTS.md now points to README for architecture/usage
2026-01-06 12:46:40 +11:00
Dhanji R. Prasanna
6d2cab93f5 Extend euler.md to require AGENTS.md updates
The Euler agent must now update AGENTS.md after generating artifacts:
- Add/update 'Dependency Analysis Artifacts' section
- Table listing each file in analysis/deps/ with one-line descriptions
- No findings, metrics, or recommendations in AGENTS.md
2026-01-06 12:35:12 +11:00
Dhanji R. Prasanna
9132c441f1 Remove Key findings section from dependency analysis docs 2026-01-06 12:33:48 +11:00
Dhanji R. Prasanna
d695f10604 Document dependency analysis artifacts in AGENTS.md
Added section explaining the analysis/deps/ directory contents:
- graph.json: Raw dependency graph data
- graph.summary.md: Overview metrics and rankings
- sccs.md: Cycle detection results
- layers.observed.md: Layer diagrams
- hotspots.md: Coupling hotspots
- limitations.md: Analysis limitations

Includes key findings from the Euler agent's static analysis.
2026-01-06 12:31:17 +11:00
Dhanji R. Prasanna
386176899e Remove vision tools (except take_screenshot) and macax tools
Vision tools removed:
- extract_text (OCR from image files)
- extract_text_with_boxes (OCR with bounding boxes)
- vision_find_text (find text in app windows)
- vision_click_text (find and click on text)
- vision_click_near_text (click near text labels)

macax tools removed:
- macax_list_apps
- macax_get_frontmost_app
- macax_activate_app
- macax_press_key
- macax_type_text

The LLM can now read images directly via read_image tool.
take_screenshot is retained for capturing application windows.

Files deleted:
- crates/g3-core/src/tools/vision.rs
- crates/g3-core/src/tools/macax.rs
- docs/macax-tools.md

Updated tool counts: 12 core + 15 webdriver = 27 total
2026-01-03 17:38:25 +11:00
Dhanji R. Prasanna
29e263ac49 Fix Unicode space handling in macOS screenshot filenames
macOS uses U+202F (Narrow No-Break Space) in screenshot filenames
between the time and am/pm. When users type or paste these paths,
they use regular spaces, causing file-not-found errors.

Changes:
- Add resolve_path_with_unicode_fallback() to try U+202F variants
- Add resolve_paths_in_shell_command() for shell command paths
- Apply fix to read_file, read_image, and shell tools
- Fix read_image prompt docs: file_path -> file_paths (array)
- Add 6 unit tests for Unicode space normalization
2026-01-03 17:17:08 +11:00
Dhanji R. Prasanna
f7e2f38fe9 lamport run 2026-01-03 16:48:30 +11:00
Dhanji R. Prasanna
f4a1bf5e93 fix agent-mode session resumption bug 2026-01-03 16:44:58 +11:00
Dhanji R. Prasanna
76bfb77f84 further fowler fixes and session fixes 2026-01-03 15:47:04 +11:00
Dhanji R. Prasanna
65867e7f96 refactor tools out of lib.rs 2026-01-03 15:06:34 +11:00
Dhanji R. Prasanna
595ad6ad21 agent mode resumption 2026-01-03 14:50:08 +11:00
Dhanji R. Prasanna
016efc1db6 Prevent agent mode from stopping after first TODO phase
- Add TODO completion check to final_output tool in autonomous mode only
- When incomplete TODO items exist, reject final_output and prompt LLM to continue
- Non-autonomous modes (interactive, chat) are unaffected
- Add 6 tests verifying behavior in both autonomous and non-autonomous modes

Fixes issue where LLM would call final_output after completing first phase,
causing agent to stop prematurely instead of continuing with remaining phases.
2025-12-27 12:35:31 +11:00
Dhanji R. Prasanna
8d071d5eed fix: fowler agent now respects --workspace flag and reads project docs
- Fixed run_agent_mode to call std::env::set_current_dir with workspace_dir
- Updated fowler.md to read README.md and AGENTS.md as part of Triage & Understanding step
2025-12-26 15:24:20 +11:00
Dhanji R. Prasanna
4c25e43ee4 refactoring 2025-12-26 15:16:12 +11:00
Dhanji R. Prasanna
7e59e181f7 context line ui 2025-12-26 12:58:13 +11:00
Dhanji R. Prasanna
666be4ff40 Fix duplicate tool call handling: move tool_executed flag and reset parser
- Move tool_executed = true after duplicate check to prevent auto-continue
  from triggering when only duplicate tools were detected
- Reset parser state when duplicate detected to clear any partial/polluted
  state from LLM stuttering or example tool calls in markdown blocks
2025-12-26 11:55:57 +11:00
Dhanji R. Prasanna
46611d9e13 Improve read_image output formatting
- Add newline after └─ before first image preview
- Show only filename (not full path) in info line
2025-12-26 11:36:10 +11:00
Dhanji R. Prasanna
2a4dad2842 Update read_image output with box drawing characters
- Print └─ before images to break out of tool output box
- Print ┌─ after images to resume tool output box
- Remove │ prefix from image preview and info lines
- Info line uses single space prefix, dimmed text
- Only include error messages in tool result (success info printed via imgcat)
2025-12-26 11:29:33 +11:00
Dhanji R. Prasanna
e688d3b29f Simplify read_image imgcat output formatting
- Remove │ prefix before image preview, use single space instead
- Keep info line on its own line with │ prefix
- Keep blank line spacing between images
2025-12-26 11:24:13 +11:00
Dhanji R. Prasanna
3601cc0547 Enhance read_image tool with magic byte detection and multi-image support
- Fix media type detection using magic bytes instead of file extension
  - Correctly identifies JPEG files with .png extension (and vice versa)
  - Supports PNG, JPEG, GIF, and WebP formats

- Add multi-image support with file_paths array parameter
  - Load multiple images in a single tool call
  - All images queued for LLM analysis

- Enhanced CLI output:
  - Inline image preview via iTerm2 imgcat protocol (height=5)
  - Dimmed info line showing: path | dimensions | media type | file size
  - Proper │ prefix alignment with tool output boxing
  - Human-readable file sizes (bytes, KB, MB)

- Add image dimension extraction from file headers
  - PNG, JPEG, GIF, WebP dimension parsing

- Add comprehensive tests for magic byte detection and dimensions
2025-12-26 11:19:37 +11:00
Dhanji R. Prasanna
3ece02ff31 fix: resolve compiler warnings across crates
- Remove unused assignment to final_output_called (returns immediately after)
- Mark cache_config field as #[allow(dead_code)] (reserved for future use)
- Mark print_status_line method as #[allow(dead_code)] (reserved for future use)
2025-12-25 18:47:22 +11:00
Dhanji R. Prasanna
258f9878ff style: use ◉ symbol for token count in timing footer
Changes '227tk | 48% ctx' to '227 ◉ | 48%' for a cleaner look.
2025-12-25 18:40:17 +11:00
Dhanji R. Prasanna
d09c80180e fix: remove redundant TODO list header that breaks boxing effect 2025-12-25 18:34:51 +11:00
Dhanji R. Prasanna
64f27c0abc feat: move TODO lists to session-scoped directories
TODO lists are now stored in .g3/sessions/<session_id>/todo.g3.md instead
of the workspace root. This prevents different g3 sessions from accidentally
picking up or overwriting each other's TODOs.

Changes:
- Add get_session_todo_path() function in paths.rs
- Update todo_read/todo_write handlers to use session-specific paths
- Remove TODO loading at Agent initialization (sessions start fresh)
- Update prompts to reflect session-scoped behavior

Fallback behavior preserved for planner mode (G3_TODO_PATH env var).
2025-12-25 18:33:03 +11:00
Dhanji R. Prasanna
d9c58576a1 feat: add background_process tool for launching long-running processes
Adds a new tool that allows launching processes (like game servers) in the
background while g3 continues to operate. The process runs independently
with stdout/stderr captured to a log file.

Features:
- Named process tracking for easy reference
- Automatic log capture to logs/background_processes/
- Returns PID and log file path for use with shell tool
- Automatic cleanup on agent shutdown via Drop trait

Usage: Use shell tool to interact with the process:
- Read logs: tail -100 <logfile>
- Check status: ps -p <pid>
- Stop process: kill <pid>

Files:
- New: crates/g3-core/src/background_process.rs
- New: crates/g3-core/tests/background_process_demo_test.rs
- Modified: crates/g3-core/src/lib.rs (tool definition + handler)
- Modified: crates/g3-core/src/prompts.rs (documentation)
2025-12-25 18:23:10 +11:00
Dhanji R. Prasanna
9ff5ba6098 Fix auto-continue false positives from tool-call-like content
When the LLM outputs text containing tool call patterns (e.g., reading
log files, showing examples, or discussing tool calls), the parser's
has_unexecuted_tool_call() would detect these as real tool calls and
trigger auto-continue, leading to repeated empty responses.

The fix: mark the parser buffer as consumed when content is displayed.
This prevents tool-call-like patterns in displayed text from triggering
false positives later. The fix is safe because:

1. Only runs when no tool was detected (inside 'if !tool_executed')
2. Legitimate tool calls are detected first by process_chunk()
3. Matches existing pattern of calling mark_tool_calls_consumed()
   after tool execution
2025-12-25 17:55:13 +11:00
Dhanji R. Prasanna
f9d0c33461 Revert "Fix auto-continue bug: ensure assistant message before continue prompt"
This reverts commit fe96969adb.
2025-12-24 15:52:23 +11:00
Dhanji R. Prasanna
fe96969adb Fix auto-continue bug: ensure assistant message before continue prompt
The auto-continue logic was adding User continue prompts without first
adding an Assistant message when the LLM returned an empty response.
This caused consecutive User messages in the conversation history,
which confused the LLM and caused it to return more empty responses.

The fix ensures an Assistant message is always added before the continue
prompt, using '[empty response]' as a placeholder when the LLM returned
nothing substantive. This maintains proper User/Assistant alternation.
2025-12-24 15:50:30 +11:00
Dhanji R. Prasanna
cd64ebbf87 Add tokens consumed and context percentage to per-tool timing footer
The per-tool timing line now shows:
- Tokens delta (tokens added to context by this tool call)
- Context window usage percentage

Example: └─ ️ 1ms  523tk | 49% ctx

Changes:
- Updated UiWriter trait print_tool_timing signature
- Track tokens before/after adding tool messages to calculate delta
- Updated ConsoleUiWriter, MachineUiWriter, PlannerUiWriter, and test mocks
2025-12-24 15:44:19 +11:00
Dhanji R. Prasanna
fd22ce9890 refactor(g3-core): extract 4 modules from monolithic lib.rs
Reduce lib.rs from 7481 to 6557 lines (-12.4%) by extracting:

- paths.rs: Session/workspace path utilities (get_todo_path, get_logs_dir, etc.)
- streaming_parser.rs: StreamingToolParser for LLM response parsing
- utils.rs: Diff parsing and shell escaping utilities
- webdriver_session.rs: Unified Safari/Chrome WebDriver abstraction

All public APIs preserved via re-exports for backward compatibility.
Added 13 new unit tests across extracted modules.
All 225 tests pass.
2025-12-24 14:32:39 +11:00
Dhanji R. Prasanna
382b905441 duplicate output fix 2025-12-23 17:20:23 +11:00
Dhanji R. Prasanna
ed246ce434 consolidate .g3/session -> .g3/sessions/* 2025-12-23 16:22:12 +11:00
Dhanji R. Prasanna
0b023b610f Update README with recent improvements
- Added section on Tool Call Duplicate Detection explaining the
  sequential-only duplicate prevention logic
- Added section on Timing Footer showing token usage and context %
- Updated Logging note to mention INFO->DEBUG conversion for cleaner CLI
2025-12-22 17:32:39 +11:00
Dhanji R. Prasanna
743d622468 Add token usage and context % to timing footer
Added a quality-of-life feature that displays:
- Tokens used in the current turn (from LLM response, not estimated)
- Current context window usage percentage

These are displayed dimmed after the timing info:
  ⏱️ 1.2s | 💭 0.3s  1234tk | 45% ctx

The token count comes directly from the LLM's usage response data,
not from any estimation. If no usage data is available from the LLM,
only the context percentage is shown.
2025-12-22 17:22:54 +11:00
Dhanji R. Prasanna
720ad8cad7 Merge branch 'dhanji/fix-auto-continue': Fix auto-continue and duplicate detection bugs 2025-12-22 17:12:24 +11:00
Dhanji R. Prasanna
10e2fe9b94 Add tests for duplicate detection logic
Added 13 tests to verify that duplicate detection only catches
IMMEDIATELY SEQUENTIAL duplicates:

- test_find_complete_json_object_end_* - Tests for JSON parsing helper
- test_same_tool_with_text_between_not_duplicate - Key test ensuring
  tool calls separated by text are NOT duplicates
- test_different_tools_back_to_back_not_duplicate
- test_same_tool_different_args_not_duplicate
- test_identical_tool_calls_back_to_back_are_duplicates
- test_has_text_after_tool_call - Tests text detection logic
- test_tool_call_with_newlines_between
- test_tool_call_with_whitespace_text_between
- test_tool_call_in_middle_of_text
- test_multiple_different_tool_calls_with_text

Also made find_complete_json_object_end public for testing.
2025-12-22 17:11:05 +11:00
Dhanji R. Prasanna
c7204c6699 Fix tool call detection and duplicate handling issues
1. Set tool_executed=true when a tool call is detected, even if skipped
   as a duplicate. This prevents the raw JSON from being printed to screen
   when a tool call is detected but not executed.

2. Remove session-level duplicate detection entirely. All tools should be
   allowed to be called multiple times in a session.

3. Fix sequential duplicate detection to only catch IMMEDIATELY sequential
   duplicates:

   - DUP IN CHUNK: Now only checks if the PREVIOUS tool call in the chunk
     is the same (not any tool call in the chunk)

   - DUP IN MSG: Now only checks if the LAST tool call in the previous
     message matches AND there's no text after it. If there's any
     non-whitespace text between tool calls, they're not considered
     duplicates.

This allows legitimate re-use of tools while still catching cases where
the LLM stutters and outputs the same tool call twice in a row.
2025-12-22 17:03:07 +11:00
Dhanji R. Prasanna
da91459e09 Fix auto-continue bug: don't return early when tools executed but final_output not called
The bug was in the chunk.finished block inside stream_completion_with_tools.
When no tool was executed in the CURRENT iteration (!tool_executed), the code
would return early without checking if tools were executed in PREVIOUS iterations
(any_tool_executed) and final_output was never called.

This caused the agent to terminate prematurely after executing tools like
todo_read when the LLM responded with text instead of calling final_output.

The fix adds a check: if any_tool_executed && !final_output_called, we break
to let the outer loop's auto-continue logic prompt the LLM to continue.

Also fixed missing debug! import in g3-console/src/main.rs.
2025-12-22 16:45:17 +11:00
Dhanji R. Prasanna
923def0ab2 Convert all INFO logs to DEBUG to reduce CLI noise
Converted ~77 info! macro calls to debug! across the codebase to prevent
log messages from interrupting the CLI experience during normal operation.
Users can still see these logs by setting RUST_LOG=debug if needed.

Affected crates:
- g3-cli
- g3-computer-control
- g3-console
- g3-core
- g3-ensembles
- g3-execution
- g3-providers
2025-12-22 16:27:35 +11:00