Commit Graph

427 Commits

Author SHA1 Message Date
Dhanji R. Prasanna
bebf04c7bd Tighten system prompt 2026-01-09 14:11:19 +11:00
Dhanji R. Prasanna
d96d8c1d90 Rewrite JSON tool call filter with clean state machine
Fixes bug where JSON tool calls were printed as text due to chunking issues.

Changes:
- Complete rewrite of filter_json.rs with 3-state machine:
  - Streaming: normal pass-through, watches for newline + whitespace + {
  - Buffering: confirms/denies tool pattern with ~20 char buffer
  - Suppressing: string-aware brace counting until balanced
- Character-by-character processing eliminates chunk boundary issues
- Proper handling of } inside JSON strings (was causing premature exit)
- Detects truncated JSON followed by complete JSON (LLM retry case)
- Removed regex dependency, simpler pattern matching
- Added 59 stress tests covering malformed JSON, partial patterns,
  streaming edge cases, adversarial inputs, and real-world patterns

All 86 filter_json tests pass.
2026-01-09 14:05:11 +11:00
Dhanji R. Prasanna
49b27b0cbc fix: truncate long lines in streaming tool output to prevent terminal wrapping
When shell commands output very long lines (e.g., JSON content from
tail -c 10000), the lines would wrap in the terminal. The cursor-up
escape code (\x1b[1A) only moves up one visual line, not the entire
wrapped content, causing the display to fill with uncleared text.

This fix truncates lines to 120 characters in update_tool_output_line()
before displaying them, preventing the wrapping issue.
2026-01-09 13:35:58 +11:00
Dhanji R. Prasanna
67be0f20c7 fix: remove allow_multiple_tool_calls config and simplify tool execution flow
This fixes a bug where the agent would stop responding abruptly without
calling final_output. The root cause was the allow_multiple_tool_calls
config option (default: false) which caused the agent to break out of
the streaming loop mid-stream after executing the first tool, losing
any subsequent content.

Changes:
- Remove allow_multiple_tool_calls config option entirely
- Always process all tool calls without breaking mid-stream
- Simplify system prompt generation (no longer needs boolean param)
- Let the stream complete fully before continuing to next iteration
- Change find_last_tool_call_start to find_first_tool_call_start
- Remove parser.reset() call on duplicate detection

Benefits:
- Simpler logic with less conditional branching
- No lost content after tool calls
- Consistent behavior for all users
- Reduced config complexity
2026-01-09 13:28:07 +11:00
Dhanji R. Prasanna
a72d5a650a Fix two markdown formatting bugs
Bug 1: Inline code after list bullets not detected
- After emitting a list bullet, at_line_start was not set to false
- This caused the next backtick to be treated as a potential code fence
- Fixed by setting at_line_start = false after emitting bullet

Bug 2: Code block closing on indented backticks
- Code blocks containing indented ``` (4+ spaces) were closing prematurely
- The .trim() check was too permissive
- Fixed by only allowing closing fence with <= 3 spaces indent (CommonMark spec)

Added tests for both edge cases.
2026-01-08 20:50:26 +11:00
Dhanji R. Prasanna
19a804e0be Add syntax highlighting for Racket, Elisp, and Scheme
Add language alias mapping in highlight_code() to map:
- racket, rkt -> lisp
- elisp, emacs-lisp -> lisp
- scheme -> lisp
- common-lisp, cl -> lisp
- shell, sh, zsh, dockerfile -> bash

Syntect's built-in Lisp syntax handles all Lisp-family languages well.
Added test to verify the aliases work correctly.
2026-01-08 20:35:34 +11:00
Dhanji R. Prasanna
df706308ca Unify final_output rendering with streaming markdown formatter
Replace the separate syntax_highlight module with the streaming markdown
formatter for final_output rendering. This:

- Removes special buffered rendering logic for final_output
- Uses the same StreamingMarkdownFormatter used for agent responses
- Removes the spinner animation (content renders immediately)
- Deletes the now-unused syntax_highlight.rs module
- Updates test to use the streaming formatter

Benefits:
- Consistent rendering across all markdown output
- Less code to maintain (removed ~250 lines)
- Same syntax highlighting via syntect (already in streaming formatter)
2026-01-08 20:30:44 +11:00
Dhanji R. Prasanna
347513b04c Add comprehensive stress tests for streaming markdown formatter
Add 10 stress tests covering:
- Nested formatting (bold in italic, italic in bold)
- Empty/minimal content edge cases
- Escape sequences and special characters
- Lists with complex inline formatting
- Links with various content types
- Tables with formatting in cells
- Code blocks (should not format contents)
- Mixed block elements (headers, quotes, rules)
- Nested lists (3+ levels, mixed types)
- Pathological/adversarial inputs (unbalanced delimiters, unicode, long lines)

All 45 tests pass.
2026-01-08 20:27:28 +11:00
Dhanji R. Prasanna
fadfaee040 update gitingore 2026-01-08 13:50:03 +11:00
Dhanji R. Prasanna
381b852869 refactor(g3-core): Extract streaming utilities into dedicated module
Extract reusable utilities from the massive stream_completion_with_tools
function into a new streaming.rs module for improved readability:

- format_duration, format_timing_footer: timing display helpers
- clean_llm_tokens: consolidates 4 duplicate token-cleaning call sites
- log_stream_error: extracts 70+ lines of error logging
- is_empty_response, is_connection_error: predicate helpers
- truncate_for_display, truncate_line: string truncation utilities
- StreamingState, IterationState: state structs for future refactoring

Results:
- lib.rs reduced from 2978 to 2840 lines (138 lines, ~5%)
- New streaming.rs: 309 lines with 5 unit tests
- All 98+ tests pass

Agent: carmack
2026-01-08 13:20:11 +11:00
Dhanji R. Prasanna
267ef00848 refactor: extract session helper in webdriver.rs to reduce boilerplate
Agent: carmack

Add get_session() helper function that:
- Checks if webdriver is enabled
- Acquires the session read lock
- Returns the cloned session or an error message

Refactored 12 webdriver tool functions to use this helper:
- execute_webdriver_navigate
- execute_webdriver_get_url
- execute_webdriver_get_title
- execute_webdriver_find_element
- execute_webdriver_find_elements
- execute_webdriver_click
- execute_webdriver_send_keys
- execute_webdriver_execute_script
- execute_webdriver_get_page_source
- execute_webdriver_screenshot
- execute_webdriver_back
- execute_webdriver_forward
- execute_webdriver_refresh

Each function previously had ~10 lines of identical boilerplate.
Now reduced to 4 lines using the helper.

Net reduction: 68 lines (678 -> 610)
All tests pass. Behavior unchanged.
2026-01-08 13:05:44 +11:00
Dhanji R. Prasanna
5bfaee8dd5 use consistent naming for compaction 2026-01-08 12:54:03 +11:00
Dhanji R. Prasanna
3776ed847e refactor: use shared streaming helpers in openai and embedded providers
Agent: carmack

openai.rs:
- Use make_text_chunk() for streaming text content
- Use make_final_chunk() for final completion chunk
- Simplify tool_calls conversion logic

embedded.rs:
- Use make_text_chunk() for all 4 streaming text chunks
- Use make_final_chunk() for final completion chunk
- Remove unused CompletionChunk import

Net reduction: 35 lines removed
All tests pass. Behavior unchanged.
2026-01-07 13:01:03 +11:00
Dhanji R. Prasanna
2bf475960c refactor: extract shared streaming utilities module
Agent: carmack

Create crates/g3-providers/src/streaming.rs with shared helpers:
- decode_utf8_streaming(): Handle incomplete UTF-8 sequences in SSE streams
- is_incomplete_json_error(): Detect incomplete vs malformed JSON
- make_final_chunk(): Create finished completion chunks
- make_text_chunk(): Create text content chunks
- make_tool_chunk(): Create tool call chunks

Refactor anthropic.rs:
- Use shared decode_utf8_streaming (removes 15 lines of inline UTF-8 handling)
- Use make_final_chunk, make_text_chunk, make_tool_chunk helpers
- Reduces verbose CompletionChunk constructions throughout

Refactor databricks.rs:
- Remove local copies of streaming helpers (now uses shared module)
- Reduces duplication between providers

Net reduction: 118 lines removed, 16 lines added (including new module)
All tests pass. Behavior unchanged.
2026-01-07 12:48:07 +11:00
Dhanji R. Prasanna
bb63050779 refactor: improve readability of streaming and file ops code
Agent: carmack

databricks.rs:
- Extract ToolCallAccumulator struct to replace opaque (String, String, String) tuple
- Add decode_utf8_streaming() helper for cleaner UTF-8 handling
- Add is_incomplete_json_error() helper for JSON parse error detection
- Add make_final_chunk() helper to reduce duplication
- Add finalize_tool_calls() to convert accumulators to final format
- Refactor parse_streaming_response from ~270 lines to ~100 lines
- Reduce nesting depth from 8+ levels to 4 levels
- Use early returns and let-else for cleaner control flow

file_ops.rs:
- Replace repetitive if-let chains with declarative PATH_CONTENT_KEYS table
- Use match expression instead of nested if-else
- Reduce extract_path_and_content from 44 lines to 20 lines

All tests pass. Behavior unchanged.
2026-01-07 12:39:05 +11:00
Dhanji R. Prasanna
532ed132f7 Few shot prompts for carmack 2026-01-07 12:33:11 +11:00
Dhanji R. Prasanna
4e7aca50fa feat: royal blue tool names in agent mode + fix README heading display
- Add set_agent_mode() to UiWriter trait for visual mode differentiation
- ConsoleUiWriter uses royal blue (ANSI 256 color 69) for tool names in agent mode
- Fix extract_readme_heading() to search only README section of combined content
  (was incorrectly showing AGENTS.md heading instead of README heading)
2026-01-07 11:37:51 +11:00
Dhanji R. Prasanna
189fdec006 Carmack agent 2026-01-07 11:18:27 +11:00
Dhanji R. Prasanna
1980e62511 Improve code readability in g3-core
- streaming_parser.rs: Rename has_message_like_keys to args_contain_prose_fragments
  with improved documentation explaining the heuristic for detecting malformed
  tool calls where LLM prose leaked into JSON keys

- context_window.rs: Simplify build_thin_result_message using early return
  pattern and match expression for cleaner control flow

Agent: carmack
2026-01-07 11:16:42 +11:00
Dhanji R. Prasanna
2e9535974d removed testing craft 2026-01-07 10:46:37 +11:00
Dhanji R. Prasanna
775bcd10a5 chore: remove g3-console crate entirely
The g3-console crate was not referenced by any other crate in the
workspace and appears to be an abandoned web console implementation.

Removed:
- crates/g3-console/ (entire directory)
- Workspace member entry in Cargo.toml

Agent: fowler
2026-01-07 10:41:46 +11:00
Dhanji R. Prasanna
1056b4193b chore(g3-cli): remove orphaned retro_tui and tui modules
These files were not referenced anywhere in the codebase and appear
to be leftover from a previous TUI implementation that was abandoned.

Removed:
- crates/g3-cli/src/retro_tui.rs (62KB)
- crates/g3-cli/src/tui.rs (6KB)

Agent: fowler
2026-01-07 10:39:42 +11:00
Dhanji R. Prasanna
48036d01e3 fix(g3-core): disable auto-continue in interactive mode
Auto-continue was incorrectly triggering when the LLM asked questions
in interactive/chat mode. Now auto-continue only activates when
is_autonomous is true, allowing proper back-and-forth conversation
in interactive mode.

Agent: fowler
2026-01-07 10:37:30 +11:00
Dhanji R. Prasanna
a553764e93 docs(agents): add git authorship rule to all agent prompts
Ensure agents never override git author/email and instead put their
identity in the commit message body.

Agent: fowler
2026-01-07 10:27:44 +11:00
Dhanji R. Prasanna
b73dfacb7a refactor(g3-core): extract provider_registration and session modules
Extract two focused modules from the monolithic lib.rs (3372 lines):

1. provider_registration.rs (233 lines)
   - Consolidates duplicated provider registration patterns
   - Single determine_providers_to_register() function for mode-based selection
   - Unified register_providers() async function for all provider types
   - Includes unit tests for registration logic

2. session.rs (394 lines)
   - Session ID generation (generate_session_id)
   - Context window persistence (save_context_window, write_context_window_summary)
   - Error logging (log_error_to_session)
   - Utility functions (format_token_count, token_indicator)
   - Session restoration helper (restore_from_session_log)
   - Includes comprehensive unit tests

Also fixes:
- Removed redundant tool_executed assignment that triggered unused warning
- Removed unused Message import in session.rs

Results:
- lib.rs reduced from 3372 to 2976 lines (-396 lines, -11.7%)
- All tests pass, no warnings
- Behavior preserved (pure mechanical extraction)

Agent: fowler
2026-01-07 10:20:28 +11:00
Dhanji R. Prasanna
c4ae85de72 Add --new-session flag to skip session resumption in agent mode
Adds a new CLI flag that allows users to force a new session when running
in agent mode, bypassing the automatic detection and resumption of
incomplete sessions.

Usage: g3 --agent my-agent --new-session
2026-01-07 09:59:15 +11:00
Dhanji R. Prasanna
f0bd7959b1 chore(analysis): update dependency analysis artifacts
Authored by: Structural Analysis Agent (Euler)

Updated all dependency analysis artifacts with fresh extraction:
- graph.json: Canonical dependency graph with 10 crates, 139 files, 16 crate edges, 72 file edges
- graph.summary.md: Overview with fan-in/fan-out rankings and crate inventory
- sccs.md: SCC analysis confirming no cycles at crate or file level (clean DAG)
- layers.observed.md: 5-layer architecture diagram derived from dependencies
- hotspots.md: Coupling hotspots (g3-config highest fan-in, g3-cli highest fan-out)
- limitations.md: Documented extraction limitations (conditional compilation, macros, etc.)

Key findings:
- All 10 workspace crates form a directed acyclic graph
- g3-core/src/ui_writer.rs has highest file-level fan-in (10 dependents)
- g3-console is standalone with no workspace dependencies
- Clean layered architecture with no violations detected
2026-01-07 09:36:52 +11:00
Dhanji R. Prasanna
ff08a622eb ask all agents to commit their work 2026-01-07 09:31:02 +11:00
Dhanji R. Prasanna
5d20da2609 Add 54 integration tests for CLI, tools, and message serialization
New test files:
- crates/g3-cli/tests/cli_integration_test.rs (14 tests)
  Blackbox CLI tests: help/version flags, argument validation,
  conflicting modes, flock mode requirements

- crates/g3-core/tests/tool_execution_test.rs (20 tests)
  Tool call structure tests and unified diff application:
  read_file, write_file, str_replace, shell, background_process,
  todo, final_output, code_search, take_screenshot

- crates/g3-providers/tests/message_serialization_test.rs (20 tests)
  Round-trip serialization tests for Message, MessageRole,
  CacheControl, and Tool types. Covers Unicode, special chars,
  and edge cases.

All tests follow blackbox/integration-first principles with
documentation of what they protect and intentionally do not assert.
2026-01-07 09:23:34 +11:00
Dhanji R. Prasanna
9cb6282719 update lamport 2026-01-07 09:07:29 +11:00
Dhanji R. Prasanna
311b3bd75a added hopper testing agent and updated fowler to use euler 2026-01-07 09:06:46 +11:00
Dhanji R. Prasanna
e2445a5d22 refactor(g3-core): extract duplicate detection helper and consolidate thinning
- Extract check_duplicate_in_previous_message() helper to reduce nesting
  from 6+ levels to 2 levels in stream_completion_with_tools
- Create do_thin_context() and do_thin_context_all() helpers to centralize
  context thinning with event tracking
- Use provider_config::parse_provider_ref() in additional call sites
- All 295 tests pass

This continues the refactoring to eliminate code-path aliasing and
reduce cyclomatic complexity in the Agent implementation.
2026-01-07 08:45:51 +11:00
Dhanji R. Prasanna
a87928661d Remove overly broad *.json from .gitignore
The blanket *.json ignore is not canonical for Rust projects.
JSON files that need ignoring are already covered by:
- .g3/ for session logs
- logs/ for error logs
- .build for Swift build artifacts
2026-01-06 13:54:27 +11:00
Dhanji R. Prasanna
2d8e733820 Add dependency graph JSON data
Add exception to .gitignore for analysis/deps/graph.json
2026-01-06 13:24:01 +11:00
Dhanji R. Prasanna
6d6aed563d Add structural dependency analysis artifacts
- graph.json: Canonical dependency graph (10 crates, 16 edges, 76 files)
- graph.summary.md: One-page overview with fan-in/fan-out rankings
- sccs.md: Strongly Connected Components analysis (no cycles)
- layers.observed.md: 5-layer architecture diagram
- hotspots.md: Coupling hotspots (g3-config, g3-cli)
- limitations.md: Extraction limitations and validity conditions
2026-01-06 13:23:24 +11:00
Dhanji R. Prasanna
764d1bf67e Add ./tmp/ to .gitignore 2026-01-06 12:50:14 +11:00
Dhanji R. Prasanna
2592fee5d5 Generalize lamport.md examples to be language-agnostic
- Changed Rust-specific examples to generic ones:
  - 'Tool calls must be valid JSON' → 'API responses must be valid JSON'
  - 'Never block the async runtime' → 'Never block the event loop'
  - 'Crate/module' → 'Module/package'
  - 'run cargo test' → 'basic commands'
2026-01-06 12:49:00 +11:00
Dhanji R. Prasanna
e2fffaab94 Slim down AGENTS.md and update lamport.md for machine-specific output
AGENTS.md changes:
- Removed redundant sections that duplicated README.md:
  - System Overview (crate table)
  - File Structure Quick Reference
  - Testing Strategy
  - Pointers to Documentation
  - Architecture Decisions
- Kept unique machine-specific sections:
  - Critical Invariants (merged Performance Constraints)
  - Recommended Entry Points
  - Dangerous/Subtle Code Paths
  - Do's and Don'ts for Automated Changes
  - Common Incorrect Assumptions
  - Dependency Analysis Artifacts
- Reduced from ~220 lines to ~116 lines

lamport.md changes:
- Rewrote AGENTS.md section with explicit instructions
- Added REQUIRED sections list (5 sections only)
- Added DO NOT include list to prevent README duplication
- AGENTS.md now points to README for architecture/usage
2026-01-06 12:46:40 +11:00
Dhanji R. Prasanna
6d2cab93f5 Extend euler.md to require AGENTS.md updates
The Euler agent must now update AGENTS.md after generating artifacts:
- Add/update 'Dependency Analysis Artifacts' section
- Table listing each file in analysis/deps/ with one-line descriptions
- No findings, metrics, or recommendations in AGENTS.md
2026-01-06 12:35:12 +11:00
Dhanji R. Prasanna
9132c441f1 Remove Key findings section from dependency analysis docs 2026-01-06 12:33:48 +11:00
Dhanji R. Prasanna
d695f10604 Document dependency analysis artifacts in AGENTS.md
Added section explaining the analysis/deps/ directory contents:
- graph.json: Raw dependency graph data
- graph.summary.md: Overview metrics and rankings
- sccs.md: Cycle detection results
- layers.observed.md: Layer diagrams
- hotspots.md: Coupling hotspots
- limitations.md: Analysis limitations

Includes key findings from the Euler agent's static analysis.
2026-01-06 12:31:17 +11:00
Dhanji R. Prasanna
386176899e Remove vision tools (except take_screenshot) and macax tools
Vision tools removed:
- extract_text (OCR from image files)
- extract_text_with_boxes (OCR with bounding boxes)
- vision_find_text (find text in app windows)
- vision_click_text (find and click on text)
- vision_click_near_text (click near text labels)

macax tools removed:
- macax_list_apps
- macax_get_frontmost_app
- macax_activate_app
- macax_press_key
- macax_type_text

The LLM can now read images directly via read_image tool.
take_screenshot is retained for capturing application windows.

Files deleted:
- crates/g3-core/src/tools/vision.rs
- crates/g3-core/src/tools/macax.rs
- docs/macax-tools.md

Updated tool counts: 12 core + 15 webdriver = 27 total
2026-01-03 17:38:25 +11:00
Dhanji R. Prasanna
29e263ac49 Fix Unicode space handling in macOS screenshot filenames
macOS uses U+202F (Narrow No-Break Space) in screenshot filenames
between the time and am/pm. When users type or paste these paths,
they use regular spaces, causing file-not-found errors.

Changes:
- Add resolve_path_with_unicode_fallback() to try U+202F variants
- Add resolve_paths_in_shell_command() for shell command paths
- Apply fix to read_file, read_image, and shell tools
- Fix read_image prompt docs: file_path -> file_paths (array)
- Add 6 unit tests for Unicode space normalization
2026-01-03 17:17:08 +11:00
Dhanji R. Prasanna
f7e2f38fe9 lamport run 2026-01-03 16:48:30 +11:00
Dhanji R. Prasanna
f4a1bf5e93 fix agent-mode session resumption bug 2026-01-03 16:44:58 +11:00
Dhanji R. Prasanna
76bfb77f84 further fowler fixes and session fixes 2026-01-03 15:47:04 +11:00
Dhanji R. Prasanna
65867e7f96 refactor tools out of lib.rs 2026-01-03 15:06:34 +11:00
Dhanji R. Prasanna
595ad6ad21 agent mode resumption 2026-01-03 14:50:08 +11:00
Dhanji R. Prasanna
016efc1db6 Prevent agent mode from stopping after first TODO phase
- Add TODO completion check to final_output tool in autonomous mode only
- When incomplete TODO items exist, reject final_output and prompt LLM to continue
- Non-autonomous modes (interactive, chat) are unaffected
- Add 6 tests verifying behavior in both autonomous and non-autonomous modes

Fixes issue where LLM would call final_output after completing first phase,
causing agent to stop prematurely instead of continuing with remaining phases.
2025-12-27 12:35:31 +11:00
Dhanji R. Prasanna
8d071d5eed fix: fowler agent now respects --workspace flag and reads project docs
- Fixed run_agent_mode to call std::env::set_current_dir with workspace_dir
- Updated fowler.md to read README.md and AGENTS.md as part of Triage & Understanding step
2025-12-26 15:24:20 +11:00