Commit Graph

607 Commits

Author SHA1 Message Date
Dhanji R. Prasanna
04e3c69b0a Add --accept flag to studio run command
Automatically accept the session after g3 completes successfully,
but only if there are commits on the branch.

Changes:
- Add --accept flag to Run command (stripped, not passed to g3)
- Add has_commits_on_branch() helper using git rev-list --count
- Auto-accept triggers merge to main and cleanup when:
  1. g3 exits successfully (exit code 0)
  2. Branch has commits ahead of main
- Show warning if --accept set but no commits exist

Usage: studio run --agent carmack --accept
2026-01-15 06:43:35 +05:30
Dhanji R. Prasanna
5d8dbc43f8 Add agent-specific language prompt injection
When running in agent mode (e.g., --agent carmack) in a workspace with
detected languages, inject agent+language-specific prompts from
prompts/langs/<agent>.<lang>.md at the end of the system prompt.

Changes:
- Add AGENT_LANGUAGE_PROMPTS static array for compile-time embedding
- Add get_agent_language_prompt() to look up specific agent+lang combos
- Add get_agent_language_prompts_for_workspace_with_langs() that returns
  both content and matched languages for display
- Update agent_mode.rs to inject prompts and show which languages loaded
- Display format: '✓ carmack: racket language guidance'
- Add tests for new functionality

Uses the same detect_languages() mechanism as regular language prompts
to avoid code-path aliasing.
2026-01-15 06:43:29 +05:30
Dhanji R. Prasanna
eefc067aae Add carmack.racket.md agent-specific language prompt
Racket-specific guidance for the carmack agent including:
- Idiomatic Racket patterns (match, for/*, cond)
- Module organization with explicit provide lists
- Contracts and type boundaries
- Data modeling with structs
- Error handling best practices
- IO, paths, and portability
- Performance considerations
- Macro guidelines
- Testing with rackunit
2026-01-15 06:43:20 +05:30
Connor Justice
fa29a64e51 Simplify logging initialization comment
Removed unnecessary comment about logging initialization.
2026-01-14 17:53:04 -05:00
Connor Justice
505225c0bd fix: prevent panic when tracing subscriber already initialized
Use try_init() instead of init() for tracing subscriber setup to
gracefully handle cases where a global subscriber is already set.

This fixes a panic in the scout agent subprocess when spawned by the
research tool, where a dependency may have already initialized tracing.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-14 15:33:22 -05:00
Connor Justice
6532442d32 fix: initialize logging before planning mode check
Move initialize_logging() call to run immediately after CLI parsing,
before any mode checks. This ensures the --verbose flag works correctly
in planning mode, which previously bypassed logging initialization.

Previously, planning mode would return early before initialize_logging()
was called, causing verbose output to be silently ignored.
2026-01-14 14:33:44 -05:00
Dhanji R. Prasanna
afec65fd50 Add language-specific prompt injection for toolchain guidance
- Add language_prompts module that auto-detects programming languages in workspace
- Scan for language files with depth limit (2) to inject relevant toolchain prompts
- Add prompts/langs/ directory for language-specific markdown files
- Include Racket/raco toolchain guidance as first language prompt
- Update combine_project_content() to accept language_content parameter
- Integrate language detection into main CLI flow and agent mode
- Update project memory with new feature documentation
2026-01-14 21:00:52 +05:30
Dhanji R. Prasanna
716d598bd8 remove openai specific config example 2026-01-14 20:24:53 +05:30
Dhanji R. Prasanna
affa878992 Add minimal OpenAI example config 2026-01-14 20:21:38 +05:30
Dhanji R. Prasanna
f4562cd4c9 config: default agent settings and provider override 2026-01-14 20:14:33 +05:30
Dhanji R. Prasanna
38828c7757 Clean up tool output formatting
- Shell: " Command executed successfully" → "️ ran successfully"
- Write file: Remove ✏️ emoji, use plain "wrote N lines | M chars"
2026-01-14 19:42:54 +05:30
Dhanji R. Prasanna
9ef064a041 Add guidance to shell tool description to avoid unnecessary cd prefixes
LLMs were prefixing shell commands with `cd <workspace> &&` unnecessarily,
wasting tokens and cluttering CLI display. Added clear guidance in the
shell tool description that commands already execute in the working directory.
2026-01-14 19:00:53 +05:30
Dhanji R. Prasanna
03143ec7f8 Agent Mode Enhancements
• Agent prompts are now embedded within the g3 binary
• README.md - Added new "Agent Mode" section documenting:
  • All 7 built-in agents with their focus areas
  • Usage examples (--list-agents, --agent <name>)
  • How to create custom workspace agents

Behavior
1. Workspace agents take priority - If agents/<name>.md exists in the workspace, it's used
2. Embedded fallback - If no workspace agent exists, the embedded version is used
3. Portability - g3 binary now works on any repo without needing the agents/ directory
4. Discoverability - g3 --list-agents shows all available agents and their source
2026-01-14 16:27:03 +05:30
Dhanji R. Prasanna
5104bd53b6 refactor(g3-core): improve stream_completion_with_tools readability
Extract and simplify the streaming completion function:

- Extract ensure_context_capacity() helper for pre-loop context management
  (thinning + compaction logic now in dedicated async method)
- Simplify compact_summary generation block: flatten nested if/match,
  remove redundant comments, reorder branches for clarity
- Remove dead code: unused _last_error variable and modified_tool_call
- Streamline duplicate detection block: reduce verbose logging
- Clean up text content display block: remove redundant comments,
  tighten variable declarations
- Remove redundant is_todo_tool redefinition inside block expression

Net reduction: 79 lines (-187/+108)
Behavior unchanged, all unit tests pass.

Agent: carmack
2026-01-14 15:11:53 +05:30
Dhanji R. Prasanna
996dc357b4 Skip session resume prompt when --new-session flag is passed
When users explicitly pass --new-session, they want a fresh session.
Previously g3 would still prompt to resume an existing session.
Now the resume check is skipped entirely when the flag is set.
2026-01-14 08:54:35 +05:30
Dhanji R. Prasanna
dea0e6b1ca Compact tool output improvements
- Rename take_screenshot -> screenshot, code_coverage -> coverage (shorter names)
- Align | character across all compact tools (pad to 11 chars for str_replace)
- Make code_search a compact tool with summary display
- Show language and search name in code_search output (e.g., rust:"find structs")
- Add format_code_search_summary() to extract match/file counts from JSON response
2026-01-14 08:12:50 +05:30
Dhanji R. Prasanna
bd25d7dace Merge sessions/fowler/786b20b5 2026-01-14 04:28:06 +05:30
Dhanji R. Prasanna
7d17b436f9 refactor(g3-core): remove 3 unused Agent constructor variants
Remove dead code - constructor variants that had no callers:
- new_with_readme()
- new_autonomous_with_readme()
- new_with_quiet()

These were thin wrappers around new_with_mode_and_readme() that were
never used externally. All 5 remaining constructors have verified callers.

Results:
- lib.rs reduced from 2817 to 2797 lines (-20 lines)
- Eliminated code-path aliasing: 8 constructors → 5 constructors
- All g3-core tests pass
- Full workspace compiles cleanly

Agent: fowler
2026-01-14 04:26:42 +05:30
Dhanji R. Prasanna
21eb4f2d30 Only show Chrome diagnostics when there are issues
Silence the diagnostic report when all checks pass to reduce noise.
2026-01-14 04:25:13 +05:30
Dhanji R. Prasanna
a1dfd9c0b6 Enhanced auto-memory with rich few-shot format
- Updated memory reminder prompt with per-symbol char ranges
- Added two few-shot examples: Session Continuation (feature) + UTF-8 Safe Slicing (pattern)
- Updated system prompt Memory Format section to match
- Format: file -> nested symbols with [start..end] ranges and descriptions
- Enables direct read_file navigation to specific functions
2026-01-13 21:49:48 +05:30
Dhanji R. Prasanna
3a47ebe668 better racket example support 2026-01-13 21:16:14 +05:30
Dhanji R. Prasanna
c2f96d7048 Make WebDriver and Chrome headless enabled by default
- webdriver flag now defaults to true (tools always available)
- chrome_headless flag now defaults to true (Chrome is default browser)
- Use --safari flag to override and use Safari instead
- Updated README documentation to reflect new defaults
2026-01-13 21:14:52 +05:30
Dhanji R. Prasanna
151b8c4658 Add Racket tree-sitter support, remove Kotlin
- Add tree-sitter-racket dependency (v0.24)
- Initialize Racket parser in code search
- Add .rkt, .rktl, .rktd file extensions
- Add test_racket_search test
- Remove Kotlin from supported languages (was disabled)
- Clean up duplicate test files

Supported languages: Rust, Python, JavaScript, TypeScript, Go, Java, C, C++, Racket
2026-01-13 18:44:59 +05:30
Dhanji R. Prasanna
5e45e110e2 refactor(g3-core): extract finalize_streaming_turn() to unify return paths
Extract a single canonical helper function for completing streaming turns,
eliminating 3 nearly-identical return paths in stream_completion_with_tools().

Changes:
- Add finalize_streaming_turn() helper that handles:
  - Finishing streaming markdown
  - Saving context window
  - Adding timing footer (when requested)
  - Dehydrating context (when ACD enabled)
  - Building TaskResult
- Replace 3 duplicated return blocks with calls to the helper
- Remove unused mut on full_response variable

Results:
- Function reduced from 1067 to 999 lines (-68 lines)
- Eliminated code-path aliasing: 3 paths → 1 canonical path
- All 32 characterization tests pass
- Full g3-core test suite passes

Agent: fowler
2026-01-13 16:52:48 +05:30
Dhanji R. Prasanna
333a85ed1e Merge sessions/hopper/e2a0ad02 2026-01-13 16:27:17 +05:30
Dhanji R. Prasanna
b89d55a9ff Add characterization tests for stream_completion_with_tools
Add 32 blackbox characterization tests to lock down the behavior of the
stream_completion_with_tools function (1067 lines) before refactoring.

Tests cover key behaviors through stable boundaries:
- StreamingToolParser: tool call detection, incomplete detection, text accumulation
- Auto-continue logic: autonomous mode decisions, priority ordering
- Duplicate detection: sequential duplicates, cross-message duplicates
- Context window: token tracking, compaction threshold, history preservation
- Tool execution: read_file, shell, write_file, todo tools through Agent
- Streaming utilities: LLM token cleaning, duration formatting, truncation
- Parser sanitization: inline tool pattern handling, homoglyph replacement

These tests intentionally do NOT assert:
- Internal parser state or implementation details
- Specific timing values
- UI output formatting
- Provider-specific behavior

Agent: hopper
2026-01-13 16:25:33 +05:30
Dhanji R. Prasanna
bd756307f1 fowler doesnt need to explicity read README/AGENTS 2026-01-13 16:16:27 +05:30
Dhanji R. Prasanna
47e3a88cf6 refactor(g3-core): extract stats formatting to dedicated module
Extract the get_stats() function (158 lines) from lib.rs to a new stats.rs module.

Changes:
- Create stats.rs with AgentStatsSnapshot struct for capturing agent state
- Replace inline formatting logic with delegation to snapshot.format()
- Add unit tests for stats formatting (empty and populated states)
- Reduce lib.rs from 2961 to 2818 lines (-143 lines)

The new module improves:
- Testability: Stats formatting can now be unit tested in isolation
- Separation of concerns: Formatting logic is decoupled from Agent struct
- Readability: lib.rs is more focused on core agent behavior

All 271 workspace tests pass.

Agent: fowler
2026-01-13 16:11:53 +05:30
Dhanji R. Prasanna
562c4199f8 docs: Add Studio documentation and UTF-8 safety invariants
README.md:
- Add Studio section documenting the multi-agent workspace manager
- Document usage: run, list, status, accept, discard commands
- Explain worktree-based isolation and workflow

AGENTS.md:
- Add UTF-8 safe string slicing as critical invariant (#8)
- Add MUST NOT for byte-index slicing on multi-byte text (#5)
- Document parser sanitization as dangerous/subtle code path
  (prevents parser poisoning from inline tool-call JSON patterns)

Agent: lamport
2026-01-13 15:31:01 +05:30
Dhanji R. Prasanna
9a3b03a41f Remove flock mode (superseded by studio)
Flock mode has been superseded by the studio multi-agent workspace manager.

Changes:
- Remove g3-ensembles crate entirely
- Remove --project, --flock-workspace, --segments, --flock-max-turns CLI flags
- Remove run_flock_mode() from autonomous.rs
- Remove flock-related tests from cli_integration_test.rs
- Update README.md, docs/architecture.md, analysis/memory.md
- Delete docs/FLOCK_MODE.md
2026-01-13 15:01:12 +05:30
Dhanji R. Prasanna
82c0165765 Fix unused variable warning and UTF-8 panic in string slicing
- Remove unused total_lines variable in file_ops.rs
- Fix UTF-8 boundary panic in utils.rs when generating diff error preview
  The code was slicing at byte index 200 which could land inside a
  multi-byte character (e.g., box-drawing chars like ─). Now uses
  character-based slicing with chars().take() instead.
2026-01-13 14:52:52 +05:30
Dhanji R. Prasanna
c65d082c5d Make --agent optional in Studio for one-shot mode
Studio can now run g3 without specifying an agent:

  # Agent mode (existing)
  studio run --agent carmack "fix the bug"

  # One-shot mode (new)
  studio run "fix the bug"

When no agent is specified, sessions are created under the 'single'
directory in .worktrees/sessions/single/<session-id>/

This makes Studio a complete replacement for Flock mode.
2026-01-13 14:42:20 +05:30
Dhanji R. Prasanna
f6b84d864a Rename G3 -> g3 in docs and comments
Standardize project name to lowercase 'g3' throughout documentation,
comments, and configuration files. Environment variables (G3_*) are
unchanged as they follow the uppercase convention.
2026-01-13 14:36:33 +05:30
Dhanji R. Prasanna
389ed6a554 Compact project info display in interactive mode
Before:
  🤖 AGENTS.md configuration loaded
  📚 detected: G3 - AI Coding Agent
  🧠 Project memory loaded
  workspace: /Users/dhanji/src/g3

After:
  >> G3 - AI Coding Agent
     ✓ README | ✓ AGENTS.md | ✓ Memory
  -> ~/src/g3
2026-01-13 14:32:24 +05:30
Dhanji R. Prasanna
af3aa840db Compress session continuation UI prompt 2026-01-13 14:29:54 +05:30
Dhanji R. Prasanna
118935d2da Remove unused variable total_lines in file_ops.rs 2026-01-13 14:25:17 +05:30
Dhanji R. Prasanna
a09967eb27 refactor(streaming): Extract deduplication and auto-continue logic into helpers
Improve readability of stream_completion_with_tools (~1000 line function):

- Add deduplicate_tool_calls() helper with closure for previous-message check
- Add should_auto_continue() with AutoContinueReason enum for clearer control flow
- Replace inline deduplication loop with helper call (-19 lines)
- Replace complex auto-continue conditional with match on reason enum (-13 lines)
- Add section comments for major phases (State Init, Pre-loop, Main Loop, Auto-Continue, Post-Loop)
- Add comprehensive tests for new helpers

Net reduction: 82 deletions, behavior unchanged (172+ tests pass)

Agent: carmack
2026-01-13 11:44:06 +05:30
Dhanji R. Prasanna
dc45987e8d Add characterization tests for UTF-8 truncation and parser sanitization
Agent: hopper

Adds 32 new integration tests covering recent commits:

## UTF-8 Safe Truncation Tests (14 tests)
Covers commit f30f145 (Fix UTF-8 panics):
- Topic extraction with emoji, CJK, and multi-byte characters
- Truncation at character boundaries (not byte boundaries)
- Edge cases: exactly 50 chars, 51 chars, 2-byte/3-byte/4-byte UTF-8
- Stub generation with multi-byte topics
- Combining characters and diacritics

## Parser Sanitization Tests (18 tests)
Covers commit 4c36cc0 (Prevent parser poisoning):
- Code block contexts (inline code, after fences, prose)
- Line boundary edge cases (empty lines, whitespace, indentation)
- Unicode handling (emoji, bullets, CJK before patterns)
- Multiple patterns on same line
- Negative cases (similar but different patterns, partial patterns)
- Real-world scenarios from the original bug report

All tests are blackbox/characterization style - they test observable
outputs through stable public interfaces without encoding internal
implementation details.
2026-01-13 11:22:46 +05:30
Dhanji R. Prasanna
8dcb7a3dba feat: add compact styled output for TODO tools
TODO tools (todo_read, todo_write) now display with a cleaner, more
compact format:

- Styled header: " ● todo_read" or " ● todo_write"
- Tree-style prefixes for content lines (│ and └)
- Checkbox conversion: "- [ ]" → □, "- [x]" → ■
- Dimmed content for visual distinction
- No timing footer (cleaner output)

Changes:
- Add print_todo_compact() method to UiWriter trait
- Implement print_todo_compact() in ConsoleUiWriter
- Update todo.rs to call print_todo_compact() instead of line-by-line output
- Skip tool header, output header, and timing for TODO tools in agent streaming
2026-01-13 10:58:55 +05:30
Dhanji R. Prasanna
4c36cc058c fix: prevent parser poisoning from inline tool-call JSON patterns
When the streaming parser encountered fragments of JSON that looked like
partial tool calls (e.g., {"tool":) embedded in inline text (like code
examples or prose), it would incorrectly enter JSON parsing mode and
poison the parser state, causing control to be returned to the user
mid-task.

This fix:
- Adds sanitize_inline_tool_patterns() to detect tool-call patterns that
  are NOT on their own line and replace the opening brace with a Unicode
  homoglyph (fullwidth left curly bracket U+FF5B)
- Integrates sanitization into process_chunk() before text is buffered
- Updates system prompts to instruct LLMs to use homoglyphs when showing
  example tool call JSON in prose
- Adds comprehensive tests for the sanitization logic

Real tool calls from LLMs always appear on their own line, so those are
left untouched. Only inline patterns (with non-whitespace before them)
are sanitized.
2026-01-13 10:58:41 +05:30
Dhanji R. Prasanna
a0b9126555 Revert "refactor(g3-core): extract streaming logic to agent_streaming.rs"
This reverts commit a2e51cf075.
2026-01-13 07:59:18 +05:30
Dhanji R. Prasanna
6907fa36c0 UI: Add newline before auto-memory skip message 2026-01-13 07:03:42 +05:30
Dhanji R. Prasanna
08e6a1dca0 Merge sessions/fowler/f8c3f2e5 2026-01-13 06:58:01 +05:30
Dhanji R. Prasanna
98eea09dc8 UI: Show consecutive read_file calls as continuation lines
When the LLM reads the same file multiple times in sequence (scrolling
through a large file), instead of showing each as a separate line:

  ● read_file | path [0..2000] | 50 lines | 100 ◉ 5ms
  ● read_file | path [2000..4000] | 50 lines | 100 ◉ 5ms
  ● read_file | path [4000..6000] | 50 lines | 100 ◉ 5ms

Now shows a cleaner continuation format:

  ● read_file | path [0..2000] | 50 lines | 100 ◉ 5ms
     └─ reading further [2000..4000] | 50 lines | 100 ◉ 5ms
     └─ reading further [4000..6000] | 50 lines | 100 ◉ 5ms

This makes it visually clear that the agent is scrolling through
a single file rather than reading multiple different files.

Implementation:
- Added last_read_file_path field to ConsoleUiWriter
- Detect when consecutive read_file calls target the same file
- Print continuation format for subsequent reads
- Reset tracking when:
  - A different tool is executed (shell, write_file, etc.)
  - A different file is read
  - Text is output between tool calls
2026-01-13 06:25:28 +05:30
Dhanji R. Prasanna
a2e51cf075 refactor(g3-core): extract streaming logic to agent_streaming.rs
Reduce lib.rs complexity by extracting the streaming completion logic:

- Extract stream_completion_with_tools (~1080 lines) to agent_streaming.rs
- Extract stream_with_retry helper method
- Extract parse_diff_stats helper function
- Add handle_pre_stream_compaction helper for cleaner pre-stream logic
- Add format_tool_output helper for tool output formatting
- Remove 3 unused constructor variants:
  - new_with_readme
  - new_autonomous_with_readme
  - new_with_quiet

Results:
- lib.rs reduced from 2974 to 1791 lines (40% reduction)
- Streaming logic cleanly separated into dedicated module
- All tests pass, no behavior changes

Agent: fowler
2026-01-13 06:14:56 +05:30
Dhanji R. Prasanna
5c9404e292 Refactor: improve readability in CLI modules
- project_files.rs: Fix UTF-8 safety in truncate_for_display (use char
  boundaries instead of byte slicing), add test for multi-byte chars
- task_execution.rs: Extract recoverable_error_name() helper, use shared
  calculate_retry_delay() from error_handling.rs to eliminate duplication
- ui_writer_impl.rs: Extract duration_color() helper for timing display,
  add clear_tool_state() to consolidate repeated mutex clearing patterns

Agent: carmack
2026-01-13 05:58:54 +05:30
Dhanji R. Prasanna
f30f145c85 Fix UTF-8 panics and inconsistent retry logic
- Fix 7 UTF-8 byte slicing panics that crash on multi-byte characters:
  - acd.rs: extract_topic_from_text() [..50] slice
  - streaming.rs: log_stream_error() [..500] slice
  - tools/acd.rs: rehydrate message truncation [..2000] slice
  - history.rs: git commit message truncation [..69] slice
  - planner.rs: commit summary/description truncation [..69] slices
  - llm.rs: requirements summary line truncation [..117] slice

- All now use chars().count() and chars().take(N).collect() for
  UTF-8 safe truncation

- Fix inconsistent retry logic in task_execution.rs:
  - Previously only retried on Timeout errors
  - Now retries on ALL recoverable errors (rate limits, network,
    server errors, model busy, token limits, context length)
  - Added error-specific base delays (rate limit: 5s, server: 2s, etc.)
  - Added exponential backoff with ±20% jitter
  - Consistent with autonomous mode retry behavior
2026-01-13 05:49:45 +05:30
Dhanji R. Prasanna
6f50d01ab6 Add comprehensive end-of-turn behavior tests for g3-core
Agent: hopper

Adds 56 new integration tests covering the observable end-of-turn
behaviors in the streaming module:

- Timing footer formatting (5 tests): verifies user-facing timing display
  with various durations, token counts, and context percentages

- Tool call duplicate detection (6 tests): ensures identical sequential
  tool calls are detected while different tools/args are not

- Empty response detection (9 tests): validates detection of empty,
  whitespace-only, and timing-only responses that trigger auto-continue

- Connection error classification (5 tests): verifies EOF, connection,
  chunk, and body errors are correctly identified for graceful recovery

- Tool output summary formatting (17 tests): covers read_file, write_file,
  str_replace, remember, screenshot, coverage, and rehydrate summaries

- Duration formatting (4 tests): milliseconds, seconds, minutes, zero

- Text truncation (4 tests): short/long strings, multiline, flag behavior

- LLM token cleaning (3 tests): removal of stop tokens like <|im_end|>

- Edge cases (4 tests): empty inputs, unicode handling, large numbers

All tests are blackbox/characterization style - they test observable
outputs through stable public interfaces without encoding internal
implementation details. Tests remain stable under refactoring that
preserves behavior.
2026-01-12 21:17:32 +05:30
Dhanji R. Prasanna
d164c97ad2 Fix multi-line error messages in compact tool output
The truncate_for_display() function now takes only the first line
of input before truncating. This prevents multi-line error messages
(like str_replace failures) from breaking the compact single-line
format.

Added tests for multi-line input handling.
2026-01-12 20:55:05 +05:30
Dhanji R. Prasanna
81ea149369 Fix confusing documentation references
1. architecture.md: Fixed diagram to show 'studio' instead of 'g3-console'
   (the crate was renamed during development)

2. analysis/memory.md: Removed reference to non-existent machine_ui_writer.rs

3. theme.rs: Clarified that 'retro' is a theme option (the default theme),
   not a separate TUI mode. No --retro CLI flag exists.
2026-01-12 20:49:37 +05:30