Commit Graph

724 Commits

Author SHA1 Message Date
Dhanji R. Prasanna
01cb4f6691 fix: use consistent max_tokens defaults across providers
- Fix aliasing issue where resolve_max_tokens() used fallback_default_max_tokens
  (8192) instead of provider-specific defaults
- Update fallback_default_max_tokens from 8192 to 32000
- Set provider-specific max_tokens defaults:
  - Anthropic: 32000
  - OpenAI: 32000 (was 16000)
  - Databricks: 32000 (was 50000, now matches Anthropic as passthru)
  - Embedded: 2048
- Context window lengths unchanged:
  - OpenAI: 400,000
  - Anthropic: 200,000
  - Databricks (Claude): 200,000

This fixes the 'LLM response was cut off due to max_tokens limit' error
in agent mode that occurred because 8192 was being used instead of 32000.
2026-01-16 07:05:57 +05:30
Dhanji R. Prasanna
65e0217c68 Add unit tests for studio session management
New tests:
- test_new_session_has_short_id
- test_new_interactive_session
- test_branch_name_format
- test_session_save_and_load
- test_session_mark_complete
- test_session_mark_paused
- test_list_empty_sessions
- test_backwards_compatibility_no_session_type

Added tempfile as dev dependency for temp directory tests.
2026-01-16 06:52:23 +05:30
Dhanji R. Prasanna
78f9207d27 Add interactive mode to studio
New commands:
- studio cli (alias: c) - Start a new interactive g3 session in an isolated worktree
- studio resume <id> (alias: r) - Resume a paused interactive session
- Bare 'studio' now defaults to 'studio cli'

Session changes:
- Added SessionStatus::Paused for sessions that can be resumed
- Added SessionType enum (OneShot, Interactive) for future use
- Interactive sessions use inherited stdio for direct TTY access
- Sessions are marked as Paused when user exits g3

Workflow:
1. studio        # creates worktree, runs g3 interactively
2. (work in g3, exit when done)
3. studio resume <id>  # continue working
4. studio accept <id>  # merge to main when finished
2026-01-16 06:48:24 +05:30
Dhanji R. Prasanna
637884f84b Fix duplicate todo_read display in agent mode
The print_todo_compact() function was missing the call to clear the
streaming hint line before printing the final tool output. This caused
the tool name to appear twice when the hint line wasn't cleared:

  ● todo_read     ● todo_read   | empty

Added the missing handle_hint(ToolParsingHint::Complete) call to match
the behavior of print_tool_compact().
2026-01-16 06:38:11 +05:30
Dhanji R. Prasanna
25d35529e7 Fix --accept flag being passed through to g3 in studio run
When --accept was passed after positional args (e.g., 'studio run --agent
carmack task --accept'), clap's trailing_var_arg captured it as part of
g3_args instead of parsing it as the studio flag. This caused g3 to error
with 'unexpected argument --accept'.

- Extract filter_accept_flag() helper to detect and remove --accept from
  trailing args
- Set auto_accept=true if --accept found in either position
- Add 5 unit tests for the filtering logic
2026-01-15 21:05:13 +05:30
Dhanji R. Prasanna
a84fead03b refactor: improve readability of streaming parser and JSON filter
Agent: carmack

Changes:
- streaming_parser.rs: Unified find_first/last_tool_call_start into single
  find_tool_call_start with SearchDirection enum, reducing duplication.
  Simplified is_json_invalidated from 45 to 20 lines with clearer logic.
  Fixed redundant !escape_next check in find_complete_json_object_end.

- filter_json.rs: Simplified check_tool_pattern from 40 to 24 lines.
  Replaced repetitive prefix checks with loop over ["t", "to", "too", "tool"].
  Reduced trailing return statements with direct expression returns.

- ui_writer_impl.rs: Added ansi module for duration color constants.
  Simplified duration_color function by removing redundant comments.

- language_prompts.rs: Fixed test assertions to match actual prompt content
  ("obvious, readable Racket" instead of "RACKET-SPECIFIC GUIDANCE").

All 174+ tests pass. No behavior changes.
2026-01-15 13:49:29 +05:30
Dhanji R. Prasanna
0ae1a13cdb feat: real-time tool call streaming indicator with blinking UI
- Add ToolParsingHint enum (Detected/Active/Complete) for UI feedback
- New UiWriter methods: print_tool_streaming_hint(), print_tool_streaming_active()
- Refactor ConsoleUiWriter state to use atomics in ParsingHintState
- Add tool_call_streaming field to CompletionChunk for provider hints
- Anthropic provider sends streaming hints when tool name detected
- New streaming helpers: make_tool_streaming_hint(), make_tool_streaming_active()

Parser improvements:
- Add is_json_invalidated() to detect false positive tool patterns
- Fix tool result poisoning when file contents contain partial JSON
- Unescaped newlines in strings or prose after JSON invalidates detection

User sees ' ● tool_name |' immediately when tool call starts streaming,
with blinking indicator while args are received.
2026-01-15 13:49:29 +05:30
Dhanji R. Prasanna
d68f059acf fix: detect invalidated JSON tool calls to prevent parser poisoning
When partial JSON tool call patterns appear in LLM output (e.g., from
quoting file content), the parser would incorrectly report them as
"incomplete tool calls", triggering auto-continue loops.

Fix: Added is_json_invalidated() to detect when partial JSON has been
invalidated by subsequent content that cannot be valid JSON:
- Unescaped newline inside a string (invalid JSON)
- Newline followed by prose text outside a string

The check is only applied to incomplete JSON - complete tool calls
with trailing text are still correctly detected.

Added 6 new tests covering:
- Tool results with partial JSON patterns
- LLM quoting file content inline vs on own line
- Comment prefixes (// # -- etc) with partial patterns
- Real incomplete tool calls (should still be detected)
2026-01-15 13:49:29 +05:30
Dhanji R. Prasanna
999ac6fe66 fix: prevent parser poisoning from inline tool-call JSON patterns
The streaming parser was incorrectly detecting tool call patterns that
appeared inline in prose (e.g., when explaining the format), causing
g3 to return control mid-task.

Fix: Modified find_first_tool_call_start() and find_last_tool_call_start()
to only recognize patterns that appear on their own line (at start of
buffer or after newline with only whitespace before the pattern).

Changes:
- Added is_on_own_line() helper to check line-boundary conditions
- Updated detection methods to skip inline patterns
- Removed sanitize_inline_tool_patterns() and LBRACE_HOMOGLYPH (no longer needed)
- Rewrote tests for new behavior
- Added streaming_repro tests that use process_chunk() to verify the exact bug scenario

28 tests covering: streaming repro, line boundaries, Unicode, code contexts, edge cases
2026-01-15 13:49:29 +05:30
Dhanji R. Prasanna
616e0898c7 Add performance deep cuts and parameterize guidance
Performance:
- Beware list-ref in a loop (O(n²) trap)
- Consolidated performance section with data structure selection rationale
- for/fold for single-pass result building

Parameters and dynamic scope:
- Good uses: ports, logging, config, test fixtures
- Bad uses: hidden global state, implicit argument passing
- Document when functions read from parameters

Also simplified Continuations section (parameterize now has its own section).
2026-01-15 13:49:29 +05:30
Dhanji R. Prasanna
52cd19a015 Refine carmack.racket.md with deeper Racket idioms
Major improvements:
- Iteration idioms: for/fold example, for*/list, in-naturals for indices
- Data structure mutability: when to use mutable hash/vector/box
- let/let*/define style: use let* when order matters
- Contracts section: when to use define/contract, ->i, boundary focus
- Naming: -ref/-set/-update suffixes for custom types
- Size heuristics: semantic ('one abstraction per module') not numeric
- Module hygiene: explicit provides only, contract-out when correctness matters

Removed:
- Packages/tooling section (covered in base racket.md injection)

Now 119 lines of actionable, non-obvious Racket guidance.
2026-01-15 13:49:29 +05:30
Dhanji R. Prasanna
e222b9affc Add non-obvious Racket style guide recommendations
From docs.racket-lang.org/style, added only the non-obvious tips:
- Prefer define over let/let* (reduces indentation)
- Put provide before require (interface at top)
- Use racket/base for libraries (faster loading)
- Naming: prefix functions with data type (board-free-spaces)
- Use in-list/in-vector explicitly in for loops (performance)
- Use module+ test submodules with raco test
- Size limits: ~500 lines/module, ~66 lines/function

Skipped basic conventions LLMs already know (predicate suffixes, etc).
2026-01-15 13:49:29 +05:30
Dhanji R. Prasanna
5ad9fb3718 Improve carmack.racket.md with code examples and Racket-specific guidance
Changes:
- Add concrete code examples for match/cond and contract-out
- Add Phase separation section (for-syntax vs runtime)
- Add Continuations section (call/ec over call/cc, parameterize)
- Add Concurrency section (places, threads, channels, sync)
- Add Gotchas section (eq?/equal?/eqv?, null?/empty?, string=?)
- Tighten Packages/tooling (raco pkg install --auto, info.rkt)

Removed generic advice:
- 'Don't swallow exceptions' (obvious)
- 'Add docstrings/comments' (obvious)
- 'Include runnable examples' (obvious)
- 'Optimize the bottleneck only' (obvious)
- Entire 'Output expectations' section (meta, not Racket-specific)
- Removed oddly specific 'file/sha1, file-watch' reference
2026-01-15 13:49:29 +05:30
Dhanji R. Prasanna
65807eea99 Add carmack.rust.md agent-specific language prompt
Rust-specific readability guidance for the carmack agent including:
- let...else example for shallow control flow
- Async: don't block the runtime (tokio::fs, spawn_blocking, Send)
- Visibility: prefer pub(crate), private fields with accessors
- Generics: impl Trait over explicit params, avoid complex where clauses
- Improved iterator guidance: if you need a comment, use a loop
- UTF-8 string slicing warnings
- Ownership/lifetime pragmatism
- Anti-patterns: no macros/typestate/proc-macros unless already in repo

Also adds Rust detection to LANGUAGE_PROMPTS (empty base prompt,
agent-specific prompts handle the guidance).
2026-01-15 13:49:29 +05:30
Jochen
6d1aa62ba7 Merge pull request #63 from cjustice/fix/tracing-subscriber-panic
Fix tracing subscriber panic in scout agent
2026-01-15 12:54:31 +11:00
Jochen
0bca05a1ba Merge pull request #62 from cjustice/fix/planning-verbose-flag
Fix: Initialize logging before planning mode check
2026-01-15 12:51:11 +11:00
Dhanji R. Prasanna
85ea8fe69c Update project memory with agent-specific language prompts
Document the new agent+language prompt injection feature including:
- AGENT_LANGUAGE_PROMPTS static array location
- get_agent_language_prompt() and get_agent_language_prompts_for_workspace_with_langs()
- File naming pattern: prompts/langs/<agent>.<lang>.md
- Instructions for adding new agent+lang prompts
2026-01-15 06:43:42 +05:30
Dhanji R. Prasanna
04e3c69b0a Add --accept flag to studio run command
Automatically accept the session after g3 completes successfully,
but only if there are commits on the branch.

Changes:
- Add --accept flag to Run command (stripped, not passed to g3)
- Add has_commits_on_branch() helper using git rev-list --count
- Auto-accept triggers merge to main and cleanup when:
  1. g3 exits successfully (exit code 0)
  2. Branch has commits ahead of main
- Show warning if --accept set but no commits exist

Usage: studio run --agent carmack --accept
2026-01-15 06:43:35 +05:30
Dhanji R. Prasanna
5d8dbc43f8 Add agent-specific language prompt injection
When running in agent mode (e.g., --agent carmack) in a workspace with
detected languages, inject agent+language-specific prompts from
prompts/langs/<agent>.<lang>.md at the end of the system prompt.

Changes:
- Add AGENT_LANGUAGE_PROMPTS static array for compile-time embedding
- Add get_agent_language_prompt() to look up specific agent+lang combos
- Add get_agent_language_prompts_for_workspace_with_langs() that returns
  both content and matched languages for display
- Update agent_mode.rs to inject prompts and show which languages loaded
- Display format: '✓ carmack: racket language guidance'
- Add tests for new functionality

Uses the same detect_languages() mechanism as regular language prompts
to avoid code-path aliasing.
2026-01-15 06:43:29 +05:30
Dhanji R. Prasanna
eefc067aae Add carmack.racket.md agent-specific language prompt
Racket-specific guidance for the carmack agent including:
- Idiomatic Racket patterns (match, for/*, cond)
- Module organization with explicit provide lists
- Contracts and type boundaries
- Data modeling with structs
- Error handling best practices
- IO, paths, and portability
- Performance considerations
- Macro guidelines
- Testing with rackunit
2026-01-15 06:43:20 +05:30
Connor Justice
fa29a64e51 Simplify logging initialization comment
Removed unnecessary comment about logging initialization.
2026-01-14 17:53:04 -05:00
Connor Justice
505225c0bd fix: prevent panic when tracing subscriber already initialized
Use try_init() instead of init() for tracing subscriber setup to
gracefully handle cases where a global subscriber is already set.

This fixes a panic in the scout agent subprocess when spawned by the
research tool, where a dependency may have already initialized tracing.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-14 15:33:22 -05:00
Connor Justice
6532442d32 fix: initialize logging before planning mode check
Move initialize_logging() call to run immediately after CLI parsing,
before any mode checks. This ensures the --verbose flag works correctly
in planning mode, which previously bypassed logging initialization.

Previously, planning mode would return early before initialize_logging()
was called, causing verbose output to be silently ignored.
2026-01-14 14:33:44 -05:00
Dhanji R. Prasanna
afec65fd50 Add language-specific prompt injection for toolchain guidance
- Add language_prompts module that auto-detects programming languages in workspace
- Scan for language files with depth limit (2) to inject relevant toolchain prompts
- Add prompts/langs/ directory for language-specific markdown files
- Include Racket/raco toolchain guidance as first language prompt
- Update combine_project_content() to accept language_content parameter
- Integrate language detection into main CLI flow and agent mode
- Update project memory with new feature documentation
2026-01-14 21:00:52 +05:30
Dhanji R. Prasanna
716d598bd8 remove openai specific config example 2026-01-14 20:24:53 +05:30
Dhanji R. Prasanna
affa878992 Add minimal OpenAI example config 2026-01-14 20:21:38 +05:30
Dhanji R. Prasanna
f4562cd4c9 config: default agent settings and provider override 2026-01-14 20:14:33 +05:30
Dhanji R. Prasanna
38828c7757 Clean up tool output formatting
- Shell: " Command executed successfully" → "️ ran successfully"
- Write file: Remove ✏️ emoji, use plain "wrote N lines | M chars"
2026-01-14 19:42:54 +05:30
Dhanji R. Prasanna
9ef064a041 Add guidance to shell tool description to avoid unnecessary cd prefixes
LLMs were prefixing shell commands with `cd <workspace> &&` unnecessarily,
wasting tokens and cluttering CLI display. Added clear guidance in the
shell tool description that commands already execute in the working directory.
2026-01-14 19:00:53 +05:30
Dhanji R. Prasanna
03143ec7f8 Agent Mode Enhancements
• Agent prompts are now embedded within the g3 binary
• README.md - Added new "Agent Mode" section documenting:
  • All 7 built-in agents with their focus areas
  • Usage examples (--list-agents, --agent <name>)
  • How to create custom workspace agents

Behavior
1. Workspace agents take priority - If agents/<name>.md exists in the workspace, it's used
2. Embedded fallback - If no workspace agent exists, the embedded version is used
3. Portability - g3 binary now works on any repo without needing the agents/ directory
4. Discoverability - g3 --list-agents shows all available agents and their source
2026-01-14 16:27:03 +05:30
Dhanji R. Prasanna
5104bd53b6 refactor(g3-core): improve stream_completion_with_tools readability
Extract and simplify the streaming completion function:

- Extract ensure_context_capacity() helper for pre-loop context management
  (thinning + compaction logic now in dedicated async method)
- Simplify compact_summary generation block: flatten nested if/match,
  remove redundant comments, reorder branches for clarity
- Remove dead code: unused _last_error variable and modified_tool_call
- Streamline duplicate detection block: reduce verbose logging
- Clean up text content display block: remove redundant comments,
  tighten variable declarations
- Remove redundant is_todo_tool redefinition inside block expression

Net reduction: 79 lines (-187/+108)
Behavior unchanged, all unit tests pass.

Agent: carmack
2026-01-14 15:11:53 +05:30
Dhanji R. Prasanna
996dc357b4 Skip session resume prompt when --new-session flag is passed
When users explicitly pass --new-session, they want a fresh session.
Previously g3 would still prompt to resume an existing session.
Now the resume check is skipped entirely when the flag is set.
2026-01-14 08:54:35 +05:30
Dhanji R. Prasanna
dea0e6b1ca Compact tool output improvements
- Rename take_screenshot -> screenshot, code_coverage -> coverage (shorter names)
- Align | character across all compact tools (pad to 11 chars for str_replace)
- Make code_search a compact tool with summary display
- Show language and search name in code_search output (e.g., rust:"find structs")
- Add format_code_search_summary() to extract match/file counts from JSON response
2026-01-14 08:12:50 +05:30
Dhanji R. Prasanna
bd25d7dace Merge sessions/fowler/786b20b5 2026-01-14 04:28:06 +05:30
Dhanji R. Prasanna
7d17b436f9 refactor(g3-core): remove 3 unused Agent constructor variants
Remove dead code - constructor variants that had no callers:
- new_with_readme()
- new_autonomous_with_readme()
- new_with_quiet()

These were thin wrappers around new_with_mode_and_readme() that were
never used externally. All 5 remaining constructors have verified callers.

Results:
- lib.rs reduced from 2817 to 2797 lines (-20 lines)
- Eliminated code-path aliasing: 8 constructors → 5 constructors
- All g3-core tests pass
- Full workspace compiles cleanly

Agent: fowler
2026-01-14 04:26:42 +05:30
Dhanji R. Prasanna
21eb4f2d30 Only show Chrome diagnostics when there are issues
Silence the diagnostic report when all checks pass to reduce noise.
2026-01-14 04:25:13 +05:30
Dhanji R. Prasanna
a1dfd9c0b6 Enhanced auto-memory with rich few-shot format
- Updated memory reminder prompt with per-symbol char ranges
- Added two few-shot examples: Session Continuation (feature) + UTF-8 Safe Slicing (pattern)
- Updated system prompt Memory Format section to match
- Format: file -> nested symbols with [start..end] ranges and descriptions
- Enables direct read_file navigation to specific functions
2026-01-13 21:49:48 +05:30
Dhanji R. Prasanna
3a47ebe668 better racket example support 2026-01-13 21:16:14 +05:30
Dhanji R. Prasanna
c2f96d7048 Make WebDriver and Chrome headless enabled by default
- webdriver flag now defaults to true (tools always available)
- chrome_headless flag now defaults to true (Chrome is default browser)
- Use --safari flag to override and use Safari instead
- Updated README documentation to reflect new defaults
2026-01-13 21:14:52 +05:30
Dhanji R. Prasanna
151b8c4658 Add Racket tree-sitter support, remove Kotlin
- Add tree-sitter-racket dependency (v0.24)
- Initialize Racket parser in code search
- Add .rkt, .rktl, .rktd file extensions
- Add test_racket_search test
- Remove Kotlin from supported languages (was disabled)
- Clean up duplicate test files

Supported languages: Rust, Python, JavaScript, TypeScript, Go, Java, C, C++, Racket
2026-01-13 18:44:59 +05:30
Dhanji R. Prasanna
5e45e110e2 refactor(g3-core): extract finalize_streaming_turn() to unify return paths
Extract a single canonical helper function for completing streaming turns,
eliminating 3 nearly-identical return paths in stream_completion_with_tools().

Changes:
- Add finalize_streaming_turn() helper that handles:
  - Finishing streaming markdown
  - Saving context window
  - Adding timing footer (when requested)
  - Dehydrating context (when ACD enabled)
  - Building TaskResult
- Replace 3 duplicated return blocks with calls to the helper
- Remove unused mut on full_response variable

Results:
- Function reduced from 1067 to 999 lines (-68 lines)
- Eliminated code-path aliasing: 3 paths → 1 canonical path
- All 32 characterization tests pass
- Full g3-core test suite passes

Agent: fowler
2026-01-13 16:52:48 +05:30
Dhanji R. Prasanna
333a85ed1e Merge sessions/hopper/e2a0ad02 2026-01-13 16:27:17 +05:30
Dhanji R. Prasanna
b89d55a9ff Add characterization tests for stream_completion_with_tools
Add 32 blackbox characterization tests to lock down the behavior of the
stream_completion_with_tools function (1067 lines) before refactoring.

Tests cover key behaviors through stable boundaries:
- StreamingToolParser: tool call detection, incomplete detection, text accumulation
- Auto-continue logic: autonomous mode decisions, priority ordering
- Duplicate detection: sequential duplicates, cross-message duplicates
- Context window: token tracking, compaction threshold, history preservation
- Tool execution: read_file, shell, write_file, todo tools through Agent
- Streaming utilities: LLM token cleaning, duration formatting, truncation
- Parser sanitization: inline tool pattern handling, homoglyph replacement

These tests intentionally do NOT assert:
- Internal parser state or implementation details
- Specific timing values
- UI output formatting
- Provider-specific behavior

Agent: hopper
2026-01-13 16:25:33 +05:30
Dhanji R. Prasanna
bd756307f1 fowler doesnt need to explicity read README/AGENTS 2026-01-13 16:16:27 +05:30
Dhanji R. Prasanna
47e3a88cf6 refactor(g3-core): extract stats formatting to dedicated module
Extract the get_stats() function (158 lines) from lib.rs to a new stats.rs module.

Changes:
- Create stats.rs with AgentStatsSnapshot struct for capturing agent state
- Replace inline formatting logic with delegation to snapshot.format()
- Add unit tests for stats formatting (empty and populated states)
- Reduce lib.rs from 2961 to 2818 lines (-143 lines)

The new module improves:
- Testability: Stats formatting can now be unit tested in isolation
- Separation of concerns: Formatting logic is decoupled from Agent struct
- Readability: lib.rs is more focused on core agent behavior

All 271 workspace tests pass.

Agent: fowler
2026-01-13 16:11:53 +05:30
Dhanji R. Prasanna
562c4199f8 docs: Add Studio documentation and UTF-8 safety invariants
README.md:
- Add Studio section documenting the multi-agent workspace manager
- Document usage: run, list, status, accept, discard commands
- Explain worktree-based isolation and workflow

AGENTS.md:
- Add UTF-8 safe string slicing as critical invariant (#8)
- Add MUST NOT for byte-index slicing on multi-byte text (#5)
- Document parser sanitization as dangerous/subtle code path
  (prevents parser poisoning from inline tool-call JSON patterns)

Agent: lamport
2026-01-13 15:31:01 +05:30
Dhanji R. Prasanna
9a3b03a41f Remove flock mode (superseded by studio)
Flock mode has been superseded by the studio multi-agent workspace manager.

Changes:
- Remove g3-ensembles crate entirely
- Remove --project, --flock-workspace, --segments, --flock-max-turns CLI flags
- Remove run_flock_mode() from autonomous.rs
- Remove flock-related tests from cli_integration_test.rs
- Update README.md, docs/architecture.md, analysis/memory.md
- Delete docs/FLOCK_MODE.md
2026-01-13 15:01:12 +05:30
Dhanji R. Prasanna
82c0165765 Fix unused variable warning and UTF-8 panic in string slicing
- Remove unused total_lines variable in file_ops.rs
- Fix UTF-8 boundary panic in utils.rs when generating diff error preview
  The code was slicing at byte index 200 which could land inside a
  multi-byte character (e.g., box-drawing chars like ─). Now uses
  character-based slicing with chars().take() instead.
2026-01-13 14:52:52 +05:30
Dhanji R. Prasanna
c65d082c5d Make --agent optional in Studio for one-shot mode
Studio can now run g3 without specifying an agent:

  # Agent mode (existing)
  studio run --agent carmack "fix the bug"

  # One-shot mode (new)
  studio run "fix the bug"

When no agent is specified, sessions are created under the 'single'
directory in .worktrees/sessions/single/<session-id>/

This makes Studio a complete replacement for Flock mode.
2026-01-13 14:42:20 +05:30
Dhanji R. Prasanna
f6b84d864a Rename G3 -> g3 in docs and comments
Standardize project name to lowercase 'g3' throughout documentation,
comments, and configuration files. Environment variables (G3_*) are
unchanged as they follow the uppercase convention.
2026-01-13 14:36:33 +05:30