diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000..7779dfa --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,205 @@ +# AGENTS.md - Machine Instructions for G3 + +**Last updated**: January 2025 +**Purpose**: Enable AI agents to work safely and effectively with this codebase + +## System Overview + +G3 is an AI coding agent built in Rust. It uses LLM providers to execute tasks through a tool-based interface. The codebase is organized as a Cargo workspace with 9 crates. + +### Quick Reference + +| Crate | Purpose | Stability | +|-------|---------|----------| +| `g3-core` | Agent engine, tools, context management | Stable | +| `g3-providers` | LLM provider abstractions | Stable | +| `g3-cli` | Command-line interface | Stable | +| `g3-config` | Configuration management | Stable | +| `g3-execution` | Code execution | Stable | +| `g3-computer-control` | Computer automation | Experimental | +| `g3-planner` | Planning mode | Stable | +| `g3-ensembles` | Multi-agent (flock) mode | Experimental | +| `g3-console` | Web monitoring console | Experimental | + +## Critical Invariants + +### MUST Hold + +1. **Tool calls must be valid JSON** - The streaming parser expects well-formed tool calls +2. **Context window limits must be respected** - Exceeding limits causes API errors +3. **Provider trait implementations must be Send + Sync** - Required for async runtime +4. **Session IDs must be unique** - Used for log file paths and TODO scoping +5. **File paths in tools support tilde expansion** - `~` expands to home directory + +### MUST NOT Do + +1. **Never block the async runtime** - Use `tokio::spawn` for CPU-intensive work +2. **Never store secrets in logs** - API keys are redacted in error logs +3. **Never modify files outside working directory without explicit permission** +4. **Never assume tool results fit in context** - Large results are thinned automatically + +## Recommended Entry Points + +### For Understanding the System + +1. `src/main.rs` - Entry point (trivial) +2. `crates/g3-cli/src/lib.rs` - CLI logic and execution modes +3. `crates/g3-core/src/lib.rs` - Agent struct and orchestration +4. `crates/g3-providers/src/lib.rs` - Provider trait definition + +### For Adding Features + +1. **New tool**: `crates/g3-core/src/tool_definitions.rs` → `crates/g3-core/src/tools/` +2. **New provider**: `crates/g3-providers/src/` → implement `LLMProvider` trait +3. **New CLI mode**: `crates/g3-cli/src/lib.rs` +4. **New config option**: `crates/g3-config/src/lib.rs` + +### For Debugging + +1. Session logs: `.g3/sessions//session.json` +2. Error logs: `logs/errors/` +3. Context state: Use `/stats` command in interactive mode + +## Dangerous/Subtle Code Paths + +### Context Window Management (`g3-core/src/context_window.rs`) + +- **Thinning**: Automatically replaces large tool results with file references +- **Summarization**: Compresses conversation history at 80% capacity +- **Token estimation**: Uses character-based heuristics, not exact tokenization +- **Risk**: Incorrect token estimates can cause context overflow + +### Streaming Parser (`g3-core/src/streaming_parser.rs`) + +- Parses LLM responses in real-time for tool calls +- Must handle partial JSON across chunk boundaries +- **Risk**: Malformed responses can cause parsing failures + +### Tool Dispatch (`g3-core/src/tool_dispatch.rs`) + +- Routes tool calls to implementations +- Handles both native and JSON-based tool calling +- **Risk**: Missing dispatch cases cause silent failures + +### Retry Logic (`g3-core/src/retry.rs`) + +- Exponential backoff with jitter +- Different configs for interactive vs autonomous mode +- **Risk**: Aggressive retries can hit rate limits harder + +## Performance Constraints + +1. **Streaming is preferred** - Non-streaming requests block UI +2. **Tool results are size-limited** - Large outputs are truncated or thinned +3. **Concurrent tool calls** - Enabled by `allow_multiple_tool_calls` config +4. **Background processes** - Long-running commands use `background_process` tool + +## Testing Strategy + +### Test Locations + +- Unit tests: `crates/*/tests/` +- Integration tests: `crates/*/tests/` +- Test fixtures: `examples/test_code/` + +### Running Tests + +```bash +# All tests +cargo test + +# Specific crate +cargo test -p g3-core + +# With output +cargo test -- --nocapture +``` + +### Test Considerations + +- Provider tests may require API keys +- Computer control tests require OS permissions +- WebDriver tests require browser setup + +## Do's and Don'ts for Automated Changes + +### Do + +- ✅ Run `cargo check` after modifications +- ✅ Run `cargo test` before committing +- ✅ Update tool definitions when adding tools +- ✅ Add tests for new functionality +- ✅ Use existing patterns for similar features +- ✅ Keep functions under 80 lines +- ✅ Update documentation for user-facing changes + +### Don't + +- ❌ Modify `Cargo.toml` dependencies without justification +- ❌ Add blocking code in async contexts +- ❌ Store sensitive data in plain text +- ❌ Ignore error handling +- ❌ Create deeply nested conditionals (>6 levels) +- ❌ Add external dependencies for simple tasks + +## Common Incorrect Assumptions + +1. **"All providers support tool calling"** - Embedded models use JSON fallback +2. **"Context window is unlimited"** - Each provider has limits (4k-200k tokens) +3. **"Tool results are always small"** - File reads can return megabytes +4. **"Sessions persist across runs"** - Sessions are ephemeral by default +5. **"All platforms are equal"** - macOS has more features (Vision, Accessibility) + +## Architecture Decisions + +See `DESIGN.md` for original design rationale. + +Key decisions: +- **Rust for performance and safety** - Async runtime, memory safety +- **Workspace structure** - Separation of concerns, independent compilation +- **Provider abstraction** - Swap providers without code changes +- **Tool-first philosophy** - Agent acts through tools, not just advice +- **Session-scoped state** - TODO lists, logs tied to sessions + +## File Structure Quick Reference + +``` +g3/ +├── src/main.rs # Entry point +├── crates/ +│ ├── g3-cli/src/ +│ │ ├── lib.rs # CLI logic (~112k chars) +│ │ └── retro_tui.rs # Retro TUI mode +│ ├── g3-core/src/ +│ │ ├── lib.rs # Agent struct (~3400 lines) +│ │ ├── context_window.rs # Context management +│ │ ├── tool_definitions.rs # Tool schemas +│ │ ├── tool_dispatch.rs # Tool routing +│ │ ├── tools/ # Tool implementations +│ │ ├── streaming_parser.rs # Response parsing +│ │ └── retry.rs # Retry logic +│ ├── g3-providers/src/ +│ │ ├── lib.rs # Provider trait +│ │ ├── anthropic.rs # Anthropic Claude +│ │ ├── databricks.rs # Databricks +│ │ ├── openai.rs # OpenAI +│ │ └── embedded.rs # Local models +│ ├── g3-config/src/lib.rs # Configuration +│ ├── g3-planner/src/ # Planning mode +│ ├── g3-ensembles/src/ # Flock mode +│ └── g3-computer-control/src/ # Automation +├── agents/ # Agent personas +├── docs/ # Documentation +└── logs/ # Session logs +``` + +## Pointers to Documentation + +- [Architecture](docs/architecture.md) - System design and data flow +- [Configuration](docs/configuration.md) - Config file format and options +- [Tools Reference](docs/tools.md) - All available tools +- [Providers Guide](docs/providers.md) - LLM provider setup +- [Control Commands](docs/CONTROL_COMMANDS.md) - Interactive commands +- [Code Search](docs/CODE_SEARCH.md) - Tree-sitter search guide +- [Flock Mode](docs/FLOCK_MODE.md) - Multi-agent development +- [macOS Accessibility](docs/macax-tools.md) - macOS automation diff --git a/README.md b/README.md index a42f6fd..4f61b51 100644 --- a/README.md +++ b/README.md @@ -338,6 +338,28 @@ G3 automatically saves session logs for each interaction in the `logs/` director The `logs/` directory is created automatically on first use and is excluded from version control. +## Documentation Map + +Detailed documentation is available in the `docs/` directory: + +| Document | Description | +|----------|-------------| +| [Architecture](docs/architecture.md) | System design, crate responsibilities, data flow | +| [Configuration](docs/configuration.md) | Config file format, provider setup, all options | +| [Tools Reference](docs/tools.md) | Complete reference for all available tools | +| [Providers Guide](docs/providers.md) | LLM provider setup and selection guide | +| [Control Commands](docs/CONTROL_COMMANDS.md) | Interactive `/` commands for context management | +| [Code Search](docs/CODE_SEARCH.md) | Tree-sitter code search query patterns | +| [Flock Mode](docs/FLOCK_MODE.md) | Parallel multi-agent development | +| [macOS Accessibility](docs/macax-tools.md) | macOS Accessibility API automation | + +For AI agents working with this codebase, see [AGENTS.md](AGENTS.md). + +Additional resources: +- `DESIGN.md` - Original design document and rationale +- `config.example.toml` - Complete configuration example +- `config.coach-player.example.toml` - Multi-role configuration example + ## License MIT License - see LICENSE file for details diff --git a/docs/CODE_SEARCH.md b/docs/CODE_SEARCH.md new file mode 100644 index 0000000..5f38a60 --- /dev/null +++ b/docs/CODE_SEARCH.md @@ -0,0 +1,430 @@ +# G3 Code Search Guide + +**Last updated**: January 2025 +**Source of truth**: `crates/g3-core/src/code_search/`, `crates/g3-core/src/tool_definitions.rs` + +## Purpose + +G3 includes a syntax-aware code search tool powered by tree-sitter. Unlike text-based search (grep), it understands code structure and finds actual functions, classes, methods, and other constructs—ignoring matches in comments and strings. + +## Why Use Code Search? + +| Feature | grep/ripgrep | code_search | +|---------|--------------|-------------| +| Finds text in comments | ✅ | ❌ | +| Finds text in strings | ✅ | ❌ | +| Understands code structure | ❌ | ✅ | +| Finds function definitions | Regex needed | Native | +| Finds class hierarchies | ❌ | ✅ | +| Language-aware | ❌ | ✅ | + +**Use code_search when**: +- Finding function/method definitions +- Finding class/struct declarations +- Searching for specific code constructs +- Need accurate results without false positives + +**Use grep when**: +- Searching non-code files (logs, markdown) +- Simple string searches +- Searching comments or documentation +- Regex for text patterns + +## Supported Languages + +- Rust +- Python +- JavaScript +- TypeScript +- Go +- Java +- C +- C++ +- Kotlin + +## Basic Usage + +```json +{"tool": "code_search", "args": { + "searches": [{ + "name": "my_search", + "query": "(function_item name: (identifier) @name)", + "language": "rust" + }] +}} +``` + +### Parameters + +| Parameter | Type | Required | Description | +|-----------|------|----------|-------------| +| `searches` | array | Yes | Array of search objects (max 20) | +| `max_concurrency` | integer | No | Parallel searches (default: 4) | +| `max_matches_per_search` | integer | No | Max matches (default: 500) | + +### Search Object + +| Field | Type | Required | Description | +|-------|------|----------|-------------| +| `name` | string | Yes | Label for this search | +| `query` | string | Yes | Tree-sitter query (S-expression) | +| `language` | string | Yes | Programming language | +| `paths` | array | No | Paths to search (default: current dir) | +| `context_lines` | integer | No | Lines of context (0-20, default: 0) | + +## Query Syntax + +Tree-sitter queries use S-expression syntax. The basic pattern is: + +``` +(node_type field: (child_type) @capture_name) +``` + +- `node_type`: The AST node to match +- `field`: Optional field name +- `child_type`: Type of child node +- `@capture_name`: Name for the captured node + +## Common Query Patterns + +### Rust + +```lisp +;; All functions +(function_item name: (identifier) @name) + +;; Async functions +(function_item (function_modifiers) name: (identifier) @name) + +;; Structs +(struct_item name: (type_identifier) @name) + +;; Enums +(enum_item name: (type_identifier) @name) + +;; Impl blocks +(impl_item type: (type_identifier) @name) + +;; Trait definitions +(trait_item name: (type_identifier) @name) + +;; Macros +(macro_definition name: (identifier) @name) + +;; Constants +(const_item name: (identifier) @name) + +;; Static variables +(static_item name: (identifier) @name) + +;; Type aliases +(type_item name: (type_identifier) @name) + +;; Modules +(mod_item name: (identifier) @name) +``` + +### Python + +```lisp +;; Functions +(function_definition name: (identifier) @name) + +;; Async functions +(function_definition name: (identifier) @name) @fn + +;; Classes +(class_definition name: (identifier) @name) + +;; Methods (functions inside classes) +(class_definition + body: (block + (function_definition name: (identifier) @name))) + +;; Decorators +(decorator) @decorator + +;; Imports +(import_statement) @import +(import_from_statement) @import +``` + +### JavaScript / TypeScript + +```lisp +;; Function declarations +(function_declaration name: (identifier) @name) + +;; Arrow functions assigned to variables +(variable_declarator + name: (identifier) @name + value: (arrow_function)) + +;; Classes +(class_declaration name: (identifier) @name) + +;; Methods +(method_definition name: (property_identifier) @name) + +;; Exports +(export_statement) @export + +;; Imports +(import_statement) @import +``` + +### Go + +```lisp +;; Functions +(function_declaration name: (identifier) @name) + +;; Methods +(method_declaration name: (field_identifier) @name) + +;; Structs +(type_declaration + (type_spec name: (type_identifier) @name + type: (struct_type))) + +;; Interfaces +(type_declaration + (type_spec name: (type_identifier) @name + type: (interface_type))) +``` + +### Java + +```lisp +;; Classes +(class_declaration name: (identifier) @name) + +;; Interfaces +(interface_declaration name: (identifier) @name) + +;; Methods +(method_declaration name: (identifier) @name) + +;; Constructors +(constructor_declaration name: (identifier) @name) + +;; Fields +(field_declaration + declarator: (variable_declarator name: (identifier) @name)) +``` + +### C / C++ + +```lisp +;; Functions +(function_definition + declarator: (function_declarator + declarator: (identifier) @name)) + +;; Structs (C) +(struct_specifier name: (type_identifier) @name) + +;; Classes (C++) +(class_specifier name: (type_identifier) @name) + +;; Namespaces (C++) +(namespace_definition name: (identifier) @name) +``` + +## Advanced Queries + +### Wildcards + +Use `_` to match any node: + +```lisp +;; Any function with any name +(function_item name: (_) @name) +``` + +### Alternatives + +Match multiple patterns: + +```lisp +;; Functions or methods +[(function_item) (impl_item)] @item +``` + +### Predicates + +Filter matches: + +```lisp +;; Functions starting with "test_" +(function_item name: (identifier) @name + (#match? @name "^test_")) + +;; Functions NOT starting with "_" +(function_item name: (identifier) @name + (#not-match? @name "^_")) +``` + +### Nested Matches + +```lisp +;; Methods inside impl blocks +(impl_item + body: (declaration_list + (function_item name: (identifier) @method_name))) +``` + +## Batch Searches + +Run multiple searches in parallel: + +```json +{"tool": "code_search", "args": { + "searches": [ + { + "name": "functions", + "query": "(function_item name: (identifier) @name)", + "language": "rust" + }, + { + "name": "structs", + "query": "(struct_item name: (type_identifier) @name)", + "language": "rust" + }, + { + "name": "tests", + "query": "(function_item name: (identifier) @name (#match? @name \"^test_\"))", + "language": "rust", + "paths": ["tests/"] + } + ], + "max_concurrency": 4 +}} +``` + +## Context Lines + +Include surrounding code: + +```json +{"tool": "code_search", "args": { + "searches": [{ + "name": "functions", + "query": "(function_item name: (identifier) @name)", + "language": "rust", + "context_lines": 3 + }] +}} +``` + +This shows 3 lines before and after each match. + +## Path Filtering + +Search specific directories: + +```json +{"tool": "code_search", "args": { + "searches": [{ + "name": "core_functions", + "query": "(function_item name: (identifier) @name)", + "language": "rust", + "paths": ["src/core", "src/lib.rs"] + }] +}} +``` + +## Output Format + +Results include: +- File path +- Line number +- Matched code +- Context (if requested) + +``` +=== functions (15 matches) === + +src/lib.rs:42 + fn process_request(req: Request) -> Response { + +src/lib.rs:78 + fn handle_error(err: Error) -> Result<()> { + +src/utils.rs:15 + fn format_output(data: &str) -> String { +``` + +## Tips + +### Finding the Right Query + +1. **Start simple**: Begin with basic node types +2. **Use AST explorer**: Understand your language's AST +3. **Iterate**: Refine queries based on results + +### Performance + +- **Limit paths**: Search specific directories when possible +- **Use concurrency**: Batch related searches +- **Set max_matches**: Prevent overwhelming output + +### Debugging Queries + +If a query returns no results: +1. Check language spelling (lowercase) +2. Verify node type names for your language +3. Start with simpler query, add constraints +4. Check if files exist in search paths + +## Examples by Task + +### Find all public functions in Rust + +```json +{"tool": "code_search", "args": { + "searches": [{ + "name": "public_fns", + "query": "(function_item (visibility_modifier) name: (identifier) @name)", + "language": "rust" + }] +}} +``` + +### Find all test functions + +```json +{"tool": "code_search", "args": { + "searches": [{ + "name": "tests", + "query": "(function_item name: (identifier) @name (#match? @name \"^test_\"))", + "language": "rust", + "paths": ["tests/"] + }] +}} +``` + +### Find all API endpoints (Python Flask) + +```json +{"tool": "code_search", "args": { + "searches": [{ + "name": "routes", + "query": "(decorated_definition (decorator) @dec (function_definition name: (identifier) @name))", + "language": "python" + }] +}} +``` + +### Find all React components + +```json +{"tool": "code_search", "args": { + "searches": [{ + "name": "components", + "query": "(function_declaration name: (identifier) @name (#match? @name \"^[A-Z]\"))", + "language": "javascript", + "paths": ["src/components/"] + }] +}} +``` diff --git a/docs/CONTROL_COMMANDS.md b/docs/CONTROL_COMMANDS.md new file mode 100644 index 0000000..2a92a97 --- /dev/null +++ b/docs/CONTROL_COMMANDS.md @@ -0,0 +1,224 @@ +# G3 Control Commands + +**Last updated**: January 2025 +**Source of truth**: `crates/g3-cli/src/lib.rs` + +## Purpose + +Control commands are special commands you can use during an interactive G3 session to manage context, refresh documentation, and view statistics. They start with `/` and are processed by the CLI, not sent to the LLM. + +## Available Commands + +| Command | Description | +|---------|-------------| +| `/compact` | Manually trigger conversation summarization | +| `/thinnify` | Replace large tool results with file references (first third) | +| `/skinnify` | Full context thinning (entire context window) | +| `/readme` | Reload README.md and AGENTS.md from disk | +| `/stats` | Show detailed context and performance statistics | +| `/help` | Display all available control commands | + +--- + +## /compact + +Manually trigger conversation summarization to reduce context size. + +**When to use**: +- Context usage is getting high (70%+) +- You want to start a new phase of work +- Conversation has accumulated irrelevant history + +**What it does**: +1. Sends conversation history to LLM for summarization +2. Replaces detailed history with concise summary +3. Preserves key decisions and context +4. Significantly reduces token usage + +**Example**: +``` +g3> /compact +📝 Compacting conversation history... +✅ Reduced context from 45,000 to 8,000 tokens (82% reduction) +``` + +**Notes**: +- Summarization uses tokens, so there's a small cost +- Some detail is lost; use before major context shifts +- Auto-triggered at 80% context usage if `auto_compact = true` + +--- + +## /thinnify + +Replace large tool results with file references to save context space. + +**When to use**: +- Large file contents are consuming context +- Tool outputs are taking up space +- You want to preserve conversation structure but reduce size + +**What it does**: +1. Scans the first third of context for large tool results +2. Saves content to `.g3/sessions//thinned/` +3. Replaces inline content with file reference +4. Preserves the ability to re-read if needed + +**Example**: +``` +g3> /thinnify +🔧 Thinning context window... +✅ Thinned 3 large tool results, saved 12,000 characters +``` + +**Notes**: +- Only processes the first third of context (older content) +- Recent tool results are preserved inline +- Auto-triggered at 50%, 60%, 70%, 80% thresholds + +--- + +## /skinnify + +Full context thinning - processes the entire context window. + +**When to use**: +- Context is critically full +- `/thinnify` wasn't enough +- You need maximum space recovery + +**What it does**: +- Same as `/thinnify` but processes entire context +- More aggressive space recovery +- May thin recent tool results too + +**Example**: +``` +g3> /skinnify +🔧 Full context thinning... +✅ Thinned 8 tool results, saved 35,000 characters +``` + +**Notes**: +- Use sparingly; may thin content you still need inline +- Consider `/compact` first for better context preservation + +--- + +## /readme + +Reload README.md and AGENTS.md from disk without restarting. + +**When to use**: +- You've updated project documentation +- AGENTS.md has new instructions +- README.md has changed + +**What it does**: +1. Re-reads README.md from workspace root +2. Re-reads AGENTS.md from workspace root +3. Updates the agent's system context +4. New instructions take effect immediately + +**Example**: +``` +g3> /readme +📖 Reloading documentation... +✅ Loaded README.md (5,234 chars) +✅ Loaded AGENTS.md (2,100 chars) +``` + +**Notes**: +- Useful during iterative documentation updates +- Changes apply to subsequent messages +- Previous context retains old documentation + +--- + +## /stats + +Show detailed context and performance statistics. + +**What it shows**: +- Current context usage (tokens and percentage) +- Session duration +- Token usage breakdown +- Tool call metrics +- Thinning and summarization events +- First-token latency statistics + +**Example**: +``` +g3> /stats +📊 Session Statistics +━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ +Context Usage: 45,230 / 200,000 tokens (22.6%) +Session Duration: 1h 23m 45s +Total Tokens Used: 125,430 +Tool Calls: 47 (45 successful, 2 failed) +Thinning Events: 3 (saved 28,000 chars) +Summarizations: 1 (saved 35,000 chars) +Avg First Token: 1.2s +``` + +--- + +## /help + +Display all available control commands with brief descriptions. + +**Example**: +``` +g3> /help +📚 Available Commands: + /compact - Summarize conversation to reduce context + /thinnify - Replace large tool results with file refs + /skinnify - Full context thinning (entire window) + /readme - Reload README.md and AGENTS.md + /stats - Show context and performance statistics + /help - Show this help message +``` + +--- + +## Context Management Strategy + +G3 automatically manages context, but manual intervention can help: + +### Proactive Management + +1. **Check stats regularly**: Use `/stats` to monitor usage +2. **Thin early**: Use `/thinnify` before hitting thresholds +3. **Compact at transitions**: Use `/compact` when switching tasks + +### Reactive Management + +When context gets high: + +1. **50-70%**: Consider `/thinnify` +2. **70-80%**: Use `/compact` +3. **80-90%**: Use `/skinnify` then `/compact` +4. **90%+**: Auto-summarization triggers + +### Best Practices + +- **Long sessions**: Compact periodically to maintain quality +- **Large files**: Thin after reading large codebases +- **Documentation updates**: Use `/readme` instead of restarting +- **Before complex tasks**: Ensure adequate context space + +--- + +## Automatic Context Management + +G3 performs automatic context management: + +| Threshold | Action | +|-----------|--------| +| 50% | Thin oldest third of context | +| 60% | Thin oldest third of context | +| 70% | Thin oldest third of context | +| 80% | Auto-summarization (if `auto_compact = true`) | +| 90% | Aggressive thinning before tool calls | + +Manual commands give you finer control over when and how this happens. diff --git a/docs/FLOCK_MODE.md b/docs/FLOCK_MODE.md new file mode 100644 index 0000000..86e6882 --- /dev/null +++ b/docs/FLOCK_MODE.md @@ -0,0 +1,397 @@ +# G3 Flock Mode Guide + +**Last updated**: January 2025 +**Source of truth**: `crates/g3-ensembles/src/flock.rs` + +## Purpose + +Flock mode enables parallel multi-agent development by spawning multiple G3 agent instances that work on different parts of a project simultaneously. This is useful for large projects with modular architectures where independent components can be developed in parallel. + +## Overview + +In Flock mode: +- Multiple agent instances run concurrently +- Each agent works on a specific module or component +- Agents operate independently but share the same codebase +- Progress is tracked and coordinated centrally + +``` +┌─────────────────────────────────────────────────────────┐ +│ Flock Coordinator │ +│ │ +│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ +│ │ Agent 1 │ │ Agent 2 │ │ Agent 3 │ │ Agent N │ │ +│ │ Module A│ │ Module B│ │ Module C│ │ Module N│ │ +│ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │ +│ │ │ │ │ │ +│ ▼ ▼ ▼ ▼ │ +│ ┌─────────────────────────────────────────────────┐ │ +│ │ Shared Codebase │ │ +│ └─────────────────────────────────────────────────┘ │ +└─────────────────────────────────────────────────────────┘ +``` + +## When to Use Flock Mode + +**Good candidates**: +- Microservices architectures +- Projects with independent modules +- Large refactoring across multiple files +- Parallel feature development +- Test suite expansion + +**Not recommended for**: +- Tightly coupled code +- Sequential dependencies +- Small projects +- Single-file changes + +## Configuration + +Flock mode is configured through a YAML manifest file: + +```yaml +# flock.yaml +name: "my-project-flock" +description: "Parallel development of project modules" + +# Global settings +settings: + max_agents: 4 + timeout_minutes: 60 + provider: "anthropic.default" + +# Agent definitions +agents: + - name: "api-agent" + description: "Develops the REST API layer" + working_dir: "src/api" + requirements: | + Implement REST endpoints for user management: + - GET /users + - POST /users + - GET /users/{id} + - PUT /users/{id} + - DELETE /users/{id} + + - name: "db-agent" + description: "Develops the database layer" + working_dir: "src/db" + requirements: | + Implement database models and queries: + - User model with CRUD operations + - Connection pooling + - Migration support + + - name: "test-agent" + description: "Writes integration tests" + working_dir: "tests" + requirements: | + Write integration tests for: + - API endpoints + - Database operations + - Error handling +``` + +## Usage + +### Starting a Flock + +```bash +# Start flock with manifest +g3 --flock flock.yaml + +# Start with specific agents only +g3 --flock flock.yaml --agents api-agent,db-agent + +# Start with custom timeout +g3 --flock flock.yaml --timeout 120 +``` + +### Monitoring Progress + +Flock mode provides real-time status updates: + +``` +🐦 Flock Status: my-project-flock +━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ + + api-agent [████████░░] 80% Implementing DELETE endpoint + db-agent [██████████] 100% ✅ Complete + test-agent [██████░░░░] 60% Writing error handling tests + +Elapsed: 15m 32s | Tokens: 45,230 | Errors: 0 +``` + +### Stopping a Flock + +```bash +# Graceful stop (wait for current tasks) +Ctrl+C + +# Force stop all agents +Ctrl+C Ctrl+C +``` + +## Agent Communication + +Agents in a flock operate independently but can: + +1. **Read shared files**: All agents can read the entire codebase +2. **Write to their area**: Each agent writes to its designated working directory +3. **Signal completion**: Agents report when their tasks are done +4. **Report errors**: Failures are logged and can trigger coordinator action + +### Conflict Prevention + +To prevent conflicts: +- Assign non-overlapping working directories +- Use clear module boundaries +- Define explicit interfaces between modules +- Run integration after all agents complete + +## Status Tracking + +Flock status is tracked in `.g3/flock/`: + +``` +.g3/flock/ +├── status.json # Overall flock status +├── api-agent/ +│ ├── session.json # Agent session log +│ └── todo.g3.md # Agent's TODO list +├── db-agent/ +│ ├── session.json +│ └── todo.g3.md +└── test-agent/ + ├── session.json + └── todo.g3.md +``` + +### Status File Format + +```json +{ + "flock_name": "my-project-flock", + "started_at": "2025-01-03T10:00:00Z", + "status": "running", + "agents": [ + { + "name": "api-agent", + "status": "running", + "progress": 80, + "current_task": "Implementing DELETE endpoint", + "tokens_used": 15000, + "errors": 0 + } + ] +} +``` + +## Best Practices + +### 1. Define Clear Boundaries + +```yaml +# Good: Clear module separation +agents: + - name: "frontend" + working_dir: "src/frontend" + - name: "backend" + working_dir: "src/backend" + +# Bad: Overlapping directories +agents: + - name: "agent1" + working_dir: "src" + - name: "agent2" + working_dir: "src/utils" # Overlaps with agent1! +``` + +### 2. Specify Interfaces First + +Define shared interfaces before parallel development: + +```yaml +agents: + - name: "interface-agent" + priority: 1 # Runs first + requirements: | + Define shared interfaces in src/interfaces/: + - UserService trait + - DatabaseConnection trait + - Error types + + - name: "impl-agent" + priority: 2 # Runs after interfaces + depends_on: ["interface-agent"] + requirements: | + Implement UserService trait... +``` + +### 3. Use Appropriate Granularity + +- **Too few agents**: Doesn't leverage parallelism +- **Too many agents**: Coordination overhead, potential conflicts +- **Sweet spot**: 2-6 agents for most projects + +### 4. Include a Test Agent + +Always include an agent for testing: + +```yaml +agents: + - name: "test-agent" + working_dir: "tests" + requirements: | + Write tests for all new functionality. + Run tests after other agents complete. +``` + +### 5. Plan for Integration + +After flock completion: + +```bash +# Run all tests +cargo test + +# Check for conflicts +git status + +# Review changes +git diff +``` + +## Error Handling + +### Agent Failures + +If an agent fails: +1. Error is logged to agent's session +2. Coordinator is notified +3. Other agents continue (by default) +4. Failed agent can be restarted + +### Restart Failed Agent + +```bash +# Restart specific agent +g3 --flock flock.yaml --restart api-agent + +# Restart all failed agents +g3 --flock flock.yaml --restart-failed +``` + +### Conflict Resolution + +If agents modify the same file: +1. Last write wins (by default) +2. Conflicts are logged +3. Manual resolution may be needed + +## Resource Management + +### Token Usage + +Each agent has its own token budget: + +```yaml +settings: + max_tokens_per_agent: 100000 + total_token_budget: 500000 +``` + +### Concurrency + +Limit concurrent agents based on: +- API rate limits +- System resources +- Provider capacity + +```yaml +settings: + max_concurrent_agents: 3 # Run at most 3 at once +``` + +## Example: Microservices Project + +```yaml +name: "microservices-flock" + +settings: + max_agents: 5 + provider: "anthropic.default" + +agents: + - name: "user-service" + working_dir: "services/user" + requirements: | + Implement user service: + - User registration + - Authentication + - Profile management + + - name: "order-service" + working_dir: "services/order" + requirements: | + Implement order service: + - Order creation + - Order status tracking + - Payment integration + + - name: "inventory-service" + working_dir: "services/inventory" + requirements: | + Implement inventory service: + - Stock management + - Availability checking + - Reorder alerts + + - name: "gateway" + working_dir: "services/gateway" + requirements: | + Implement API gateway: + - Request routing + - Authentication middleware + - Rate limiting + + - name: "integration-tests" + working_dir: "tests/integration" + depends_on: ["user-service", "order-service", "inventory-service", "gateway"] + requirements: | + Write integration tests for: + - End-to-end order flow + - Service communication + - Error scenarios +``` + +## Limitations + +- **No real-time coordination**: Agents don't communicate during execution +- **File conflicts**: Possible if boundaries aren't clear +- **Resource intensive**: Multiple LLM calls in parallel +- **Debugging complexity**: Multiple logs to review + +## Troubleshooting + +### Agents Not Starting + +1. Check manifest syntax (YAML) +2. Verify working directories exist +3. Check provider configuration +4. Review logs in `.g3/flock/` + +### Slow Progress + +1. Reduce number of concurrent agents +2. Check for rate limiting +3. Simplify requirements +4. Use faster provider + +### Inconsistent Results + +1. Define clearer interfaces +2. Add more specific requirements +3. Use lower temperature +4. Add validation steps diff --git a/docs/architecture.md b/docs/architecture.md new file mode 100644 index 0000000..9598717 --- /dev/null +++ b/docs/architecture.md @@ -0,0 +1,363 @@ +# G3 Architecture + +**Last updated**: January 2025 +**Source of truth**: Crate structure in `crates/`, `Cargo.toml`, `DESIGN.md` + +## Purpose + +This document describes the internal architecture of G3, a modular AI coding agent built in Rust. It is intended for developers who want to understand, extend, or maintain the codebase. + +## High-Level Overview + +G3 follows a **tool-first philosophy**: instead of just providing advice, it actively uses tools to read files, write code, execute commands, and complete tasks autonomously. + +``` +┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ +│ g3-cli │ │ g3-core │ │ g3-providers │ +│ │ │ │ │ │ +│ • CLI parsing │◄──►│ • Agent engine │◄──►│ • Anthropic │ +│ • Interactive │ │ • Context mgmt │ │ • Databricks │ +│ • Retro TUI │ │ • Tool system │ │ • OpenAI │ +│ • Autonomous │ │ • Streaming │ │ • Embedded │ +│ mode │ │ • Task exec │ │ (llama.cpp) │ +│ │ │ • TODO mgmt │ │ • OAuth flow │ +└─────────────────┘ └─────────────────┘ └─────────────────┘ + │ │ │ + └───────────────────────┼───────────────────────┘ + │ + ┌───────────────────────┼───────────────────────┐ + │ │ │ +┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ +│ g3-execution │ │ g3-config │ │ g3-planner │ +│ │ │ │ │ │ +│ • Code exec │ │ • TOML config │ │ • Requirements │ +│ • Shell cmds │ │ • Env overrides │ │ • Git ops │ +│ • Streaming │ │ • Provider │ │ • Planning │ +│ • Error hdlg │ │ settings │ │ workflow │ +└─────────────────┘ └─────────────────┘ └─────────────────┘ + │ │ │ + │ ┌─────────────────┐ │ + │ │ g3-computer- │ │ + └─────────────►│ control │◄─────────────┘ + │ • Mouse/kbd │ + │ • Screenshots │ + │ • OCR/Vision │ + │ • WebDriver │ + │ • macOS Ax API │ + └─────────────────┘ + │ + ┌───────────────────────┼───────────────────────┐ + │ │ │ +┌─────────────────┐ ┌─────────────────┐ +│ g3-ensembles │ │ g3-console │ +│ │ │ │ +│ • Flock mode │ │ • Web console │ +│ • Multi-agent │ │ • Process mgmt │ +│ • Parallel dev │ │ • Log viewing │ +└─────────────────┘ └─────────────────┘ +``` + +## Workspace Structure + +G3 is organized as a Rust workspace with 9 crates: + +``` +g3/ +├── src/main.rs # Entry point (delegates to g3-cli) +├── crates/ +│ ├── g3-cli/ # Command-line interface and TUI +│ ├── g3-core/ # Core agent engine and tools +│ ├── g3-providers/ # LLM provider abstractions +│ ├── g3-config/ # Configuration management +│ ├── g3-execution/ # Code execution engine +│ ├── g3-computer-control/ # Computer automation +│ ├── g3-planner/ # Planning mode workflow +│ ├── g3-ensembles/ # Multi-agent (flock) mode +│ └── g3-console/ # Web monitoring console +├── agents/ # Agent persona definitions +├── logs/ # Session logs (auto-created) +└── g3-plan/ # Planning artifacts +``` + +## Crate Responsibilities + +### g3-core (Central Hub) + +**Location**: `crates/g3-core/` +**Purpose**: Core agent engine, tool system, and orchestration logic + +Key modules: +- `lib.rs` - Main `Agent` struct and orchestration (~3400 lines) +- `context_window.rs` - Token tracking and context management +- `streaming_parser.rs` - Real-time LLM response parsing +- `tool_definitions.rs` - JSON schema definitions for all tools +- `tool_dispatch.rs` - Routes tool calls to implementations +- `tools/` - Tool implementations (file ops, shell, vision, webdriver, etc.) +- `error_handling.rs` - Error classification and recovery +- `retry.rs` - Retry logic with exponential backoff +- `prompts.rs` - System prompt generation +- `code_search/` - Tree-sitter based code search + +**Key types**: +- `Agent` - Main agent struct, generic over UI output +- `ContextWindow` - Manages conversation history and token limits +- `StreamingToolParser` - Parses streaming LLM responses for tool calls +- `ToolCall` - Represents a tool invocation + +### g3-providers (LLM Abstraction) + +**Location**: `crates/g3-providers/` +**Purpose**: Unified interface for multiple LLM backends + +Key modules: +- `lib.rs` - `LLMProvider` trait and `ProviderRegistry` +- `anthropic.rs` - Anthropic Claude API (~51k chars) +- `databricks.rs` - Databricks Foundation Models (~58k chars) +- `openai.rs` - OpenAI and compatible APIs (~18k chars) +- `embedded.rs` - Local models via llama.cpp (~34k chars) +- `oauth.rs` - OAuth authentication flow + +**Key traits**: +```rust +#[async_trait] +pub trait LLMProvider: Send + Sync { + async fn complete(&self, request: CompletionRequest) -> Result; + async fn stream(&self, request: CompletionRequest) -> Result; + fn name(&self) -> &str; + fn model(&self) -> &str; + fn has_native_tool_calling(&self) -> bool; + fn supports_cache_control(&self) -> bool; + fn max_tokens(&self) -> u32; + fn temperature(&self) -> f32; +} +``` + +### g3-cli (User Interface) + +**Location**: `crates/g3-cli/` +**Purpose**: Command-line interface, TUI, and execution modes + +Key modules: +- `lib.rs` - Main CLI logic and execution modes (~112k chars) +- `retro_tui.rs` - Full-screen retro terminal UI (~63k chars) +- `filter_json.rs` - JSON tool call filtering for display +- `ui_writer_impl.rs` - Console output implementation +- `theme.rs` - Color themes for retro mode + +**Execution modes**: +1. **Single-shot**: `g3 "task description"` - Execute one task and exit +2. **Interactive**: `g3` - REPL-style conversation (default) +3. **Autonomous**: `g3 --autonomous` - Coach-player feedback loop +4. **Accumulative**: Default interactive mode with autonomous runs +5. **Planning**: `g3 --planning` - Requirements-driven development +6. **Retro TUI**: `g3 --retro` - Full-screen terminal interface + +### g3-config (Configuration) + +**Location**: `crates/g3-config/` +**Purpose**: TOML-based configuration management + +Key structures: +- `Config` - Root configuration +- `ProvidersConfig` - Provider settings with named configs +- `AgentConfig` - Agent behavior settings +- `WebDriverConfig` - Browser automation settings +- `MacAxConfig` - macOS Accessibility API settings + +**Configuration hierarchy** (highest priority last): +1. Default configuration +2. `~/.config/g3/config.toml` +3. `./g3.toml` +4. Environment variables (`G3_*`) +5. CLI arguments + +### g3-execution (Code Execution) + +**Location**: `crates/g3-execution/` +**Purpose**: Safe execution of shell commands and scripts + +Features: +- Streaming output capture +- Exit code tracking +- Async execution via Tokio +- Error handling and formatting + +### g3-computer-control (Automation) + +**Location**: `crates/g3-computer-control/` +**Purpose**: Cross-platform computer control and automation + +Key modules: +- `platform/` - Platform-specific implementations (macOS, Linux, Windows) +- `webdriver/` - Safari and Chrome WebDriver integration +- `ocr/` - Text extraction (Tesseract, Apple Vision) +- `macax/` - macOS Accessibility API controller + +**Platform support**: +- **macOS**: Core Graphics, Cocoa, screencapture, Vision framework +- **Linux**: X11/Xtest for input +- **Windows**: Win32 APIs + +### g3-planner (Planning Mode) + +**Location**: `crates/g3-planner/` +**Purpose**: Requirements-driven development workflow + +Key modules: +- `planner.rs` - Main planning state machine (~40k chars) +- `state.rs` - Planning state management +- `git.rs` - Git operations +- `code_explore.rs` - Codebase exploration +- `llm.rs` - LLM interactions for planning +- `history.rs` - Planning history tracking + +**Workflow**: +1. Write requirements in `/g3-plan/new_requirements.md` +2. LLM refines requirements +3. Requirements renamed to `current_requirements.md` +4. Coach/player loop implements +5. Files archived with timestamps +6. Git commit with LLM-generated message + +### g3-ensembles (Multi-Agent) + +**Location**: `crates/g3-ensembles/` +**Purpose**: Parallel multi-agent development (Flock mode) + +Key modules: +- `flock.rs` - Flock orchestration (~43k chars) +- `status.rs` - Agent status tracking + +Flock mode enables parallel development by spawning multiple agent instances working on different parts of a project. + +### g3-console (Web Console) + +**Location**: `crates/g3-console/` +**Purpose**: Web-based monitoring and control + +Key modules: +- `main.rs` - Axum web server +- `api/` - REST API endpoints +- `process/` - Process detection and control +- `logs.rs` - Log parsing and streaming + +## Data Flow + +### Request Flow + +``` +User Input + │ + ▼ +┌─────────────┐ +│ g3-cli │ Parse input, determine mode +└─────────────┘ + │ + ▼ +┌─────────────┐ +│ g3-core │ Add to context window +│ Agent │ Build completion request +└─────────────┘ + │ + ▼ +┌─────────────┐ +│ g3-providers│ Send to LLM provider +│ Registry │ Stream response +└─────────────┘ + │ + ▼ +┌─────────────┐ +│ g3-core │ Parse streaming response +│ Parser │ Detect tool calls +└─────────────┘ + │ + ▼ +┌─────────────┐ +│ g3-core │ Execute tools +│ Tools │ Return results +└─────────────┘ + │ + ▼ +┌─────────────┐ +│ g3-core │ Add results to context +│ Agent │ Continue or complete +└─────────────┘ +``` + +### Context Window Management + +The `ContextWindow` struct manages conversation history with intelligent token tracking: + +1. **Token Tracking**: Monitors usage as percentage of provider's context limit +2. **Context Thinning**: At 50%, 60%, 70%, 80% thresholds, replaces large tool results with file references +3. **Auto-Summarization**: At 80% capacity, triggers conversation summarization +4. **Provider Adaptation**: Adjusts to different model context windows (4k to 200k+ tokens) + +## Error Handling + +G3 implements comprehensive error handling: + +1. **Error Classification**: Distinguishes recoverable vs non-recoverable errors +2. **Automatic Retry**: Exponential backoff with jitter for: + - Rate limits (HTTP 429) + - Network errors + - Server errors (HTTP 5xx) + - Timeouts +3. **Error Logging**: Detailed logs saved to `logs/errors/` +4. **Graceful Degradation**: Continues when possible, fails gracefully when not + +## Session Management + +Sessions are tracked in `.g3/sessions//`: +- `session.json` - Full conversation history and metadata +- `todo.g3.md` - Session-scoped TODO list +- Context summaries and thinned content + +Legacy logs are stored in `logs/g3_session_*.json`. + +## Extension Points + +### Adding a New Tool + +1. Add tool definition in `g3-core/src/tool_definitions.rs` +2. Implement handler in `g3-core/src/tools/` +3. Add dispatch case in `g3-core/src/tool_dispatch.rs` +4. Update system prompt if needed in `g3-core/src/prompts.rs` + +### Adding a New Provider + +1. Implement `LLMProvider` trait in `g3-providers/src/` +2. Add configuration struct in `g3-config/src/lib.rs` +3. Register provider in `g3-core/src/lib.rs` (in `new_with_mode_and_readme`) +4. Update documentation + +### Adding a New Execution Mode + +1. Add CLI arguments in `g3-cli/src/lib.rs` +2. Implement mode logic in the CLI +3. May require new agent methods in `g3-core` + +## Key Files for Understanding + +Start reading here: + +1. `src/main.rs` - Entry point (trivial, delegates to g3-cli) +2. `crates/g3-cli/src/lib.rs` - CLI and execution modes +3. `crates/g3-core/src/lib.rs` - Agent implementation +4. `crates/g3-providers/src/lib.rs` - Provider trait and registry +5. `crates/g3-core/src/tool_definitions.rs` - Available tools +6. `crates/g3-config/src/lib.rs` - Configuration structures +7. `DESIGN.md` - Original design document + +## Dependencies + +Key external dependencies: + +- **tokio**: Async runtime +- **reqwest**: HTTP client for API calls +- **serde/serde_json**: Serialization +- **clap**: CLI argument parsing +- **tree-sitter**: Syntax-aware code search +- **llama_cpp**: Local model inference (with Metal acceleration) +- **fantoccini**: WebDriver client +- **axum**: Web framework (for g3-console) diff --git a/docs/configuration.md b/docs/configuration.md new file mode 100644 index 0000000..1cf9328 --- /dev/null +++ b/docs/configuration.md @@ -0,0 +1,385 @@ +# G3 Configuration Guide + +**Last updated**: January 2025 +**Source of truth**: `crates/g3-config/src/lib.rs`, `config.example.toml` + +## Purpose + +This document explains how to configure G3, including provider setup, agent behavior, and optional features like WebDriver and computer control. + +## Configuration File Location + +G3 looks for configuration files in this order: + +1. Path specified via `--config` CLI argument +2. `./g3.toml` (current directory) +3. `~/.config/g3/config.toml` (user config) +4. `~/.g3.toml` (legacy location) + +If no configuration file exists, G3 creates a default one at `~/.config/g3/config.toml` on first run. + +## Configuration Format + +G3 uses TOML format. The configuration is organized into sections: + +```toml +[providers] # LLM provider settings +[agent] # Agent behavior settings +[computer_control] # Mouse/keyboard automation +[webdriver] # Browser automation +[macax] # macOS Accessibility API +``` + +## Provider Configuration + +### Provider Reference Format + +Providers are referenced using the format: `.` + +Examples: +- `anthropic.default` +- `databricks.production` +- `openai.gpt4` +- `embedded.local` + +### Basic Provider Setup + +```toml +[providers] +# Default provider used for all operations +default_provider = "anthropic.default" + +# Optional: Different providers for different roles +# planner = "anthropic.planner" # Planning mode +# coach = "anthropic.default" # Code reviewer in autonomous mode +# player = "anthropic.default" # Code implementer in autonomous mode +``` + +### Anthropic Configuration + +```toml +[providers.anthropic.default] +api_key = "sk-ant-..." # Required: Your Anthropic API key +model = "claude-sonnet-4-5" # Model to use +max_tokens = 64000 # Max output tokens per request +temperature = 0.3 # Sampling temperature (0.0-1.0) +# cache_config = "ephemeral" # Optional: Enable prompt caching +# enable_1m_context = true # Optional: Enable 1M context (extra cost) +# thinking_budget_tokens = 10000 # Optional: Extended thinking mode +``` + +**Available Anthropic models**: +- `claude-sonnet-4-5` (recommended) +- `claude-opus-4-5` +- `claude-3-5-sonnet-20241022` +- `claude-3-opus-20240229` + +### Databricks Configuration + +```toml +[providers.databricks.default] +host = "https://your-workspace.cloud.databricks.com" # Required +model = "databricks-claude-sonnet-4" # Model endpoint +max_tokens = 4096 +temperature = 0.1 +use_oauth = true # Use OAuth (recommended) +# token = "dapi..." # Or use personal access token +``` + +**OAuth vs Token Authentication**: +- **OAuth** (`use_oauth = true`): Opens browser for authentication, tokens refresh automatically +- **Token** (`token = "..."`, `use_oauth = false`): Uses personal access token directly + +### OpenAI Configuration + +```toml +[providers.openai.default] +api_key = "sk-..." # Required: Your OpenAI API key +model = "gpt-4-turbo" # Model to use +max_tokens = 4096 +temperature = 0.1 +# base_url = "https://api.openai.com/v1" # Optional: Custom endpoint +``` + +### OpenAI-Compatible Providers + +For services with OpenAI-compatible APIs (OpenRouter, Groq, Together, etc.): + +```toml +[providers.openai_compatible.openrouter] +api_key = "sk-or-..." # Provider's API key +model = "anthropic/claude-3.5-sonnet" +base_url = "https://openrouter.ai/api/v1" +max_tokens = 4096 +temperature = 0.1 + +[providers.openai_compatible.groq] +api_key = "gsk_..." +model = "llama-3.3-70b-versatile" +base_url = "https://api.groq.com/openai/v1" +max_tokens = 4096 +temperature = 0.1 +``` + +Reference these as `openrouter.default` or `groq.default` in `default_provider`. + +### Embedded (Local) Models + +```toml +[providers.embedded.default] +model_path = "~/.cache/g3/models/qwen2.5-7b-instruct-q3_k_m.gguf" +model_type = "qwen" # Model architecture +context_length = 32768 # Context window size +max_tokens = 2048 # Max output tokens +temperature = 0.1 +gpu_layers = 32 # Layers to offload to GPU (Metal/CUDA) +threads = 8 # CPU threads for inference +``` + +**Supported model types**: `qwen`, `codellama`, `llama`, `mistral` + +**Hardware requirements**: +- 4-16GB RAM depending on model size +- Optional GPU acceleration (Metal on macOS, CUDA on Linux) + +## Agent Configuration + +```toml +[agent] +# Context and token settings +fallback_default_max_tokens = 8192 # Default max tokens if provider doesn't specify +# max_context_length = 200000 # Override context window size for all providers + +# Behavior settings +enable_streaming = true # Stream responses in real-time +allow_multiple_tool_calls = true # Allow multiple tools per response +timeout_seconds = 60 # Request timeout +auto_compact = true # Auto-compact context at 90% + +# Retry settings +max_retry_attempts = 3 # Retries for interactive mode +autonomous_max_retry_attempts = 6 # Retries for autonomous mode + +# TODO management +check_todo_staleness = true # Warn about stale TODO items +``` + +### Retry Behavior + +G3 automatically retries on recoverable errors: +- Rate limits (HTTP 429) +- Network errors +- Server errors (HTTP 5xx) +- Timeouts + +**Interactive mode** uses `max_retry_attempts` (default: 3) +**Autonomous mode** uses `autonomous_max_retry_attempts` (default: 6) with longer delays + +## Computer Control Configuration + +```toml +[computer_control] +enabled = false # Set to true to enable +require_confirmation = true # Require confirmation before actions +max_actions_per_second = 5 # Rate limit for safety +``` + +**Required OS permissions**: +- **macOS**: System Preferences → Security & Privacy → Accessibility +- **Linux**: X11 or Wayland access +- **Windows**: Run as administrator (first time) + +## WebDriver Configuration + +```toml +[webdriver] +enabled = false # Set to true to enable +browser = "safari" # "safari" or "chrome-headless" +safari_port = 4444 # Safari WebDriver port +chrome_port = 9515 # ChromeDriver port +# chrome_binary = "/path/to/chrome" # Optional: Custom Chrome path +``` + +### Safari Setup (macOS) + +```bash +# Enable Safari remote automation (one-time setup) +safaridriver --enable + +# Or via Safari UI: +# Safari → Preferences → Advanced → Show Develop menu +# Develop → Allow Remote Automation +``` + +### Chrome Setup + +**Option 1: Chrome for Testing (Recommended)** +```bash +./scripts/setup-chrome-for-testing.sh +``` + +Then configure: +```toml +[webdriver] +chrome_binary = "/Users/yourname/.chrome-for-testing/chrome-mac-arm64/Google Chrome for Testing.app/Contents/MacOS/Google Chrome for Testing" +``` + +**Option 2: System Chrome** +```bash +# macOS +brew install chromedriver + +# Linux +apt install chromium-chromedriver +``` + +## macOS Accessibility API Configuration + +```toml +[macax] +enabled = false # Set to true to enable +``` + +**Required permissions**: System Preferences → Security & Privacy → Privacy → Accessibility → Add your terminal app + +See [macOS Accessibility Tools Guide](macax-tools.md) for detailed usage. + +## Multi-Role Configuration + +For autonomous mode with different models for coach and player: + +```toml +[providers] +default_provider = "anthropic.default" +coach = "anthropic.coach" # Code reviewer +player = "anthropic.player" # Code implementer + +[providers.anthropic.coach] +api_key = "sk-ant-..." +model = "claude-sonnet-4-5" +max_tokens = 32000 +temperature = 0.1 # Lower for consistent reviews + +[providers.anthropic.player] +api_key = "sk-ant-..." +model = "claude-sonnet-4-5" +max_tokens = 64000 +temperature = 0.3 # Higher for creative implementations +``` + +See `config.coach-player.example.toml` for a complete example. + +## Environment Variables + +Environment variables override configuration file settings: + +| Variable | Description | +|----------|-------------| +| `G3_WORKSPACE_PATH` | Override workspace directory | +| `ANTHROPIC_API_KEY` | Anthropic API key | +| `OPENAI_API_KEY` | OpenAI API key | +| `DATABRICKS_HOST` | Databricks workspace URL | +| `DATABRICKS_TOKEN` | Databricks personal access token | + +## CLI Overrides + +CLI arguments have the highest priority: + +```bash +# Override provider +g3 --provider anthropic.default + +# Override model +g3 --model claude-opus-4-5 + +# Enable features +g3 --webdriver # Enable WebDriver (Safari) +g3 --chrome-headless # Enable WebDriver (Chrome headless) +g3 --macax # Enable macOS Accessibility API + +# Specify config file +g3 --config /path/to/config.toml +``` + +## Complete Example Configuration + +```toml +# ~/.config/g3/config.toml + +[providers] +default_provider = "anthropic.default" + +[providers.anthropic.default] +api_key = "sk-ant-api03-..." +model = "claude-sonnet-4-5" +max_tokens = 64000 +temperature = 0.3 + +[providers.databricks.work] +host = "https://mycompany.cloud.databricks.com" +model = "databricks-claude-sonnet-4" +max_tokens = 4096 +temperature = 0.1 +use_oauth = true + +[agent] +fallback_default_max_tokens = 8192 +enable_streaming = true +allow_multiple_tool_calls = true +timeout_seconds = 60 +max_retry_attempts = 3 +autonomous_max_retry_attempts = 6 + +[computer_control] +enabled = false +require_confirmation = true +max_actions_per_second = 5 + +[webdriver] +enabled = true +browser = "safari" +safari_port = 4444 + +[macax] +enabled = false +``` + +## Troubleshooting + +### "Old config format" error + +If you see this error, your config uses a deprecated format. Update to the new named provider format: + +**Old format** (deprecated): +```toml +[providers.anthropic] +api_key = "..." +``` + +**New format**: +```toml +[providers.anthropic.default] +api_key = "..." +``` + +### Provider not found + +Ensure your `default_provider` matches a configured provider: +```toml +default_provider = "anthropic.default" # Must match [providers.anthropic.default] +``` + +### OAuth issues + +For Databricks OAuth: +1. Ensure `use_oauth = true` +2. Remove any `token` setting +3. A browser window will open for authentication +4. Tokens are cached in `~/.databricks/oauth-tokens.json` + +### Context window errors + +If you see context overflow errors: +1. Check `max_context_length` in `[agent]` +2. Use `/compact` command to manually summarize +3. Use `/thinnify` to replace large tool results with file references diff --git a/docs/macax-tools.md b/docs/macax-tools.md new file mode 100644 index 0000000..58d9659 --- /dev/null +++ b/docs/macax-tools.md @@ -0,0 +1,472 @@ +# macOS Accessibility Tools Guide + +**Last updated**: January 2025 +**Source of truth**: `crates/g3-computer-control/src/macax/` + +## Purpose + +G3 includes tools for controlling macOS applications via the Accessibility API. This enables automation of native macOS apps, including those you're building with G3. + +## Overview + +The macOS Accessibility API provides programmatic access to UI elements in any application. G3 exposes this through the `macax_*` tools, allowing you to: + +- List and activate applications +- Inspect UI element hierarchies +- Find elements by role, title, or identifier +- Click buttons and interact with controls +- Read and set values in text fields +- Simulate keyboard input + +## Setup + +### 1. Enable in Configuration + +```toml +# ~/.config/g3/config.toml +[macax] +enabled = true +``` + +Or use the CLI flag: + +```bash +g3 --macax +``` + +### 2. Grant Accessibility Permissions + +1. Open **System Preferences** → **Security & Privacy** → **Privacy** +2. Select **Accessibility** in the left sidebar +3. Click the lock icon and authenticate +4. Add your terminal application (Terminal, iTerm2, etc.) +5. Restart your terminal + +**Note**: If using VS Code's integrated terminal, add VS Code to the list. + +### 3. Verify Setup + +```json +{"tool": "macax_list_apps", "args": {}} +``` + +This should return a list of running applications. + +## Available Tools + +### macax_list_apps + +List all running applications. + +**Parameters**: None + +**Example**: +```json +{"tool": "macax_list_apps", "args": {}} +``` + +**Returns**: +``` +Running Applications: +- Safari (com.apple.Safari) +- Finder (com.apple.finder) +- Terminal (com.apple.Terminal) +- MyApp (com.example.myapp) +``` + +--- + +### macax_get_frontmost_app + +Get the currently active (frontmost) application. + +**Parameters**: None + +**Example**: +```json +{"tool": "macax_get_frontmost_app", "args": {}} +``` + +**Returns**: +``` +Frontmost Application: Safari (com.apple.Safari) +``` + +--- + +### macax_activate_app + +Bring an application to the front. + +**Parameters**: +- `app_name` (string, required): Application name + +**Example**: +```json +{"tool": "macax_activate_app", "args": {"app_name": "Safari"}} +``` + +--- + +### macax_get_ui_tree + +Get the UI element hierarchy of an application. + +**Parameters**: +- `app_name` (string, required): Application name +- `max_depth` (integer, optional): Maximum tree depth (default: 5) + +**Example**: +```json +{"tool": "macax_get_ui_tree", "args": {"app_name": "Calculator", "max_depth": 3}} +``` + +**Returns**: +``` +UI Tree for Calculator: +└── AXApplication "Calculator" + └── AXWindow "Calculator" + ├── AXGroup + │ ├── AXButton "1" [id: digit_1] + │ ├── AXButton "2" [id: digit_2] + │ ├── AXButton "+" [id: add] + │ └── AXButton "=" [id: equals] + └── AXStaticText "0" [id: display] +``` + +**Notes**: +- Use lower `max_depth` for complex apps to avoid overwhelming output +- Elements show role, title, and accessibility identifier (if set) + +--- + +### macax_find_elements + +Find UI elements matching criteria. + +**Parameters**: +- `app_name` (string, required): Application name +- `role` (string, optional): Element role (e.g., "button", "textField") +- `title` (string, optional): Element title/label +- `identifier` (string, optional): Accessibility identifier + +**Example**: +```json +{"tool": "macax_find_elements", "args": { + "app_name": "Safari", + "role": "button" +}} +``` + +**Returns**: +``` +Found 5 elements: +1. AXButton "Back" [id: BackButton] +2. AXButton "Forward" [id: ForwardButton] +3. AXButton "Reload" [id: ReloadButton] +4. AXButton "Share" [id: ShareButton] +5. AXButton "New Tab" [id: NewTabButton] +``` + +--- + +### macax_click + +Click a UI element. + +**Parameters**: +- `app_name` (string, required): Application name +- `identifier` (string, optional): Accessibility identifier +- `title` (string, optional): Element title +- `role` (string, optional): Element role + +At least one of `identifier`, `title`, or `role` must be provided. + +**Examples**: + +```json +// Click by identifier (most reliable) +{"tool": "macax_click", "args": { + "app_name": "Calculator", + "identifier": "digit_5" +}} + +// Click by title +{"tool": "macax_click", "args": { + "app_name": "Calculator", + "title": "5" +}} + +// Click by role and title +{"tool": "macax_click", "args": { + "app_name": "Safari", + "role": "button", + "title": "Reload" +}} +``` + +--- + +### macax_set_value + +Set the value of a UI element (text fields, sliders, etc.). + +**Parameters**: +- `app_name` (string, required): Application name +- `identifier` (string, optional): Accessibility identifier +- `title` (string, optional): Element title +- `value` (string, required): Value to set + +**Example**: +```json +{"tool": "macax_set_value", "args": { + "app_name": "TextEdit", + "role": "textArea", + "value": "Hello, World!" +}} +``` + +--- + +### macax_get_value + +Get the current value of a UI element. + +**Parameters**: +- `app_name` (string, required): Application name +- `identifier` (string, optional): Accessibility identifier +- `title` (string, optional): Element title + +**Example**: +```json +{"tool": "macax_get_value", "args": { + "app_name": "Calculator", + "identifier": "display" +}} +``` + +**Returns**: +``` +Value: 42 +``` + +--- + +### macax_press_key + +Simulate a key press. + +**Parameters**: +- `key` (string, required): Key to press +- `modifiers` (array, optional): Modifier keys + +**Supported modifiers**: `command`, `shift`, `option`, `control` + +**Examples**: + +```json +// Simple key press +{"tool": "macax_press_key", "args": {"key": "a"}} + +// With modifiers (Cmd+S) +{"tool": "macax_press_key", "args": { + "key": "s", + "modifiers": ["command"] +}} + +// Multiple modifiers (Cmd+Shift+N) +{"tool": "macax_press_key", "args": { + "key": "n", + "modifiers": ["command", "shift"] +}} + +// Special keys +{"tool": "macax_press_key", "args": {"key": "return"}} +{"tool": "macax_press_key", "args": {"key": "escape"}} +{"tool": "macax_press_key", "args": {"key": "tab"}} +{"tool": "macax_press_key", "args": {"key": "delete"}} +``` + +**Special key names**: +- `return`, `enter` +- `escape`, `esc` +- `tab` +- `delete`, `backspace` +- `space` +- `up`, `down`, `left`, `right` +- `home`, `end`, `pageup`, `pagedown` +- `f1` through `f12` + +## Common Roles + +| Role | Description | +|------|-------------| +| `button` | Clickable button | +| `textField` | Single-line text input | +| `textArea` | Multi-line text input | +| `checkbox` | Checkbox control | +| `radioButton` | Radio button | +| `popUpButton` | Dropdown/popup menu | +| `slider` | Slider control | +| `table` | Table view | +| `list` | List view | +| `outline` | Outline/tree view | +| `group` | Container group | +| `window` | Application window | +| `sheet` | Modal sheet | +| `dialog` | Dialog window | +| `staticText` | Non-editable text | +| `image` | Image element | +| `scrollArea` | Scrollable container | +| `toolbar` | Toolbar | +| `menuBar` | Menu bar | +| `menu` | Menu | +| `menuItem` | Menu item | + +## Best Practices + +### 1. Use Accessibility Identifiers + +When building apps you'll automate with G3, add accessibility identifiers: + +**SwiftUI**: +```swift +Button("Submit") { ... } + .accessibilityIdentifier("submit_button") +``` + +**UIKit**: +```swift +button.accessibilityIdentifier = "submit_button" +``` + +**AppKit**: +```swift +button.setAccessibilityIdentifier("submit_button") +``` + +Identifiers are more reliable than titles (which may be localized). + +### 2. Inspect Before Automating + +Always inspect the UI tree first: + +```json +{"tool": "macax_get_ui_tree", "args": {"app_name": "MyApp", "max_depth": 4}} +``` + +This helps you understand: +- Element hierarchy +- Available identifiers +- Correct role names + +### 3. Activate App First + +Some actions require the app to be frontmost: + +```json +{"tool": "macax_activate_app", "args": {"app_name": "MyApp"}} +{"tool": "macax_click", "args": {"app_name": "MyApp", "identifier": "button1"}} +``` + +### 4. Handle Timing + +UI updates may take time. If an element isn't found: +1. Wait briefly +2. Retry the operation +3. Check if the app state changed + +### 5. Prefer Identifiers Over Titles + +```json +// Good: Uses identifier +{"tool": "macax_click", "args": {"app_name": "MyApp", "identifier": "save_btn"}} + +// Less reliable: Uses title (may be localized) +{"tool": "macax_click", "args": {"app_name": "MyApp", "title": "Save"}} +``` + +## Example: Automating Calculator + +```json +// 1. Activate Calculator +{"tool": "macax_activate_app", "args": {"app_name": "Calculator"}} + +// 2. Inspect UI +{"tool": "macax_get_ui_tree", "args": {"app_name": "Calculator", "max_depth": 3}} + +// 3. Click "5" +{"tool": "macax_click", "args": {"app_name": "Calculator", "title": "5"}} + +// 4. Click "+" +{"tool": "macax_click", "args": {"app_name": "Calculator", "title": "+"}} + +// 5. Click "3" +{"tool": "macax_click", "args": {"app_name": "Calculator", "title": "3"}} + +// 6. Click "=" +{"tool": "macax_click", "args": {"app_name": "Calculator", "title": "="}} + +// 7. Read result +{"tool": "macax_get_value", "args": {"app_name": "Calculator", "role": "staticText"}} +``` + +## Troubleshooting + +### "Accessibility permission denied" + +1. Check System Preferences → Security & Privacy → Accessibility +2. Ensure your terminal app is listed and checked +3. Restart the terminal after granting permission + +### "Application not found" + +1. Use exact app name (case-sensitive) +2. Run `macax_list_apps` to see available apps +3. App must be running + +### "Element not found" + +1. Inspect UI tree to verify element exists +2. Check identifier/title spelling +3. Element may be in a different window or sheet +4. App state may have changed + +### "Cannot perform action" + +1. Element may be disabled +2. App may need to be frontmost +3. Element may not support the action +4. Check element role supports the operation + +### Slow Performance + +1. Reduce `max_depth` in `macax_get_ui_tree` +2. Use specific identifiers instead of searching +3. Complex apps have large UI trees + +## Comparison with Other Tools + +| Feature | macax | Vision Tools | WebDriver | +|---------|-------|--------------|----------| +| Native apps | ✅ | ✅ (via OCR) | ❌ | +| Web browsers | ✅ | ✅ | ✅ | +| Electron apps | ✅ | ✅ | Partial | +| Reliability | High | Medium | High | +| Setup | Permissions | None | Driver | +| Speed | Fast | Slower | Medium | + +**Use macax when**: +- Automating native macOS apps +- You control the app and can add identifiers +- Need reliable, fast automation + +**Use Vision tools when**: +- App doesn't expose accessibility +- Need to find text visually +- Cross-platform approach needed + +**Use WebDriver when**: +- Automating web content +- Need JavaScript execution +- Testing web applications diff --git a/docs/providers.md b/docs/providers.md new file mode 100644 index 0000000..5e0886e --- /dev/null +++ b/docs/providers.md @@ -0,0 +1,408 @@ +# G3 LLM Providers Guide + +**Last updated**: January 2025 +**Source of truth**: `crates/g3-providers/src/` + +## Purpose + +This document describes the LLM providers supported by G3, their capabilities, and how to choose between them. + +## Provider Overview + +| Provider | Type | Tool Calling | Cache Control | Context Window | Best For | +|----------|------|--------------|---------------|----------------|----------| +| **Anthropic** | Cloud | Native | Yes | 200k (1M optional) | General use, complex tasks | +| **Databricks** | Cloud | Native | Yes (Claude models) | Varies | Enterprise, existing Databricks users | +| **OpenAI** | Cloud | Native | No | 128k | GPT model preference | +| **OpenAI-Compatible** | Cloud | Native | No | Varies | OpenRouter, Groq, Together, etc. | +| **Embedded** | Local | JSON fallback | No | 4k-32k | Privacy, offline, cost savings | + +## Anthropic + +**Location**: `crates/g3-providers/src/anthropic.rs` + +### Features + +- **Native tool calling**: Full support for structured tool calls +- **Prompt caching**: Reduce costs with ephemeral caching +- **Extended context**: Optional 1M token context (additional cost) +- **Extended thinking**: Budget tokens for complex reasoning +- **Streaming**: Real-time response streaming + +### Configuration + +```toml +[providers.anthropic.default] +api_key = "sk-ant-api03-..." # Required +model = "claude-sonnet-4-5" # Model name +max_tokens = 64000 # Max output tokens +temperature = 0.3 # 0.0-1.0 +cache_config = "ephemeral" # Optional: Enable caching +enable_1m_context = true # Optional: 1M context +thinking_budget_tokens = 10000 # Optional: Extended thinking +``` + +### Available Models + +| Model | Context | Best For | +|-------|---------|----------| +| `claude-sonnet-4-5` | 200k | Balanced performance/cost | +| `claude-opus-4-5` | 200k | Complex reasoning | +| `claude-3-5-sonnet-20241022` | 200k | Previous generation | +| `claude-3-opus-20240229` | 200k | Previous generation | + +### Prompt Caching + +Enable caching to reduce costs for repeated context: + +```toml +cache_config = "ephemeral" # Cache for session duration +``` + +Caching is applied to: +- System prompts +- README/AGENTS.md content +- Large tool results + +### Extended Thinking + +For complex tasks requiring step-by-step reasoning: + +```toml +thinking_budget_tokens = 10000 # Tokens for internal reasoning +``` + +The model uses these tokens for planning before responding. + +--- + +## Databricks + +**Location**: `crates/g3-providers/src/databricks.rs` + +### Features + +- **Foundation Model APIs**: Access to various models +- **OAuth authentication**: Secure browser-based auth +- **Token authentication**: Personal access tokens +- **Enterprise integration**: Works with existing Databricks setup + +### Configuration + +```toml +[providers.databricks.default] +host = "https://your-workspace.cloud.databricks.com" +model = "databricks-claude-sonnet-4" +max_tokens = 4096 +temperature = 0.1 +use_oauth = true # Recommended +# token = "dapi..." # Alternative: PAT +``` + +### Authentication + +**OAuth (Recommended)**: +1. Set `use_oauth = true` +2. On first run, browser opens for authentication +3. Tokens are cached in `~/.databricks/oauth-tokens.json` +4. Tokens refresh automatically + +**Personal Access Token**: +1. Generate token in Databricks workspace +2. Set `token = "dapi..."` and `use_oauth = false` + +### Available Models + +Models depend on your Databricks workspace configuration: +- `databricks-claude-sonnet-4` (Claude via Databricks) +- `databricks-meta-llama-3-1-70b-instruct` +- `databricks-dbrx-instruct` +- Custom fine-tuned models + +--- + +## OpenAI + +**Location**: `crates/g3-providers/src/openai.rs` + +### Features + +- **Native tool calling**: Full support +- **Custom endpoints**: Override base URL +- **Streaming**: Real-time responses + +### Configuration + +```toml +[providers.openai.default] +api_key = "sk-..." # Required +model = "gpt-4-turbo" # Model name +max_tokens = 4096 +temperature = 0.1 +# base_url = "https://api.openai.com/v1" # Optional +``` + +### Available Models + +| Model | Context | Notes | +|-------|---------|-------| +| `gpt-4-turbo` | 128k | Latest GPT-4 | +| `gpt-4o` | 128k | Optimized GPT-4 | +| `gpt-4` | 8k | Original GPT-4 | +| `gpt-3.5-turbo` | 16k | Faster, cheaper | + +--- + +## OpenAI-Compatible Providers + +**Location**: `crates/g3-providers/src/openai.rs` (reuses OpenAI implementation) + +For services that implement the OpenAI API format. + +### Configuration + +```toml +# OpenRouter +[providers.openai_compatible.openrouter] +api_key = "sk-or-..." +model = "anthropic/claude-3.5-sonnet" +base_url = "https://openrouter.ai/api/v1" +max_tokens = 4096 +temperature = 0.1 + +# Groq +[providers.openai_compatible.groq] +api_key = "gsk_..." +model = "llama-3.3-70b-versatile" +base_url = "https://api.groq.com/openai/v1" +max_tokens = 4096 +temperature = 0.1 + +# Together +[providers.openai_compatible.together] +api_key = "..." +model = "meta-llama/Llama-3-70b-chat-hf" +base_url = "https://api.together.xyz/v1" +max_tokens = 4096 +temperature = 0.1 +``` + +### Supported Services + +- **OpenRouter**: Access to many models through one API +- **Groq**: Fast inference for Llama models +- **Together**: Open-source model hosting +- **Anyscale**: Scalable model serving +- **Local servers**: Ollama, vLLM, text-generation-inference + +--- + +## Embedded (Local Models) + +**Location**: `crates/g3-providers/src/embedded.rs` + +### Features + +- **Completely local**: No data leaves your machine +- **Offline capable**: Works without internet +- **GPU acceleration**: Metal (macOS), CUDA (Linux) +- **No API costs**: Free after model download + +### Configuration + +```toml +[providers.embedded.default] +model_path = "~/.cache/g3/models/qwen2.5-7b-instruct-q3_k_m.gguf" +model_type = "qwen" # Model architecture +context_length = 32768 # Context window +max_tokens = 2048 # Max output +temperature = 0.1 +gpu_layers = 32 # GPU offload (0 = CPU only) +threads = 8 # CPU threads +``` + +### Supported Model Types + +| Type | Models | Notes | +|------|--------|-------| +| `qwen` | Qwen 2.5 series | Good coding ability | +| `codellama` | Code Llama | Specialized for code | +| `llama` | Llama 2/3 | General purpose | +| `mistral` | Mistral/Mixtral | Efficient | + +### Model Download + +Download GGUF models from Hugging Face: + +```bash +mkdir -p ~/.cache/g3/models +cd ~/.cache/g3/models + +# Example: Qwen 2.5 7B +wget https://huggingface.co/Qwen/Qwen2.5-7B-Instruct-GGUF/resolve/main/qwen2.5-7b-instruct-q4_k_m.gguf +``` + +### Hardware Requirements + +| Model Size | RAM Required | GPU VRAM | Notes | +|------------|--------------|----------|-------| +| 7B Q4 | 6GB | 4GB | Good for most tasks | +| 7B Q8 | 10GB | 8GB | Better quality | +| 13B Q4 | 10GB | 8GB | More capable | +| 70B Q4 | 48GB | 40GB | Requires high-end hardware | + +### GPU Acceleration + +**macOS (Metal)**: +```toml +gpu_layers = 32 # Offload layers to GPU +``` + +**Linux (CUDA)**: +Requires CUDA toolkit installed. + +**CPU Only**: +```toml +gpu_layers = 0 +threads = 8 # Use more threads +``` + +### Tool Calling + +Embedded models don't have native tool calling. G3 uses JSON fallback: +1. System prompt includes tool definitions as JSON +2. Model outputs tool calls as JSON in response +3. G3 parses JSON and executes tools + +This works but is less reliable than native tool calling. + +--- + +## Provider Selection Guide + +### By Use Case + +| Use Case | Recommended Provider | +|----------|---------------------| +| General coding tasks | Anthropic (Claude Sonnet) | +| Complex reasoning | Anthropic (Claude Opus) | +| Enterprise/compliance | Databricks | +| Cost-sensitive | Embedded or Groq | +| Privacy-critical | Embedded | +| Offline development | Embedded | +| Fast iteration | Groq (Llama) | +| Model variety | OpenRouter | + +### By Priority + +**Quality first**: Anthropic Claude Opus/Sonnet +- Best reasoning and coding ability +- Native tool calling +- Prompt caching for efficiency + +**Cost first**: Embedded or OpenAI-compatible +- Embedded: Free after download +- Groq: Very cheap, fast +- OpenRouter: Pay-per-use, many options + +**Privacy first**: Embedded +- Data never leaves your machine +- No API calls +- Full control + +**Speed first**: Groq or Embedded with GPU +- Groq: Extremely fast inference +- Embedded with Metal/CUDA: Low latency + +--- + +## Provider Trait + +All providers implement the `LLMProvider` trait: + +```rust +#[async_trait] +pub trait LLMProvider: Send + Sync { + /// Generate a completion + async fn complete(&self, request: CompletionRequest) -> Result; + + /// Stream a completion + async fn stream(&self, request: CompletionRequest) -> Result; + + /// Provider name (e.g., "anthropic.default") + fn name(&self) -> &str; + + /// Model name (e.g., "claude-sonnet-4-5") + fn model(&self) -> &str; + + /// Whether provider supports native tool calling + fn has_native_tool_calling(&self) -> bool; + + /// Whether provider supports cache control + fn supports_cache_control(&self) -> bool; + + /// Configured max tokens + fn max_tokens(&self) -> u32; + + /// Configured temperature + fn temperature(&self) -> f32; +} +``` + +--- + +## Adding a New Provider + +1. Create `crates/g3-providers/src/newprovider.rs` +2. Implement `LLMProvider` trait +3. Add configuration struct to `crates/g3-config/src/lib.rs` +4. Register in `crates/g3-core/src/lib.rs` (`new_with_mode_and_readme`) +5. Export from `crates/g3-providers/src/lib.rs` +6. Update documentation + +--- + +## Troubleshooting + +### Authentication Errors + +**Anthropic**: Verify API key starts with `sk-ant-` + +**Databricks OAuth**: +- Delete `~/.databricks/oauth-tokens.json` and re-authenticate +- Ensure workspace URL is correct + +**OpenAI**: Verify API key and check billing status + +### Rate Limits + +G3 automatically retries on rate limits with exponential backoff. + +To reduce rate limit issues: +- Use prompt caching (Anthropic) +- Reduce `max_tokens` +- Use a provider with higher limits + +### Context Window Errors + +If you see "context too long" errors: +1. Use `/compact` to summarize conversation +2. Use `/thinnify` to replace large tool results +3. Increase `max_context_length` in config +4. Switch to a provider with larger context + +### Embedded Model Issues + +**Model not loading**: +- Verify `model_path` is correct +- Check file permissions +- Ensure enough RAM + +**Slow inference**: +- Increase `gpu_layers` for GPU offload +- Reduce `context_length` +- Use a smaller quantization (Q4 vs Q8) + +**Poor tool calling**: +- Embedded models use JSON fallback +- Consider cloud provider for complex tool use diff --git a/docs/tools.md b/docs/tools.md new file mode 100644 index 0000000..6700ca0 --- /dev/null +++ b/docs/tools.md @@ -0,0 +1,538 @@ +# G3 Tools Reference + +**Last updated**: January 2025 +**Source of truth**: `crates/g3-core/src/tool_definitions.rs`, `crates/g3-core/src/tools/` + +## Purpose + +This document describes all tools available to the G3 agent. Tools are the primary mechanism by which G3 interacts with the filesystem, executes commands, and automates tasks. + +## Tool Categories + +| Category | Tools | Enabled By | +|----------|-------|------------| +| **Core** | shell, read_file, write_file, str_replace, final_output, background_process | Always | +| **Images** | read_image, take_screenshot, extract_text | Always | +| **Task Management** | todo_read, todo_write | Always | +| **Code Intelligence** | code_search, code_coverage | Always | +| **WebDriver** | webdriver_* (12 tools) | `--webdriver` or `--chrome-headless` | +| **Vision** | vision_find_text, vision_click_text, vision_click_near_text | Always (macOS) | +| **macOS Accessibility** | macax_* (9 tools) | `--macax` | +| **Computer Control** | mouse_click, type_text, find_element, list_windows | `computer_control.enabled = true` | + +--- + +## Core Tools + +### shell + +Execute shell commands. + +**Parameters**: +- `command` (string, required): The shell command to execute + +**Example**: +```json +{"tool": "shell", "args": {"command": "ls -la"}} +``` + +**Notes**: +- Commands run in the current working directory +- Output is streamed in real-time +- Both stdout and stderr are captured +- Exit code is reported + +--- + +### background_process + +Launch a long-running process in the background. + +**Parameters**: +- `name` (string, required): Unique name for the process (e.g., "game_server") +- `command` (string, required): Shell command to execute +- `working_dir` (string, optional): Working directory + +**Example**: +```json +{"tool": "background_process", "args": {"name": "dev_server", "command": "npm run dev"}} +``` + +**Returns**: PID and log file path + +**Notes**: +- Process runs independently of the agent +- Logs are captured to a file +- Use `shell` to read logs (`tail`), check status (`ps`), or stop (`kill`) + +--- + +### read_file + +Read file contents with optional character range. + +**Parameters**: +- `file_path` (string, required): Path to the file +- `start` (integer, optional): Starting character position (0-indexed, inclusive) +- `end` (integer, optional): Ending character position (0-indexed, exclusive) + +**Example**: +```json +{"tool": "read_file", "args": {"file_path": "src/main.rs", "start": 0, "end": 1000}} +``` + +**Notes**: +- For image files (png, jpg, gif, etc.), automatically extracts text using OCR +- Supports tilde expansion (`~`) +- Reports file size and line count + +--- + +### read_image + +Read image files for visual analysis by the LLM. + +**Parameters**: +- `file_paths` (array of strings, required): Paths to image files + +**Example**: +```json +{"tool": "read_image", "args": {"file_paths": ["screenshot.png", "diagram.jpg"]}} +``` + +**Supported formats**: PNG, JPEG, GIF, WebP + +**Notes**: +- Images are sent to the LLM for visual analysis +- Use for inspecting sprites, UI screenshots, diagrams, etc. +- Different from `extract_text` which only does OCR + +--- + +### write_file + +Create or overwrite a file. + +**Parameters**: +- `file_path` (string, required): Path to the file +- `content` (string, required): Content to write + +**Example**: +```json +{"tool": "write_file", "args": {"file_path": "hello.txt", "content": "Hello, world!"}} +``` + +**Notes**: +- Creates parent directories if needed +- Overwrites existing files +- Reports bytes written + +--- + +### str_replace + +Apply a unified diff to a file. + +**Parameters**: +- `file_path` (string, required): Path to the file +- `diff` (string, required): Unified diff with context lines +- `start` (integer, optional): Starting character position to constrain search +- `end` (integer, optional): Ending character position to constrain search + +**Example**: +```json +{"tool": "str_replace", "args": { + "file_path": "src/main.rs", + "diff": "@@ -10,3 +10,4 @@\n fn main() {\n println!(\"Hello\");\n+ println!(\"World\");\n }" +}} +``` + +**Notes**: +- Supports multiple hunks +- Context lines help locate the correct position +- Use `start`/`end` to disambiguate when multiple matches exist +- `---/+++` headers are optional for minimal diffs + +--- + +### final_output + +Signal task completion with a summary. + +**Parameters**: +- `summary` (string, required): Markdown summary of what was accomplished + +**Example**: +```json +{"tool": "final_output", "args": {"summary": "## Completed\n\n- Created user authentication module\n- Added unit tests\n- Updated documentation"}} +``` + +**Notes**: +- Ends the current task +- Summary is displayed to the user +- In autonomous mode, triggers coach review + +--- + +## Image & Screenshot Tools + +### take_screenshot + +Capture a screenshot of an application window. + +**Parameters**: +- `path` (string, required): Filename for the screenshot +- `window_id` (string, required): Application name (e.g., "Safari", "Terminal") +- `region` (object, optional): `{x, y, width, height}` to capture a region + +**Example**: +```json +{"tool": "take_screenshot", "args": {"path": "safari.png", "window_id": "Safari"}} +``` + +**Notes**: +- Use `list_windows` first to identify available windows +- Relative paths save to `~/tmp` or `$TMPDIR` +- Uses native screencapture on macOS + +--- + +### extract_text + +Extract text from an image using OCR. + +**Parameters**: +- `path` (string, optional): Path to image file + +**Example**: +```json +{"tool": "extract_text", "args": {"path": "screenshot.png"}} +``` + +**Notes**: +- Uses Tesseract OCR or Apple Vision framework +- For window-based OCR, use `vision_find_text` instead + +--- + +## Task Management Tools + +### todo_read + +Read the current TODO list. + +**Parameters**: None + +**Example**: +```json +{"tool": "todo_read", "args": {}} +``` + +**Notes**: +- TODO lists are session-scoped +- Stored in `.g3/sessions//todo.g3.md` +- Call at start of multi-step tasks to check for existing plans + +--- + +### todo_write + +Create or update the TODO list. + +**Parameters**: +- `content` (string, required): TODO list content in markdown checkbox format + +**Example**: +```json +{"tool": "todo_write", "args": {"content": "- [ ] Implement feature\n - [ ] Write tests\n - [ ] Update docs\n- [x] Setup project"}} +``` + +**Notes**: +- Replaces entire file content +- Always call `todo_read` first to preserve existing content +- Use `- [ ]` for incomplete, `- [x]` for complete +- Supports nested tasks with indentation + +--- + +## Code Intelligence Tools + +### code_search + +Syntax-aware code search using tree-sitter. + +**Parameters**: +- `searches` (array, required): Array of search objects: + - `name` (string): Label for this search + - `query` (string): Tree-sitter query in S-expression format + - `language` (string): Programming language + - `paths` (array, optional): Paths to search + - `context_lines` (integer, optional): Lines of context (0-20) +- `max_concurrency` (integer, optional): Parallel searches (default: 4) +- `max_matches_per_search` (integer, optional): Max matches (default: 500) + +**Supported languages**: rust, python, javascript, typescript, go, java, c, cpp, kotlin + +**Example**: +```json +{"tool": "code_search", "args": { + "searches": [{ + "name": "functions", + "query": "(function_item name: (identifier) @name)", + "language": "rust", + "context_lines": 2 + }] +}} +``` + +See [Code Search Guide](CODE_SEARCH.md) for detailed query patterns. + +--- + +### code_coverage + +Generate code coverage report using cargo llvm-cov. + +**Parameters**: None + +**Example**: +```json +{"tool": "code_coverage", "args": {}} +``` + +**Notes**: +- Runs all tests with coverage instrumentation +- Auto-installs llvm-tools-preview and cargo-llvm-cov if missing +- Returns coverage statistics summary + +--- + +## WebDriver Tools + +Enabled with `--webdriver` (Safari) or `--chrome-headless` (Chrome). + +### webdriver_start + +Start a browser session. + +**Example**: +```json +{"tool": "webdriver_start", "args": {}} +``` + +### webdriver_navigate + +Navigate to a URL. + +**Parameters**: +- `url` (string, required): URL with protocol (e.g., `https://`) + +### webdriver_get_url / webdriver_get_title + +Get current URL or page title. + +### webdriver_find_element / webdriver_find_elements + +Find element(s) by CSS selector. + +**Parameters**: +- `selector` (string, required): CSS selector + +### webdriver_click + +Click an element. + +**Parameters**: +- `selector` (string, required): CSS selector + +### webdriver_send_keys + +Type text into an input. + +**Parameters**: +- `selector` (string, required): CSS selector +- `text` (string, required): Text to type +- `clear_first` (boolean, optional): Clear before typing (default: true) + +### webdriver_execute_script + +Execute JavaScript. + +**Parameters**: +- `script` (string, required): JavaScript code (use `return` to return values) + +### webdriver_get_page_source + +Get rendered HTML. + +**Parameters**: +- `max_length` (integer, optional): Max chars to return (default: 10000, 0 for no limit) +- `save_to_file` (string, optional): Save to file instead of returning inline + +### webdriver_screenshot + +Take browser screenshot. + +**Parameters**: +- `path` (string, required): Save path + +### webdriver_back / webdriver_forward / webdriver_refresh + +Navigation controls. + +### webdriver_quit + +Close browser and end session. + +--- + +## Vision Tools (macOS) + +Use Apple Vision framework for text recognition. + +### vision_find_text + +Find text in an application window. + +**Parameters**: +- `app_name` (string, required): Application name +- `text` (string, required): Text to search for + +**Returns**: Bounding box coordinates and confidence score + +### vision_click_text + +Find and click on text. + +**Parameters**: +- `app_name` (string, required): Application name +- `text` (string, required): Text to click + +### vision_click_near_text + +Click near a text label (useful for form fields). + +**Parameters**: +- `app_name` (string, required): Application name +- `text` (string, required): Label text to find +- `direction` (string, optional): "right", "below", "left", "above" (default: "right") +- `distance` (integer, optional): Pixels from text (default: 50) + +--- + +## macOS Accessibility Tools + +Enabled with `--macax`. See [macOS Accessibility Tools Guide](macax-tools.md). + +### macax_list_apps + +List running applications. + +### macax_get_frontmost_app + +Get the frontmost application. + +### macax_activate_app + +Bring an application to front. + +**Parameters**: +- `app_name` (string, required): Application name + +### macax_get_ui_tree + +Get UI element hierarchy. + +**Parameters**: +- `app_name` (string, required): Application name +- `max_depth` (integer, optional): Tree depth limit + +### macax_find_elements + +Find UI elements by criteria. + +**Parameters**: +- `app_name` (string, required): Application name +- `role` (string, optional): Element role (button, textField, etc.) +- `title` (string, optional): Element title +- `identifier` (string, optional): Accessibility identifier + +### macax_click + +Click a UI element. + +**Parameters**: +- `app_name` (string, required): Application name +- `identifier` or `title` or `role`: Element selector + +### macax_set_value / macax_get_value + +Set or get element value. + +### macax_press_key + +Simulate key press. + +**Parameters**: +- `key` (string, required): Key to press +- `modifiers` (array, optional): ["command", "shift", "option", "control"] + +--- + +## Computer Control Tools + +Enabled with `computer_control.enabled = true` in config. + +### mouse_click + +Click at coordinates. + +**Parameters**: +- `x` (integer, required): X coordinate +- `y` (integer, required): Y coordinate +- `button` (string, optional): "left", "right", "middle" + +### type_text + +Type text at cursor. + +**Parameters**: +- `text` (string, required): Text to type + +### find_element + +Find UI element by text, role, or attributes. + +### list_windows + +List all open windows with IDs and titles. + +--- + +## Tool Execution Notes + +### Duplicate Detection + +G3 prevents accidental duplicate tool calls: +- Only immediately sequential identical calls are blocked +- Text between tool calls resets detection +- Tools can be reused throughout a session + +### Error Handling + +Tool errors are reported back to the agent, which can: +- Retry with different parameters +- Try an alternative approach +- Report the issue to the user + +### Working Directory + +Tools execute in: +1. Directory specified by `--codebase-fast-start` if provided +2. Current working directory otherwise + +### File Paths + +- Tilde expansion (`~`) is supported +- Relative paths are relative to working directory +- Screenshots default to `~/tmp` or `$TMPDIR`