lamport run

This commit is contained in:
Dhanji R. Prasanna
2026-01-03 16:48:30 +11:00
parent f4a1bf5e93
commit f7e2f38fe9
10 changed files with 3444 additions and 0 deletions

430
docs/CODE_SEARCH.md Normal file
View File

@@ -0,0 +1,430 @@
# G3 Code Search Guide
**Last updated**: January 2025
**Source of truth**: `crates/g3-core/src/code_search/`, `crates/g3-core/src/tool_definitions.rs`
## Purpose
G3 includes a syntax-aware code search tool powered by tree-sitter. Unlike text-based search (grep), it understands code structure and finds actual functions, classes, methods, and other constructs—ignoring matches in comments and strings.
## Why Use Code Search?
| Feature | grep/ripgrep | code_search |
|---------|--------------|-------------|
| Finds text in comments | ✅ | ❌ |
| Finds text in strings | ✅ | ❌ |
| Understands code structure | ❌ | ✅ |
| Finds function definitions | Regex needed | Native |
| Finds class hierarchies | ❌ | ✅ |
| Language-aware | ❌ | ✅ |
**Use code_search when**:
- Finding function/method definitions
- Finding class/struct declarations
- Searching for specific code constructs
- Need accurate results without false positives
**Use grep when**:
- Searching non-code files (logs, markdown)
- Simple string searches
- Searching comments or documentation
- Regex for text patterns
## Supported Languages
- Rust
- Python
- JavaScript
- TypeScript
- Go
- Java
- C
- C++
- Kotlin
## Basic Usage
```json
{"tool": "code_search", "args": {
"searches": [{
"name": "my_search",
"query": "(function_item name: (identifier) @name)",
"language": "rust"
}]
}}
```
### Parameters
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `searches` | array | Yes | Array of search objects (max 20) |
| `max_concurrency` | integer | No | Parallel searches (default: 4) |
| `max_matches_per_search` | integer | No | Max matches (default: 500) |
### Search Object
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `name` | string | Yes | Label for this search |
| `query` | string | Yes | Tree-sitter query (S-expression) |
| `language` | string | Yes | Programming language |
| `paths` | array | No | Paths to search (default: current dir) |
| `context_lines` | integer | No | Lines of context (0-20, default: 0) |
## Query Syntax
Tree-sitter queries use S-expression syntax. The basic pattern is:
```
(node_type field: (child_type) @capture_name)
```
- `node_type`: The AST node to match
- `field`: Optional field name
- `child_type`: Type of child node
- `@capture_name`: Name for the captured node
## Common Query Patterns
### Rust
```lisp
;; All functions
(function_item name: (identifier) @name)
;; Async functions
(function_item (function_modifiers) name: (identifier) @name)
;; Structs
(struct_item name: (type_identifier) @name)
;; Enums
(enum_item name: (type_identifier) @name)
;; Impl blocks
(impl_item type: (type_identifier) @name)
;; Trait definitions
(trait_item name: (type_identifier) @name)
;; Macros
(macro_definition name: (identifier) @name)
;; Constants
(const_item name: (identifier) @name)
;; Static variables
(static_item name: (identifier) @name)
;; Type aliases
(type_item name: (type_identifier) @name)
;; Modules
(mod_item name: (identifier) @name)
```
### Python
```lisp
;; Functions
(function_definition name: (identifier) @name)
;; Async functions
(function_definition name: (identifier) @name) @fn
;; Classes
(class_definition name: (identifier) @name)
;; Methods (functions inside classes)
(class_definition
body: (block
(function_definition name: (identifier) @name)))
;; Decorators
(decorator) @decorator
;; Imports
(import_statement) @import
(import_from_statement) @import
```
### JavaScript / TypeScript
```lisp
;; Function declarations
(function_declaration name: (identifier) @name)
;; Arrow functions assigned to variables
(variable_declarator
name: (identifier) @name
value: (arrow_function))
;; Classes
(class_declaration name: (identifier) @name)
;; Methods
(method_definition name: (property_identifier) @name)
;; Exports
(export_statement) @export
;; Imports
(import_statement) @import
```
### Go
```lisp
;; Functions
(function_declaration name: (identifier) @name)
;; Methods
(method_declaration name: (field_identifier) @name)
;; Structs
(type_declaration
(type_spec name: (type_identifier) @name
type: (struct_type)))
;; Interfaces
(type_declaration
(type_spec name: (type_identifier) @name
type: (interface_type)))
```
### Java
```lisp
;; Classes
(class_declaration name: (identifier) @name)
;; Interfaces
(interface_declaration name: (identifier) @name)
;; Methods
(method_declaration name: (identifier) @name)
;; Constructors
(constructor_declaration name: (identifier) @name)
;; Fields
(field_declaration
declarator: (variable_declarator name: (identifier) @name))
```
### C / C++
```lisp
;; Functions
(function_definition
declarator: (function_declarator
declarator: (identifier) @name))
;; Structs (C)
(struct_specifier name: (type_identifier) @name)
;; Classes (C++)
(class_specifier name: (type_identifier) @name)
;; Namespaces (C++)
(namespace_definition name: (identifier) @name)
```
## Advanced Queries
### Wildcards
Use `_` to match any node:
```lisp
;; Any function with any name
(function_item name: (_) @name)
```
### Alternatives
Match multiple patterns:
```lisp
;; Functions or methods
[(function_item) (impl_item)] @item
```
### Predicates
Filter matches:
```lisp
;; Functions starting with "test_"
(function_item name: (identifier) @name
(#match? @name "^test_"))
;; Functions NOT starting with "_"
(function_item name: (identifier) @name
(#not-match? @name "^_"))
```
### Nested Matches
```lisp
;; Methods inside impl blocks
(impl_item
body: (declaration_list
(function_item name: (identifier) @method_name)))
```
## Batch Searches
Run multiple searches in parallel:
```json
{"tool": "code_search", "args": {
"searches": [
{
"name": "functions",
"query": "(function_item name: (identifier) @name)",
"language": "rust"
},
{
"name": "structs",
"query": "(struct_item name: (type_identifier) @name)",
"language": "rust"
},
{
"name": "tests",
"query": "(function_item name: (identifier) @name (#match? @name \"^test_\"))",
"language": "rust",
"paths": ["tests/"]
}
],
"max_concurrency": 4
}}
```
## Context Lines
Include surrounding code:
```json
{"tool": "code_search", "args": {
"searches": [{
"name": "functions",
"query": "(function_item name: (identifier) @name)",
"language": "rust",
"context_lines": 3
}]
}}
```
This shows 3 lines before and after each match.
## Path Filtering
Search specific directories:
```json
{"tool": "code_search", "args": {
"searches": [{
"name": "core_functions",
"query": "(function_item name: (identifier) @name)",
"language": "rust",
"paths": ["src/core", "src/lib.rs"]
}]
}}
```
## Output Format
Results include:
- File path
- Line number
- Matched code
- Context (if requested)
```
=== functions (15 matches) ===
src/lib.rs:42
fn process_request(req: Request) -> Response {
src/lib.rs:78
fn handle_error(err: Error) -> Result<()> {
src/utils.rs:15
fn format_output(data: &str) -> String {
```
## Tips
### Finding the Right Query
1. **Start simple**: Begin with basic node types
2. **Use AST explorer**: Understand your language's AST
3. **Iterate**: Refine queries based on results
### Performance
- **Limit paths**: Search specific directories when possible
- **Use concurrency**: Batch related searches
- **Set max_matches**: Prevent overwhelming output
### Debugging Queries
If a query returns no results:
1. Check language spelling (lowercase)
2. Verify node type names for your language
3. Start with simpler query, add constraints
4. Check if files exist in search paths
## Examples by Task
### Find all public functions in Rust
```json
{"tool": "code_search", "args": {
"searches": [{
"name": "public_fns",
"query": "(function_item (visibility_modifier) name: (identifier) @name)",
"language": "rust"
}]
}}
```
### Find all test functions
```json
{"tool": "code_search", "args": {
"searches": [{
"name": "tests",
"query": "(function_item name: (identifier) @name (#match? @name \"^test_\"))",
"language": "rust",
"paths": ["tests/"]
}]
}}
```
### Find all API endpoints (Python Flask)
```json
{"tool": "code_search", "args": {
"searches": [{
"name": "routes",
"query": "(decorated_definition (decorator) @dec (function_definition name: (identifier) @name))",
"language": "python"
}]
}}
```
### Find all React components
```json
{"tool": "code_search", "args": {
"searches": [{
"name": "components",
"query": "(function_declaration name: (identifier) @name (#match? @name \"^[A-Z]\"))",
"language": "javascript",
"paths": ["src/components/"]
}]
}}
```

224
docs/CONTROL_COMMANDS.md Normal file
View File

@@ -0,0 +1,224 @@
# G3 Control Commands
**Last updated**: January 2025
**Source of truth**: `crates/g3-cli/src/lib.rs`
## Purpose
Control commands are special commands you can use during an interactive G3 session to manage context, refresh documentation, and view statistics. They start with `/` and are processed by the CLI, not sent to the LLM.
## Available Commands
| Command | Description |
|---------|-------------|
| `/compact` | Manually trigger conversation summarization |
| `/thinnify` | Replace large tool results with file references (first third) |
| `/skinnify` | Full context thinning (entire context window) |
| `/readme` | Reload README.md and AGENTS.md from disk |
| `/stats` | Show detailed context and performance statistics |
| `/help` | Display all available control commands |
---
## /compact
Manually trigger conversation summarization to reduce context size.
**When to use**:
- Context usage is getting high (70%+)
- You want to start a new phase of work
- Conversation has accumulated irrelevant history
**What it does**:
1. Sends conversation history to LLM for summarization
2. Replaces detailed history with concise summary
3. Preserves key decisions and context
4. Significantly reduces token usage
**Example**:
```
g3> /compact
📝 Compacting conversation history...
✅ Reduced context from 45,000 to 8,000 tokens (82% reduction)
```
**Notes**:
- Summarization uses tokens, so there's a small cost
- Some detail is lost; use before major context shifts
- Auto-triggered at 80% context usage if `auto_compact = true`
---
## /thinnify
Replace large tool results with file references to save context space.
**When to use**:
- Large file contents are consuming context
- Tool outputs are taking up space
- You want to preserve conversation structure but reduce size
**What it does**:
1. Scans the first third of context for large tool results
2. Saves content to `.g3/sessions/<session>/thinned/`
3. Replaces inline content with file reference
4. Preserves the ability to re-read if needed
**Example**:
```
g3> /thinnify
🔧 Thinning context window...
✅ Thinned 3 large tool results, saved 12,000 characters
```
**Notes**:
- Only processes the first third of context (older content)
- Recent tool results are preserved inline
- Auto-triggered at 50%, 60%, 70%, 80% thresholds
---
## /skinnify
Full context thinning - processes the entire context window.
**When to use**:
- Context is critically full
- `/thinnify` wasn't enough
- You need maximum space recovery
**What it does**:
- Same as `/thinnify` but processes entire context
- More aggressive space recovery
- May thin recent tool results too
**Example**:
```
g3> /skinnify
🔧 Full context thinning...
✅ Thinned 8 tool results, saved 35,000 characters
```
**Notes**:
- Use sparingly; may thin content you still need inline
- Consider `/compact` first for better context preservation
---
## /readme
Reload README.md and AGENTS.md from disk without restarting.
**When to use**:
- You've updated project documentation
- AGENTS.md has new instructions
- README.md has changed
**What it does**:
1. Re-reads README.md from workspace root
2. Re-reads AGENTS.md from workspace root
3. Updates the agent's system context
4. New instructions take effect immediately
**Example**:
```
g3> /readme
📖 Reloading documentation...
✅ Loaded README.md (5,234 chars)
✅ Loaded AGENTS.md (2,100 chars)
```
**Notes**:
- Useful during iterative documentation updates
- Changes apply to subsequent messages
- Previous context retains old documentation
---
## /stats
Show detailed context and performance statistics.
**What it shows**:
- Current context usage (tokens and percentage)
- Session duration
- Token usage breakdown
- Tool call metrics
- Thinning and summarization events
- First-token latency statistics
**Example**:
```
g3> /stats
📊 Session Statistics
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Context Usage: 45,230 / 200,000 tokens (22.6%)
Session Duration: 1h 23m 45s
Total Tokens Used: 125,430
Tool Calls: 47 (45 successful, 2 failed)
Thinning Events: 3 (saved 28,000 chars)
Summarizations: 1 (saved 35,000 chars)
Avg First Token: 1.2s
```
---
## /help
Display all available control commands with brief descriptions.
**Example**:
```
g3> /help
📚 Available Commands:
/compact - Summarize conversation to reduce context
/thinnify - Replace large tool results with file refs
/skinnify - Full context thinning (entire window)
/readme - Reload README.md and AGENTS.md
/stats - Show context and performance statistics
/help - Show this help message
```
---
## Context Management Strategy
G3 automatically manages context, but manual intervention can help:
### Proactive Management
1. **Check stats regularly**: Use `/stats` to monitor usage
2. **Thin early**: Use `/thinnify` before hitting thresholds
3. **Compact at transitions**: Use `/compact` when switching tasks
### Reactive Management
When context gets high:
1. **50-70%**: Consider `/thinnify`
2. **70-80%**: Use `/compact`
3. **80-90%**: Use `/skinnify` then `/compact`
4. **90%+**: Auto-summarization triggers
### Best Practices
- **Long sessions**: Compact periodically to maintain quality
- **Large files**: Thin after reading large codebases
- **Documentation updates**: Use `/readme` instead of restarting
- **Before complex tasks**: Ensure adequate context space
---
## Automatic Context Management
G3 performs automatic context management:
| Threshold | Action |
|-----------|--------|
| 50% | Thin oldest third of context |
| 60% | Thin oldest third of context |
| 70% | Thin oldest third of context |
| 80% | Auto-summarization (if `auto_compact = true`) |
| 90% | Aggressive thinning before tool calls |
Manual commands give you finer control over when and how this happens.

397
docs/FLOCK_MODE.md Normal file
View File

@@ -0,0 +1,397 @@
# G3 Flock Mode Guide
**Last updated**: January 2025
**Source of truth**: `crates/g3-ensembles/src/flock.rs`
## Purpose
Flock mode enables parallel multi-agent development by spawning multiple G3 agent instances that work on different parts of a project simultaneously. This is useful for large projects with modular architectures where independent components can be developed in parallel.
## Overview
In Flock mode:
- Multiple agent instances run concurrently
- Each agent works on a specific module or component
- Agents operate independently but share the same codebase
- Progress is tracked and coordinated centrally
```
┌─────────────────────────────────────────────────────────┐
│ Flock Coordinator │
│ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Agent 1 │ │ Agent 2 │ │ Agent 3 │ │ Agent N │ │
│ │ Module A│ │ Module B│ │ Module C│ │ Module N│ │
│ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │
│ │ │ │ │ │
│ ▼ ▼ ▼ ▼ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ Shared Codebase │ │
│ └─────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
```
## When to Use Flock Mode
**Good candidates**:
- Microservices architectures
- Projects with independent modules
- Large refactoring across multiple files
- Parallel feature development
- Test suite expansion
**Not recommended for**:
- Tightly coupled code
- Sequential dependencies
- Small projects
- Single-file changes
## Configuration
Flock mode is configured through a YAML manifest file:
```yaml
# flock.yaml
name: "my-project-flock"
description: "Parallel development of project modules"
# Global settings
settings:
max_agents: 4
timeout_minutes: 60
provider: "anthropic.default"
# Agent definitions
agents:
- name: "api-agent"
description: "Develops the REST API layer"
working_dir: "src/api"
requirements: |
Implement REST endpoints for user management:
- GET /users
- POST /users
- GET /users/{id}
- PUT /users/{id}
- DELETE /users/{id}
- name: "db-agent"
description: "Develops the database layer"
working_dir: "src/db"
requirements: |
Implement database models and queries:
- User model with CRUD operations
- Connection pooling
- Migration support
- name: "test-agent"
description: "Writes integration tests"
working_dir: "tests"
requirements: |
Write integration tests for:
- API endpoints
- Database operations
- Error handling
```
## Usage
### Starting a Flock
```bash
# Start flock with manifest
g3 --flock flock.yaml
# Start with specific agents only
g3 --flock flock.yaml --agents api-agent,db-agent
# Start with custom timeout
g3 --flock flock.yaml --timeout 120
```
### Monitoring Progress
Flock mode provides real-time status updates:
```
🐦 Flock Status: my-project-flock
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
api-agent [████████░░] 80% Implementing DELETE endpoint
db-agent [██████████] 100% ✅ Complete
test-agent [██████░░░░] 60% Writing error handling tests
Elapsed: 15m 32s | Tokens: 45,230 | Errors: 0
```
### Stopping a Flock
```bash
# Graceful stop (wait for current tasks)
Ctrl+C
# Force stop all agents
Ctrl+C Ctrl+C
```
## Agent Communication
Agents in a flock operate independently but can:
1. **Read shared files**: All agents can read the entire codebase
2. **Write to their area**: Each agent writes to its designated working directory
3. **Signal completion**: Agents report when their tasks are done
4. **Report errors**: Failures are logged and can trigger coordinator action
### Conflict Prevention
To prevent conflicts:
- Assign non-overlapping working directories
- Use clear module boundaries
- Define explicit interfaces between modules
- Run integration after all agents complete
## Status Tracking
Flock status is tracked in `.g3/flock/`:
```
.g3/flock/
├── status.json # Overall flock status
├── api-agent/
│ ├── session.json # Agent session log
│ └── todo.g3.md # Agent's TODO list
├── db-agent/
│ ├── session.json
│ └── todo.g3.md
└── test-agent/
├── session.json
└── todo.g3.md
```
### Status File Format
```json
{
"flock_name": "my-project-flock",
"started_at": "2025-01-03T10:00:00Z",
"status": "running",
"agents": [
{
"name": "api-agent",
"status": "running",
"progress": 80,
"current_task": "Implementing DELETE endpoint",
"tokens_used": 15000,
"errors": 0
}
]
}
```
## Best Practices
### 1. Define Clear Boundaries
```yaml
# Good: Clear module separation
agents:
- name: "frontend"
working_dir: "src/frontend"
- name: "backend"
working_dir: "src/backend"
# Bad: Overlapping directories
agents:
- name: "agent1"
working_dir: "src"
- name: "agent2"
working_dir: "src/utils" # Overlaps with agent1!
```
### 2. Specify Interfaces First
Define shared interfaces before parallel development:
```yaml
agents:
- name: "interface-agent"
priority: 1 # Runs first
requirements: |
Define shared interfaces in src/interfaces/:
- UserService trait
- DatabaseConnection trait
- Error types
- name: "impl-agent"
priority: 2 # Runs after interfaces
depends_on: ["interface-agent"]
requirements: |
Implement UserService trait...
```
### 3. Use Appropriate Granularity
- **Too few agents**: Doesn't leverage parallelism
- **Too many agents**: Coordination overhead, potential conflicts
- **Sweet spot**: 2-6 agents for most projects
### 4. Include a Test Agent
Always include an agent for testing:
```yaml
agents:
- name: "test-agent"
working_dir: "tests"
requirements: |
Write tests for all new functionality.
Run tests after other agents complete.
```
### 5. Plan for Integration
After flock completion:
```bash
# Run all tests
cargo test
# Check for conflicts
git status
# Review changes
git diff
```
## Error Handling
### Agent Failures
If an agent fails:
1. Error is logged to agent's session
2. Coordinator is notified
3. Other agents continue (by default)
4. Failed agent can be restarted
### Restart Failed Agent
```bash
# Restart specific agent
g3 --flock flock.yaml --restart api-agent
# Restart all failed agents
g3 --flock flock.yaml --restart-failed
```
### Conflict Resolution
If agents modify the same file:
1. Last write wins (by default)
2. Conflicts are logged
3. Manual resolution may be needed
## Resource Management
### Token Usage
Each agent has its own token budget:
```yaml
settings:
max_tokens_per_agent: 100000
total_token_budget: 500000
```
### Concurrency
Limit concurrent agents based on:
- API rate limits
- System resources
- Provider capacity
```yaml
settings:
max_concurrent_agents: 3 # Run at most 3 at once
```
## Example: Microservices Project
```yaml
name: "microservices-flock"
settings:
max_agents: 5
provider: "anthropic.default"
agents:
- name: "user-service"
working_dir: "services/user"
requirements: |
Implement user service:
- User registration
- Authentication
- Profile management
- name: "order-service"
working_dir: "services/order"
requirements: |
Implement order service:
- Order creation
- Order status tracking
- Payment integration
- name: "inventory-service"
working_dir: "services/inventory"
requirements: |
Implement inventory service:
- Stock management
- Availability checking
- Reorder alerts
- name: "gateway"
working_dir: "services/gateway"
requirements: |
Implement API gateway:
- Request routing
- Authentication middleware
- Rate limiting
- name: "integration-tests"
working_dir: "tests/integration"
depends_on: ["user-service", "order-service", "inventory-service", "gateway"]
requirements: |
Write integration tests for:
- End-to-end order flow
- Service communication
- Error scenarios
```
## Limitations
- **No real-time coordination**: Agents don't communicate during execution
- **File conflicts**: Possible if boundaries aren't clear
- **Resource intensive**: Multiple LLM calls in parallel
- **Debugging complexity**: Multiple logs to review
## Troubleshooting
### Agents Not Starting
1. Check manifest syntax (YAML)
2. Verify working directories exist
3. Check provider configuration
4. Review logs in `.g3/flock/`
### Slow Progress
1. Reduce number of concurrent agents
2. Check for rate limiting
3. Simplify requirements
4. Use faster provider
### Inconsistent Results
1. Define clearer interfaces
2. Add more specific requirements
3. Use lower temperature
4. Add validation steps

363
docs/architecture.md Normal file
View File

@@ -0,0 +1,363 @@
# G3 Architecture
**Last updated**: January 2025
**Source of truth**: Crate structure in `crates/`, `Cargo.toml`, `DESIGN.md`
## Purpose
This document describes the internal architecture of G3, a modular AI coding agent built in Rust. It is intended for developers who want to understand, extend, or maintain the codebase.
## High-Level Overview
G3 follows a **tool-first philosophy**: instead of just providing advice, it actively uses tools to read files, write code, execute commands, and complete tasks autonomously.
```
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ g3-cli │ │ g3-core │ │ g3-providers │
│ │ │ │ │ │
│ • CLI parsing │◄──►│ • Agent engine │◄──►│ • Anthropic │
│ • Interactive │ │ • Context mgmt │ │ • Databricks │
│ • Retro TUI │ │ • Tool system │ │ • OpenAI │
│ • Autonomous │ │ • Streaming │ │ • Embedded │
│ mode │ │ • Task exec │ │ (llama.cpp) │
│ │ │ • TODO mgmt │ │ • OAuth flow │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
└───────────────────────┼───────────────────────┘
┌───────────────────────┼───────────────────────┐
│ │ │
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ g3-execution │ │ g3-config │ │ g3-planner │
│ │ │ │ │ │
│ • Code exec │ │ • TOML config │ │ • Requirements │
│ • Shell cmds │ │ • Env overrides │ │ • Git ops │
│ • Streaming │ │ • Provider │ │ • Planning │
│ • Error hdlg │ │ settings │ │ workflow │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
│ ┌─────────────────┐ │
│ │ g3-computer- │ │
└─────────────►│ control │◄─────────────┘
│ • Mouse/kbd │
│ • Screenshots │
│ • OCR/Vision │
│ • WebDriver │
│ • macOS Ax API │
└─────────────────┘
┌───────────────────────┼───────────────────────┐
│ │ │
┌─────────────────┐ ┌─────────────────┐
│ g3-ensembles │ │ g3-console │
│ │ │ │
│ • Flock mode │ │ • Web console │
│ • Multi-agent │ │ • Process mgmt │
│ • Parallel dev │ │ • Log viewing │
└─────────────────┘ └─────────────────┘
```
## Workspace Structure
G3 is organized as a Rust workspace with 9 crates:
```
g3/
├── src/main.rs # Entry point (delegates to g3-cli)
├── crates/
│ ├── g3-cli/ # Command-line interface and TUI
│ ├── g3-core/ # Core agent engine and tools
│ ├── g3-providers/ # LLM provider abstractions
│ ├── g3-config/ # Configuration management
│ ├── g3-execution/ # Code execution engine
│ ├── g3-computer-control/ # Computer automation
│ ├── g3-planner/ # Planning mode workflow
│ ├── g3-ensembles/ # Multi-agent (flock) mode
│ └── g3-console/ # Web monitoring console
├── agents/ # Agent persona definitions
├── logs/ # Session logs (auto-created)
└── g3-plan/ # Planning artifacts
```
## Crate Responsibilities
### g3-core (Central Hub)
**Location**: `crates/g3-core/`
**Purpose**: Core agent engine, tool system, and orchestration logic
Key modules:
- `lib.rs` - Main `Agent` struct and orchestration (~3400 lines)
- `context_window.rs` - Token tracking and context management
- `streaming_parser.rs` - Real-time LLM response parsing
- `tool_definitions.rs` - JSON schema definitions for all tools
- `tool_dispatch.rs` - Routes tool calls to implementations
- `tools/` - Tool implementations (file ops, shell, vision, webdriver, etc.)
- `error_handling.rs` - Error classification and recovery
- `retry.rs` - Retry logic with exponential backoff
- `prompts.rs` - System prompt generation
- `code_search/` - Tree-sitter based code search
**Key types**:
- `Agent<W: UiWriter>` - Main agent struct, generic over UI output
- `ContextWindow` - Manages conversation history and token limits
- `StreamingToolParser` - Parses streaming LLM responses for tool calls
- `ToolCall` - Represents a tool invocation
### g3-providers (LLM Abstraction)
**Location**: `crates/g3-providers/`
**Purpose**: Unified interface for multiple LLM backends
Key modules:
- `lib.rs` - `LLMProvider` trait and `ProviderRegistry`
- `anthropic.rs` - Anthropic Claude API (~51k chars)
- `databricks.rs` - Databricks Foundation Models (~58k chars)
- `openai.rs` - OpenAI and compatible APIs (~18k chars)
- `embedded.rs` - Local models via llama.cpp (~34k chars)
- `oauth.rs` - OAuth authentication flow
**Key traits**:
```rust
#[async_trait]
pub trait LLMProvider: Send + Sync {
async fn complete(&self, request: CompletionRequest) -> Result<CompletionResponse>;
async fn stream(&self, request: CompletionRequest) -> Result<CompletionStream>;
fn name(&self) -> &str;
fn model(&self) -> &str;
fn has_native_tool_calling(&self) -> bool;
fn supports_cache_control(&self) -> bool;
fn max_tokens(&self) -> u32;
fn temperature(&self) -> f32;
}
```
### g3-cli (User Interface)
**Location**: `crates/g3-cli/`
**Purpose**: Command-line interface, TUI, and execution modes
Key modules:
- `lib.rs` - Main CLI logic and execution modes (~112k chars)
- `retro_tui.rs` - Full-screen retro terminal UI (~63k chars)
- `filter_json.rs` - JSON tool call filtering for display
- `ui_writer_impl.rs` - Console output implementation
- `theme.rs` - Color themes for retro mode
**Execution modes**:
1. **Single-shot**: `g3 "task description"` - Execute one task and exit
2. **Interactive**: `g3` - REPL-style conversation (default)
3. **Autonomous**: `g3 --autonomous` - Coach-player feedback loop
4. **Accumulative**: Default interactive mode with autonomous runs
5. **Planning**: `g3 --planning` - Requirements-driven development
6. **Retro TUI**: `g3 --retro` - Full-screen terminal interface
### g3-config (Configuration)
**Location**: `crates/g3-config/`
**Purpose**: TOML-based configuration management
Key structures:
- `Config` - Root configuration
- `ProvidersConfig` - Provider settings with named configs
- `AgentConfig` - Agent behavior settings
- `WebDriverConfig` - Browser automation settings
- `MacAxConfig` - macOS Accessibility API settings
**Configuration hierarchy** (highest priority last):
1. Default configuration
2. `~/.config/g3/config.toml`
3. `./g3.toml`
4. Environment variables (`G3_*`)
5. CLI arguments
### g3-execution (Code Execution)
**Location**: `crates/g3-execution/`
**Purpose**: Safe execution of shell commands and scripts
Features:
- Streaming output capture
- Exit code tracking
- Async execution via Tokio
- Error handling and formatting
### g3-computer-control (Automation)
**Location**: `crates/g3-computer-control/`
**Purpose**: Cross-platform computer control and automation
Key modules:
- `platform/` - Platform-specific implementations (macOS, Linux, Windows)
- `webdriver/` - Safari and Chrome WebDriver integration
- `ocr/` - Text extraction (Tesseract, Apple Vision)
- `macax/` - macOS Accessibility API controller
**Platform support**:
- **macOS**: Core Graphics, Cocoa, screencapture, Vision framework
- **Linux**: X11/Xtest for input
- **Windows**: Win32 APIs
### g3-planner (Planning Mode)
**Location**: `crates/g3-planner/`
**Purpose**: Requirements-driven development workflow
Key modules:
- `planner.rs` - Main planning state machine (~40k chars)
- `state.rs` - Planning state management
- `git.rs` - Git operations
- `code_explore.rs` - Codebase exploration
- `llm.rs` - LLM interactions for planning
- `history.rs` - Planning history tracking
**Workflow**:
1. Write requirements in `<codepath>/g3-plan/new_requirements.md`
2. LLM refines requirements
3. Requirements renamed to `current_requirements.md`
4. Coach/player loop implements
5. Files archived with timestamps
6. Git commit with LLM-generated message
### g3-ensembles (Multi-Agent)
**Location**: `crates/g3-ensembles/`
**Purpose**: Parallel multi-agent development (Flock mode)
Key modules:
- `flock.rs` - Flock orchestration (~43k chars)
- `status.rs` - Agent status tracking
Flock mode enables parallel development by spawning multiple agent instances working on different parts of a project.
### g3-console (Web Console)
**Location**: `crates/g3-console/`
**Purpose**: Web-based monitoring and control
Key modules:
- `main.rs` - Axum web server
- `api/` - REST API endpoints
- `process/` - Process detection and control
- `logs.rs` - Log parsing and streaming
## Data Flow
### Request Flow
```
User Input
┌─────────────┐
│ g3-cli │ Parse input, determine mode
└─────────────┘
┌─────────────┐
│ g3-core │ Add to context window
│ Agent │ Build completion request
└─────────────┘
┌─────────────┐
│ g3-providers│ Send to LLM provider
│ Registry │ Stream response
└─────────────┘
┌─────────────┐
│ g3-core │ Parse streaming response
│ Parser │ Detect tool calls
└─────────────┘
┌─────────────┐
│ g3-core │ Execute tools
│ Tools │ Return results
└─────────────┘
┌─────────────┐
│ g3-core │ Add results to context
│ Agent │ Continue or complete
└─────────────┘
```
### Context Window Management
The `ContextWindow` struct manages conversation history with intelligent token tracking:
1. **Token Tracking**: Monitors usage as percentage of provider's context limit
2. **Context Thinning**: At 50%, 60%, 70%, 80% thresholds, replaces large tool results with file references
3. **Auto-Summarization**: At 80% capacity, triggers conversation summarization
4. **Provider Adaptation**: Adjusts to different model context windows (4k to 200k+ tokens)
## Error Handling
G3 implements comprehensive error handling:
1. **Error Classification**: Distinguishes recoverable vs non-recoverable errors
2. **Automatic Retry**: Exponential backoff with jitter for:
- Rate limits (HTTP 429)
- Network errors
- Server errors (HTTP 5xx)
- Timeouts
3. **Error Logging**: Detailed logs saved to `logs/errors/`
4. **Graceful Degradation**: Continues when possible, fails gracefully when not
## Session Management
Sessions are tracked in `.g3/sessions/<session_id>/`:
- `session.json` - Full conversation history and metadata
- `todo.g3.md` - Session-scoped TODO list
- Context summaries and thinned content
Legacy logs are stored in `logs/g3_session_*.json`.
## Extension Points
### Adding a New Tool
1. Add tool definition in `g3-core/src/tool_definitions.rs`
2. Implement handler in `g3-core/src/tools/`
3. Add dispatch case in `g3-core/src/tool_dispatch.rs`
4. Update system prompt if needed in `g3-core/src/prompts.rs`
### Adding a New Provider
1. Implement `LLMProvider` trait in `g3-providers/src/`
2. Add configuration struct in `g3-config/src/lib.rs`
3. Register provider in `g3-core/src/lib.rs` (in `new_with_mode_and_readme`)
4. Update documentation
### Adding a New Execution Mode
1. Add CLI arguments in `g3-cli/src/lib.rs`
2. Implement mode logic in the CLI
3. May require new agent methods in `g3-core`
## Key Files for Understanding
Start reading here:
1. `src/main.rs` - Entry point (trivial, delegates to g3-cli)
2. `crates/g3-cli/src/lib.rs` - CLI and execution modes
3. `crates/g3-core/src/lib.rs` - Agent implementation
4. `crates/g3-providers/src/lib.rs` - Provider trait and registry
5. `crates/g3-core/src/tool_definitions.rs` - Available tools
6. `crates/g3-config/src/lib.rs` - Configuration structures
7. `DESIGN.md` - Original design document
## Dependencies
Key external dependencies:
- **tokio**: Async runtime
- **reqwest**: HTTP client for API calls
- **serde/serde_json**: Serialization
- **clap**: CLI argument parsing
- **tree-sitter**: Syntax-aware code search
- **llama_cpp**: Local model inference (with Metal acceleration)
- **fantoccini**: WebDriver client
- **axum**: Web framework (for g3-console)

385
docs/configuration.md Normal file
View File

@@ -0,0 +1,385 @@
# G3 Configuration Guide
**Last updated**: January 2025
**Source of truth**: `crates/g3-config/src/lib.rs`, `config.example.toml`
## Purpose
This document explains how to configure G3, including provider setup, agent behavior, and optional features like WebDriver and computer control.
## Configuration File Location
G3 looks for configuration files in this order:
1. Path specified via `--config` CLI argument
2. `./g3.toml` (current directory)
3. `~/.config/g3/config.toml` (user config)
4. `~/.g3.toml` (legacy location)
If no configuration file exists, G3 creates a default one at `~/.config/g3/config.toml` on first run.
## Configuration Format
G3 uses TOML format. The configuration is organized into sections:
```toml
[providers] # LLM provider settings
[agent] # Agent behavior settings
[computer_control] # Mouse/keyboard automation
[webdriver] # Browser automation
[macax] # macOS Accessibility API
```
## Provider Configuration
### Provider Reference Format
Providers are referenced using the format: `<provider_type>.<config_name>`
Examples:
- `anthropic.default`
- `databricks.production`
- `openai.gpt4`
- `embedded.local`
### Basic Provider Setup
```toml
[providers]
# Default provider used for all operations
default_provider = "anthropic.default"
# Optional: Different providers for different roles
# planner = "anthropic.planner" # Planning mode
# coach = "anthropic.default" # Code reviewer in autonomous mode
# player = "anthropic.default" # Code implementer in autonomous mode
```
### Anthropic Configuration
```toml
[providers.anthropic.default]
api_key = "sk-ant-..." # Required: Your Anthropic API key
model = "claude-sonnet-4-5" # Model to use
max_tokens = 64000 # Max output tokens per request
temperature = 0.3 # Sampling temperature (0.0-1.0)
# cache_config = "ephemeral" # Optional: Enable prompt caching
# enable_1m_context = true # Optional: Enable 1M context (extra cost)
# thinking_budget_tokens = 10000 # Optional: Extended thinking mode
```
**Available Anthropic models**:
- `claude-sonnet-4-5` (recommended)
- `claude-opus-4-5`
- `claude-3-5-sonnet-20241022`
- `claude-3-opus-20240229`
### Databricks Configuration
```toml
[providers.databricks.default]
host = "https://your-workspace.cloud.databricks.com" # Required
model = "databricks-claude-sonnet-4" # Model endpoint
max_tokens = 4096
temperature = 0.1
use_oauth = true # Use OAuth (recommended)
# token = "dapi..." # Or use personal access token
```
**OAuth vs Token Authentication**:
- **OAuth** (`use_oauth = true`): Opens browser for authentication, tokens refresh automatically
- **Token** (`token = "..."`, `use_oauth = false`): Uses personal access token directly
### OpenAI Configuration
```toml
[providers.openai.default]
api_key = "sk-..." # Required: Your OpenAI API key
model = "gpt-4-turbo" # Model to use
max_tokens = 4096
temperature = 0.1
# base_url = "https://api.openai.com/v1" # Optional: Custom endpoint
```
### OpenAI-Compatible Providers
For services with OpenAI-compatible APIs (OpenRouter, Groq, Together, etc.):
```toml
[providers.openai_compatible.openrouter]
api_key = "sk-or-..." # Provider's API key
model = "anthropic/claude-3.5-sonnet"
base_url = "https://openrouter.ai/api/v1"
max_tokens = 4096
temperature = 0.1
[providers.openai_compatible.groq]
api_key = "gsk_..."
model = "llama-3.3-70b-versatile"
base_url = "https://api.groq.com/openai/v1"
max_tokens = 4096
temperature = 0.1
```
Reference these as `openrouter.default` or `groq.default` in `default_provider`.
### Embedded (Local) Models
```toml
[providers.embedded.default]
model_path = "~/.cache/g3/models/qwen2.5-7b-instruct-q3_k_m.gguf"
model_type = "qwen" # Model architecture
context_length = 32768 # Context window size
max_tokens = 2048 # Max output tokens
temperature = 0.1
gpu_layers = 32 # Layers to offload to GPU (Metal/CUDA)
threads = 8 # CPU threads for inference
```
**Supported model types**: `qwen`, `codellama`, `llama`, `mistral`
**Hardware requirements**:
- 4-16GB RAM depending on model size
- Optional GPU acceleration (Metal on macOS, CUDA on Linux)
## Agent Configuration
```toml
[agent]
# Context and token settings
fallback_default_max_tokens = 8192 # Default max tokens if provider doesn't specify
# max_context_length = 200000 # Override context window size for all providers
# Behavior settings
enable_streaming = true # Stream responses in real-time
allow_multiple_tool_calls = true # Allow multiple tools per response
timeout_seconds = 60 # Request timeout
auto_compact = true # Auto-compact context at 90%
# Retry settings
max_retry_attempts = 3 # Retries for interactive mode
autonomous_max_retry_attempts = 6 # Retries for autonomous mode
# TODO management
check_todo_staleness = true # Warn about stale TODO items
```
### Retry Behavior
G3 automatically retries on recoverable errors:
- Rate limits (HTTP 429)
- Network errors
- Server errors (HTTP 5xx)
- Timeouts
**Interactive mode** uses `max_retry_attempts` (default: 3)
**Autonomous mode** uses `autonomous_max_retry_attempts` (default: 6) with longer delays
## Computer Control Configuration
```toml
[computer_control]
enabled = false # Set to true to enable
require_confirmation = true # Require confirmation before actions
max_actions_per_second = 5 # Rate limit for safety
```
**Required OS permissions**:
- **macOS**: System Preferences → Security & Privacy → Accessibility
- **Linux**: X11 or Wayland access
- **Windows**: Run as administrator (first time)
## WebDriver Configuration
```toml
[webdriver]
enabled = false # Set to true to enable
browser = "safari" # "safari" or "chrome-headless"
safari_port = 4444 # Safari WebDriver port
chrome_port = 9515 # ChromeDriver port
# chrome_binary = "/path/to/chrome" # Optional: Custom Chrome path
```
### Safari Setup (macOS)
```bash
# Enable Safari remote automation (one-time setup)
safaridriver --enable
# Or via Safari UI:
# Safari → Preferences → Advanced → Show Develop menu
# Develop → Allow Remote Automation
```
### Chrome Setup
**Option 1: Chrome for Testing (Recommended)**
```bash
./scripts/setup-chrome-for-testing.sh
```
Then configure:
```toml
[webdriver]
chrome_binary = "/Users/yourname/.chrome-for-testing/chrome-mac-arm64/Google Chrome for Testing.app/Contents/MacOS/Google Chrome for Testing"
```
**Option 2: System Chrome**
```bash
# macOS
brew install chromedriver
# Linux
apt install chromium-chromedriver
```
## macOS Accessibility API Configuration
```toml
[macax]
enabled = false # Set to true to enable
```
**Required permissions**: System Preferences → Security & Privacy → Privacy → Accessibility → Add your terminal app
See [macOS Accessibility Tools Guide](macax-tools.md) for detailed usage.
## Multi-Role Configuration
For autonomous mode with different models for coach and player:
```toml
[providers]
default_provider = "anthropic.default"
coach = "anthropic.coach" # Code reviewer
player = "anthropic.player" # Code implementer
[providers.anthropic.coach]
api_key = "sk-ant-..."
model = "claude-sonnet-4-5"
max_tokens = 32000
temperature = 0.1 # Lower for consistent reviews
[providers.anthropic.player]
api_key = "sk-ant-..."
model = "claude-sonnet-4-5"
max_tokens = 64000
temperature = 0.3 # Higher for creative implementations
```
See `config.coach-player.example.toml` for a complete example.
## Environment Variables
Environment variables override configuration file settings:
| Variable | Description |
|----------|-------------|
| `G3_WORKSPACE_PATH` | Override workspace directory |
| `ANTHROPIC_API_KEY` | Anthropic API key |
| `OPENAI_API_KEY` | OpenAI API key |
| `DATABRICKS_HOST` | Databricks workspace URL |
| `DATABRICKS_TOKEN` | Databricks personal access token |
## CLI Overrides
CLI arguments have the highest priority:
```bash
# Override provider
g3 --provider anthropic.default
# Override model
g3 --model claude-opus-4-5
# Enable features
g3 --webdriver # Enable WebDriver (Safari)
g3 --chrome-headless # Enable WebDriver (Chrome headless)
g3 --macax # Enable macOS Accessibility API
# Specify config file
g3 --config /path/to/config.toml
```
## Complete Example Configuration
```toml
# ~/.config/g3/config.toml
[providers]
default_provider = "anthropic.default"
[providers.anthropic.default]
api_key = "sk-ant-api03-..."
model = "claude-sonnet-4-5"
max_tokens = 64000
temperature = 0.3
[providers.databricks.work]
host = "https://mycompany.cloud.databricks.com"
model = "databricks-claude-sonnet-4"
max_tokens = 4096
temperature = 0.1
use_oauth = true
[agent]
fallback_default_max_tokens = 8192
enable_streaming = true
allow_multiple_tool_calls = true
timeout_seconds = 60
max_retry_attempts = 3
autonomous_max_retry_attempts = 6
[computer_control]
enabled = false
require_confirmation = true
max_actions_per_second = 5
[webdriver]
enabled = true
browser = "safari"
safari_port = 4444
[macax]
enabled = false
```
## Troubleshooting
### "Old config format" error
If you see this error, your config uses a deprecated format. Update to the new named provider format:
**Old format** (deprecated):
```toml
[providers.anthropic]
api_key = "..."
```
**New format**:
```toml
[providers.anthropic.default]
api_key = "..."
```
### Provider not found
Ensure your `default_provider` matches a configured provider:
```toml
default_provider = "anthropic.default" # Must match [providers.anthropic.default]
```
### OAuth issues
For Databricks OAuth:
1. Ensure `use_oauth = true`
2. Remove any `token` setting
3. A browser window will open for authentication
4. Tokens are cached in `~/.databricks/oauth-tokens.json`
### Context window errors
If you see context overflow errors:
1. Check `max_context_length` in `[agent]`
2. Use `/compact` command to manually summarize
3. Use `/thinnify` to replace large tool results with file references

472
docs/macax-tools.md Normal file
View File

@@ -0,0 +1,472 @@
# macOS Accessibility Tools Guide
**Last updated**: January 2025
**Source of truth**: `crates/g3-computer-control/src/macax/`
## Purpose
G3 includes tools for controlling macOS applications via the Accessibility API. This enables automation of native macOS apps, including those you're building with G3.
## Overview
The macOS Accessibility API provides programmatic access to UI elements in any application. G3 exposes this through the `macax_*` tools, allowing you to:
- List and activate applications
- Inspect UI element hierarchies
- Find elements by role, title, or identifier
- Click buttons and interact with controls
- Read and set values in text fields
- Simulate keyboard input
## Setup
### 1. Enable in Configuration
```toml
# ~/.config/g3/config.toml
[macax]
enabled = true
```
Or use the CLI flag:
```bash
g3 --macax
```
### 2. Grant Accessibility Permissions
1. Open **System Preferences****Security & Privacy****Privacy**
2. Select **Accessibility** in the left sidebar
3. Click the lock icon and authenticate
4. Add your terminal application (Terminal, iTerm2, etc.)
5. Restart your terminal
**Note**: If using VS Code's integrated terminal, add VS Code to the list.
### 3. Verify Setup
```json
{"tool": "macax_list_apps", "args": {}}
```
This should return a list of running applications.
## Available Tools
### macax_list_apps
List all running applications.
**Parameters**: None
**Example**:
```json
{"tool": "macax_list_apps", "args": {}}
```
**Returns**:
```
Running Applications:
- Safari (com.apple.Safari)
- Finder (com.apple.finder)
- Terminal (com.apple.Terminal)
- MyApp (com.example.myapp)
```
---
### macax_get_frontmost_app
Get the currently active (frontmost) application.
**Parameters**: None
**Example**:
```json
{"tool": "macax_get_frontmost_app", "args": {}}
```
**Returns**:
```
Frontmost Application: Safari (com.apple.Safari)
```
---
### macax_activate_app
Bring an application to the front.
**Parameters**:
- `app_name` (string, required): Application name
**Example**:
```json
{"tool": "macax_activate_app", "args": {"app_name": "Safari"}}
```
---
### macax_get_ui_tree
Get the UI element hierarchy of an application.
**Parameters**:
- `app_name` (string, required): Application name
- `max_depth` (integer, optional): Maximum tree depth (default: 5)
**Example**:
```json
{"tool": "macax_get_ui_tree", "args": {"app_name": "Calculator", "max_depth": 3}}
```
**Returns**:
```
UI Tree for Calculator:
└── AXApplication "Calculator"
└── AXWindow "Calculator"
├── AXGroup
│ ├── AXButton "1" [id: digit_1]
│ ├── AXButton "2" [id: digit_2]
│ ├── AXButton "+" [id: add]
│ └── AXButton "=" [id: equals]
└── AXStaticText "0" [id: display]
```
**Notes**:
- Use lower `max_depth` for complex apps to avoid overwhelming output
- Elements show role, title, and accessibility identifier (if set)
---
### macax_find_elements
Find UI elements matching criteria.
**Parameters**:
- `app_name` (string, required): Application name
- `role` (string, optional): Element role (e.g., "button", "textField")
- `title` (string, optional): Element title/label
- `identifier` (string, optional): Accessibility identifier
**Example**:
```json
{"tool": "macax_find_elements", "args": {
"app_name": "Safari",
"role": "button"
}}
```
**Returns**:
```
Found 5 elements:
1. AXButton "Back" [id: BackButton]
2. AXButton "Forward" [id: ForwardButton]
3. AXButton "Reload" [id: ReloadButton]
4. AXButton "Share" [id: ShareButton]
5. AXButton "New Tab" [id: NewTabButton]
```
---
### macax_click
Click a UI element.
**Parameters**:
- `app_name` (string, required): Application name
- `identifier` (string, optional): Accessibility identifier
- `title` (string, optional): Element title
- `role` (string, optional): Element role
At least one of `identifier`, `title`, or `role` must be provided.
**Examples**:
```json
// Click by identifier (most reliable)
{"tool": "macax_click", "args": {
"app_name": "Calculator",
"identifier": "digit_5"
}}
// Click by title
{"tool": "macax_click", "args": {
"app_name": "Calculator",
"title": "5"
}}
// Click by role and title
{"tool": "macax_click", "args": {
"app_name": "Safari",
"role": "button",
"title": "Reload"
}}
```
---
### macax_set_value
Set the value of a UI element (text fields, sliders, etc.).
**Parameters**:
- `app_name` (string, required): Application name
- `identifier` (string, optional): Accessibility identifier
- `title` (string, optional): Element title
- `value` (string, required): Value to set
**Example**:
```json
{"tool": "macax_set_value", "args": {
"app_name": "TextEdit",
"role": "textArea",
"value": "Hello, World!"
}}
```
---
### macax_get_value
Get the current value of a UI element.
**Parameters**:
- `app_name` (string, required): Application name
- `identifier` (string, optional): Accessibility identifier
- `title` (string, optional): Element title
**Example**:
```json
{"tool": "macax_get_value", "args": {
"app_name": "Calculator",
"identifier": "display"
}}
```
**Returns**:
```
Value: 42
```
---
### macax_press_key
Simulate a key press.
**Parameters**:
- `key` (string, required): Key to press
- `modifiers` (array, optional): Modifier keys
**Supported modifiers**: `command`, `shift`, `option`, `control`
**Examples**:
```json
// Simple key press
{"tool": "macax_press_key", "args": {"key": "a"}}
// With modifiers (Cmd+S)
{"tool": "macax_press_key", "args": {
"key": "s",
"modifiers": ["command"]
}}
// Multiple modifiers (Cmd+Shift+N)
{"tool": "macax_press_key", "args": {
"key": "n",
"modifiers": ["command", "shift"]
}}
// Special keys
{"tool": "macax_press_key", "args": {"key": "return"}}
{"tool": "macax_press_key", "args": {"key": "escape"}}
{"tool": "macax_press_key", "args": {"key": "tab"}}
{"tool": "macax_press_key", "args": {"key": "delete"}}
```
**Special key names**:
- `return`, `enter`
- `escape`, `esc`
- `tab`
- `delete`, `backspace`
- `space`
- `up`, `down`, `left`, `right`
- `home`, `end`, `pageup`, `pagedown`
- `f1` through `f12`
## Common Roles
| Role | Description |
|------|-------------|
| `button` | Clickable button |
| `textField` | Single-line text input |
| `textArea` | Multi-line text input |
| `checkbox` | Checkbox control |
| `radioButton` | Radio button |
| `popUpButton` | Dropdown/popup menu |
| `slider` | Slider control |
| `table` | Table view |
| `list` | List view |
| `outline` | Outline/tree view |
| `group` | Container group |
| `window` | Application window |
| `sheet` | Modal sheet |
| `dialog` | Dialog window |
| `staticText` | Non-editable text |
| `image` | Image element |
| `scrollArea` | Scrollable container |
| `toolbar` | Toolbar |
| `menuBar` | Menu bar |
| `menu` | Menu |
| `menuItem` | Menu item |
## Best Practices
### 1. Use Accessibility Identifiers
When building apps you'll automate with G3, add accessibility identifiers:
**SwiftUI**:
```swift
Button("Submit") { ... }
.accessibilityIdentifier("submit_button")
```
**UIKit**:
```swift
button.accessibilityIdentifier = "submit_button"
```
**AppKit**:
```swift
button.setAccessibilityIdentifier("submit_button")
```
Identifiers are more reliable than titles (which may be localized).
### 2. Inspect Before Automating
Always inspect the UI tree first:
```json
{"tool": "macax_get_ui_tree", "args": {"app_name": "MyApp", "max_depth": 4}}
```
This helps you understand:
- Element hierarchy
- Available identifiers
- Correct role names
### 3. Activate App First
Some actions require the app to be frontmost:
```json
{"tool": "macax_activate_app", "args": {"app_name": "MyApp"}}
{"tool": "macax_click", "args": {"app_name": "MyApp", "identifier": "button1"}}
```
### 4. Handle Timing
UI updates may take time. If an element isn't found:
1. Wait briefly
2. Retry the operation
3. Check if the app state changed
### 5. Prefer Identifiers Over Titles
```json
// Good: Uses identifier
{"tool": "macax_click", "args": {"app_name": "MyApp", "identifier": "save_btn"}}
// Less reliable: Uses title (may be localized)
{"tool": "macax_click", "args": {"app_name": "MyApp", "title": "Save"}}
```
## Example: Automating Calculator
```json
// 1. Activate Calculator
{"tool": "macax_activate_app", "args": {"app_name": "Calculator"}}
// 2. Inspect UI
{"tool": "macax_get_ui_tree", "args": {"app_name": "Calculator", "max_depth": 3}}
// 3. Click "5"
{"tool": "macax_click", "args": {"app_name": "Calculator", "title": "5"}}
// 4. Click "+"
{"tool": "macax_click", "args": {"app_name": "Calculator", "title": "+"}}
// 5. Click "3"
{"tool": "macax_click", "args": {"app_name": "Calculator", "title": "3"}}
// 6. Click "="
{"tool": "macax_click", "args": {"app_name": "Calculator", "title": "="}}
// 7. Read result
{"tool": "macax_get_value", "args": {"app_name": "Calculator", "role": "staticText"}}
```
## Troubleshooting
### "Accessibility permission denied"
1. Check System Preferences → Security & Privacy → Accessibility
2. Ensure your terminal app is listed and checked
3. Restart the terminal after granting permission
### "Application not found"
1. Use exact app name (case-sensitive)
2. Run `macax_list_apps` to see available apps
3. App must be running
### "Element not found"
1. Inspect UI tree to verify element exists
2. Check identifier/title spelling
3. Element may be in a different window or sheet
4. App state may have changed
### "Cannot perform action"
1. Element may be disabled
2. App may need to be frontmost
3. Element may not support the action
4. Check element role supports the operation
### Slow Performance
1. Reduce `max_depth` in `macax_get_ui_tree`
2. Use specific identifiers instead of searching
3. Complex apps have large UI trees
## Comparison with Other Tools
| Feature | macax | Vision Tools | WebDriver |
|---------|-------|--------------|----------|
| Native apps | ✅ | ✅ (via OCR) | ❌ |
| Web browsers | ✅ | ✅ | ✅ |
| Electron apps | ✅ | ✅ | Partial |
| Reliability | High | Medium | High |
| Setup | Permissions | None | Driver |
| Speed | Fast | Slower | Medium |
**Use macax when**:
- Automating native macOS apps
- You control the app and can add identifiers
- Need reliable, fast automation
**Use Vision tools when**:
- App doesn't expose accessibility
- Need to find text visually
- Cross-platform approach needed
**Use WebDriver when**:
- Automating web content
- Need JavaScript execution
- Testing web applications

408
docs/providers.md Normal file
View File

@@ -0,0 +1,408 @@
# G3 LLM Providers Guide
**Last updated**: January 2025
**Source of truth**: `crates/g3-providers/src/`
## Purpose
This document describes the LLM providers supported by G3, their capabilities, and how to choose between them.
## Provider Overview
| Provider | Type | Tool Calling | Cache Control | Context Window | Best For |
|----------|------|--------------|---------------|----------------|----------|
| **Anthropic** | Cloud | Native | Yes | 200k (1M optional) | General use, complex tasks |
| **Databricks** | Cloud | Native | Yes (Claude models) | Varies | Enterprise, existing Databricks users |
| **OpenAI** | Cloud | Native | No | 128k | GPT model preference |
| **OpenAI-Compatible** | Cloud | Native | No | Varies | OpenRouter, Groq, Together, etc. |
| **Embedded** | Local | JSON fallback | No | 4k-32k | Privacy, offline, cost savings |
## Anthropic
**Location**: `crates/g3-providers/src/anthropic.rs`
### Features
- **Native tool calling**: Full support for structured tool calls
- **Prompt caching**: Reduce costs with ephemeral caching
- **Extended context**: Optional 1M token context (additional cost)
- **Extended thinking**: Budget tokens for complex reasoning
- **Streaming**: Real-time response streaming
### Configuration
```toml
[providers.anthropic.default]
api_key = "sk-ant-api03-..." # Required
model = "claude-sonnet-4-5" # Model name
max_tokens = 64000 # Max output tokens
temperature = 0.3 # 0.0-1.0
cache_config = "ephemeral" # Optional: Enable caching
enable_1m_context = true # Optional: 1M context
thinking_budget_tokens = 10000 # Optional: Extended thinking
```
### Available Models
| Model | Context | Best For |
|-------|---------|----------|
| `claude-sonnet-4-5` | 200k | Balanced performance/cost |
| `claude-opus-4-5` | 200k | Complex reasoning |
| `claude-3-5-sonnet-20241022` | 200k | Previous generation |
| `claude-3-opus-20240229` | 200k | Previous generation |
### Prompt Caching
Enable caching to reduce costs for repeated context:
```toml
cache_config = "ephemeral" # Cache for session duration
```
Caching is applied to:
- System prompts
- README/AGENTS.md content
- Large tool results
### Extended Thinking
For complex tasks requiring step-by-step reasoning:
```toml
thinking_budget_tokens = 10000 # Tokens for internal reasoning
```
The model uses these tokens for planning before responding.
---
## Databricks
**Location**: `crates/g3-providers/src/databricks.rs`
### Features
- **Foundation Model APIs**: Access to various models
- **OAuth authentication**: Secure browser-based auth
- **Token authentication**: Personal access tokens
- **Enterprise integration**: Works with existing Databricks setup
### Configuration
```toml
[providers.databricks.default]
host = "https://your-workspace.cloud.databricks.com"
model = "databricks-claude-sonnet-4"
max_tokens = 4096
temperature = 0.1
use_oauth = true # Recommended
# token = "dapi..." # Alternative: PAT
```
### Authentication
**OAuth (Recommended)**:
1. Set `use_oauth = true`
2. On first run, browser opens for authentication
3. Tokens are cached in `~/.databricks/oauth-tokens.json`
4. Tokens refresh automatically
**Personal Access Token**:
1. Generate token in Databricks workspace
2. Set `token = "dapi..."` and `use_oauth = false`
### Available Models
Models depend on your Databricks workspace configuration:
- `databricks-claude-sonnet-4` (Claude via Databricks)
- `databricks-meta-llama-3-1-70b-instruct`
- `databricks-dbrx-instruct`
- Custom fine-tuned models
---
## OpenAI
**Location**: `crates/g3-providers/src/openai.rs`
### Features
- **Native tool calling**: Full support
- **Custom endpoints**: Override base URL
- **Streaming**: Real-time responses
### Configuration
```toml
[providers.openai.default]
api_key = "sk-..." # Required
model = "gpt-4-turbo" # Model name
max_tokens = 4096
temperature = 0.1
# base_url = "https://api.openai.com/v1" # Optional
```
### Available Models
| Model | Context | Notes |
|-------|---------|-------|
| `gpt-4-turbo` | 128k | Latest GPT-4 |
| `gpt-4o` | 128k | Optimized GPT-4 |
| `gpt-4` | 8k | Original GPT-4 |
| `gpt-3.5-turbo` | 16k | Faster, cheaper |
---
## OpenAI-Compatible Providers
**Location**: `crates/g3-providers/src/openai.rs` (reuses OpenAI implementation)
For services that implement the OpenAI API format.
### Configuration
```toml
# OpenRouter
[providers.openai_compatible.openrouter]
api_key = "sk-or-..."
model = "anthropic/claude-3.5-sonnet"
base_url = "https://openrouter.ai/api/v1"
max_tokens = 4096
temperature = 0.1
# Groq
[providers.openai_compatible.groq]
api_key = "gsk_..."
model = "llama-3.3-70b-versatile"
base_url = "https://api.groq.com/openai/v1"
max_tokens = 4096
temperature = 0.1
# Together
[providers.openai_compatible.together]
api_key = "..."
model = "meta-llama/Llama-3-70b-chat-hf"
base_url = "https://api.together.xyz/v1"
max_tokens = 4096
temperature = 0.1
```
### Supported Services
- **OpenRouter**: Access to many models through one API
- **Groq**: Fast inference for Llama models
- **Together**: Open-source model hosting
- **Anyscale**: Scalable model serving
- **Local servers**: Ollama, vLLM, text-generation-inference
---
## Embedded (Local Models)
**Location**: `crates/g3-providers/src/embedded.rs`
### Features
- **Completely local**: No data leaves your machine
- **Offline capable**: Works without internet
- **GPU acceleration**: Metal (macOS), CUDA (Linux)
- **No API costs**: Free after model download
### Configuration
```toml
[providers.embedded.default]
model_path = "~/.cache/g3/models/qwen2.5-7b-instruct-q3_k_m.gguf"
model_type = "qwen" # Model architecture
context_length = 32768 # Context window
max_tokens = 2048 # Max output
temperature = 0.1
gpu_layers = 32 # GPU offload (0 = CPU only)
threads = 8 # CPU threads
```
### Supported Model Types
| Type | Models | Notes |
|------|--------|-------|
| `qwen` | Qwen 2.5 series | Good coding ability |
| `codellama` | Code Llama | Specialized for code |
| `llama` | Llama 2/3 | General purpose |
| `mistral` | Mistral/Mixtral | Efficient |
### Model Download
Download GGUF models from Hugging Face:
```bash
mkdir -p ~/.cache/g3/models
cd ~/.cache/g3/models
# Example: Qwen 2.5 7B
wget https://huggingface.co/Qwen/Qwen2.5-7B-Instruct-GGUF/resolve/main/qwen2.5-7b-instruct-q4_k_m.gguf
```
### Hardware Requirements
| Model Size | RAM Required | GPU VRAM | Notes |
|------------|--------------|----------|-------|
| 7B Q4 | 6GB | 4GB | Good for most tasks |
| 7B Q8 | 10GB | 8GB | Better quality |
| 13B Q4 | 10GB | 8GB | More capable |
| 70B Q4 | 48GB | 40GB | Requires high-end hardware |
### GPU Acceleration
**macOS (Metal)**:
```toml
gpu_layers = 32 # Offload layers to GPU
```
**Linux (CUDA)**:
Requires CUDA toolkit installed.
**CPU Only**:
```toml
gpu_layers = 0
threads = 8 # Use more threads
```
### Tool Calling
Embedded models don't have native tool calling. G3 uses JSON fallback:
1. System prompt includes tool definitions as JSON
2. Model outputs tool calls as JSON in response
3. G3 parses JSON and executes tools
This works but is less reliable than native tool calling.
---
## Provider Selection Guide
### By Use Case
| Use Case | Recommended Provider |
|----------|---------------------|
| General coding tasks | Anthropic (Claude Sonnet) |
| Complex reasoning | Anthropic (Claude Opus) |
| Enterprise/compliance | Databricks |
| Cost-sensitive | Embedded or Groq |
| Privacy-critical | Embedded |
| Offline development | Embedded |
| Fast iteration | Groq (Llama) |
| Model variety | OpenRouter |
### By Priority
**Quality first**: Anthropic Claude Opus/Sonnet
- Best reasoning and coding ability
- Native tool calling
- Prompt caching for efficiency
**Cost first**: Embedded or OpenAI-compatible
- Embedded: Free after download
- Groq: Very cheap, fast
- OpenRouter: Pay-per-use, many options
**Privacy first**: Embedded
- Data never leaves your machine
- No API calls
- Full control
**Speed first**: Groq or Embedded with GPU
- Groq: Extremely fast inference
- Embedded with Metal/CUDA: Low latency
---
## Provider Trait
All providers implement the `LLMProvider` trait:
```rust
#[async_trait]
pub trait LLMProvider: Send + Sync {
/// Generate a completion
async fn complete(&self, request: CompletionRequest) -> Result<CompletionResponse>;
/// Stream a completion
async fn stream(&self, request: CompletionRequest) -> Result<CompletionStream>;
/// Provider name (e.g., "anthropic.default")
fn name(&self) -> &str;
/// Model name (e.g., "claude-sonnet-4-5")
fn model(&self) -> &str;
/// Whether provider supports native tool calling
fn has_native_tool_calling(&self) -> bool;
/// Whether provider supports cache control
fn supports_cache_control(&self) -> bool;
/// Configured max tokens
fn max_tokens(&self) -> u32;
/// Configured temperature
fn temperature(&self) -> f32;
}
```
---
## Adding a New Provider
1. Create `crates/g3-providers/src/newprovider.rs`
2. Implement `LLMProvider` trait
3. Add configuration struct to `crates/g3-config/src/lib.rs`
4. Register in `crates/g3-core/src/lib.rs` (`new_with_mode_and_readme`)
5. Export from `crates/g3-providers/src/lib.rs`
6. Update documentation
---
## Troubleshooting
### Authentication Errors
**Anthropic**: Verify API key starts with `sk-ant-`
**Databricks OAuth**:
- Delete `~/.databricks/oauth-tokens.json` and re-authenticate
- Ensure workspace URL is correct
**OpenAI**: Verify API key and check billing status
### Rate Limits
G3 automatically retries on rate limits with exponential backoff.
To reduce rate limit issues:
- Use prompt caching (Anthropic)
- Reduce `max_tokens`
- Use a provider with higher limits
### Context Window Errors
If you see "context too long" errors:
1. Use `/compact` to summarize conversation
2. Use `/thinnify` to replace large tool results
3. Increase `max_context_length` in config
4. Switch to a provider with larger context
### Embedded Model Issues
**Model not loading**:
- Verify `model_path` is correct
- Check file permissions
- Ensure enough RAM
**Slow inference**:
- Increase `gpu_layers` for GPU offload
- Reduce `context_length`
- Use a smaller quantization (Q4 vs Q8)
**Poor tool calling**:
- Embedded models use JSON fallback
- Consider cloud provider for complex tool use

538
docs/tools.md Normal file
View File

@@ -0,0 +1,538 @@
# G3 Tools Reference
**Last updated**: January 2025
**Source of truth**: `crates/g3-core/src/tool_definitions.rs`, `crates/g3-core/src/tools/`
## Purpose
This document describes all tools available to the G3 agent. Tools are the primary mechanism by which G3 interacts with the filesystem, executes commands, and automates tasks.
## Tool Categories
| Category | Tools | Enabled By |
|----------|-------|------------|
| **Core** | shell, read_file, write_file, str_replace, final_output, background_process | Always |
| **Images** | read_image, take_screenshot, extract_text | Always |
| **Task Management** | todo_read, todo_write | Always |
| **Code Intelligence** | code_search, code_coverage | Always |
| **WebDriver** | webdriver_* (12 tools) | `--webdriver` or `--chrome-headless` |
| **Vision** | vision_find_text, vision_click_text, vision_click_near_text | Always (macOS) |
| **macOS Accessibility** | macax_* (9 tools) | `--macax` |
| **Computer Control** | mouse_click, type_text, find_element, list_windows | `computer_control.enabled = true` |
---
## Core Tools
### shell
Execute shell commands.
**Parameters**:
- `command` (string, required): The shell command to execute
**Example**:
```json
{"tool": "shell", "args": {"command": "ls -la"}}
```
**Notes**:
- Commands run in the current working directory
- Output is streamed in real-time
- Both stdout and stderr are captured
- Exit code is reported
---
### background_process
Launch a long-running process in the background.
**Parameters**:
- `name` (string, required): Unique name for the process (e.g., "game_server")
- `command` (string, required): Shell command to execute
- `working_dir` (string, optional): Working directory
**Example**:
```json
{"tool": "background_process", "args": {"name": "dev_server", "command": "npm run dev"}}
```
**Returns**: PID and log file path
**Notes**:
- Process runs independently of the agent
- Logs are captured to a file
- Use `shell` to read logs (`tail`), check status (`ps`), or stop (`kill`)
---
### read_file
Read file contents with optional character range.
**Parameters**:
- `file_path` (string, required): Path to the file
- `start` (integer, optional): Starting character position (0-indexed, inclusive)
- `end` (integer, optional): Ending character position (0-indexed, exclusive)
**Example**:
```json
{"tool": "read_file", "args": {"file_path": "src/main.rs", "start": 0, "end": 1000}}
```
**Notes**:
- For image files (png, jpg, gif, etc.), automatically extracts text using OCR
- Supports tilde expansion (`~`)
- Reports file size and line count
---
### read_image
Read image files for visual analysis by the LLM.
**Parameters**:
- `file_paths` (array of strings, required): Paths to image files
**Example**:
```json
{"tool": "read_image", "args": {"file_paths": ["screenshot.png", "diagram.jpg"]}}
```
**Supported formats**: PNG, JPEG, GIF, WebP
**Notes**:
- Images are sent to the LLM for visual analysis
- Use for inspecting sprites, UI screenshots, diagrams, etc.
- Different from `extract_text` which only does OCR
---
### write_file
Create or overwrite a file.
**Parameters**:
- `file_path` (string, required): Path to the file
- `content` (string, required): Content to write
**Example**:
```json
{"tool": "write_file", "args": {"file_path": "hello.txt", "content": "Hello, world!"}}
```
**Notes**:
- Creates parent directories if needed
- Overwrites existing files
- Reports bytes written
---
### str_replace
Apply a unified diff to a file.
**Parameters**:
- `file_path` (string, required): Path to the file
- `diff` (string, required): Unified diff with context lines
- `start` (integer, optional): Starting character position to constrain search
- `end` (integer, optional): Ending character position to constrain search
**Example**:
```json
{"tool": "str_replace", "args": {
"file_path": "src/main.rs",
"diff": "@@ -10,3 +10,4 @@\n fn main() {\n println!(\"Hello\");\n+ println!(\"World\");\n }"
}}
```
**Notes**:
- Supports multiple hunks
- Context lines help locate the correct position
- Use `start`/`end` to disambiguate when multiple matches exist
- `---/+++` headers are optional for minimal diffs
---
### final_output
Signal task completion with a summary.
**Parameters**:
- `summary` (string, required): Markdown summary of what was accomplished
**Example**:
```json
{"tool": "final_output", "args": {"summary": "## Completed\n\n- Created user authentication module\n- Added unit tests\n- Updated documentation"}}
```
**Notes**:
- Ends the current task
- Summary is displayed to the user
- In autonomous mode, triggers coach review
---
## Image & Screenshot Tools
### take_screenshot
Capture a screenshot of an application window.
**Parameters**:
- `path` (string, required): Filename for the screenshot
- `window_id` (string, required): Application name (e.g., "Safari", "Terminal")
- `region` (object, optional): `{x, y, width, height}` to capture a region
**Example**:
```json
{"tool": "take_screenshot", "args": {"path": "safari.png", "window_id": "Safari"}}
```
**Notes**:
- Use `list_windows` first to identify available windows
- Relative paths save to `~/tmp` or `$TMPDIR`
- Uses native screencapture on macOS
---
### extract_text
Extract text from an image using OCR.
**Parameters**:
- `path` (string, optional): Path to image file
**Example**:
```json
{"tool": "extract_text", "args": {"path": "screenshot.png"}}
```
**Notes**:
- Uses Tesseract OCR or Apple Vision framework
- For window-based OCR, use `vision_find_text` instead
---
## Task Management Tools
### todo_read
Read the current TODO list.
**Parameters**: None
**Example**:
```json
{"tool": "todo_read", "args": {}}
```
**Notes**:
- TODO lists are session-scoped
- Stored in `.g3/sessions/<session_id>/todo.g3.md`
- Call at start of multi-step tasks to check for existing plans
---
### todo_write
Create or update the TODO list.
**Parameters**:
- `content` (string, required): TODO list content in markdown checkbox format
**Example**:
```json
{"tool": "todo_write", "args": {"content": "- [ ] Implement feature\n - [ ] Write tests\n - [ ] Update docs\n- [x] Setup project"}}
```
**Notes**:
- Replaces entire file content
- Always call `todo_read` first to preserve existing content
- Use `- [ ]` for incomplete, `- [x]` for complete
- Supports nested tasks with indentation
---
## Code Intelligence Tools
### code_search
Syntax-aware code search using tree-sitter.
**Parameters**:
- `searches` (array, required): Array of search objects:
- `name` (string): Label for this search
- `query` (string): Tree-sitter query in S-expression format
- `language` (string): Programming language
- `paths` (array, optional): Paths to search
- `context_lines` (integer, optional): Lines of context (0-20)
- `max_concurrency` (integer, optional): Parallel searches (default: 4)
- `max_matches_per_search` (integer, optional): Max matches (default: 500)
**Supported languages**: rust, python, javascript, typescript, go, java, c, cpp, kotlin
**Example**:
```json
{"tool": "code_search", "args": {
"searches": [{
"name": "functions",
"query": "(function_item name: (identifier) @name)",
"language": "rust",
"context_lines": 2
}]
}}
```
See [Code Search Guide](CODE_SEARCH.md) for detailed query patterns.
---
### code_coverage
Generate code coverage report using cargo llvm-cov.
**Parameters**: None
**Example**:
```json
{"tool": "code_coverage", "args": {}}
```
**Notes**:
- Runs all tests with coverage instrumentation
- Auto-installs llvm-tools-preview and cargo-llvm-cov if missing
- Returns coverage statistics summary
---
## WebDriver Tools
Enabled with `--webdriver` (Safari) or `--chrome-headless` (Chrome).
### webdriver_start
Start a browser session.
**Example**:
```json
{"tool": "webdriver_start", "args": {}}
```
### webdriver_navigate
Navigate to a URL.
**Parameters**:
- `url` (string, required): URL with protocol (e.g., `https://`)
### webdriver_get_url / webdriver_get_title
Get current URL or page title.
### webdriver_find_element / webdriver_find_elements
Find element(s) by CSS selector.
**Parameters**:
- `selector` (string, required): CSS selector
### webdriver_click
Click an element.
**Parameters**:
- `selector` (string, required): CSS selector
### webdriver_send_keys
Type text into an input.
**Parameters**:
- `selector` (string, required): CSS selector
- `text` (string, required): Text to type
- `clear_first` (boolean, optional): Clear before typing (default: true)
### webdriver_execute_script
Execute JavaScript.
**Parameters**:
- `script` (string, required): JavaScript code (use `return` to return values)
### webdriver_get_page_source
Get rendered HTML.
**Parameters**:
- `max_length` (integer, optional): Max chars to return (default: 10000, 0 for no limit)
- `save_to_file` (string, optional): Save to file instead of returning inline
### webdriver_screenshot
Take browser screenshot.
**Parameters**:
- `path` (string, required): Save path
### webdriver_back / webdriver_forward / webdriver_refresh
Navigation controls.
### webdriver_quit
Close browser and end session.
---
## Vision Tools (macOS)
Use Apple Vision framework for text recognition.
### vision_find_text
Find text in an application window.
**Parameters**:
- `app_name` (string, required): Application name
- `text` (string, required): Text to search for
**Returns**: Bounding box coordinates and confidence score
### vision_click_text
Find and click on text.
**Parameters**:
- `app_name` (string, required): Application name
- `text` (string, required): Text to click
### vision_click_near_text
Click near a text label (useful for form fields).
**Parameters**:
- `app_name` (string, required): Application name
- `text` (string, required): Label text to find
- `direction` (string, optional): "right", "below", "left", "above" (default: "right")
- `distance` (integer, optional): Pixels from text (default: 50)
---
## macOS Accessibility Tools
Enabled with `--macax`. See [macOS Accessibility Tools Guide](macax-tools.md).
### macax_list_apps
List running applications.
### macax_get_frontmost_app
Get the frontmost application.
### macax_activate_app
Bring an application to front.
**Parameters**:
- `app_name` (string, required): Application name
### macax_get_ui_tree
Get UI element hierarchy.
**Parameters**:
- `app_name` (string, required): Application name
- `max_depth` (integer, optional): Tree depth limit
### macax_find_elements
Find UI elements by criteria.
**Parameters**:
- `app_name` (string, required): Application name
- `role` (string, optional): Element role (button, textField, etc.)
- `title` (string, optional): Element title
- `identifier` (string, optional): Accessibility identifier
### macax_click
Click a UI element.
**Parameters**:
- `app_name` (string, required): Application name
- `identifier` or `title` or `role`: Element selector
### macax_set_value / macax_get_value
Set or get element value.
### macax_press_key
Simulate key press.
**Parameters**:
- `key` (string, required): Key to press
- `modifiers` (array, optional): ["command", "shift", "option", "control"]
---
## Computer Control Tools
Enabled with `computer_control.enabled = true` in config.
### mouse_click
Click at coordinates.
**Parameters**:
- `x` (integer, required): X coordinate
- `y` (integer, required): Y coordinate
- `button` (string, optional): "left", "right", "middle"
### type_text
Type text at cursor.
**Parameters**:
- `text` (string, required): Text to type
### find_element
Find UI element by text, role, or attributes.
### list_windows
List all open windows with IDs and titles.
---
## Tool Execution Notes
### Duplicate Detection
G3 prevents accidental duplicate tool calls:
- Only immediately sequential identical calls are blocked
- Text between tool calls resets detection
- Tools can be reused throughout a session
### Error Handling
Tool errors are reported back to the agent, which can:
- Retry with different parameters
- Try an alternative approach
- Report the issue to the user
### Working Directory
Tools execute in:
1. Directory specified by `--codebase-fast-start` if provided
2. Current working directory otherwise
### File Paths
- Tilde expansion (`~`) is supported
- Relative paths are relative to working directory
- Screenshots default to `~/tmp` or `$TMPDIR`