lamport run

2026-01-03 16:48:30 +11:00
parent f4a1bf5e93
commit f7e2f38fe9
10 changed files with 3444 additions and 0 deletions
--- a/docs/CODE_SEARCH.md
+++ b/docs/CODE_SEARCH.md
@@ -0,0 +1,430 @@
+# G3 Code Search Guide
+
+**Last updated**: January 2025  
+**Source of truth**: `crates/g3-core/src/code_search/`, `crates/g3-core/src/tool_definitions.rs`
+
+## Purpose
+
+G3 includes a syntax-aware code search tool powered by tree-sitter. Unlike text-based search (grep), it understands code structure and finds actual functions, classes, methods, and other constructs—ignoring matches in comments and strings.
+
+## Why Use Code Search?
+
+| Feature | grep/ripgrep | code_search |
+|---------|--------------|-------------|
+| Finds text in comments | ✅ | ❌ |
+| Finds text in strings | ✅ | ❌ |
+| Understands code structure | ❌ | ✅ |
+| Finds function definitions | Regex needed | Native |
+| Finds class hierarchies | ❌ | ✅ |
+| Language-aware | ❌ | ✅ |
+
+**Use code_search when**:
+- Finding function/method definitions
+- Finding class/struct declarations
+- Searching for specific code constructs
+- Need accurate results without false positives
+
+**Use grep when**:
+- Searching non-code files (logs, markdown)
+- Simple string searches
+- Searching comments or documentation
+- Regex for text patterns
+
+## Supported Languages
+
+- Rust
+- Python
+- JavaScript
+- TypeScript
+- Go
+- Java
+- C
+- C++
+- Kotlin
+
+## Basic Usage
+
+```json
+{"tool": "code_search", "args": {
+  "searches": [{
+    "name": "my_search",
+    "query": "(function_item name: (identifier) @name)",
+    "language": "rust"
+  }]
+}}
+```
+
+### Parameters
+
+| Parameter | Type | Required | Description |
+|-----------|------|----------|-------------|
+| `searches` | array | Yes | Array of search objects (max 20) |
+| `max_concurrency` | integer | No | Parallel searches (default: 4) |
+| `max_matches_per_search` | integer | No | Max matches (default: 500) |
+
+### Search Object
+
+| Field | Type | Required | Description |
+|-------|------|----------|-------------|
+| `name` | string | Yes | Label for this search |
+| `query` | string | Yes | Tree-sitter query (S-expression) |
+| `language` | string | Yes | Programming language |
+| `paths` | array | No | Paths to search (default: current dir) |
+| `context_lines` | integer | No | Lines of context (0-20, default: 0) |
+
+## Query Syntax
+
+Tree-sitter queries use S-expression syntax. The basic pattern is:
+
+```
+(node_type field: (child_type) @capture_name)
+```
+
+- `node_type`: The AST node to match
+- `field`: Optional field name
+- `child_type`: Type of child node
+- `@capture_name`: Name for the captured node
+
+## Common Query Patterns
+
+### Rust
+
+```lisp
+;; All functions
+(function_item name: (identifier) @name)
+
+;; Async functions
+(function_item (function_modifiers) name: (identifier) @name)
+
+;; Structs
+(struct_item name: (type_identifier) @name)
+
+;; Enums
+(enum_item name: (type_identifier) @name)
+
+;; Impl blocks
+(impl_item type: (type_identifier) @name)
+
+;; Trait definitions
+(trait_item name: (type_identifier) @name)
+
+;; Macros
+(macro_definition name: (identifier) @name)
+
+;; Constants
+(const_item name: (identifier) @name)
+
+;; Static variables
+(static_item name: (identifier) @name)
+
+;; Type aliases
+(type_item name: (type_identifier) @name)
+
+;; Modules
+(mod_item name: (identifier) @name)
+```
+
+### Python
+
+```lisp
+;; Functions
+(function_definition name: (identifier) @name)
+
+;; Async functions
+(function_definition name: (identifier) @name) @fn
+
+;; Classes
+(class_definition name: (identifier) @name)
+
+;; Methods (functions inside classes)
+(class_definition
+  body: (block
+    (function_definition name: (identifier) @name)))
+
+;; Decorators
+(decorator) @decorator
+
+;; Imports
+(import_statement) @import
+(import_from_statement) @import
+```
+
+### JavaScript / TypeScript
+
+```lisp
+;; Function declarations
+(function_declaration name: (identifier) @name)
+
+;; Arrow functions assigned to variables
+(variable_declarator
+  name: (identifier) @name
+  value: (arrow_function))
+
+;; Classes
+(class_declaration name: (identifier) @name)
+
+;; Methods
+(method_definition name: (property_identifier) @name)
+
+;; Exports
+(export_statement) @export
+
+;; Imports
+(import_statement) @import
+```
+
+### Go
+
+```lisp
+;; Functions
+(function_declaration name: (identifier) @name)
+
+;; Methods
+(method_declaration name: (field_identifier) @name)
+
+;; Structs
+(type_declaration
+  (type_spec name: (type_identifier) @name
+    type: (struct_type)))
+
+;; Interfaces
+(type_declaration
+  (type_spec name: (type_identifier) @name
+    type: (interface_type)))
+```
+
+### Java
+
+```lisp
+;; Classes
+(class_declaration name: (identifier) @name)
+
+;; Interfaces
+(interface_declaration name: (identifier) @name)
+
+;; Methods
+(method_declaration name: (identifier) @name)
+
+;; Constructors
+(constructor_declaration name: (identifier) @name)
+
+;; Fields
+(field_declaration
+  declarator: (variable_declarator name: (identifier) @name))
+```
+
+### C / C++
+
+```lisp
+;; Functions
+(function_definition
+  declarator: (function_declarator
+    declarator: (identifier) @name))
+
+;; Structs (C)
+(struct_specifier name: (type_identifier) @name)
+
+;; Classes (C++)
+(class_specifier name: (type_identifier) @name)
+
+;; Namespaces (C++)
+(namespace_definition name: (identifier) @name)
+```
+
+## Advanced Queries
+
+### Wildcards
+
+Use `_` to match any node:
+
+```lisp
+;; Any function with any name
+(function_item name: (_) @name)
+```
+
+### Alternatives
+
+Match multiple patterns:
+
+```lisp
+;; Functions or methods
+[(function_item) (impl_item)] @item
+```
+
+### Predicates
+
+Filter matches:
+
+```lisp
+;; Functions starting with "test_"
+(function_item name: (identifier) @name
+  (#match? @name "^test_"))
+
+;; Functions NOT starting with "_"
+(function_item name: (identifier) @name
+  (#not-match? @name "^_"))
+```
+
+### Nested Matches
+
+```lisp
+;; Methods inside impl blocks
+(impl_item
+  body: (declaration_list
+    (function_item name: (identifier) @method_name)))
+```
+
+## Batch Searches
+
+Run multiple searches in parallel:
+
+```json
+{"tool": "code_search", "args": {
+  "searches": [
+    {
+      "name": "functions",
+      "query": "(function_item name: (identifier) @name)",
+      "language": "rust"
+    },
+    {
+      "name": "structs",
+      "query": "(struct_item name: (type_identifier) @name)",
+      "language": "rust"
+    },
+    {
+      "name": "tests",
+      "query": "(function_item name: (identifier) @name (#match? @name \"^test_\"))",
+      "language": "rust",
+      "paths": ["tests/"]
+    }
+  ],
+  "max_concurrency": 4
+}}
+```
+
+## Context Lines
+
+Include surrounding code:
+
+```json
+{"tool": "code_search", "args": {
+  "searches": [{
+    "name": "functions",
+    "query": "(function_item name: (identifier) @name)",
+    "language": "rust",
+    "context_lines": 3
+  }]
+}}
+```
+
+This shows 3 lines before and after each match.
+
+## Path Filtering
+
+Search specific directories:
+
+```json
+{"tool": "code_search", "args": {
+  "searches": [{
+    "name": "core_functions",
+    "query": "(function_item name: (identifier) @name)",
+    "language": "rust",
+    "paths": ["src/core", "src/lib.rs"]
+  }]
+}}
+```
+
+## Output Format
+
+Results include:
+- File path
+- Line number
+- Matched code
+- Context (if requested)
+
+```
+=== functions (15 matches) ===
+
+src/lib.rs:42
+  fn process_request(req: Request) -> Response {
+
+src/lib.rs:78
+  fn handle_error(err: Error) -> Result<()> {
+
+src/utils.rs:15
+  fn format_output(data: &str) -> String {
+```
+
+## Tips
+
+### Finding the Right Query
+
+1. **Start simple**: Begin with basic node types
+2. **Use AST explorer**: Understand your language's AST
+3. **Iterate**: Refine queries based on results
+
+### Performance
+
+- **Limit paths**: Search specific directories when possible
+- **Use concurrency**: Batch related searches
+- **Set max_matches**: Prevent overwhelming output
+
+### Debugging Queries
+
+If a query returns no results:
+1. Check language spelling (lowercase)
+2. Verify node type names for your language
+3. Start with simpler query, add constraints
+4. Check if files exist in search paths
+
+## Examples by Task
+
+### Find all public functions in Rust
+
+```json
+{"tool": "code_search", "args": {
+  "searches": [{
+    "name": "public_fns",
+    "query": "(function_item (visibility_modifier) name: (identifier) @name)",
+    "language": "rust"
+  }]
+}}
+```
+
+### Find all test functions
+
+```json
+{"tool": "code_search", "args": {
+  "searches": [{
+    "name": "tests",
+    "query": "(function_item name: (identifier) @name (#match? @name \"^test_\"))",
+    "language": "rust",
+    "paths": ["tests/"]
+  }]
+}}
+```
+
+### Find all API endpoints (Python Flask)
+
+```json
+{"tool": "code_search", "args": {
+  "searches": [{
+    "name": "routes",
+    "query": "(decorated_definition (decorator) @dec (function_definition name: (identifier) @name))",
+    "language": "python"
+  }]
+}}
+```
+
+### Find all React components
+
+```json
+{"tool": "code_search", "args": {
+  "searches": [{
+    "name": "components",
+    "query": "(function_declaration name: (identifier) @name (#match? @name \"^[A-Z]\"))",
+    "language": "javascript",
+    "paths": ["src/components/"]
+  }]
+}}
+```
--- a/docs/CONTROL_COMMANDS.md
+++ b/docs/CONTROL_COMMANDS.md
@@ -0,0 +1,224 @@
+# G3 Control Commands
+
+**Last updated**: January 2025  
+**Source of truth**: `crates/g3-cli/src/lib.rs`
+
+## Purpose
+
+Control commands are special commands you can use during an interactive G3 session to manage context, refresh documentation, and view statistics. They start with `/` and are processed by the CLI, not sent to the LLM.
+
+## Available Commands
+
+| Command | Description |
+|---------|-------------|
+| `/compact` | Manually trigger conversation summarization |
+| `/thinnify` | Replace large tool results with file references (first third) |
+| `/skinnify` | Full context thinning (entire context window) |
+| `/readme` | Reload README.md and AGENTS.md from disk |
+| `/stats` | Show detailed context and performance statistics |
+| `/help` | Display all available control commands |
+
+---
+
+## /compact
+
+Manually trigger conversation summarization to reduce context size.
+
+**When to use**:
+- Context usage is getting high (70%+)
+- You want to start a new phase of work
+- Conversation has accumulated irrelevant history
+
+**What it does**:
+1. Sends conversation history to LLM for summarization
+2. Replaces detailed history with concise summary
+3. Preserves key decisions and context
+4. Significantly reduces token usage
+
+**Example**:
+```
+g3> /compact
+📝 Compacting conversation history...
+✅ Reduced context from 45,000 to 8,000 tokens (82% reduction)
+```
+
+**Notes**:
+- Summarization uses tokens, so there's a small cost
+- Some detail is lost; use before major context shifts
+- Auto-triggered at 80% context usage if `auto_compact = true`
+
+---
+
+## /thinnify
+
+Replace large tool results with file references to save context space.
+
+**When to use**:
+- Large file contents are consuming context
+- Tool outputs are taking up space
+- You want to preserve conversation structure but reduce size
+
+**What it does**:
+1. Scans the first third of context for large tool results
+2. Saves content to `.g3/sessions/<session>/thinned/`
+3. Replaces inline content with file reference
+4. Preserves the ability to re-read if needed
+
+**Example**:
+```
+g3> /thinnify
+🔧 Thinning context window...
+✅ Thinned 3 large tool results, saved 12,000 characters
+```
+
+**Notes**:
+- Only processes the first third of context (older content)
+- Recent tool results are preserved inline
+- Auto-triggered at 50%, 60%, 70%, 80% thresholds
+
+---
+
+## /skinnify
+
+Full context thinning - processes the entire context window.
+
+**When to use**:
+- Context is critically full
+- `/thinnify` wasn't enough
+- You need maximum space recovery
+
+**What it does**:
+- Same as `/thinnify` but processes entire context
+- More aggressive space recovery
+- May thin recent tool results too
+
+**Example**:
+```
+g3> /skinnify
+🔧 Full context thinning...
+✅ Thinned 8 tool results, saved 35,000 characters
+```
+
+**Notes**:
+- Use sparingly; may thin content you still need inline
+- Consider `/compact` first for better context preservation
+
+---
+
+## /readme
+
+Reload README.md and AGENTS.md from disk without restarting.
+
+**When to use**:
+- You've updated project documentation
+- AGENTS.md has new instructions
+- README.md has changed
+
+**What it does**:
+1. Re-reads README.md from workspace root
+2. Re-reads AGENTS.md from workspace root
+3. Updates the agent's system context
+4. New instructions take effect immediately
+
+**Example**:
+```
+g3> /readme
+📖 Reloading documentation...
+✅ Loaded README.md (5,234 chars)
+✅ Loaded AGENTS.md (2,100 chars)
+```
+
+**Notes**:
+- Useful during iterative documentation updates
+- Changes apply to subsequent messages
+- Previous context retains old documentation
+
+---
+
+## /stats
+
+Show detailed context and performance statistics.
+
+**What it shows**:
+- Current context usage (tokens and percentage)
+- Session duration
+- Token usage breakdown
+- Tool call metrics
+- Thinning and summarization events
+- First-token latency statistics
+
+**Example**:
+```
+g3> /stats
+📊 Session Statistics
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+Context Usage:     45,230 / 200,000 tokens (22.6%)
+Session Duration:  1h 23m 45s
+Total Tokens Used: 125,430
+Tool Calls:        47 (45 successful, 2 failed)
+Thinning Events:   3 (saved 28,000 chars)
+Summarizations:    1 (saved 35,000 chars)
+Avg First Token:   1.2s
+```
+
+---
+
+## /help
+
+Display all available control commands with brief descriptions.
+
+**Example**:
+```
+g3> /help
+📚 Available Commands:
+  /compact   - Summarize conversation to reduce context
+  /thinnify  - Replace large tool results with file refs
+  /skinnify  - Full context thinning (entire window)
+  /readme    - Reload README.md and AGENTS.md
+  /stats     - Show context and performance statistics
+  /help      - Show this help message
+```
+
+---
+
+## Context Management Strategy
+
+G3 automatically manages context, but manual intervention can help:
+
+### Proactive Management
+
+1. **Check stats regularly**: Use `/stats` to monitor usage
+2. **Thin early**: Use `/thinnify` before hitting thresholds
+3. **Compact at transitions**: Use `/compact` when switching tasks
+
+### Reactive Management
+
+When context gets high:
+
+1. **50-70%**: Consider `/thinnify`
+2. **70-80%**: Use `/compact`
+3. **80-90%**: Use `/skinnify` then `/compact`
+4. **90%+**: Auto-summarization triggers
+
+### Best Practices
+
+- **Long sessions**: Compact periodically to maintain quality
+- **Large files**: Thin after reading large codebases
+- **Documentation updates**: Use `/readme` instead of restarting
+- **Before complex tasks**: Ensure adequate context space
+
+---
+
+## Automatic Context Management
+
+G3 performs automatic context management:
+
+| Threshold | Action |
+|-----------|--------|
+| 50% | Thin oldest third of context |
+| 60% | Thin oldest third of context |
+| 70% | Thin oldest third of context |
+| 80% | Auto-summarization (if `auto_compact = true`) |
+| 90% | Aggressive thinning before tool calls |
+
+Manual commands give you finer control over when and how this happens.
--- a/docs/FLOCK_MODE.md
+++ b/docs/FLOCK_MODE.md
@@ -0,0 +1,397 @@
+# G3 Flock Mode Guide
+
+**Last updated**: January 2025  
+**Source of truth**: `crates/g3-ensembles/src/flock.rs`
+
+## Purpose
+
+Flock mode enables parallel multi-agent development by spawning multiple G3 agent instances that work on different parts of a project simultaneously. This is useful for large projects with modular architectures where independent components can be developed in parallel.
+
+## Overview
+
+In Flock mode:
+- Multiple agent instances run concurrently
+- Each agent works on a specific module or component
+- Agents operate independently but share the same codebase
+- Progress is tracked and coordinated centrally
+
+```
+┌─────────────────────────────────────────────────────────┐
+│                    Flock Coordinator                     │
+│                                                         │
+│  ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────┐   │
+│  │ Agent 1 │  │ Agent 2 │  │ Agent 3 │  │ Agent N │   │
+│  │ Module A│  │ Module B│  │ Module C│  │ Module N│   │
+│  └─────────┘  └─────────┘  └─────────┘  └─────────┘   │
+│       │            │            │            │         │
+│       ▼            ▼            ▼            ▼         │
+│  ┌─────────────────────────────────────────────────┐   │
+│  │              Shared Codebase                     │   │
+│  └─────────────────────────────────────────────────┘   │
+└─────────────────────────────────────────────────────────┘
+```
+
+## When to Use Flock Mode
+
+**Good candidates**:
+- Microservices architectures
+- Projects with independent modules
+- Large refactoring across multiple files
+- Parallel feature development
+- Test suite expansion
+
+**Not recommended for**:
+- Tightly coupled code
+- Sequential dependencies
+- Small projects
+- Single-file changes
+
+## Configuration
+
+Flock mode is configured through a YAML manifest file:
+
+```yaml
+# flock.yaml
+name: "my-project-flock"
+description: "Parallel development of project modules"
+
+# Global settings
+settings:
+  max_agents: 4
+  timeout_minutes: 60
+  provider: "anthropic.default"
+
+# Agent definitions
+agents:
+  - name: "api-agent"
+    description: "Develops the REST API layer"
+    working_dir: "src/api"
+    requirements: |
+      Implement REST endpoints for user management:
+      - GET /users
+      - POST /users
+      - GET /users/{id}
+      - PUT /users/{id}
+      - DELETE /users/{id}
+
+  - name: "db-agent"
+    description: "Develops the database layer"
+    working_dir: "src/db"
+    requirements: |
+      Implement database models and queries:
+      - User model with CRUD operations
+      - Connection pooling
+      - Migration support
+
+  - name: "test-agent"
+    description: "Writes integration tests"
+    working_dir: "tests"
+    requirements: |
+      Write integration tests for:
+      - API endpoints
+      - Database operations
+      - Error handling
+```
+
+## Usage
+
+### Starting a Flock
+
+```bash
+# Start flock with manifest
+g3 --flock flock.yaml
+
+# Start with specific agents only
+g3 --flock flock.yaml --agents api-agent,db-agent
+
+# Start with custom timeout
+g3 --flock flock.yaml --timeout 120
+```
+
+### Monitoring Progress
+
+Flock mode provides real-time status updates:
+
+```
+🐦 Flock Status: my-project-flock
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+
+  api-agent     [████████░░] 80%  Implementing DELETE endpoint
+  db-agent      [██████████] 100% ✅ Complete
+  test-agent    [██████░░░░] 60%  Writing error handling tests
+
+Elapsed: 15m 32s | Tokens: 45,230 | Errors: 0
+```
+
+### Stopping a Flock
+
+```bash
+# Graceful stop (wait for current tasks)
+Ctrl+C
+
+# Force stop all agents
+Ctrl+C Ctrl+C
+```
+
+## Agent Communication
+
+Agents in a flock operate independently but can:
+
+1. **Read shared files**: All agents can read the entire codebase
+2. **Write to their area**: Each agent writes to its designated working directory
+3. **Signal completion**: Agents report when their tasks are done
+4. **Report errors**: Failures are logged and can trigger coordinator action
+
+### Conflict Prevention
+
+To prevent conflicts:
+- Assign non-overlapping working directories
+- Use clear module boundaries
+- Define explicit interfaces between modules
+- Run integration after all agents complete
+
+## Status Tracking
+
+Flock status is tracked in `.g3/flock/`:
+
+```
+.g3/flock/
+├── status.json           # Overall flock status
+├── api-agent/
+│   ├── session.json      # Agent session log
+│   └── todo.g3.md        # Agent's TODO list
+├── db-agent/
+│   ├── session.json
+│   └── todo.g3.md
+└── test-agent/
+    ├── session.json
+    └── todo.g3.md
+```
+
+### Status File Format
+
+```json
+{
+  "flock_name": "my-project-flock",
+  "started_at": "2025-01-03T10:00:00Z",
+  "status": "running",
+  "agents": [
+    {
+      "name": "api-agent",
+      "status": "running",
+      "progress": 80,
+      "current_task": "Implementing DELETE endpoint",
+      "tokens_used": 15000,
+      "errors": 0
+    }
+  ]
+}
+```
+
+## Best Practices
+
+### 1. Define Clear Boundaries
+
+```yaml
+# Good: Clear module separation
+agents:
+  - name: "frontend"
+    working_dir: "src/frontend"
+  - name: "backend"
+    working_dir: "src/backend"
+
+# Bad: Overlapping directories
+agents:
+  - name: "agent1"
+    working_dir: "src"
+  - name: "agent2"
+    working_dir: "src/utils"  # Overlaps with agent1!
+```
+
+### 2. Specify Interfaces First
+
+Define shared interfaces before parallel development:
+
+```yaml
+agents:
+  - name: "interface-agent"
+    priority: 1  # Runs first
+    requirements: |
+      Define shared interfaces in src/interfaces/:
+      - UserService trait
+      - DatabaseConnection trait
+      - Error types
+
+  - name: "impl-agent"
+    priority: 2  # Runs after interfaces
+    depends_on: ["interface-agent"]
+    requirements: |
+      Implement UserService trait...
+```
+
+### 3. Use Appropriate Granularity
+
+- **Too few agents**: Doesn't leverage parallelism
+- **Too many agents**: Coordination overhead, potential conflicts
+- **Sweet spot**: 2-6 agents for most projects
+
+### 4. Include a Test Agent
+
+Always include an agent for testing:
+
+```yaml
+agents:
+  - name: "test-agent"
+    working_dir: "tests"
+    requirements: |
+      Write tests for all new functionality.
+      Run tests after other agents complete.
+```
+
+### 5. Plan for Integration
+
+After flock completion:
+
+```bash
+# Run all tests
+cargo test
+
+# Check for conflicts
+git status
+
+# Review changes
+git diff
+```
+
+## Error Handling
+
+### Agent Failures
+
+If an agent fails:
+1. Error is logged to agent's session
+2. Coordinator is notified
+3. Other agents continue (by default)
+4. Failed agent can be restarted
+
+### Restart Failed Agent
+
+```bash
+# Restart specific agent
+g3 --flock flock.yaml --restart api-agent
+
+# Restart all failed agents
+g3 --flock flock.yaml --restart-failed
+```
+
+### Conflict Resolution
+
+If agents modify the same file:
+1. Last write wins (by default)
+2. Conflicts are logged
+3. Manual resolution may be needed
+
+## Resource Management
+
+### Token Usage
+
+Each agent has its own token budget:
+
+```yaml
+settings:
+  max_tokens_per_agent: 100000
+  total_token_budget: 500000
+```
+
+### Concurrency
+
+Limit concurrent agents based on:
+- API rate limits
+- System resources
+- Provider capacity
+
+```yaml
+settings:
+  max_concurrent_agents: 3  # Run at most 3 at once
+```
+
+## Example: Microservices Project
+
+```yaml
+name: "microservices-flock"
+
+settings:
+  max_agents: 5
+  provider: "anthropic.default"
+
+agents:
+  - name: "user-service"
+    working_dir: "services/user"
+    requirements: |
+      Implement user service:
+      - User registration
+      - Authentication
+      - Profile management
+
+  - name: "order-service"
+    working_dir: "services/order"
+    requirements: |
+      Implement order service:
+      - Order creation
+      - Order status tracking
+      - Payment integration
+
+  - name: "inventory-service"
+    working_dir: "services/inventory"
+    requirements: |
+      Implement inventory service:
+      - Stock management
+      - Availability checking
+      - Reorder alerts
+
+  - name: "gateway"
+    working_dir: "services/gateway"
+    requirements: |
+      Implement API gateway:
+      - Request routing
+      - Authentication middleware
+      - Rate limiting
+
+  - name: "integration-tests"
+    working_dir: "tests/integration"
+    depends_on: ["user-service", "order-service", "inventory-service", "gateway"]
+    requirements: |
+      Write integration tests for:
+      - End-to-end order flow
+      - Service communication
+      - Error scenarios
+```
+
+## Limitations
+
+- **No real-time coordination**: Agents don't communicate during execution
+- **File conflicts**: Possible if boundaries aren't clear
+- **Resource intensive**: Multiple LLM calls in parallel
+- **Debugging complexity**: Multiple logs to review
+
+## Troubleshooting
+
+### Agents Not Starting
+
+1. Check manifest syntax (YAML)
+2. Verify working directories exist
+3. Check provider configuration
+4. Review logs in `.g3/flock/`
+
+### Slow Progress
+
+1. Reduce number of concurrent agents
+2. Check for rate limiting
+3. Simplify requirements
+4. Use faster provider
+
+### Inconsistent Results
+
+1. Define clearer interfaces
+2. Add more specific requirements
+3. Use lower temperature
+4. Add validation steps
--- a/docs/architecture.md
+++ b/docs/architecture.md
@@ -0,0 +1,363 @@
+# G3 Architecture
+
+**Last updated**: January 2025  
+**Source of truth**: Crate structure in `crates/`, `Cargo.toml`, `DESIGN.md`
+
+## Purpose
+
+This document describes the internal architecture of G3, a modular AI coding agent built in Rust. It is intended for developers who want to understand, extend, or maintain the codebase.
+
+## High-Level Overview
+
+G3 follows a **tool-first philosophy**: instead of just providing advice, it actively uses tools to read files, write code, execute commands, and complete tasks autonomously.
+
+```
+┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
+│   g3-cli        │    │   g3-core       │    │ g3-providers    │
+│                 │    │                 │    │                 │
+│ • CLI parsing   │◄──►│ • Agent engine  │◄──►│ • Anthropic     │
+│ • Interactive   │    │ • Context mgmt  │    │ • Databricks    │
+│ • Retro TUI     │    │ • Tool system   │    │ • OpenAI        │
+│ • Autonomous    │    │ • Streaming     │    │ • Embedded      │
+│   mode          │    │ • Task exec     │    │   (llama.cpp)   │
+│                 │    │ • TODO mgmt     │    │ • OAuth flow    │
+└─────────────────┘    └─────────────────┘    └─────────────────┘
+         │                       │                       │
+         └───────────────────────┼───────────────────────┘
+                                 │
+         ┌───────────────────────┼───────────────────────┐
+         │                       │                       │
+┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
+│ g3-execution    │    │   g3-config     │    │  g3-planner     │
+│                 │    │                 │    │                 │
+│ • Code exec     │    │ • TOML config   │    │ • Requirements  │
+│ • Shell cmds    │    │ • Env overrides │    │ • Git ops       │
+│ • Streaming     │    │ • Provider      │    │ • Planning      │
+│ • Error hdlg    │    │   settings      │    │   workflow      │
+└─────────────────┘    └─────────────────┘    └─────────────────┘
+         │                       │                       │
+         │              ┌─────────────────┐              │
+         │              │ g3-computer-    │              │
+         └─────────────►│   control       │◄─────────────┘
+                        │ • Mouse/kbd     │
+                        │ • Screenshots   │
+                        │ • OCR/Vision    │
+                        │ • WebDriver     │
+                        │ • macOS Ax API  │
+                        └─────────────────┘
+                                 │
+         ┌───────────────────────┼───────────────────────┐
+         │                       │                       │
+┌─────────────────┐    ┌─────────────────┐
+│ g3-ensembles    │    │   g3-console    │
+│                 │    │                 │
+│ • Flock mode    │    │ • Web console   │
+│ • Multi-agent   │    │ • Process mgmt  │
+│ • Parallel dev  │    │ • Log viewing   │
+└─────────────────┘    └─────────────────┘
+```
+
+## Workspace Structure
+
+G3 is organized as a Rust workspace with 9 crates:
+
+```
+g3/
+├── src/main.rs                   # Entry point (delegates to g3-cli)
+├── crates/
+│   ├── g3-cli/                   # Command-line interface and TUI
+│   ├── g3-core/                  # Core agent engine and tools
+│   ├── g3-providers/             # LLM provider abstractions
+│   ├── g3-config/                # Configuration management
+│   ├── g3-execution/             # Code execution engine
+│   ├── g3-computer-control/      # Computer automation
+│   ├── g3-planner/               # Planning mode workflow
+│   ├── g3-ensembles/             # Multi-agent (flock) mode
+│   └── g3-console/               # Web monitoring console
+├── agents/                       # Agent persona definitions
+├── logs/                         # Session logs (auto-created)
+└── g3-plan/                      # Planning artifacts
+```
+
+## Crate Responsibilities
+
+### g3-core (Central Hub)
+
+**Location**: `crates/g3-core/`  
+**Purpose**: Core agent engine, tool system, and orchestration logic
+
+Key modules:
+- `lib.rs` - Main `Agent` struct and orchestration (~3400 lines)
+- `context_window.rs` - Token tracking and context management
+- `streaming_parser.rs` - Real-time LLM response parsing
+- `tool_definitions.rs` - JSON schema definitions for all tools
+- `tool_dispatch.rs` - Routes tool calls to implementations
+- `tools/` - Tool implementations (file ops, shell, vision, webdriver, etc.)
+- `error_handling.rs` - Error classification and recovery
+- `retry.rs` - Retry logic with exponential backoff
+- `prompts.rs` - System prompt generation
+- `code_search/` - Tree-sitter based code search
+
+**Key types**:
+- `Agent<W: UiWriter>` - Main agent struct, generic over UI output
+- `ContextWindow` - Manages conversation history and token limits
+- `StreamingToolParser` - Parses streaming LLM responses for tool calls
+- `ToolCall` - Represents a tool invocation
+
+### g3-providers (LLM Abstraction)
+
+**Location**: `crates/g3-providers/`  
+**Purpose**: Unified interface for multiple LLM backends
+
+Key modules:
+- `lib.rs` - `LLMProvider` trait and `ProviderRegistry`
+- `anthropic.rs` - Anthropic Claude API (~51k chars)
+- `databricks.rs` - Databricks Foundation Models (~58k chars)
+- `openai.rs` - OpenAI and compatible APIs (~18k chars)
+- `embedded.rs` - Local models via llama.cpp (~34k chars)
+- `oauth.rs` - OAuth authentication flow
+
+**Key traits**:
+```rust
+#[async_trait]
+pub trait LLMProvider: Send + Sync {
+    async fn complete(&self, request: CompletionRequest) -> Result<CompletionResponse>;
+    async fn stream(&self, request: CompletionRequest) -> Result<CompletionStream>;
+    fn name(&self) -> &str;
+    fn model(&self) -> &str;
+    fn has_native_tool_calling(&self) -> bool;
+    fn supports_cache_control(&self) -> bool;
+    fn max_tokens(&self) -> u32;
+    fn temperature(&self) -> f32;
+}
+```
+
+### g3-cli (User Interface)
+
+**Location**: `crates/g3-cli/`  
+**Purpose**: Command-line interface, TUI, and execution modes
+
+Key modules:
+- `lib.rs` - Main CLI logic and execution modes (~112k chars)
+- `retro_tui.rs` - Full-screen retro terminal UI (~63k chars)
+- `filter_json.rs` - JSON tool call filtering for display
+- `ui_writer_impl.rs` - Console output implementation
+- `theme.rs` - Color themes for retro mode
+
+**Execution modes**:
+1. **Single-shot**: `g3 "task description"` - Execute one task and exit
+2. **Interactive**: `g3` - REPL-style conversation (default)
+3. **Autonomous**: `g3 --autonomous` - Coach-player feedback loop
+4. **Accumulative**: Default interactive mode with autonomous runs
+5. **Planning**: `g3 --planning` - Requirements-driven development
+6. **Retro TUI**: `g3 --retro` - Full-screen terminal interface
+
+### g3-config (Configuration)
+
+**Location**: `crates/g3-config/`  
+**Purpose**: TOML-based configuration management
+
+Key structures:
+- `Config` - Root configuration
+- `ProvidersConfig` - Provider settings with named configs
+- `AgentConfig` - Agent behavior settings
+- `WebDriverConfig` - Browser automation settings
+- `MacAxConfig` - macOS Accessibility API settings
+
+**Configuration hierarchy** (highest priority last):
+1. Default configuration
+2. `~/.config/g3/config.toml`
+3. `./g3.toml`
+4. Environment variables (`G3_*`)
+5. CLI arguments
+
+### g3-execution (Code Execution)
+
+**Location**: `crates/g3-execution/`  
+**Purpose**: Safe execution of shell commands and scripts
+
+Features:
+- Streaming output capture
+- Exit code tracking
+- Async execution via Tokio
+- Error handling and formatting
+
+### g3-computer-control (Automation)
+
+**Location**: `crates/g3-computer-control/`  
+**Purpose**: Cross-platform computer control and automation
+
+Key modules:
+- `platform/` - Platform-specific implementations (macOS, Linux, Windows)
+- `webdriver/` - Safari and Chrome WebDriver integration
+- `ocr/` - Text extraction (Tesseract, Apple Vision)
+- `macax/` - macOS Accessibility API controller
+
+**Platform support**:
+- **macOS**: Core Graphics, Cocoa, screencapture, Vision framework
+- **Linux**: X11/Xtest for input
+- **Windows**: Win32 APIs
+
+### g3-planner (Planning Mode)
+
+**Location**: `crates/g3-planner/`  
+**Purpose**: Requirements-driven development workflow
+
+Key modules:
+- `planner.rs` - Main planning state machine (~40k chars)
+- `state.rs` - Planning state management
+- `git.rs` - Git operations
+- `code_explore.rs` - Codebase exploration
+- `llm.rs` - LLM interactions for planning
+- `history.rs` - Planning history tracking
+
+**Workflow**:
+1. Write requirements in `<codepath>/g3-plan/new_requirements.md`
+2. LLM refines requirements
+3. Requirements renamed to `current_requirements.md`
+4. Coach/player loop implements
+5. Files archived with timestamps
+6. Git commit with LLM-generated message
+
+### g3-ensembles (Multi-Agent)
+
+**Location**: `crates/g3-ensembles/`  
+**Purpose**: Parallel multi-agent development (Flock mode)
+
+Key modules:
+- `flock.rs` - Flock orchestration (~43k chars)
+- `status.rs` - Agent status tracking
+
+Flock mode enables parallel development by spawning multiple agent instances working on different parts of a project.
+
+### g3-console (Web Console)
+
+**Location**: `crates/g3-console/`  
+**Purpose**: Web-based monitoring and control
+
+Key modules:
+- `main.rs` - Axum web server
+- `api/` - REST API endpoints
+- `process/` - Process detection and control
+- `logs.rs` - Log parsing and streaming
+
+## Data Flow
+
+### Request Flow
+
+```
+User Input
+    │
+    ▼
+┌─────────────┐
+│  g3-cli     │  Parse input, determine mode
+└─────────────┘
+    │
+    ▼
+┌─────────────┐
+│  g3-core    │  Add to context window
+│  Agent      │  Build completion request
+└─────────────┘
+    │
+    ▼
+┌─────────────┐
+│ g3-providers│  Send to LLM provider
+│ Registry    │  Stream response
+└─────────────┘
+    │
+    ▼
+┌─────────────┐
+│  g3-core    │  Parse streaming response
+│  Parser     │  Detect tool calls
+└─────────────┘
+    │
+    ▼
+┌─────────────┐
+│  g3-core    │  Execute tools
+│  Tools      │  Return results
+└─────────────┘
+    │
+    ▼
+┌─────────────┐
+│  g3-core    │  Add results to context
+│  Agent      │  Continue or complete
+└─────────────┘
+```
+
+### Context Window Management
+
+The `ContextWindow` struct manages conversation history with intelligent token tracking:
+
+1. **Token Tracking**: Monitors usage as percentage of provider's context limit
+2. **Context Thinning**: At 50%, 60%, 70%, 80% thresholds, replaces large tool results with file references
+3. **Auto-Summarization**: At 80% capacity, triggers conversation summarization
+4. **Provider Adaptation**: Adjusts to different model context windows (4k to 200k+ tokens)
+
+## Error Handling
+
+G3 implements comprehensive error handling:
+
+1. **Error Classification**: Distinguishes recoverable vs non-recoverable errors
+2. **Automatic Retry**: Exponential backoff with jitter for:
+   - Rate limits (HTTP 429)
+   - Network errors
+   - Server errors (HTTP 5xx)
+   - Timeouts
+3. **Error Logging**: Detailed logs saved to `logs/errors/`
+4. **Graceful Degradation**: Continues when possible, fails gracefully when not
+
+## Session Management
+
+Sessions are tracked in `.g3/sessions/<session_id>/`:
+- `session.json` - Full conversation history and metadata
+- `todo.g3.md` - Session-scoped TODO list
+- Context summaries and thinned content
+
+Legacy logs are stored in `logs/g3_session_*.json`.
+
+## Extension Points
+
+### Adding a New Tool
+
+1. Add tool definition in `g3-core/src/tool_definitions.rs`
+2. Implement handler in `g3-core/src/tools/`
+3. Add dispatch case in `g3-core/src/tool_dispatch.rs`
+4. Update system prompt if needed in `g3-core/src/prompts.rs`
+
+### Adding a New Provider
+
+1. Implement `LLMProvider` trait in `g3-providers/src/`
+2. Add configuration struct in `g3-config/src/lib.rs`
+3. Register provider in `g3-core/src/lib.rs` (in `new_with_mode_and_readme`)
+4. Update documentation
+
+### Adding a New Execution Mode
+
+1. Add CLI arguments in `g3-cli/src/lib.rs`
+2. Implement mode logic in the CLI
+3. May require new agent methods in `g3-core`
+
+## Key Files for Understanding
+
+Start reading here:
+
+1. `src/main.rs` - Entry point (trivial, delegates to g3-cli)
+2. `crates/g3-cli/src/lib.rs` - CLI and execution modes
+3. `crates/g3-core/src/lib.rs` - Agent implementation
+4. `crates/g3-providers/src/lib.rs` - Provider trait and registry
+5. `crates/g3-core/src/tool_definitions.rs` - Available tools
+6. `crates/g3-config/src/lib.rs` - Configuration structures
+7. `DESIGN.md` - Original design document
+
+## Dependencies
+
+Key external dependencies:
+
+- **tokio**: Async runtime
+- **reqwest**: HTTP client for API calls
+- **serde/serde_json**: Serialization
+- **clap**: CLI argument parsing
+- **tree-sitter**: Syntax-aware code search
+- **llama_cpp**: Local model inference (with Metal acceleration)
+- **fantoccini**: WebDriver client
+- **axum**: Web framework (for g3-console)
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -0,0 +1,385 @@
+# G3 Configuration Guide
+
+**Last updated**: January 2025  
+**Source of truth**: `crates/g3-config/src/lib.rs`, `config.example.toml`
+
+## Purpose
+
+This document explains how to configure G3, including provider setup, agent behavior, and optional features like WebDriver and computer control.
+
+## Configuration File Location
+
+G3 looks for configuration files in this order:
+
+1. Path specified via `--config` CLI argument
+2. `./g3.toml` (current directory)
+3. `~/.config/g3/config.toml` (user config)
+4. `~/.g3.toml` (legacy location)
+
+If no configuration file exists, G3 creates a default one at `~/.config/g3/config.toml` on first run.
+
+## Configuration Format
+
+G3 uses TOML format. The configuration is organized into sections:
+
+```toml
+[providers]           # LLM provider settings
+[agent]               # Agent behavior settings
+[computer_control]    # Mouse/keyboard automation
+[webdriver]           # Browser automation
+[macax]               # macOS Accessibility API
+```
+
+## Provider Configuration
+
+### Provider Reference Format
+
+Providers are referenced using the format: `<provider_type>.<config_name>`
+
+Examples:
+- `anthropic.default`
+- `databricks.production`
+- `openai.gpt4`
+- `embedded.local`
+
+### Basic Provider Setup
+
+```toml
+[providers]
+# Default provider used for all operations
+default_provider = "anthropic.default"
+
+# Optional: Different providers for different roles
+# planner = "anthropic.planner"   # Planning mode
+# coach = "anthropic.default"     # Code reviewer in autonomous mode
+# player = "anthropic.default"    # Code implementer in autonomous mode
+```
+
+### Anthropic Configuration
+
+```toml
+[providers.anthropic.default]
+api_key = "sk-ant-..."           # Required: Your Anthropic API key
+model = "claude-sonnet-4-5"      # Model to use
+max_tokens = 64000               # Max output tokens per request
+temperature = 0.3                # Sampling temperature (0.0-1.0)
+# cache_config = "ephemeral"     # Optional: Enable prompt caching
+# enable_1m_context = true        # Optional: Enable 1M context (extra cost)
+# thinking_budget_tokens = 10000  # Optional: Extended thinking mode
+```
+
+**Available Anthropic models**:
+- `claude-sonnet-4-5` (recommended)
+- `claude-opus-4-5`
+- `claude-3-5-sonnet-20241022`
+- `claude-3-opus-20240229`
+
+### Databricks Configuration
+
+```toml
+[providers.databricks.default]
+host = "https://your-workspace.cloud.databricks.com"  # Required
+model = "databricks-claude-sonnet-4"                   # Model endpoint
+max_tokens = 4096
+temperature = 0.1
+use_oauth = true                 # Use OAuth (recommended)
+# token = "dapi..."              # Or use personal access token
+```
+
+**OAuth vs Token Authentication**:
+- **OAuth** (`use_oauth = true`): Opens browser for authentication, tokens refresh automatically
+- **Token** (`token = "..."`, `use_oauth = false`): Uses personal access token directly
+
+### OpenAI Configuration
+
+```toml
+[providers.openai.default]
+api_key = "sk-..."               # Required: Your OpenAI API key
+model = "gpt-4-turbo"            # Model to use
+max_tokens = 4096
+temperature = 0.1
+# base_url = "https://api.openai.com/v1"  # Optional: Custom endpoint
+```
+
+### OpenAI-Compatible Providers
+
+For services with OpenAI-compatible APIs (OpenRouter, Groq, Together, etc.):
+
+```toml
+[providers.openai_compatible.openrouter]
+api_key = "sk-or-..."            # Provider's API key
+model = "anthropic/claude-3.5-sonnet"
+base_url = "https://openrouter.ai/api/v1"
+max_tokens = 4096
+temperature = 0.1
+
+[providers.openai_compatible.groq]
+api_key = "gsk_..."
+model = "llama-3.3-70b-versatile"
+base_url = "https://api.groq.com/openai/v1"
+max_tokens = 4096
+temperature = 0.1
+```
+
+Reference these as `openrouter.default` or `groq.default` in `default_provider`.
+
+### Embedded (Local) Models
+
+```toml
+[providers.embedded.default]
+model_path = "~/.cache/g3/models/qwen2.5-7b-instruct-q3_k_m.gguf"
+model_type = "qwen"              # Model architecture
+context_length = 32768           # Context window size
+max_tokens = 2048                # Max output tokens
+temperature = 0.1
+gpu_layers = 32                  # Layers to offload to GPU (Metal/CUDA)
+threads = 8                      # CPU threads for inference
+```
+
+**Supported model types**: `qwen`, `codellama`, `llama`, `mistral`
+
+**Hardware requirements**:
+- 4-16GB RAM depending on model size
+- Optional GPU acceleration (Metal on macOS, CUDA on Linux)
+
+## Agent Configuration
+
+```toml
+[agent]
+# Context and token settings
+fallback_default_max_tokens = 8192   # Default max tokens if provider doesn't specify
+# max_context_length = 200000        # Override context window size for all providers
+
+# Behavior settings
+enable_streaming = true              # Stream responses in real-time
+allow_multiple_tool_calls = true     # Allow multiple tools per response
+timeout_seconds = 60                 # Request timeout
+auto_compact = true                  # Auto-compact context at 90%
+
+# Retry settings
+max_retry_attempts = 3               # Retries for interactive mode
+autonomous_max_retry_attempts = 6    # Retries for autonomous mode
+
+# TODO management
+check_todo_staleness = true          # Warn about stale TODO items
+```
+
+### Retry Behavior
+
+G3 automatically retries on recoverable errors:
+- Rate limits (HTTP 429)
+- Network errors
+- Server errors (HTTP 5xx)
+- Timeouts
+
+**Interactive mode** uses `max_retry_attempts` (default: 3)  
+**Autonomous mode** uses `autonomous_max_retry_attempts` (default: 6) with longer delays
+
+## Computer Control Configuration
+
+```toml
+[computer_control]
+enabled = false              # Set to true to enable
+require_confirmation = true  # Require confirmation before actions
+max_actions_per_second = 5   # Rate limit for safety
+```
+
+**Required OS permissions**:
+- **macOS**: System Preferences → Security & Privacy → Accessibility
+- **Linux**: X11 or Wayland access
+- **Windows**: Run as administrator (first time)
+
+## WebDriver Configuration
+
+```toml
+[webdriver]
+enabled = false              # Set to true to enable
+browser = "safari"           # "safari" or "chrome-headless"
+safari_port = 4444           # Safari WebDriver port
+chrome_port = 9515           # ChromeDriver port
+# chrome_binary = "/path/to/chrome"  # Optional: Custom Chrome path
+```
+
+### Safari Setup (macOS)
+
+```bash
+# Enable Safari remote automation (one-time setup)
+safaridriver --enable
+
+# Or via Safari UI:
+# Safari → Preferences → Advanced → Show Develop menu
+# Develop → Allow Remote Automation
+```
+
+### Chrome Setup
+
+**Option 1: Chrome for Testing (Recommended)**
+```bash
+./scripts/setup-chrome-for-testing.sh
+```
+
+Then configure:
+```toml
+[webdriver]
+chrome_binary = "/Users/yourname/.chrome-for-testing/chrome-mac-arm64/Google Chrome for Testing.app/Contents/MacOS/Google Chrome for Testing"
+```
+
+**Option 2: System Chrome**
+```bash
+# macOS
+brew install chromedriver
+
+# Linux
+apt install chromium-chromedriver
+```
+
+## macOS Accessibility API Configuration
+
+```toml
+[macax]
+enabled = false              # Set to true to enable
+```
+
+**Required permissions**: System Preferences → Security & Privacy → Privacy → Accessibility → Add your terminal app
+
+See [macOS Accessibility Tools Guide](macax-tools.md) for detailed usage.
+
+## Multi-Role Configuration
+
+For autonomous mode with different models for coach and player:
+
+```toml
+[providers]
+default_provider = "anthropic.default"
+coach = "anthropic.coach"    # Code reviewer
+player = "anthropic.player"  # Code implementer
+
+[providers.anthropic.coach]
+api_key = "sk-ant-..."
+model = "claude-sonnet-4-5"
+max_tokens = 32000
+temperature = 0.1            # Lower for consistent reviews
+
+[providers.anthropic.player]
+api_key = "sk-ant-..."
+model = "claude-sonnet-4-5"
+max_tokens = 64000
+temperature = 0.3            # Higher for creative implementations
+```
+
+See `config.coach-player.example.toml` for a complete example.
+
+## Environment Variables
+
+Environment variables override configuration file settings:
+
+| Variable | Description |
+|----------|-------------|
+| `G3_WORKSPACE_PATH` | Override workspace directory |
+| `ANTHROPIC_API_KEY` | Anthropic API key |
+| `OPENAI_API_KEY` | OpenAI API key |
+| `DATABRICKS_HOST` | Databricks workspace URL |
+| `DATABRICKS_TOKEN` | Databricks personal access token |
+
+## CLI Overrides
+
+CLI arguments have the highest priority:
+
+```bash
+# Override provider
+g3 --provider anthropic.default
+
+# Override model
+g3 --model claude-opus-4-5
+
+# Enable features
+g3 --webdriver           # Enable WebDriver (Safari)
+g3 --chrome-headless     # Enable WebDriver (Chrome headless)
+g3 --macax               # Enable macOS Accessibility API
+
+# Specify config file
+g3 --config /path/to/config.toml
+```
+
+## Complete Example Configuration
+
+```toml
+# ~/.config/g3/config.toml
+
+[providers]
+default_provider = "anthropic.default"
+
+[providers.anthropic.default]
+api_key = "sk-ant-api03-..."
+model = "claude-sonnet-4-5"
+max_tokens = 64000
+temperature = 0.3
+
+[providers.databricks.work]
+host = "https://mycompany.cloud.databricks.com"
+model = "databricks-claude-sonnet-4"
+max_tokens = 4096
+temperature = 0.1
+use_oauth = true
+
+[agent]
+fallback_default_max_tokens = 8192
+enable_streaming = true
+allow_multiple_tool_calls = true
+timeout_seconds = 60
+max_retry_attempts = 3
+autonomous_max_retry_attempts = 6
+
+[computer_control]
+enabled = false
+require_confirmation = true
+max_actions_per_second = 5
+
+[webdriver]
+enabled = true
+browser = "safari"
+safari_port = 4444
+
+[macax]
+enabled = false
+```
+
+## Troubleshooting
+
+### "Old config format" error
+
+If you see this error, your config uses a deprecated format. Update to the new named provider format:
+
+**Old format** (deprecated):
+```toml
+[providers.anthropic]
+api_key = "..."
+```
+
+**New format**:
+```toml
+[providers.anthropic.default]
+api_key = "..."
+```
+
+### Provider not found
+
+Ensure your `default_provider` matches a configured provider:
+```toml
+default_provider = "anthropic.default"  # Must match [providers.anthropic.default]
+```
+
+### OAuth issues
+
+For Databricks OAuth:
+1. Ensure `use_oauth = true`
+2. Remove any `token` setting
+3. A browser window will open for authentication
+4. Tokens are cached in `~/.databricks/oauth-tokens.json`
+
+### Context window errors
+
+If you see context overflow errors:
+1. Check `max_context_length` in `[agent]`
+2. Use `/compact` command to manually summarize
+3. Use `/thinnify` to replace large tool results with file references
--- a/docs/macax-tools.md
+++ b/docs/macax-tools.md
@@ -0,0 +1,472 @@
+# macOS Accessibility Tools Guide
+
+**Last updated**: January 2025  
+**Source of truth**: `crates/g3-computer-control/src/macax/`
+
+## Purpose
+
+G3 includes tools for controlling macOS applications via the Accessibility API. This enables automation of native macOS apps, including those you're building with G3.
+
+## Overview
+
+The macOS Accessibility API provides programmatic access to UI elements in any application. G3 exposes this through the `macax_*` tools, allowing you to:
+
+- List and activate applications
+- Inspect UI element hierarchies
+- Find elements by role, title, or identifier
+- Click buttons and interact with controls
+- Read and set values in text fields
+- Simulate keyboard input
+
+## Setup
+
+### 1. Enable in Configuration
+
+```toml
+# ~/.config/g3/config.toml
+[macax]
+enabled = true
+```
+
+Or use the CLI flag:
+
+```bash
+g3 --macax
+```
+
+### 2. Grant Accessibility Permissions
+
+1. Open **System Preferences** → **Security & Privacy** → **Privacy**
+2. Select **Accessibility** in the left sidebar
+3. Click the lock icon and authenticate
+4. Add your terminal application (Terminal, iTerm2, etc.)
+5. Restart your terminal
+
+**Note**: If using VS Code's integrated terminal, add VS Code to the list.
+
+### 3. Verify Setup
+
+```json
+{"tool": "macax_list_apps", "args": {}}
+```
+
+This should return a list of running applications.
+
+## Available Tools
+
+### macax_list_apps
+
+List all running applications.
+
+**Parameters**: None
+
+**Example**:
+```json
+{"tool": "macax_list_apps", "args": {}}
+```
+
+**Returns**:
+```
+Running Applications:
+- Safari (com.apple.Safari)
+- Finder (com.apple.finder)
+- Terminal (com.apple.Terminal)
+- MyApp (com.example.myapp)
+```
+
+---
+
+### macax_get_frontmost_app
+
+Get the currently active (frontmost) application.
+
+**Parameters**: None
+
+**Example**:
+```json
+{"tool": "macax_get_frontmost_app", "args": {}}
+```
+
+**Returns**:
+```
+Frontmost Application: Safari (com.apple.Safari)
+```
+
+---
+
+### macax_activate_app
+
+Bring an application to the front.
+
+**Parameters**:
+- `app_name` (string, required): Application name
+
+**Example**:
+```json
+{"tool": "macax_activate_app", "args": {"app_name": "Safari"}}
+```
+
+---
+
+### macax_get_ui_tree
+
+Get the UI element hierarchy of an application.
+
+**Parameters**:
+- `app_name` (string, required): Application name
+- `max_depth` (integer, optional): Maximum tree depth (default: 5)
+
+**Example**:
+```json
+{"tool": "macax_get_ui_tree", "args": {"app_name": "Calculator", "max_depth": 3}}
+```
+
+**Returns**:
+```
+UI Tree for Calculator:
+└── AXApplication "Calculator"
+    └── AXWindow "Calculator"
+        ├── AXGroup
+        │   ├── AXButton "1" [id: digit_1]
+        │   ├── AXButton "2" [id: digit_2]
+        │   ├── AXButton "+" [id: add]
+        │   └── AXButton "=" [id: equals]
+        └── AXStaticText "0" [id: display]
+```
+
+**Notes**:
+- Use lower `max_depth` for complex apps to avoid overwhelming output
+- Elements show role, title, and accessibility identifier (if set)
+
+---
+
+### macax_find_elements
+
+Find UI elements matching criteria.
+
+**Parameters**:
+- `app_name` (string, required): Application name
+- `role` (string, optional): Element role (e.g., "button", "textField")
+- `title` (string, optional): Element title/label
+- `identifier` (string, optional): Accessibility identifier
+
+**Example**:
+```json
+{"tool": "macax_find_elements", "args": {
+  "app_name": "Safari",
+  "role": "button"
+}}
+```
+
+**Returns**:
+```
+Found 5 elements:
+1. AXButton "Back" [id: BackButton]
+2. AXButton "Forward" [id: ForwardButton]
+3. AXButton "Reload" [id: ReloadButton]
+4. AXButton "Share" [id: ShareButton]
+5. AXButton "New Tab" [id: NewTabButton]
+```
+
+---
+
+### macax_click
+
+Click a UI element.
+
+**Parameters**:
+- `app_name` (string, required): Application name
+- `identifier` (string, optional): Accessibility identifier
+- `title` (string, optional): Element title
+- `role` (string, optional): Element role
+
+At least one of `identifier`, `title`, or `role` must be provided.
+
+**Examples**:
+
+```json
+// Click by identifier (most reliable)
+{"tool": "macax_click", "args": {
+  "app_name": "Calculator",
+  "identifier": "digit_5"
+}}
+
+// Click by title
+{"tool": "macax_click", "args": {
+  "app_name": "Calculator",
+  "title": "5"
+}}
+
+// Click by role and title
+{"tool": "macax_click", "args": {
+  "app_name": "Safari",
+  "role": "button",
+  "title": "Reload"
+}}
+```
+
+---
+
+### macax_set_value
+
+Set the value of a UI element (text fields, sliders, etc.).
+
+**Parameters**:
+- `app_name` (string, required): Application name
+- `identifier` (string, optional): Accessibility identifier
+- `title` (string, optional): Element title
+- `value` (string, required): Value to set
+
+**Example**:
+```json
+{"tool": "macax_set_value", "args": {
+  "app_name": "TextEdit",
+  "role": "textArea",
+  "value": "Hello, World!"
+}}
+```
+
+---
+
+### macax_get_value
+
+Get the current value of a UI element.
+
+**Parameters**:
+- `app_name` (string, required): Application name
+- `identifier` (string, optional): Accessibility identifier
+- `title` (string, optional): Element title
+
+**Example**:
+```json
+{"tool": "macax_get_value", "args": {
+  "app_name": "Calculator",
+  "identifier": "display"
+}}
+```
+
+**Returns**:
+```
+Value: 42
+```
+
+---
+
+### macax_press_key
+
+Simulate a key press.
+
+**Parameters**:
+- `key` (string, required): Key to press
+- `modifiers` (array, optional): Modifier keys
+
+**Supported modifiers**: `command`, `shift`, `option`, `control`
+
+**Examples**:
+
+```json
+// Simple key press
+{"tool": "macax_press_key", "args": {"key": "a"}}
+
+// With modifiers (Cmd+S)
+{"tool": "macax_press_key", "args": {
+  "key": "s",
+  "modifiers": ["command"]
+}}
+
+// Multiple modifiers (Cmd+Shift+N)
+{"tool": "macax_press_key", "args": {
+  "key": "n",
+  "modifiers": ["command", "shift"]
+}}
+
+// Special keys
+{"tool": "macax_press_key", "args": {"key": "return"}}
+{"tool": "macax_press_key", "args": {"key": "escape"}}
+{"tool": "macax_press_key", "args": {"key": "tab"}}
+{"tool": "macax_press_key", "args": {"key": "delete"}}
+```
+
+**Special key names**:
+- `return`, `enter`
+- `escape`, `esc`
+- `tab`
+- `delete`, `backspace`
+- `space`
+- `up`, `down`, `left`, `right`
+- `home`, `end`, `pageup`, `pagedown`
+- `f1` through `f12`
+
+## Common Roles
+
+| Role | Description |
+|------|-------------|
+| `button` | Clickable button |
+| `textField` | Single-line text input |
+| `textArea` | Multi-line text input |
+| `checkbox` | Checkbox control |
+| `radioButton` | Radio button |
+| `popUpButton` | Dropdown/popup menu |
+| `slider` | Slider control |
+| `table` | Table view |
+| `list` | List view |
+| `outline` | Outline/tree view |
+| `group` | Container group |
+| `window` | Application window |
+| `sheet` | Modal sheet |
+| `dialog` | Dialog window |
+| `staticText` | Non-editable text |
+| `image` | Image element |
+| `scrollArea` | Scrollable container |
+| `toolbar` | Toolbar |
+| `menuBar` | Menu bar |
+| `menu` | Menu |
+| `menuItem` | Menu item |
+
+## Best Practices
+
+### 1. Use Accessibility Identifiers
+
+When building apps you'll automate with G3, add accessibility identifiers:
+
+**SwiftUI**:
+```swift
+Button("Submit") { ... }
+    .accessibilityIdentifier("submit_button")
+```
+
+**UIKit**:
+```swift
+button.accessibilityIdentifier = "submit_button"
+```
+
+**AppKit**:
+```swift
+button.setAccessibilityIdentifier("submit_button")
+```
+
+Identifiers are more reliable than titles (which may be localized).
+
+### 2. Inspect Before Automating
+
+Always inspect the UI tree first:
+
+```json
+{"tool": "macax_get_ui_tree", "args": {"app_name": "MyApp", "max_depth": 4}}
+```
+
+This helps you understand:
+- Element hierarchy
+- Available identifiers
+- Correct role names
+
+### 3. Activate App First
+
+Some actions require the app to be frontmost:
+
+```json
+{"tool": "macax_activate_app", "args": {"app_name": "MyApp"}}
+{"tool": "macax_click", "args": {"app_name": "MyApp", "identifier": "button1"}}
+```
+
+### 4. Handle Timing
+
+UI updates may take time. If an element isn't found:
+1. Wait briefly
+2. Retry the operation
+3. Check if the app state changed
+
+### 5. Prefer Identifiers Over Titles
+
+```json
+// Good: Uses identifier
+{"tool": "macax_click", "args": {"app_name": "MyApp", "identifier": "save_btn"}}
+
+// Less reliable: Uses title (may be localized)
+{"tool": "macax_click", "args": {"app_name": "MyApp", "title": "Save"}}
+```
+
+## Example: Automating Calculator
+
+```json
+// 1. Activate Calculator
+{"tool": "macax_activate_app", "args": {"app_name": "Calculator"}}
+
+// 2. Inspect UI
+{"tool": "macax_get_ui_tree", "args": {"app_name": "Calculator", "max_depth": 3}}
+
+// 3. Click "5"
+{"tool": "macax_click", "args": {"app_name": "Calculator", "title": "5"}}
+
+// 4. Click "+"
+{"tool": "macax_click", "args": {"app_name": "Calculator", "title": "+"}}
+
+// 5. Click "3"
+{"tool": "macax_click", "args": {"app_name": "Calculator", "title": "3"}}
+
+// 6. Click "="
+{"tool": "macax_click", "args": {"app_name": "Calculator", "title": "="}}
+
+// 7. Read result
+{"tool": "macax_get_value", "args": {"app_name": "Calculator", "role": "staticText"}}
+```
+
+## Troubleshooting
+
+### "Accessibility permission denied"
+
+1. Check System Preferences → Security & Privacy → Accessibility
+2. Ensure your terminal app is listed and checked
+3. Restart the terminal after granting permission
+
+### "Application not found"
+
+1. Use exact app name (case-sensitive)
+2. Run `macax_list_apps` to see available apps
+3. App must be running
+
+### "Element not found"
+
+1. Inspect UI tree to verify element exists
+2. Check identifier/title spelling
+3. Element may be in a different window or sheet
+4. App state may have changed
+
+### "Cannot perform action"
+
+1. Element may be disabled
+2. App may need to be frontmost
+3. Element may not support the action
+4. Check element role supports the operation
+
+### Slow Performance
+
+1. Reduce `max_depth` in `macax_get_ui_tree`
+2. Use specific identifiers instead of searching
+3. Complex apps have large UI trees
+
+## Comparison with Other Tools
+
+| Feature | macax | Vision Tools | WebDriver |
+|---------|-------|--------------|----------|
+| Native apps | ✅ | ✅ (via OCR) | ❌ |
+| Web browsers | ✅ | ✅ | ✅ |
+| Electron apps | ✅ | ✅ | Partial |
+| Reliability | High | Medium | High |
+| Setup | Permissions | None | Driver |
+| Speed | Fast | Slower | Medium |
+
+**Use macax when**:
+- Automating native macOS apps
+- You control the app and can add identifiers
+- Need reliable, fast automation
+
+**Use Vision tools when**:
+- App doesn't expose accessibility
+- Need to find text visually
+- Cross-platform approach needed
+
+**Use WebDriver when**:
+- Automating web content
+- Need JavaScript execution
+- Testing web applications
--- a/docs/providers.md
+++ b/docs/providers.md
@@ -0,0 +1,408 @@
+# G3 LLM Providers Guide
+
+**Last updated**: January 2025  
+**Source of truth**: `crates/g3-providers/src/`
+
+## Purpose
+
+This document describes the LLM providers supported by G3, their capabilities, and how to choose between them.
+
+## Provider Overview
+
+| Provider | Type | Tool Calling | Cache Control | Context Window | Best For |
+|----------|------|--------------|---------------|----------------|----------|
+| **Anthropic** | Cloud | Native | Yes | 200k (1M optional) | General use, complex tasks |
+| **Databricks** | Cloud | Native | Yes (Claude models) | Varies | Enterprise, existing Databricks users |
+| **OpenAI** | Cloud | Native | No | 128k | GPT model preference |
+| **OpenAI-Compatible** | Cloud | Native | No | Varies | OpenRouter, Groq, Together, etc. |
+| **Embedded** | Local | JSON fallback | No | 4k-32k | Privacy, offline, cost savings |
+
+## Anthropic
+
+**Location**: `crates/g3-providers/src/anthropic.rs`
+
+### Features
+
+- **Native tool calling**: Full support for structured tool calls
+- **Prompt caching**: Reduce costs with ephemeral caching
+- **Extended context**: Optional 1M token context (additional cost)
+- **Extended thinking**: Budget tokens for complex reasoning
+- **Streaming**: Real-time response streaming
+
+### Configuration
+
+```toml
+[providers.anthropic.default]
+api_key = "sk-ant-api03-..."     # Required
+model = "claude-sonnet-4-5"      # Model name
+max_tokens = 64000               # Max output tokens
+temperature = 0.3                # 0.0-1.0
+cache_config = "ephemeral"       # Optional: Enable caching
+enable_1m_context = true          # Optional: 1M context
+thinking_budget_tokens = 10000    # Optional: Extended thinking
+```
+
+### Available Models
+
+| Model | Context | Best For |
+|-------|---------|----------|
+| `claude-sonnet-4-5` | 200k | Balanced performance/cost |
+| `claude-opus-4-5` | 200k | Complex reasoning |
+| `claude-3-5-sonnet-20241022` | 200k | Previous generation |
+| `claude-3-opus-20240229` | 200k | Previous generation |
+
+### Prompt Caching
+
+Enable caching to reduce costs for repeated context:
+
+```toml
+cache_config = "ephemeral"  # Cache for session duration
+```
+
+Caching is applied to:
+- System prompts
+- README/AGENTS.md content
+- Large tool results
+
+### Extended Thinking
+
+For complex tasks requiring step-by-step reasoning:
+
+```toml
+thinking_budget_tokens = 10000  # Tokens for internal reasoning
+```
+
+The model uses these tokens for planning before responding.
+
+---
+
+## Databricks
+
+**Location**: `crates/g3-providers/src/databricks.rs`
+
+### Features
+
+- **Foundation Model APIs**: Access to various models
+- **OAuth authentication**: Secure browser-based auth
+- **Token authentication**: Personal access tokens
+- **Enterprise integration**: Works with existing Databricks setup
+
+### Configuration
+
+```toml
+[providers.databricks.default]
+host = "https://your-workspace.cloud.databricks.com"
+model = "databricks-claude-sonnet-4"
+max_tokens = 4096
+temperature = 0.1
+use_oauth = true              # Recommended
+# token = "dapi..."           # Alternative: PAT
+```
+
+### Authentication
+
+**OAuth (Recommended)**:
+1. Set `use_oauth = true`
+2. On first run, browser opens for authentication
+3. Tokens are cached in `~/.databricks/oauth-tokens.json`
+4. Tokens refresh automatically
+
+**Personal Access Token**:
+1. Generate token in Databricks workspace
+2. Set `token = "dapi..."` and `use_oauth = false`
+
+### Available Models
+
+Models depend on your Databricks workspace configuration:
+- `databricks-claude-sonnet-4` (Claude via Databricks)
+- `databricks-meta-llama-3-1-70b-instruct`
+- `databricks-dbrx-instruct`
+- Custom fine-tuned models
+
+---
+
+## OpenAI
+
+**Location**: `crates/g3-providers/src/openai.rs`
+
+### Features
+
+- **Native tool calling**: Full support
+- **Custom endpoints**: Override base URL
+- **Streaming**: Real-time responses
+
+### Configuration
+
+```toml
+[providers.openai.default]
+api_key = "sk-..."               # Required
+model = "gpt-4-turbo"            # Model name
+max_tokens = 4096
+temperature = 0.1
+# base_url = "https://api.openai.com/v1"  # Optional
+```
+
+### Available Models
+
+| Model | Context | Notes |
+|-------|---------|-------|
+| `gpt-4-turbo` | 128k | Latest GPT-4 |
+| `gpt-4o` | 128k | Optimized GPT-4 |
+| `gpt-4` | 8k | Original GPT-4 |
+| `gpt-3.5-turbo` | 16k | Faster, cheaper |
+
+---
+
+## OpenAI-Compatible Providers
+
+**Location**: `crates/g3-providers/src/openai.rs` (reuses OpenAI implementation)
+
+For services that implement the OpenAI API format.
+
+### Configuration
+
+```toml
+# OpenRouter
+[providers.openai_compatible.openrouter]
+api_key = "sk-or-..."
+model = "anthropic/claude-3.5-sonnet"
+base_url = "https://openrouter.ai/api/v1"
+max_tokens = 4096
+temperature = 0.1
+
+# Groq
+[providers.openai_compatible.groq]
+api_key = "gsk_..."
+model = "llama-3.3-70b-versatile"
+base_url = "https://api.groq.com/openai/v1"
+max_tokens = 4096
+temperature = 0.1
+
+# Together
+[providers.openai_compatible.together]
+api_key = "..."
+model = "meta-llama/Llama-3-70b-chat-hf"
+base_url = "https://api.together.xyz/v1"
+max_tokens = 4096
+temperature = 0.1
+```
+
+### Supported Services
+
+- **OpenRouter**: Access to many models through one API
+- **Groq**: Fast inference for Llama models
+- **Together**: Open-source model hosting
+- **Anyscale**: Scalable model serving
+- **Local servers**: Ollama, vLLM, text-generation-inference
+
+---
+
+## Embedded (Local Models)
+
+**Location**: `crates/g3-providers/src/embedded.rs`
+
+### Features
+
+- **Completely local**: No data leaves your machine
+- **Offline capable**: Works without internet
+- **GPU acceleration**: Metal (macOS), CUDA (Linux)
+- **No API costs**: Free after model download
+
+### Configuration
+
+```toml
+[providers.embedded.default]
+model_path = "~/.cache/g3/models/qwen2.5-7b-instruct-q3_k_m.gguf"
+model_type = "qwen"              # Model architecture
+context_length = 32768           # Context window
+max_tokens = 2048                # Max output
+temperature = 0.1
+gpu_layers = 32                  # GPU offload (0 = CPU only)
+threads = 8                      # CPU threads
+```
+
+### Supported Model Types
+
+| Type | Models | Notes |
+|------|--------|-------|
+| `qwen` | Qwen 2.5 series | Good coding ability |
+| `codellama` | Code Llama | Specialized for code |
+| `llama` | Llama 2/3 | General purpose |
+| `mistral` | Mistral/Mixtral | Efficient |
+
+### Model Download
+
+Download GGUF models from Hugging Face:
+
+```bash
+mkdir -p ~/.cache/g3/models
+cd ~/.cache/g3/models
+
+# Example: Qwen 2.5 7B
+wget https://huggingface.co/Qwen/Qwen2.5-7B-Instruct-GGUF/resolve/main/qwen2.5-7b-instruct-q4_k_m.gguf
+```
+
+### Hardware Requirements
+
+| Model Size | RAM Required | GPU VRAM | Notes |
+|------------|--------------|----------|-------|
+| 7B Q4 | 6GB | 4GB | Good for most tasks |
+| 7B Q8 | 10GB | 8GB | Better quality |
+| 13B Q4 | 10GB | 8GB | More capable |
+| 70B Q4 | 48GB | 40GB | Requires high-end hardware |
+
+### GPU Acceleration
+
+**macOS (Metal)**:
+```toml
+gpu_layers = 32  # Offload layers to GPU
+```
+
+**Linux (CUDA)**:
+Requires CUDA toolkit installed.
+
+**CPU Only**:
+```toml
+gpu_layers = 0
+threads = 8  # Use more threads
+```
+
+### Tool Calling
+
+Embedded models don't have native tool calling. G3 uses JSON fallback:
+1. System prompt includes tool definitions as JSON
+2. Model outputs tool calls as JSON in response
+3. G3 parses JSON and executes tools
+
+This works but is less reliable than native tool calling.
+
+---
+
+## Provider Selection Guide
+
+### By Use Case
+
+| Use Case | Recommended Provider |
+|----------|---------------------|
+| General coding tasks | Anthropic (Claude Sonnet) |
+| Complex reasoning | Anthropic (Claude Opus) |
+| Enterprise/compliance | Databricks |
+| Cost-sensitive | Embedded or Groq |
+| Privacy-critical | Embedded |
+| Offline development | Embedded |
+| Fast iteration | Groq (Llama) |
+| Model variety | OpenRouter |
+
+### By Priority
+
+**Quality first**: Anthropic Claude Opus/Sonnet
+- Best reasoning and coding ability
+- Native tool calling
+- Prompt caching for efficiency
+
+**Cost first**: Embedded or OpenAI-compatible
+- Embedded: Free after download
+- Groq: Very cheap, fast
+- OpenRouter: Pay-per-use, many options
+
+**Privacy first**: Embedded
+- Data never leaves your machine
+- No API calls
+- Full control
+
+**Speed first**: Groq or Embedded with GPU
+- Groq: Extremely fast inference
+- Embedded with Metal/CUDA: Low latency
+
+---
+
+## Provider Trait
+
+All providers implement the `LLMProvider` trait:
+
+```rust
+#[async_trait]
+pub trait LLMProvider: Send + Sync {
+    /// Generate a completion
+    async fn complete(&self, request: CompletionRequest) -> Result<CompletionResponse>;
+    
+    /// Stream a completion
+    async fn stream(&self, request: CompletionRequest) -> Result<CompletionStream>;
+    
+    /// Provider name (e.g., "anthropic.default")
+    fn name(&self) -> &str;
+    
+    /// Model name (e.g., "claude-sonnet-4-5")
+    fn model(&self) -> &str;
+    
+    /// Whether provider supports native tool calling
+    fn has_native_tool_calling(&self) -> bool;
+    
+    /// Whether provider supports cache control
+    fn supports_cache_control(&self) -> bool;
+    
+    /// Configured max tokens
+    fn max_tokens(&self) -> u32;
+    
+    /// Configured temperature
+    fn temperature(&self) -> f32;
+}
+```
+
+---
+
+## Adding a New Provider
+
+1. Create `crates/g3-providers/src/newprovider.rs`
+2. Implement `LLMProvider` trait
+3. Add configuration struct to `crates/g3-config/src/lib.rs`
+4. Register in `crates/g3-core/src/lib.rs` (`new_with_mode_and_readme`)
+5. Export from `crates/g3-providers/src/lib.rs`
+6. Update documentation
+
+---
+
+## Troubleshooting
+
+### Authentication Errors
+
+**Anthropic**: Verify API key starts with `sk-ant-`
+
+**Databricks OAuth**: 
+- Delete `~/.databricks/oauth-tokens.json` and re-authenticate
+- Ensure workspace URL is correct
+
+**OpenAI**: Verify API key and check billing status
+
+### Rate Limits
+
+G3 automatically retries on rate limits with exponential backoff.
+
+To reduce rate limit issues:
+- Use prompt caching (Anthropic)
+- Reduce `max_tokens`
+- Use a provider with higher limits
+
+### Context Window Errors
+
+If you see "context too long" errors:
+1. Use `/compact` to summarize conversation
+2. Use `/thinnify` to replace large tool results
+3. Increase `max_context_length` in config
+4. Switch to a provider with larger context
+
+### Embedded Model Issues
+
+**Model not loading**:
+- Verify `model_path` is correct
+- Check file permissions
+- Ensure enough RAM
+
+**Slow inference**:
+- Increase `gpu_layers` for GPU offload
+- Reduce `context_length`
+- Use a smaller quantization (Q4 vs Q8)
+
+**Poor tool calling**:
+- Embedded models use JSON fallback
+- Consider cloud provider for complex tool use
--- a/docs/tools.md
+++ b/docs/tools.md
@@ -0,0 +1,538 @@
+# G3 Tools Reference
+
+**Last updated**: January 2025  
+**Source of truth**: `crates/g3-core/src/tool_definitions.rs`, `crates/g3-core/src/tools/`
+
+## Purpose
+
+This document describes all tools available to the G3 agent. Tools are the primary mechanism by which G3 interacts with the filesystem, executes commands, and automates tasks.
+
+## Tool Categories
+
+| Category | Tools | Enabled By |
+|----------|-------|------------|
+| **Core** | shell, read_file, write_file, str_replace, final_output, background_process | Always |
+| **Images** | read_image, take_screenshot, extract_text | Always |
+| **Task Management** | todo_read, todo_write | Always |
+| **Code Intelligence** | code_search, code_coverage | Always |
+| **WebDriver** | webdriver_* (12 tools) | `--webdriver` or `--chrome-headless` |
+| **Vision** | vision_find_text, vision_click_text, vision_click_near_text | Always (macOS) |
+| **macOS Accessibility** | macax_* (9 tools) | `--macax` |
+| **Computer Control** | mouse_click, type_text, find_element, list_windows | `computer_control.enabled = true` |
+
+---
+
+## Core Tools
+
+### shell
+
+Execute shell commands.
+
+**Parameters**:
+- `command` (string, required): The shell command to execute
+
+**Example**:
+```json
+{"tool": "shell", "args": {"command": "ls -la"}}
+```
+
+**Notes**:
+- Commands run in the current working directory
+- Output is streamed in real-time
+- Both stdout and stderr are captured
+- Exit code is reported
+
+---
+
+### background_process
+
+Launch a long-running process in the background.
+
+**Parameters**:
+- `name` (string, required): Unique name for the process (e.g., "game_server")
+- `command` (string, required): Shell command to execute
+- `working_dir` (string, optional): Working directory
+
+**Example**:
+```json
+{"tool": "background_process", "args": {"name": "dev_server", "command": "npm run dev"}}
+```
+
+**Returns**: PID and log file path
+
+**Notes**:
+- Process runs independently of the agent
+- Logs are captured to a file
+- Use `shell` to read logs (`tail`), check status (`ps`), or stop (`kill`)
+
+---
+
+### read_file
+
+Read file contents with optional character range.
+
+**Parameters**:
+- `file_path` (string, required): Path to the file
+- `start` (integer, optional): Starting character position (0-indexed, inclusive)
+- `end` (integer, optional): Ending character position (0-indexed, exclusive)
+
+**Example**:
+```json
+{"tool": "read_file", "args": {"file_path": "src/main.rs", "start": 0, "end": 1000}}
+```
+
+**Notes**:
+- For image files (png, jpg, gif, etc.), automatically extracts text using OCR
+- Supports tilde expansion (`~`)
+- Reports file size and line count
+
+---
+
+### read_image
+
+Read image files for visual analysis by the LLM.
+
+**Parameters**:
+- `file_paths` (array of strings, required): Paths to image files
+
+**Example**:
+```json
+{"tool": "read_image", "args": {"file_paths": ["screenshot.png", "diagram.jpg"]}}
+```
+
+**Supported formats**: PNG, JPEG, GIF, WebP
+
+**Notes**:
+- Images are sent to the LLM for visual analysis
+- Use for inspecting sprites, UI screenshots, diagrams, etc.
+- Different from `extract_text` which only does OCR
+
+---
+
+### write_file
+
+Create or overwrite a file.
+
+**Parameters**:
+- `file_path` (string, required): Path to the file
+- `content` (string, required): Content to write
+
+**Example**:
+```json
+{"tool": "write_file", "args": {"file_path": "hello.txt", "content": "Hello, world!"}}
+```
+
+**Notes**:
+- Creates parent directories if needed
+- Overwrites existing files
+- Reports bytes written
+
+---
+
+### str_replace
+
+Apply a unified diff to a file.
+
+**Parameters**:
+- `file_path` (string, required): Path to the file
+- `diff` (string, required): Unified diff with context lines
+- `start` (integer, optional): Starting character position to constrain search
+- `end` (integer, optional): Ending character position to constrain search
+
+**Example**:
+```json
+{"tool": "str_replace", "args": {
+  "file_path": "src/main.rs",
+  "diff": "@@ -10,3 +10,4 @@\n fn main() {\n     println!(\"Hello\");\n+    println!(\"World\");\n }"
+}}
+```
+
+**Notes**:
+- Supports multiple hunks
+- Context lines help locate the correct position
+- Use `start`/`end` to disambiguate when multiple matches exist
+- `---/+++` headers are optional for minimal diffs
+
+---
+
+### final_output
+
+Signal task completion with a summary.
+
+**Parameters**:
+- `summary` (string, required): Markdown summary of what was accomplished
+
+**Example**:
+```json
+{"tool": "final_output", "args": {"summary": "## Completed\n\n- Created user authentication module\n- Added unit tests\n- Updated documentation"}}
+```
+
+**Notes**:
+- Ends the current task
+- Summary is displayed to the user
+- In autonomous mode, triggers coach review
+
+---
+
+## Image & Screenshot Tools
+
+### take_screenshot
+
+Capture a screenshot of an application window.
+
+**Parameters**:
+- `path` (string, required): Filename for the screenshot
+- `window_id` (string, required): Application name (e.g., "Safari", "Terminal")
+- `region` (object, optional): `{x, y, width, height}` to capture a region
+
+**Example**:
+```json
+{"tool": "take_screenshot", "args": {"path": "safari.png", "window_id": "Safari"}}
+```
+
+**Notes**:
+- Use `list_windows` first to identify available windows
+- Relative paths save to `~/tmp` or `$TMPDIR`
+- Uses native screencapture on macOS
+
+---
+
+### extract_text
+
+Extract text from an image using OCR.
+
+**Parameters**:
+- `path` (string, optional): Path to image file
+
+**Example**:
+```json
+{"tool": "extract_text", "args": {"path": "screenshot.png"}}
+```
+
+**Notes**:
+- Uses Tesseract OCR or Apple Vision framework
+- For window-based OCR, use `vision_find_text` instead
+
+---
+
+## Task Management Tools
+
+### todo_read
+
+Read the current TODO list.
+
+**Parameters**: None
+
+**Example**:
+```json
+{"tool": "todo_read", "args": {}}
+```
+
+**Notes**:
+- TODO lists are session-scoped
+- Stored in `.g3/sessions/<session_id>/todo.g3.md`
+- Call at start of multi-step tasks to check for existing plans
+
+---
+
+### todo_write
+
+Create or update the TODO list.
+
+**Parameters**:
+- `content` (string, required): TODO list content in markdown checkbox format
+
+**Example**:
+```json
+{"tool": "todo_write", "args": {"content": "- [ ] Implement feature\n  - [ ] Write tests\n  - [ ] Update docs\n- [x] Setup project"}}
+```
+
+**Notes**:
+- Replaces entire file content
+- Always call `todo_read` first to preserve existing content
+- Use `- [ ]` for incomplete, `- [x]` for complete
+- Supports nested tasks with indentation
+
+---
+
+## Code Intelligence Tools
+
+### code_search
+
+Syntax-aware code search using tree-sitter.
+
+**Parameters**:
+- `searches` (array, required): Array of search objects:
+  - `name` (string): Label for this search
+  - `query` (string): Tree-sitter query in S-expression format
+  - `language` (string): Programming language
+  - `paths` (array, optional): Paths to search
+  - `context_lines` (integer, optional): Lines of context (0-20)
+- `max_concurrency` (integer, optional): Parallel searches (default: 4)
+- `max_matches_per_search` (integer, optional): Max matches (default: 500)
+
+**Supported languages**: rust, python, javascript, typescript, go, java, c, cpp, kotlin
+
+**Example**:
+```json
+{"tool": "code_search", "args": {
+  "searches": [{
+    "name": "functions",
+    "query": "(function_item name: (identifier) @name)",
+    "language": "rust",
+    "context_lines": 2
+  }]
+}}
+```
+
+See [Code Search Guide](CODE_SEARCH.md) for detailed query patterns.
+
+---
+
+### code_coverage
+
+Generate code coverage report using cargo llvm-cov.
+
+**Parameters**: None
+
+**Example**:
+```json
+{"tool": "code_coverage", "args": {}}
+```
+
+**Notes**:
+- Runs all tests with coverage instrumentation
+- Auto-installs llvm-tools-preview and cargo-llvm-cov if missing
+- Returns coverage statistics summary
+
+---
+
+## WebDriver Tools
+
+Enabled with `--webdriver` (Safari) or `--chrome-headless` (Chrome).
+
+### webdriver_start
+
+Start a browser session.
+
+**Example**:
+```json
+{"tool": "webdriver_start", "args": {}}
+```
+
+### webdriver_navigate
+
+Navigate to a URL.
+
+**Parameters**:
+- `url` (string, required): URL with protocol (e.g., `https://`)
+
+### webdriver_get_url / webdriver_get_title
+
+Get current URL or page title.
+
+### webdriver_find_element / webdriver_find_elements
+
+Find element(s) by CSS selector.
+
+**Parameters**:
+- `selector` (string, required): CSS selector
+
+### webdriver_click
+
+Click an element.
+
+**Parameters**:
+- `selector` (string, required): CSS selector
+
+### webdriver_send_keys
+
+Type text into an input.
+
+**Parameters**:
+- `selector` (string, required): CSS selector
+- `text` (string, required): Text to type
+- `clear_first` (boolean, optional): Clear before typing (default: true)
+
+### webdriver_execute_script
+
+Execute JavaScript.
+
+**Parameters**:
+- `script` (string, required): JavaScript code (use `return` to return values)
+
+### webdriver_get_page_source
+
+Get rendered HTML.
+
+**Parameters**:
+- `max_length` (integer, optional): Max chars to return (default: 10000, 0 for no limit)
+- `save_to_file` (string, optional): Save to file instead of returning inline
+
+### webdriver_screenshot
+
+Take browser screenshot.
+
+**Parameters**:
+- `path` (string, required): Save path
+
+### webdriver_back / webdriver_forward / webdriver_refresh
+
+Navigation controls.
+
+### webdriver_quit
+
+Close browser and end session.
+
+---
+
+## Vision Tools (macOS)
+
+Use Apple Vision framework for text recognition.
+
+### vision_find_text
+
+Find text in an application window.
+
+**Parameters**:
+- `app_name` (string, required): Application name
+- `text` (string, required): Text to search for
+
+**Returns**: Bounding box coordinates and confidence score
+
+### vision_click_text
+
+Find and click on text.
+
+**Parameters**:
+- `app_name` (string, required): Application name
+- `text` (string, required): Text to click
+
+### vision_click_near_text
+
+Click near a text label (useful for form fields).
+
+**Parameters**:
+- `app_name` (string, required): Application name
+- `text` (string, required): Label text to find
+- `direction` (string, optional): "right", "below", "left", "above" (default: "right")
+- `distance` (integer, optional): Pixels from text (default: 50)
+
+---
+
+## macOS Accessibility Tools
+
+Enabled with `--macax`. See [macOS Accessibility Tools Guide](macax-tools.md).
+
+### macax_list_apps
+
+List running applications.
+
+### macax_get_frontmost_app
+
+Get the frontmost application.
+
+### macax_activate_app
+
+Bring an application to front.
+
+**Parameters**:
+- `app_name` (string, required): Application name
+
+### macax_get_ui_tree
+
+Get UI element hierarchy.
+
+**Parameters**:
+- `app_name` (string, required): Application name
+- `max_depth` (integer, optional): Tree depth limit
+
+### macax_find_elements
+
+Find UI elements by criteria.
+
+**Parameters**:
+- `app_name` (string, required): Application name
+- `role` (string, optional): Element role (button, textField, etc.)
+- `title` (string, optional): Element title
+- `identifier` (string, optional): Accessibility identifier
+
+### macax_click
+
+Click a UI element.
+
+**Parameters**:
+- `app_name` (string, required): Application name
+- `identifier` or `title` or `role`: Element selector
+
+### macax_set_value / macax_get_value
+
+Set or get element value.
+
+### macax_press_key
+
+Simulate key press.
+
+**Parameters**:
+- `key` (string, required): Key to press
+- `modifiers` (array, optional): ["command", "shift", "option", "control"]
+
+---
+
+## Computer Control Tools
+
+Enabled with `computer_control.enabled = true` in config.
+
+### mouse_click
+
+Click at coordinates.
+
+**Parameters**:
+- `x` (integer, required): X coordinate
+- `y` (integer, required): Y coordinate
+- `button` (string, optional): "left", "right", "middle"
+
+### type_text
+
+Type text at cursor.
+
+**Parameters**:
+- `text` (string, required): Text to type
+
+### find_element
+
+Find UI element by text, role, or attributes.
+
+### list_windows
+
+List all open windows with IDs and titles.
+
+---
+
+## Tool Execution Notes
+
+### Duplicate Detection
+
+G3 prevents accidental duplicate tool calls:
+- Only immediately sequential identical calls are blocked
+- Text between tool calls resets detection
+- Tools can be reused throughout a session
+
+### Error Handling
+
+Tool errors are reported back to the agent, which can:
+- Retry with different parameters
+- Try an alternative approach
+- Report the issue to the user
+
+### Working Directory
+
+Tools execute in:
+1. Directory specified by `--codebase-fast-start` if provided
+2. Current working directory otherwise
+
+### File Paths
+
+- Tilde expansion (`~`) is supported
+- Relative paths are relative to working directory
+- Screenshots default to `~/tmp` or `$TMPDIR`