lamport run

2026-01-03 16:48:30 +11:00
parent f4a1bf5e93
commit f7e2f38fe9
10 changed files with 3444 additions and 0 deletions
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -0,0 +1,205 @@
 # AGENTS.md - Machine Instructions for G3
 **Last updated**: January 2025  
 **Purpose**: Enable AI agents to work safely and effectively with this codebase
 ## System Overview
 G3 is an AI coding agent built in Rust. It uses LLM providers to execute tasks through a tool-based interface. The codebase is organized as a Cargo workspace with 9 crates.
 ### Quick Reference
 | Crate | Purpose | Stability |
 |-------|---------|----------|
 | `g3-core` | Agent engine, tools, context management | Stable |
 | `g3-providers` | LLM provider abstractions | Stable |
 | `g3-cli` | Command-line interface | Stable |
 | `g3-config` | Configuration management | Stable |
 | `g3-execution` | Code execution | Stable |
 | `g3-computer-control` | Computer automation | Experimental |
 | `g3-planner` | Planning mode | Stable |
 | `g3-ensembles` | Multi-agent (flock) mode | Experimental |
 | `g3-console` | Web monitoring console | Experimental |
 ## Critical Invariants
 ### MUST Hold
 1. **Tool calls must be valid JSON** - The streaming parser expects well-formed tool calls
 2. **Context window limits must be respected** - Exceeding limits causes API errors
 3. **Provider trait implementations must be Send + Sync** - Required for async runtime
 4. **Session IDs must be unique** - Used for log file paths and TODO scoping
 5. **File paths in tools support tilde expansion** - `~` expands to home directory
 ### MUST NOT Do
 1. **Never block the async runtime** - Use `tokio::spawn` for CPU-intensive work
 2. **Never store secrets in logs** - API keys are redacted in error logs
 3. **Never modify files outside working directory without explicit permission**
 4. **Never assume tool results fit in context** - Large results are thinned automatically
 ## Recommended Entry Points
 ### For Understanding the System
 1. `src/main.rs` - Entry point (trivial)
 2. `crates/g3-cli/src/lib.rs` - CLI logic and execution modes
 3. `crates/g3-core/src/lib.rs` - Agent struct and orchestration
 4. `crates/g3-providers/src/lib.rs` - Provider trait definition
 ### For Adding Features
 1. **New tool**: `crates/g3-core/src/tool_definitions.rs` → `crates/g3-core/src/tools/`
 2. **New provider**: `crates/g3-providers/src/` → implement `LLMProvider` trait
 3. **New CLI mode**: `crates/g3-cli/src/lib.rs`
 4. **New config option**: `crates/g3-config/src/lib.rs`
 ### For Debugging
 1. Session logs: `.g3/sessions/<session_id>/session.json`
 2. Error logs: `logs/errors/`
 3. Context state: Use `/stats` command in interactive mode
 ## Dangerous/Subtle Code Paths
 ### Context Window Management (`g3-core/src/context_window.rs`)
 - **Thinning**: Automatically replaces large tool results with file references
 - **Summarization**: Compresses conversation history at 80% capacity
 - **Token estimation**: Uses character-based heuristics, not exact tokenization
 - **Risk**: Incorrect token estimates can cause context overflow
 ### Streaming Parser (`g3-core/src/streaming_parser.rs`)
 - Parses LLM responses in real-time for tool calls
 - Must handle partial JSON across chunk boundaries
 - **Risk**: Malformed responses can cause parsing failures
 ### Tool Dispatch (`g3-core/src/tool_dispatch.rs`)
 - Routes tool calls to implementations
 - Handles both native and JSON-based tool calling
 - **Risk**: Missing dispatch cases cause silent failures
 ### Retry Logic (`g3-core/src/retry.rs`)
 - Exponential backoff with jitter
 - Different configs for interactive vs autonomous mode
 - **Risk**: Aggressive retries can hit rate limits harder
 ## Performance Constraints
 1. **Streaming is preferred** - Non-streaming requests block UI
 2. **Tool results are size-limited** - Large outputs are truncated or thinned
 3. **Concurrent tool calls** - Enabled by `allow_multiple_tool_calls` config
 4. **Background processes** - Long-running commands use `background_process` tool
 ## Testing Strategy
 ### Test Locations
 - Unit tests: `crates/*/tests/`
 - Integration tests: `crates/*/tests/`
 - Test fixtures: `examples/test_code/`
 ### Running Tests
 ```bash
 # All tests
 cargo test
 # Specific crate
 cargo test -p g3-core
 # With output
 cargo test -- --nocapture
 ```
 ### Test Considerations
 - Provider tests may require API keys
 - Computer control tests require OS permissions
 - WebDriver tests require browser setup
 ## Do's and Don'ts for Automated Changes
 ### Do
 - ✅ Run `cargo check` after modifications
 - ✅ Run `cargo test` before committing
 - ✅ Update tool definitions when adding tools
 - ✅ Add tests for new functionality
 - ✅ Use existing patterns for similar features
 - ✅ Keep functions under 80 lines
 - ✅ Update documentation for user-facing changes
 ### Don't
 - ❌ Modify `Cargo.toml` dependencies without justification
 - ❌ Add blocking code in async contexts
 - ❌ Store sensitive data in plain text
 - ❌ Ignore error handling
 - ❌ Create deeply nested conditionals (>6 levels)
 - ❌ Add external dependencies for simple tasks
 ## Common Incorrect Assumptions
 1. **"All providers support tool calling"** - Embedded models use JSON fallback
 2. **"Context window is unlimited"** - Each provider has limits (4k-200k tokens)
 3. **"Tool results are always small"** - File reads can return megabytes
 4. **"Sessions persist across runs"** - Sessions are ephemeral by default
 5. **"All platforms are equal"** - macOS has more features (Vision, Accessibility)
 ## Architecture Decisions
 See `DESIGN.md` for original design rationale.
 Key decisions:
 - **Rust for performance and safety** - Async runtime, memory safety
 - **Workspace structure** - Separation of concerns, independent compilation
 - **Provider abstraction** - Swap providers without code changes
 - **Tool-first philosophy** - Agent acts through tools, not just advice
 - **Session-scoped state** - TODO lists, logs tied to sessions
 ## File Structure Quick Reference
 ```
 g3/
 ├── src/main.rs                    # Entry point
 ├── crates/
 │   ├── g3-cli/src/
 │   │   ├── lib.rs                 # CLI logic (~112k chars)
 │   │   └── retro_tui.rs           # Retro TUI mode
 │   ├── g3-core/src/
 │   │   ├── lib.rs                 # Agent struct (~3400 lines)
 │   │   ├── context_window.rs      # Context management
 │   │   ├── tool_definitions.rs    # Tool schemas
 │   │   ├── tool_dispatch.rs       # Tool routing
 │   │   ├── tools/                 # Tool implementations
 │   │   ├── streaming_parser.rs    # Response parsing
 │   │   └── retry.rs               # Retry logic
 │   ├── g3-providers/src/
 │   │   ├── lib.rs                 # Provider trait
 │   │   ├── anthropic.rs           # Anthropic Claude
 │   │   ├── databricks.rs          # Databricks
 │   │   ├── openai.rs              # OpenAI
 │   │   └── embedded.rs            # Local models
 │   ├── g3-config/src/lib.rs       # Configuration
 │   ├── g3-planner/src/            # Planning mode
 │   ├── g3-ensembles/src/          # Flock mode
 │   └── g3-computer-control/src/   # Automation
 ├── agents/                         # Agent personas
 ├── docs/                           # Documentation
 └── logs/                           # Session logs
 ```
 ## Pointers to Documentation
 - [Architecture](docs/architecture.md) - System design and data flow
 - [Configuration](docs/configuration.md) - Config file format and options
 - [Tools Reference](docs/tools.md) - All available tools
 - [Providers Guide](docs/providers.md) - LLM provider setup
 - [Control Commands](docs/CONTROL_COMMANDS.md) - Interactive commands
 - [Code Search](docs/CODE_SEARCH.md) - Tree-sitter search guide
 - [Flock Mode](docs/FLOCK_MODE.md) - Multi-agent development
 - [macOS Accessibility](docs/macax-tools.md) - macOS automation
--- a/README.md
+++ b/README.md
@@ -338,6 +338,28 @@ G3 automatically saves session logs for each interaction in the `logs/` director
 The `logs/` directory is created automatically on first use and is excluded from version control.
 ## Documentation Map
 Detailed documentation is available in the `docs/` directory:
 | Document | Description |
 |----------|-------------|
 | [Architecture](docs/architecture.md) | System design, crate responsibilities, data flow |
 | [Configuration](docs/configuration.md) | Config file format, provider setup, all options |
 | [Tools Reference](docs/tools.md) | Complete reference for all available tools |
 | [Providers Guide](docs/providers.md) | LLM provider setup and selection guide |
 | [Control Commands](docs/CONTROL_COMMANDS.md) | Interactive `/` commands for context management |
 | [Code Search](docs/CODE_SEARCH.md) | Tree-sitter code search query patterns |
 | [Flock Mode](docs/FLOCK_MODE.md) | Parallel multi-agent development |
 | [macOS Accessibility](docs/macax-tools.md) | macOS Accessibility API automation |
 For AI agents working with this codebase, see [AGENTS.md](AGENTS.md).
 Additional resources:
 - `DESIGN.md` - Original design document and rationale
 - `config.example.toml` - Complete configuration example
 - `config.coach-player.example.toml` - Multi-role configuration example
 ## License
 MIT License - see LICENSE file for details
--- a/docs/CODE_SEARCH.md
+++ b/docs/CODE_SEARCH.md
@@ -0,0 +1,430 @@
 # G3 Code Search Guide
 **Last updated**: January 2025  
 **Source of truth**: `crates/g3-core/src/code_search/`, `crates/g3-core/src/tool_definitions.rs`
 ## Purpose
 G3 includes a syntax-aware code search tool powered by tree-sitter. Unlike text-based search (grep), it understands code structure and finds actual functions, classes, methods, and other constructs—ignoring matches in comments and strings.
 ## Why Use Code Search?
 | Feature | grep/ripgrep | code_search |
 |---------|--------------|-------------|
 | Finds text in comments | ✅ | ❌ |
 | Finds text in strings | ✅ | ❌ |
 | Understands code structure | ❌ | ✅ |
 | Finds function definitions | Regex needed | Native |
 | Finds class hierarchies | ❌ | ✅ |
 | Language-aware | ❌ | ✅ |
 **Use code_search when**:
 - Finding function/method definitions
 - Finding class/struct declarations
 - Searching for specific code constructs
 - Need accurate results without false positives
 **Use grep when**:
 - Searching non-code files (logs, markdown)
 - Simple string searches
 - Searching comments or documentation
 - Regex for text patterns
 ## Supported Languages
 - Rust
 - Python
 - JavaScript
 - TypeScript
 - Go
 - Java
 - C
 - C++
 - Kotlin
 ## Basic Usage
 ```json
 {"tool": "code_search", "args": {
  "searches": [{
    "name": "my_search",
    "query": "(function_item name: (identifier) @name)",
    "language": "rust"
  }]
 }}
 ```
 ### Parameters
 | Parameter | Type | Required | Description |
 |-----------|------|----------|-------------|
 | `searches` | array | Yes | Array of search objects (max 20) |
 | `max_concurrency` | integer | No | Parallel searches (default: 4) |
 | `max_matches_per_search` | integer | No | Max matches (default: 500) |
 ### Search Object
 | Field | Type | Required | Description |
 |-------|------|----------|-------------|
 | `name` | string | Yes | Label for this search |
 | `query` | string | Yes | Tree-sitter query (S-expression) |
 | `language` | string | Yes | Programming language |
 | `paths` | array | No | Paths to search (default: current dir) |
 | `context_lines` | integer | No | Lines of context (0-20, default: 0) |
 ## Query Syntax
 Tree-sitter queries use S-expression syntax. The basic pattern is:
 ```
 (node_type field: (child_type) @capture_name)
 ```
 - `node_type`: The AST node to match
 - `field`: Optional field name
 - `child_type`: Type of child node
 - `@capture_name`: Name for the captured node
 ## Common Query Patterns
 ### Rust
 ```lisp
 ;; All functions
 (function_item name: (identifier) @name)
 ;; Async functions
 (function_item (function_modifiers) name: (identifier) @name)
 ;; Structs
 (struct_item name: (type_identifier) @name)
 ;; Enums
 (enum_item name: (type_identifier) @name)
 ;; Impl blocks
 (impl_item type: (type_identifier) @name)
 ;; Trait definitions
 (trait_item name: (type_identifier) @name)
 ;; Macros
 (macro_definition name: (identifier) @name)
 ;; Constants
 (const_item name: (identifier) @name)
 ;; Static variables
 (static_item name: (identifier) @name)
 ;; Type aliases
 (type_item name: (type_identifier) @name)
 ;; Modules
 (mod_item name: (identifier) @name)
 ```
 ### Python
 ```lisp
 ;; Functions
 (function_definition name: (identifier) @name)
 ;; Async functions
 (function_definition name: (identifier) @name) @fn
 ;; Classes
 (class_definition name: (identifier) @name)
 ;; Methods (functions inside classes)
 (class_definition
  body: (block
    (function_definition name: (identifier) @name)))
 ;; Decorators
 (decorator) @decorator
 ;; Imports
 (import_statement) @import
 (import_from_statement) @import
 ```
 ### JavaScript / TypeScript
 ```lisp
 ;; Function declarations
 (function_declaration name: (identifier) @name)
 ;; Arrow functions assigned to variables
 (variable_declarator
  name: (identifier) @name
  value: (arrow_function))
 ;; Classes
 (class_declaration name: (identifier) @name)
 ;; Methods
 (method_definition name: (property_identifier) @name)
 ;; Exports
 (export_statement) @export
 ;; Imports
 (import_statement) @import
 ```
 ### Go
 ```lisp
 ;; Functions
 (function_declaration name: (identifier) @name)
 ;; Methods
 (method_declaration name: (field_identifier) @name)
 ;; Structs
 (type_declaration
  (type_spec name: (type_identifier) @name
    type: (struct_type)))
 ;; Interfaces
 (type_declaration
  (type_spec name: (type_identifier) @name
    type: (interface_type)))
 ```
 ### Java
 ```lisp
 ;; Classes
 (class_declaration name: (identifier) @name)
 ;; Interfaces
 (interface_declaration name: (identifier) @name)
 ;; Methods
 (method_declaration name: (identifier) @name)
 ;; Constructors
 (constructor_declaration name: (identifier) @name)
 ;; Fields
 (field_declaration
  declarator: (variable_declarator name: (identifier) @name))
 ```
 ### C / C++
 ```lisp
 ;; Functions
 (function_definition
  declarator: (function_declarator
    declarator: (identifier) @name))
 ;; Structs (C)
 (struct_specifier name: (type_identifier) @name)
 ;; Classes (C++)
 (class_specifier name: (type_identifier) @name)
 ;; Namespaces (C++)
 (namespace_definition name: (identifier) @name)
 ```
 ## Advanced Queries
 ### Wildcards
 Use `_` to match any node:
 ```lisp
 ;; Any function with any name
 (function_item name: (_) @name)
 ```
 ### Alternatives
 Match multiple patterns:
 ```lisp
 ;; Functions or methods
 [(function_item) (impl_item)] @item
 ```
 ### Predicates
 Filter matches:
 ```lisp
 ;; Functions starting with "test_"
 (function_item name: (identifier) @name
  (#match? @name "^test_"))
 ;; Functions NOT starting with "_"
 (function_item name: (identifier) @name
  (#not-match? @name "^_"))
 ```
 ### Nested Matches
 ```lisp
 ;; Methods inside impl blocks
 (impl_item
  body: (declaration_list
    (function_item name: (identifier) @method_name)))
 ```
 ## Batch Searches
 Run multiple searches in parallel:
 ```json
 {"tool": "code_search", "args": {
  "searches": [
    {
      "name": "functions",
      "query": "(function_item name: (identifier) @name)",
      "language": "rust"
    },
    {
      "name": "structs",
      "query": "(struct_item name: (type_identifier) @name)",
      "language": "rust"
    },
    {
      "name": "tests",
      "query": "(function_item name: (identifier) @name (#match? @name \"^test_\"))",
      "language": "rust",
      "paths": ["tests/"]
    }
  ],
  "max_concurrency": 4
 }}
 ```
 ## Context Lines
 Include surrounding code:
 ```json
 {"tool": "code_search", "args": {
  "searches": [{
    "name": "functions",
    "query": "(function_item name: (identifier) @name)",
    "language": "rust",
    "context_lines": 3
  }]
 }}
 ```
 This shows 3 lines before and after each match.
 ## Path Filtering
 Search specific directories:
 ```json
 {"tool": "code_search", "args": {
  "searches": [{
    "name": "core_functions",
    "query": "(function_item name: (identifier) @name)",
    "language": "rust",
    "paths": ["src/core", "src/lib.rs"]
  }]
 }}
 ```
 ## Output Format
 Results include:
 - File path
 - Line number
 - Matched code
 - Context (if requested)
 ```
 === functions (15 matches) ===
 src/lib.rs:42
  fn process_request(req: Request) -> Response {
 src/lib.rs:78
  fn handle_error(err: Error) -> Result<()> {
 src/utils.rs:15
  fn format_output(data: &str) -> String {
 ```
 ## Tips
 ### Finding the Right Query
 1. **Start simple**: Begin with basic node types
 2. **Use AST explorer**: Understand your language's AST
 3. **Iterate**: Refine queries based on results
 ### Performance
 - **Limit paths**: Search specific directories when possible
 - **Use concurrency**: Batch related searches
 - **Set max_matches**: Prevent overwhelming output
 ### Debugging Queries
 If a query returns no results:
 1. Check language spelling (lowercase)
 2. Verify node type names for your language
 3. Start with simpler query, add constraints
 4. Check if files exist in search paths
 ## Examples by Task
 ### Find all public functions in Rust
 ```json
 {"tool": "code_search", "args": {
  "searches": [{
    "name": "public_fns",
    "query": "(function_item (visibility_modifier) name: (identifier) @name)",
    "language": "rust"
  }]
 }}
 ```
 ### Find all test functions
 ```json
 {"tool": "code_search", "args": {
  "searches": [{
    "name": "tests",
    "query": "(function_item name: (identifier) @name (#match? @name \"^test_\"))",
    "language": "rust",
    "paths": ["tests/"]
  }]
 }}
 ```
 ### Find all API endpoints (Python Flask)
 ```json
 {"tool": "code_search", "args": {
  "searches": [{
    "name": "routes",
    "query": "(decorated_definition (decorator) @dec (function_definition name: (identifier) @name))",
    "language": "python"
  }]
 }}
 ```
 ### Find all React components
 ```json
 {"tool": "code_search", "args": {
  "searches": [{
    "name": "components",
    "query": "(function_declaration name: (identifier) @name (#match? @name \"^[A-Z]\"))",
    "language": "javascript",
    "paths": ["src/components/"]
  }]
 }}
 ```
--- a/docs/CONTROL_COMMANDS.md
+++ b/docs/CONTROL_COMMANDS.md
@@ -0,0 +1,224 @@
 # G3 Control Commands
 **Last updated**: January 2025  
 **Source of truth**: `crates/g3-cli/src/lib.rs`
 ## Purpose
 Control commands are special commands you can use during an interactive G3 session to manage context, refresh documentation, and view statistics. They start with `/` and are processed by the CLI, not sent to the LLM.
 ## Available Commands
 | Command | Description |
 |---------|-------------|
 | `/compact` | Manually trigger conversation summarization |
 | `/thinnify` | Replace large tool results with file references (first third) |
 | `/skinnify` | Full context thinning (entire context window) |
 | `/readme` | Reload README.md and AGENTS.md from disk |
 | `/stats` | Show detailed context and performance statistics |
 | `/help` | Display all available control commands |
 ---
 ## /compact
 Manually trigger conversation summarization to reduce context size.
 **When to use**:
 - Context usage is getting high (70%+)
 - You want to start a new phase of work
 - Conversation has accumulated irrelevant history
 **What it does**:
 1. Sends conversation history to LLM for summarization
 2. Replaces detailed history with concise summary
 3. Preserves key decisions and context
 4. Significantly reduces token usage
 **Example**:
 ```
 g3> /compact
 📝 Compacting conversation history...
 ✅ Reduced context from 45,000 to 8,000 tokens (82% reduction)
 ```
 **Notes**:
 - Summarization uses tokens, so there's a small cost
 - Some detail is lost; use before major context shifts
 - Auto-triggered at 80% context usage if `auto_compact = true`
 ---
 ## /thinnify
 Replace large tool results with file references to save context space.
 **When to use**:
 - Large file contents are consuming context
 - Tool outputs are taking up space
 - You want to preserve conversation structure but reduce size
 **What it does**:
 1. Scans the first third of context for large tool results
 2. Saves content to `.g3/sessions/<session>/thinned/`
 3. Replaces inline content with file reference
 4. Preserves the ability to re-read if needed
 **Example**:
 ```
 g3> /thinnify
 🔧 Thinning context window...
 ✅ Thinned 3 large tool results, saved 12,000 characters
 ```
 **Notes**:
 - Only processes the first third of context (older content)
 - Recent tool results are preserved inline
 - Auto-triggered at 50%, 60%, 70%, 80% thresholds
 ---
 ## /skinnify
 Full context thinning - processes the entire context window.
 **When to use**:
 - Context is critically full
 - `/thinnify` wasn't enough
 - You need maximum space recovery
 **What it does**:
 - Same as `/thinnify` but processes entire context
 - More aggressive space recovery
 - May thin recent tool results too
 **Example**:
 ```
 g3> /skinnify
 🔧 Full context thinning...
 ✅ Thinned 8 tool results, saved 35,000 characters
 ```
 **Notes**:
 - Use sparingly; may thin content you still need inline
 - Consider `/compact` first for better context preservation
 ---
 ## /readme
 Reload README.md and AGENTS.md from disk without restarting.
 **When to use**:
 - You've updated project documentation
 - AGENTS.md has new instructions
 - README.md has changed
 **What it does**:
 1. Re-reads README.md from workspace root
 2. Re-reads AGENTS.md from workspace root
 3. Updates the agent's system context
 4. New instructions take effect immediately
 **Example**:
 ```
 g3> /readme
 📖 Reloading documentation...
 ✅ Loaded README.md (5,234 chars)
 ✅ Loaded AGENTS.md (2,100 chars)
 ```
 **Notes**:
 - Useful during iterative documentation updates
 - Changes apply to subsequent messages
 - Previous context retains old documentation
 ---
 ## /stats
 Show detailed context and performance statistics.
 **What it shows**:
 - Current context usage (tokens and percentage)
 - Session duration
 - Token usage breakdown
 - Tool call metrics
 - Thinning and summarization events
 - First-token latency statistics
 **Example**:
 ```
 g3> /stats
 📊 Session Statistics
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 Context Usage:     45,230 / 200,000 tokens (22.6%)
 Session Duration:  1h 23m 45s
 Total Tokens Used: 125,430
 Tool Calls:        47 (45 successful, 2 failed)
 Thinning Events:   3 (saved 28,000 chars)
 Summarizations:    1 (saved 35,000 chars)
 Avg First Token:   1.2s
 ```
 ---
 ## /help
 Display all available control commands with brief descriptions.
 **Example**:
 ```
 g3> /help
 📚 Available Commands:
  /compact   - Summarize conversation to reduce context
  /thinnify  - Replace large tool results with file refs
  /skinnify  - Full context thinning (entire window)
  /readme    - Reload README.md and AGENTS.md
  /stats     - Show context and performance statistics
  /help      - Show this help message
 ```
 ---
 ## Context Management Strategy
 G3 automatically manages context, but manual intervention can help:
 ### Proactive Management
 1. **Check stats regularly**: Use `/stats` to monitor usage
 2. **Thin early**: Use `/thinnify` before hitting thresholds
 3. **Compact at transitions**: Use `/compact` when switching tasks
 ### Reactive Management
 When context gets high:
 1. **50-70%**: Consider `/thinnify`
 2. **70-80%**: Use `/compact`
 3. **80-90%**: Use `/skinnify` then `/compact`
 4. **90%+**: Auto-summarization triggers
 ### Best Practices
 - **Long sessions**: Compact periodically to maintain quality
 - **Large files**: Thin after reading large codebases
 - **Documentation updates**: Use `/readme` instead of restarting
 - **Before complex tasks**: Ensure adequate context space
 ---
 ## Automatic Context Management
 G3 performs automatic context management:
 | Threshold | Action |
 |-----------|--------|
 | 50% | Thin oldest third of context |
 | 60% | Thin oldest third of context |
 | 70% | Thin oldest third of context |
 | 80% | Auto-summarization (if `auto_compact = true`) |
 | 90% | Aggressive thinning before tool calls |
 Manual commands give you finer control over when and how this happens.
--- a/docs/FLOCK_MODE.md
+++ b/docs/FLOCK_MODE.md
@@ -0,0 +1,397 @@
 # G3 Flock Mode Guide
 **Last updated**: January 2025  
 **Source of truth**: `crates/g3-ensembles/src/flock.rs`
 ## Purpose
 Flock mode enables parallel multi-agent development by spawning multiple G3 agent instances that work on different parts of a project simultaneously. This is useful for large projects with modular architectures where independent components can be developed in parallel.
 ## Overview
 In Flock mode:
 - Multiple agent instances run concurrently
 - Each agent works on a specific module or component
 - Agents operate independently but share the same codebase
 - Progress is tracked and coordinated centrally
 ```
 ┌─────────────────────────────────────────────────────────┐
 │                    Flock Coordinator                     │
 │                                                         │
 │  ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────┐   │
 │  │ Agent 1 │  │ Agent 2 │  │ Agent 3 │  │ Agent N │   │
 │  │ Module A│  │ Module B│  │ Module C│  │ Module N│   │
 │  └─────────┘  └─────────┘  └─────────┘  └─────────┘   │
 │       │            │            │            │         │
 │       ▼            ▼            ▼            ▼         │
 │  ┌─────────────────────────────────────────────────┐   │
 │  │              Shared Codebase                     │   │
 │  └─────────────────────────────────────────────────┘   │
 └─────────────────────────────────────────────────────────┘
 ```
 ## When to Use Flock Mode
 **Good candidates**:
 - Microservices architectures
 - Projects with independent modules
 - Large refactoring across multiple files
 - Parallel feature development
 - Test suite expansion
 **Not recommended for**:
 - Tightly coupled code
 - Sequential dependencies
 - Small projects
 - Single-file changes
 ## Configuration
 Flock mode is configured through a YAML manifest file:
 ```yaml
 # flock.yaml
 name: "my-project-flock"
 description: "Parallel development of project modules"
 # Global settings
 settings:
  max_agents: 4
  timeout_minutes: 60
  provider: "anthropic.default"
 # Agent definitions
 agents:
  - name: "api-agent"
    description: "Develops the REST API layer"
    working_dir: "src/api"
    requirements: |
      Implement REST endpoints for user management:
      - GET /users
      - POST /users
      - GET /users/{id}
      - PUT /users/{id}
      - DELETE /users/{id}
  - name: "db-agent"
    description: "Develops the database layer"
    working_dir: "src/db"
    requirements: |
      Implement database models and queries:
      - User model with CRUD operations
      - Connection pooling
      - Migration support
  - name: "test-agent"
    description: "Writes integration tests"
    working_dir: "tests"
    requirements: |
      Write integration tests for:
      - API endpoints
      - Database operations
      - Error handling
 ```
 ## Usage
 ### Starting a Flock
 ```bash
 # Start flock with manifest
 g3 --flock flock.yaml
 # Start with specific agents only
 g3 --flock flock.yaml --agents api-agent,db-agent
 # Start with custom timeout
 g3 --flock flock.yaml --timeout 120
 ```
 ### Monitoring Progress
 Flock mode provides real-time status updates:
 ```
 🐦 Flock Status: my-project-flock
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  api-agent     [████████░░] 80%  Implementing DELETE endpoint
  db-agent      [██████████] 100% ✅ Complete
  test-agent    [██████░░░░] 60%  Writing error handling tests
 Elapsed: 15m 32s | Tokens: 45,230 | Errors: 0
 ```
 ### Stopping a Flock
 ```bash
 # Graceful stop (wait for current tasks)
 Ctrl+C
 # Force stop all agents
 Ctrl+C Ctrl+C
 ```
 ## Agent Communication
 Agents in a flock operate independently but can:
 1. **Read shared files**: All agents can read the entire codebase
 2. **Write to their area**: Each agent writes to its designated working directory
 3. **Signal completion**: Agents report when their tasks are done
 4. **Report errors**: Failures are logged and can trigger coordinator action
 ### Conflict Prevention
 To prevent conflicts:
 - Assign non-overlapping working directories
 - Use clear module boundaries
 - Define explicit interfaces between modules
 - Run integration after all agents complete
 ## Status Tracking
 Flock status is tracked in `.g3/flock/`:
 ```
 .g3/flock/
 ├── status.json           # Overall flock status
 ├── api-agent/
 │   ├── session.json      # Agent session log
 │   └── todo.g3.md        # Agent's TODO list
 ├── db-agent/
 │   ├── session.json
 │   └── todo.g3.md
 └── test-agent/
    ├── session.json
    └── todo.g3.md
 ```
 ### Status File Format
 ```json
 {
  "flock_name": "my-project-flock",
  "started_at": "2025-01-03T10:00:00Z",
  "status": "running",
  "agents": [
    {
      "name": "api-agent",
      "status": "running",
      "progress": 80,
      "current_task": "Implementing DELETE endpoint",
      "tokens_used": 15000,
      "errors": 0
    }
  ]
 }
 ```
 ## Best Practices
 ### 1. Define Clear Boundaries
 ```yaml
 # Good: Clear module separation
 agents:
  - name: "frontend"
    working_dir: "src/frontend"
  - name: "backend"
    working_dir: "src/backend"
 # Bad: Overlapping directories
 agents:
  - name: "agent1"
    working_dir: "src"
  - name: "agent2"
    working_dir: "src/utils"  # Overlaps with agent1!
 ```
 ### 2. Specify Interfaces First
 Define shared interfaces before parallel development:
 ```yaml
 agents:
  - name: "interface-agent"
    priority: 1  # Runs first
    requirements: |
      Define shared interfaces in src/interfaces/:
      - UserService trait
      - DatabaseConnection trait
      - Error types
  - name: "impl-agent"
    priority: 2  # Runs after interfaces
    depends_on: ["interface-agent"]
    requirements: |
      Implement UserService trait...
 ```
 ### 3. Use Appropriate Granularity
 - **Too few agents**: Doesn't leverage parallelism
 - **Too many agents**: Coordination overhead, potential conflicts
 - **Sweet spot**: 2-6 agents for most projects
 ### 4. Include a Test Agent
 Always include an agent for testing:
 ```yaml
 agents:
  - name: "test-agent"
    working_dir: "tests"
    requirements: |
      Write tests for all new functionality.
      Run tests after other agents complete.
 ```
 ### 5. Plan for Integration
 After flock completion:
 ```bash
 # Run all tests
 cargo test
 # Check for conflicts
 git status
 # Review changes
 git diff
 ```
 ## Error Handling
 ### Agent Failures
 If an agent fails:
 1. Error is logged to agent's session
 2. Coordinator is notified
 3. Other agents continue (by default)
 4. Failed agent can be restarted
 ### Restart Failed Agent
 ```bash
 # Restart specific agent
 g3 --flock flock.yaml --restart api-agent
 # Restart all failed agents
 g3 --flock flock.yaml --restart-failed
 ```
 ### Conflict Resolution
 If agents modify the same file:
 1. Last write wins (by default)
 2. Conflicts are logged
 3. Manual resolution may be needed
 ## Resource Management
 ### Token Usage
 Each agent has its own token budget:
 ```yaml
 settings:
  max_tokens_per_agent: 100000
  total_token_budget: 500000
 ```
 ### Concurrency
 Limit concurrent agents based on:
 - API rate limits
 - System resources
 - Provider capacity
 ```yaml
 settings:
  max_concurrent_agents: 3  # Run at most 3 at once
 ```
 ## Example: Microservices Project
 ```yaml
 name: "microservices-flock"
 settings:
  max_agents: 5
  provider: "anthropic.default"
 agents:
  - name: "user-service"
    working_dir: "services/user"
    requirements: |
      Implement user service:
      - User registration
      - Authentication
      - Profile management
  - name: "order-service"
    working_dir: "services/order"
    requirements: |
      Implement order service:
      - Order creation
      - Order status tracking
      - Payment integration
  - name: "inventory-service"
    working_dir: "services/inventory"
    requirements: |
      Implement inventory service:
      - Stock management
      - Availability checking
      - Reorder alerts
  - name: "gateway"
    working_dir: "services/gateway"
    requirements: |
      Implement API gateway:
      - Request routing
      - Authentication middleware
      - Rate limiting
  - name: "integration-tests"
    working_dir: "tests/integration"
    depends_on: ["user-service", "order-service", "inventory-service", "gateway"]
    requirements: |
      Write integration tests for:
      - End-to-end order flow
      - Service communication
      - Error scenarios
 ```
 ## Limitations
 - **No real-time coordination**: Agents don't communicate during execution
 - **File conflicts**: Possible if boundaries aren't clear
 - **Resource intensive**: Multiple LLM calls in parallel
 - **Debugging complexity**: Multiple logs to review
 ## Troubleshooting
 ### Agents Not Starting
 1. Check manifest syntax (YAML)
 2. Verify working directories exist
 3. Check provider configuration
 4. Review logs in `.g3/flock/`
 ### Slow Progress
 1. Reduce number of concurrent agents
 2. Check for rate limiting
 3. Simplify requirements
 4. Use faster provider
 ### Inconsistent Results
 1. Define clearer interfaces
 2. Add more specific requirements
 3. Use lower temperature
 4. Add validation steps
--- a/docs/architecture.md
+++ b/docs/architecture.md
@@ -0,0 +1,363 @@
 # G3 Architecture
 **Last updated**: January 2025  
 **Source of truth**: Crate structure in `crates/`, `Cargo.toml`, `DESIGN.md`
 ## Purpose
 This document describes the internal architecture of G3, a modular AI coding agent built in Rust. It is intended for developers who want to understand, extend, or maintain the codebase.
 ## High-Level Overview
 G3 follows a **tool-first philosophy**: instead of just providing advice, it actively uses tools to read files, write code, execute commands, and complete tasks autonomously.
 ```
 ┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
 │   g3-cli        │    │   g3-core       │    │ g3-providers    │
 │                 │    │                 │    │                 │
 │ • CLI parsing   │◄──►│ • Agent engine  │◄──►│ • Anthropic     │
 │ • Interactive   │    │ • Context mgmt  │    │ • Databricks    │
 │ • Retro TUI     │    │ • Tool system   │    │ • OpenAI        │
 │ • Autonomous    │    │ • Streaming     │    │ • Embedded      │
 │   mode          │    │ • Task exec     │    │   (llama.cpp)   │
 │                 │    │ • TODO mgmt     │    │ • OAuth flow    │
 └─────────────────┘    └─────────────────┘    └─────────────────┘
         │                       │                       │
         └───────────────────────┼───────────────────────┘
                                 │
         ┌───────────────────────┼───────────────────────┐
         │                       │                       │
 ┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
 │ g3-execution    │    │   g3-config     │    │  g3-planner     │
 │                 │    │                 │    │                 │
 │ • Code exec     │    │ • TOML config   │    │ • Requirements  │
 │ • Shell cmds    │    │ • Env overrides │    │ • Git ops       │
 │ • Streaming     │    │ • Provider      │    │ • Planning      │
 │ • Error hdlg    │    │   settings      │    │   workflow      │
 └─────────────────┘    └─────────────────┘    └─────────────────┘
         │                       │                       │
         │              ┌─────────────────┐              │
         │              │ g3-computer-    │              │
         └─────────────►│   control       │◄─────────────┘
                        │ • Mouse/kbd     │
                        │ • Screenshots   │
                        │ • OCR/Vision    │
                        │ • WebDriver     │
                        │ • macOS Ax API  │
                        └─────────────────┘
                                 │
         ┌───────────────────────┼───────────────────────┐
         │                       │                       │
 ┌─────────────────┐    ┌─────────────────┐
 │ g3-ensembles    │    │   g3-console    │
 │                 │    │                 │
 │ • Flock mode    │    │ • Web console   │
 │ • Multi-agent   │    │ • Process mgmt  │
 │ • Parallel dev  │    │ • Log viewing   │
 └─────────────────┘    └─────────────────┘
 ```
 ## Workspace Structure
 G3 is organized as a Rust workspace with 9 crates:
 ```
 g3/
 ├── src/main.rs                   # Entry point (delegates to g3-cli)
 ├── crates/
 │   ├── g3-cli/                   # Command-line interface and TUI
 │   ├── g3-core/                  # Core agent engine and tools
 │   ├── g3-providers/             # LLM provider abstractions
 │   ├── g3-config/                # Configuration management
 │   ├── g3-execution/             # Code execution engine
 │   ├── g3-computer-control/      # Computer automation
 │   ├── g3-planner/               # Planning mode workflow
 │   ├── g3-ensembles/             # Multi-agent (flock) mode
 │   └── g3-console/               # Web monitoring console
 ├── agents/                       # Agent persona definitions
 ├── logs/                         # Session logs (auto-created)
 └── g3-plan/                      # Planning artifacts
 ```
 ## Crate Responsibilities
 ### g3-core (Central Hub)
 **Location**: `crates/g3-core/`  
 **Purpose**: Core agent engine, tool system, and orchestration logic
 Key modules:
 - `lib.rs` - Main `Agent` struct and orchestration (~3400 lines)
 - `context_window.rs` - Token tracking and context management
 - `streaming_parser.rs` - Real-time LLM response parsing
 - `tool_definitions.rs` - JSON schema definitions for all tools
 - `tool_dispatch.rs` - Routes tool calls to implementations
 - `tools/` - Tool implementations (file ops, shell, vision, webdriver, etc.)
 - `error_handling.rs` - Error classification and recovery
 - `retry.rs` - Retry logic with exponential backoff
 - `prompts.rs` - System prompt generation
 - `code_search/` - Tree-sitter based code search
 **Key types**:
 - `Agent<W: UiWriter>` - Main agent struct, generic over UI output
 - `ContextWindow` - Manages conversation history and token limits
 - `StreamingToolParser` - Parses streaming LLM responses for tool calls
 - `ToolCall` - Represents a tool invocation
 ### g3-providers (LLM Abstraction)
 **Location**: `crates/g3-providers/`  
 **Purpose**: Unified interface for multiple LLM backends
 Key modules:
 - `lib.rs` - `LLMProvider` trait and `ProviderRegistry`
 - `anthropic.rs` - Anthropic Claude API (~51k chars)
 - `databricks.rs` - Databricks Foundation Models (~58k chars)
 - `openai.rs` - OpenAI and compatible APIs (~18k chars)
 - `embedded.rs` - Local models via llama.cpp (~34k chars)
 - `oauth.rs` - OAuth authentication flow
 **Key traits**:
 ```rust
 #[async_trait]
 pub trait LLMProvider: Send + Sync {
    async fn complete(&self, request: CompletionRequest) -> Result<CompletionResponse>;
    async fn stream(&self, request: CompletionRequest) -> Result<CompletionStream>;
    fn name(&self) -> &str;
    fn model(&self) -> &str;
    fn has_native_tool_calling(&self) -> bool;
    fn supports_cache_control(&self) -> bool;
    fn max_tokens(&self) -> u32;
    fn temperature(&self) -> f32;
 }
 ```
 ### g3-cli (User Interface)
 **Location**: `crates/g3-cli/`  
 **Purpose**: Command-line interface, TUI, and execution modes
 Key modules:
 - `lib.rs` - Main CLI logic and execution modes (~112k chars)
 - `retro_tui.rs` - Full-screen retro terminal UI (~63k chars)
 - `filter_json.rs` - JSON tool call filtering for display
 - `ui_writer_impl.rs` - Console output implementation
 - `theme.rs` - Color themes for retro mode
 **Execution modes**:
 1. **Single-shot**: `g3 "task description"` - Execute one task and exit
 2. **Interactive**: `g3` - REPL-style conversation (default)
 3. **Autonomous**: `g3 --autonomous` - Coach-player feedback loop
 4. **Accumulative**: Default interactive mode with autonomous runs
 5. **Planning**: `g3 --planning` - Requirements-driven development
 6. **Retro TUI**: `g3 --retro` - Full-screen terminal interface
 ### g3-config (Configuration)
 **Location**: `crates/g3-config/`  
 **Purpose**: TOML-based configuration management
 Key structures:
 - `Config` - Root configuration
 - `ProvidersConfig` - Provider settings with named configs
 - `AgentConfig` - Agent behavior settings
 - `WebDriverConfig` - Browser automation settings
 - `MacAxConfig` - macOS Accessibility API settings
 **Configuration hierarchy** (highest priority last):
 1. Default configuration
 2. `~/.config/g3/config.toml`
 3. `./g3.toml`
 4. Environment variables (`G3_*`)
 5. CLI arguments
 ### g3-execution (Code Execution)
 **Location**: `crates/g3-execution/`  
 **Purpose**: Safe execution of shell commands and scripts
 Features:
 - Streaming output capture
 - Exit code tracking
 - Async execution via Tokio
 - Error handling and formatting
 ### g3-computer-control (Automation)
 **Location**: `crates/g3-computer-control/`  
 **Purpose**: Cross-platform computer control and automation
 Key modules:
 - `platform/` - Platform-specific implementations (macOS, Linux, Windows)
 - `webdriver/` - Safari and Chrome WebDriver integration
 - `ocr/` - Text extraction (Tesseract, Apple Vision)
 - `macax/` - macOS Accessibility API controller
 **Platform support**:
 - **macOS**: Core Graphics, Cocoa, screencapture, Vision framework
 - **Linux**: X11/Xtest for input
 - **Windows**: Win32 APIs
 ### g3-planner (Planning Mode)
 **Location**: `crates/g3-planner/`  
 **Purpose**: Requirements-driven development workflow
 Key modules:
 - `planner.rs` - Main planning state machine (~40k chars)
 - `state.rs` - Planning state management
 - `git.rs` - Git operations
 - `code_explore.rs` - Codebase exploration
 - `llm.rs` - LLM interactions for planning
 - `history.rs` - Planning history tracking
 **Workflow**:
 1. Write requirements in `<codepath>/g3-plan/new_requirements.md`
 2. LLM refines requirements
 3. Requirements renamed to `current_requirements.md`
 4. Coach/player loop implements
 5. Files archived with timestamps
 6. Git commit with LLM-generated message
 ### g3-ensembles (Multi-Agent)
 **Location**: `crates/g3-ensembles/`  
 **Purpose**: Parallel multi-agent development (Flock mode)
 Key modules:
 - `flock.rs` - Flock orchestration (~43k chars)
 - `status.rs` - Agent status tracking
 Flock mode enables parallel development by spawning multiple agent instances working on different parts of a project.
 ### g3-console (Web Console)
 **Location**: `crates/g3-console/`  
 **Purpose**: Web-based monitoring and control
 Key modules:
 - `main.rs` - Axum web server
 - `api/` - REST API endpoints
 - `process/` - Process detection and control
 - `logs.rs` - Log parsing and streaming
 ## Data Flow
 ### Request Flow
 ```
 User Input
    │
    ▼
 ┌─────────────┐
 │  g3-cli     │  Parse input, determine mode
 └─────────────┘
    │
    ▼
 ┌─────────────┐
 │  g3-core    │  Add to context window
 │  Agent      │  Build completion request
 └─────────────┘
    │
    ▼
 ┌─────────────┐
 │ g3-providers│  Send to LLM provider
 │ Registry    │  Stream response
 └─────────────┘
    │
    ▼
 ┌─────────────┐
 │  g3-core    │  Parse streaming response
 │  Parser     │  Detect tool calls
 └─────────────┘
    │
    ▼
 ┌─────────────┐
 │  g3-core    │  Execute tools
 │  Tools      │  Return results
 └─────────────┘
    │
    ▼
 ┌─────────────┐
 │  g3-core    │  Add results to context
 │  Agent      │  Continue or complete
 └─────────────┘
 ```
 ### Context Window Management
 The `ContextWindow` struct manages conversation history with intelligent token tracking:
 1. **Token Tracking**: Monitors usage as percentage of provider's context limit
 2. **Context Thinning**: At 50%, 60%, 70%, 80% thresholds, replaces large tool results with file references
 3. **Auto-Summarization**: At 80% capacity, triggers conversation summarization
 4. **Provider Adaptation**: Adjusts to different model context windows (4k to 200k+ tokens)
 ## Error Handling
 G3 implements comprehensive error handling:
 1. **Error Classification**: Distinguishes recoverable vs non-recoverable errors
 2. **Automatic Retry**: Exponential backoff with jitter for:
   - Rate limits (HTTP 429)
   - Network errors
   - Server errors (HTTP 5xx)
   - Timeouts
 3. **Error Logging**: Detailed logs saved to `logs/errors/`
 4. **Graceful Degradation**: Continues when possible, fails gracefully when not
 ## Session Management
 Sessions are tracked in `.g3/sessions/<session_id>/`:
 - `session.json` - Full conversation history and metadata
 - `todo.g3.md` - Session-scoped TODO list
 - Context summaries and thinned content
 Legacy logs are stored in `logs/g3_session_*.json`.
 ## Extension Points
 ### Adding a New Tool
 1. Add tool definition in `g3-core/src/tool_definitions.rs`
 2. Implement handler in `g3-core/src/tools/`
 3. Add dispatch case in `g3-core/src/tool_dispatch.rs`
 4. Update system prompt if needed in `g3-core/src/prompts.rs`
 ### Adding a New Provider
 1. Implement `LLMProvider` trait in `g3-providers/src/`
 2. Add configuration struct in `g3-config/src/lib.rs`
 3. Register provider in `g3-core/src/lib.rs` (in `new_with_mode_and_readme`)
 4. Update documentation
 ### Adding a New Execution Mode
 1. Add CLI arguments in `g3-cli/src/lib.rs`
 2. Implement mode logic in the CLI
 3. May require new agent methods in `g3-core`
 ## Key Files for Understanding
 Start reading here:
 1. `src/main.rs` - Entry point (trivial, delegates to g3-cli)
 2. `crates/g3-cli/src/lib.rs` - CLI and execution modes
 3. `crates/g3-core/src/lib.rs` - Agent implementation
 4. `crates/g3-providers/src/lib.rs` - Provider trait and registry
 5. `crates/g3-core/src/tool_definitions.rs` - Available tools
 6. `crates/g3-config/src/lib.rs` - Configuration structures
 7. `DESIGN.md` - Original design document
 ## Dependencies
 Key external dependencies:
 - **tokio**: Async runtime
 - **reqwest**: HTTP client for API calls
 - **serde/serde_json**: Serialization
 - **clap**: CLI argument parsing
 - **tree-sitter**: Syntax-aware code search
 - **llama_cpp**: Local model inference (with Metal acceleration)
 - **fantoccini**: WebDriver client
 - **axum**: Web framework (for g3-console)
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -0,0 +1,385 @@
 # G3 Configuration Guide
 **Last updated**: January 2025  
 **Source of truth**: `crates/g3-config/src/lib.rs`, `config.example.toml`
 ## Purpose
 This document explains how to configure G3, including provider setup, agent behavior, and optional features like WebDriver and computer control.
 ## Configuration File Location
 G3 looks for configuration files in this order:
 1. Path specified via `--config` CLI argument
 2. `./g3.toml` (current directory)
 3. `~/.config/g3/config.toml` (user config)
 4. `~/.g3.toml` (legacy location)
 If no configuration file exists, G3 creates a default one at `~/.config/g3/config.toml` on first run.
 ## Configuration Format
 G3 uses TOML format. The configuration is organized into sections:
 ```toml
 [providers]           # LLM provider settings
 [agent]               # Agent behavior settings
 [computer_control]    # Mouse/keyboard automation
 [webdriver]           # Browser automation
 [macax]               # macOS Accessibility API
 ```
 ## Provider Configuration
 ### Provider Reference Format
 Providers are referenced using the format: `<provider_type>.<config_name>`
 Examples:
 - `anthropic.default`
 - `databricks.production`
 - `openai.gpt4`
 - `embedded.local`
 ### Basic Provider Setup
 ```toml
 [providers]
 # Default provider used for all operations
 default_provider = "anthropic.default"
 # Optional: Different providers for different roles
 # planner = "anthropic.planner"   # Planning mode
 # coach = "anthropic.default"     # Code reviewer in autonomous mode
 # player = "anthropic.default"    # Code implementer in autonomous mode
 ```
 ### Anthropic Configuration
 ```toml
 [providers.anthropic.default]
 api_key = "sk-ant-..."           # Required: Your Anthropic API key
 model = "claude-sonnet-4-5"      # Model to use
 max_tokens = 64000               # Max output tokens per request
 temperature = 0.3                # Sampling temperature (0.0-1.0)
 # cache_config = "ephemeral"     # Optional: Enable prompt caching
 # enable_1m_context = true        # Optional: Enable 1M context (extra cost)
 # thinking_budget_tokens = 10000  # Optional: Extended thinking mode
 ```
 **Available Anthropic models**:
 - `claude-sonnet-4-5` (recommended)
 - `claude-opus-4-5`
 - `claude-3-5-sonnet-20241022`
 - `claude-3-opus-20240229`
 ### Databricks Configuration
 ```toml
 [providers.databricks.default]
 host = "https://your-workspace.cloud.databricks.com"  # Required
 model = "databricks-claude-sonnet-4"                   # Model endpoint
 max_tokens = 4096
 temperature = 0.1
 use_oauth = true                 # Use OAuth (recommended)
 # token = "dapi..."              # Or use personal access token
 ```
 **OAuth vs Token Authentication**:
 - **OAuth** (`use_oauth = true`): Opens browser for authentication, tokens refresh automatically
 - **Token** (`token = "..."`, `use_oauth = false`): Uses personal access token directly
 ### OpenAI Configuration
 ```toml
 [providers.openai.default]
 api_key = "sk-..."               # Required: Your OpenAI API key
 model = "gpt-4-turbo"            # Model to use
 max_tokens = 4096
 temperature = 0.1
 # base_url = "https://api.openai.com/v1"  # Optional: Custom endpoint
 ```
 ### OpenAI-Compatible Providers
 For services with OpenAI-compatible APIs (OpenRouter, Groq, Together, etc.):
 ```toml
 [providers.openai_compatible.openrouter]
 api_key = "sk-or-..."            # Provider's API key
 model = "anthropic/claude-3.5-sonnet"
 base_url = "https://openrouter.ai/api/v1"
 max_tokens = 4096
 temperature = 0.1
 [providers.openai_compatible.groq]
 api_key = "gsk_..."
 model = "llama-3.3-70b-versatile"
 base_url = "https://api.groq.com/openai/v1"
 max_tokens = 4096
 temperature = 0.1
 ```
 Reference these as `openrouter.default` or `groq.default` in `default_provider`.
 ### Embedded (Local) Models
 ```toml
 [providers.embedded.default]
 model_path = "~/.cache/g3/models/qwen2.5-7b-instruct-q3_k_m.gguf"
 model_type = "qwen"              # Model architecture
 context_length = 32768           # Context window size
 max_tokens = 2048                # Max output tokens
 temperature = 0.1
 gpu_layers = 32                  # Layers to offload to GPU (Metal/CUDA)
 threads = 8                      # CPU threads for inference
 ```
 **Supported model types**: `qwen`, `codellama`, `llama`, `mistral`
 **Hardware requirements**:
 - 4-16GB RAM depending on model size
 - Optional GPU acceleration (Metal on macOS, CUDA on Linux)
 ## Agent Configuration
 ```toml
 [agent]
 # Context and token settings
 fallback_default_max_tokens = 8192   # Default max tokens if provider doesn't specify
 # max_context_length = 200000        # Override context window size for all providers
 # Behavior settings
 enable_streaming = true              # Stream responses in real-time
 allow_multiple_tool_calls = true     # Allow multiple tools per response
 timeout_seconds = 60                 # Request timeout
 auto_compact = true                  # Auto-compact context at 90%
 # Retry settings
 max_retry_attempts = 3               # Retries for interactive mode
 autonomous_max_retry_attempts = 6    # Retries for autonomous mode
 # TODO management
 check_todo_staleness = true          # Warn about stale TODO items
 ```
 ### Retry Behavior
 G3 automatically retries on recoverable errors:
 - Rate limits (HTTP 429)
 - Network errors
 - Server errors (HTTP 5xx)
 - Timeouts
 **Interactive mode** uses `max_retry_attempts` (default: 3)  
 **Autonomous mode** uses `autonomous_max_retry_attempts` (default: 6) with longer delays
 ## Computer Control Configuration
 ```toml
 [computer_control]
 enabled = false              # Set to true to enable
 require_confirmation = true  # Require confirmation before actions
 max_actions_per_second = 5   # Rate limit for safety
 ```
 **Required OS permissions**:
 - **macOS**: System Preferences → Security & Privacy → Accessibility
 - **Linux**: X11 or Wayland access
 - **Windows**: Run as administrator (first time)
 ## WebDriver Configuration
 ```toml
 [webdriver]
 enabled = false              # Set to true to enable
 browser = "safari"           # "safari" or "chrome-headless"
 safari_port = 4444           # Safari WebDriver port
 chrome_port = 9515           # ChromeDriver port
 # chrome_binary = "/path/to/chrome"  # Optional: Custom Chrome path
 ```
 ### Safari Setup (macOS)
 ```bash
 # Enable Safari remote automation (one-time setup)
 safaridriver --enable
 # Or via Safari UI:
 # Safari → Preferences → Advanced → Show Develop menu
 # Develop → Allow Remote Automation
 ```
 ### Chrome Setup
 **Option 1: Chrome for Testing (Recommended)**
 ```bash
 ./scripts/setup-chrome-for-testing.sh
 ```
 Then configure:
 ```toml
 [webdriver]
 chrome_binary = "/Users/yourname/.chrome-for-testing/chrome-mac-arm64/Google Chrome for Testing.app/Contents/MacOS/Google Chrome for Testing"
 ```
 **Option 2: System Chrome**
 ```bash
 # macOS
 brew install chromedriver
 # Linux
 apt install chromium-chromedriver
 ```
 ## macOS Accessibility API Configuration
 ```toml
 [macax]
 enabled = false              # Set to true to enable
 ```
 **Required permissions**: System Preferences → Security & Privacy → Privacy → Accessibility → Add your terminal app
 See [macOS Accessibility Tools Guide](macax-tools.md) for detailed usage.
 ## Multi-Role Configuration
 For autonomous mode with different models for coach and player:
 ```toml
 [providers]
 default_provider = "anthropic.default"
 coach = "anthropic.coach"    # Code reviewer
 player = "anthropic.player"  # Code implementer
 [providers.anthropic.coach]
 api_key = "sk-ant-..."
 model = "claude-sonnet-4-5"
 max_tokens = 32000
 temperature = 0.1            # Lower for consistent reviews
 [providers.anthropic.player]
 api_key = "sk-ant-..."
 model = "claude-sonnet-4-5"
 max_tokens = 64000
 temperature = 0.3            # Higher for creative implementations
 ```
 See `config.coach-player.example.toml` for a complete example.
 ## Environment Variables
 Environment variables override configuration file settings:
 | Variable | Description |
 |----------|-------------|
 | `G3_WORKSPACE_PATH` | Override workspace directory |
 | `ANTHROPIC_API_KEY` | Anthropic API key |
 | `OPENAI_API_KEY` | OpenAI API key |
 | `DATABRICKS_HOST` | Databricks workspace URL |
 | `DATABRICKS_TOKEN` | Databricks personal access token |
 ## CLI Overrides
 CLI arguments have the highest priority:
 ```bash
 # Override provider
 g3 --provider anthropic.default
 # Override model
 g3 --model claude-opus-4-5
 # Enable features
 g3 --webdriver           # Enable WebDriver (Safari)
 g3 --chrome-headless     # Enable WebDriver (Chrome headless)
 g3 --macax               # Enable macOS Accessibility API
 # Specify config file
 g3 --config /path/to/config.toml
 ```
 ## Complete Example Configuration
 ```toml
 # ~/.config/g3/config.toml
 [providers]
 default_provider = "anthropic.default"
 [providers.anthropic.default]
 api_key = "sk-ant-api03-..."
 model = "claude-sonnet-4-5"
 max_tokens = 64000
 temperature = 0.3
 [providers.databricks.work]
 host = "https://mycompany.cloud.databricks.com"
 model = "databricks-claude-sonnet-4"
 max_tokens = 4096
 temperature = 0.1
 use_oauth = true
 [agent]
 fallback_default_max_tokens = 8192
 enable_streaming = true
 allow_multiple_tool_calls = true
 timeout_seconds = 60
 max_retry_attempts = 3
 autonomous_max_retry_attempts = 6
 [computer_control]
 enabled = false
 require_confirmation = true
 max_actions_per_second = 5
 [webdriver]
 enabled = true
 browser = "safari"
 safari_port = 4444
 [macax]
 enabled = false
 ```
 ## Troubleshooting
 ### "Old config format" error
 If you see this error, your config uses a deprecated format. Update to the new named provider format:
 **Old format** (deprecated):
 ```toml
 [providers.anthropic]
 api_key = "..."
 ```
 **New format**:
 ```toml
 [providers.anthropic.default]
 api_key = "..."
 ```
 ### Provider not found
 Ensure your `default_provider` matches a configured provider:
 ```toml
 default_provider = "anthropic.default"  # Must match [providers.anthropic.default]
 ```
 ### OAuth issues
 For Databricks OAuth:
 1. Ensure `use_oauth = true`
 2. Remove any `token` setting
 3. A browser window will open for authentication
 4. Tokens are cached in `~/.databricks/oauth-tokens.json`
 ### Context window errors
 If you see context overflow errors:
 1. Check `max_context_length` in `[agent]`
 2. Use `/compact` command to manually summarize
 3. Use `/thinnify` to replace large tool results with file references
--- a/docs/macax-tools.md
+++ b/docs/macax-tools.md
@@ -0,0 +1,472 @@
 # macOS Accessibility Tools Guide
 **Last updated**: January 2025  
 **Source of truth**: `crates/g3-computer-control/src/macax/`
 ## Purpose
 G3 includes tools for controlling macOS applications via the Accessibility API. This enables automation of native macOS apps, including those you're building with G3.
 ## Overview
 The macOS Accessibility API provides programmatic access to UI elements in any application. G3 exposes this through the `macax_*` tools, allowing you to:
 - List and activate applications
 - Inspect UI element hierarchies
 - Find elements by role, title, or identifier
 - Click buttons and interact with controls
 - Read and set values in text fields
 - Simulate keyboard input
 ## Setup
 ### 1. Enable in Configuration
 ```toml
 # ~/.config/g3/config.toml
 [macax]
 enabled = true
 ```
 Or use the CLI flag:
 ```bash
 g3 --macax
 ```
 ### 2. Grant Accessibility Permissions
 1. Open **System Preferences** → **Security & Privacy** → **Privacy**
 2. Select **Accessibility** in the left sidebar
 3. Click the lock icon and authenticate
 4. Add your terminal application (Terminal, iTerm2, etc.)
 5. Restart your terminal
 **Note**: If using VS Code's integrated terminal, add VS Code to the list.
 ### 3. Verify Setup
 ```json
 {"tool": "macax_list_apps", "args": {}}
 ```
 This should return a list of running applications.
 ## Available Tools
 ### macax_list_apps
 List all running applications.
 **Parameters**: None
 **Example**:
 ```json
 {"tool": "macax_list_apps", "args": {}}
 ```
 **Returns**:
 ```
 Running Applications:
 - Safari (com.apple.Safari)
 - Finder (com.apple.finder)
 - Terminal (com.apple.Terminal)
 - MyApp (com.example.myapp)
 ```
 ---
 ### macax_get_frontmost_app
 Get the currently active (frontmost) application.
 **Parameters**: None
 **Example**:
 ```json
 {"tool": "macax_get_frontmost_app", "args": {}}
 ```
 **Returns**:
 ```
 Frontmost Application: Safari (com.apple.Safari)
 ```
 ---
 ### macax_activate_app
 Bring an application to the front.
 **Parameters**:
 - `app_name` (string, required): Application name
 **Example**:
 ```json
 {"tool": "macax_activate_app", "args": {"app_name": "Safari"}}
 ```
 ---
 ### macax_get_ui_tree
 Get the UI element hierarchy of an application.
 **Parameters**:
 - `app_name` (string, required): Application name
 - `max_depth` (integer, optional): Maximum tree depth (default: 5)
 **Example**:
 ```json
 {"tool": "macax_get_ui_tree", "args": {"app_name": "Calculator", "max_depth": 3}}
 ```
 **Returns**:
 ```
 UI Tree for Calculator:
 └── AXApplication "Calculator"
    └── AXWindow "Calculator"
        ├── AXGroup
        │   ├── AXButton "1" [id: digit_1]
        │   ├── AXButton "2" [id: digit_2]
        │   ├── AXButton "+" [id: add]
        │   └── AXButton "=" [id: equals]
        └── AXStaticText "0" [id: display]
 ```
 **Notes**:
 - Use lower `max_depth` for complex apps to avoid overwhelming output
 - Elements show role, title, and accessibility identifier (if set)
 ---
 ### macax_find_elements
 Find UI elements matching criteria.
 **Parameters**:
 - `app_name` (string, required): Application name
 - `role` (string, optional): Element role (e.g., "button", "textField")
 - `title` (string, optional): Element title/label
 - `identifier` (string, optional): Accessibility identifier
 **Example**:
 ```json
 {"tool": "macax_find_elements", "args": {
  "app_name": "Safari",
  "role": "button"
 }}
 ```
 **Returns**:
 ```
 Found 5 elements:
 1. AXButton "Back" [id: BackButton]
 2. AXButton "Forward" [id: ForwardButton]
 3. AXButton "Reload" [id: ReloadButton]
 4. AXButton "Share" [id: ShareButton]
 5. AXButton "New Tab" [id: NewTabButton]
 ```
 ---
 ### macax_click
 Click a UI element.
 **Parameters**:
 - `app_name` (string, required): Application name
 - `identifier` (string, optional): Accessibility identifier
 - `title` (string, optional): Element title
 - `role` (string, optional): Element role
 At least one of `identifier`, `title`, or `role` must be provided.
 **Examples**:
 ```json
 // Click by identifier (most reliable)
 {"tool": "macax_click", "args": {
  "app_name": "Calculator",
  "identifier": "digit_5"
 }}
 // Click by title
 {"tool": "macax_click", "args": {
  "app_name": "Calculator",
  "title": "5"
 }}
 // Click by role and title
 {"tool": "macax_click", "args": {
  "app_name": "Safari",
  "role": "button",
  "title": "Reload"
 }}
 ```
 ---
 ### macax_set_value
 Set the value of a UI element (text fields, sliders, etc.).
 **Parameters**:
 - `app_name` (string, required): Application name
 - `identifier` (string, optional): Accessibility identifier
 - `title` (string, optional): Element title
 - `value` (string, required): Value to set
 **Example**:
 ```json
 {"tool": "macax_set_value", "args": {
  "app_name": "TextEdit",
  "role": "textArea",
  "value": "Hello, World!"
 }}
 ```
 ---
 ### macax_get_value
 Get the current value of a UI element.
 **Parameters**:
 - `app_name` (string, required): Application name
 - `identifier` (string, optional): Accessibility identifier
 - `title` (string, optional): Element title
 **Example**:
 ```json
 {"tool": "macax_get_value", "args": {
  "app_name": "Calculator",
  "identifier": "display"
 }}
 ```
 **Returns**:
 ```
 Value: 42
 ```
 ---
 ### macax_press_key
 Simulate a key press.
 **Parameters**:
 - `key` (string, required): Key to press
 - `modifiers` (array, optional): Modifier keys
 **Supported modifiers**: `command`, `shift`, `option`, `control`
 **Examples**:
 ```json
 // Simple key press
 {"tool": "macax_press_key", "args": {"key": "a"}}
 // With modifiers (Cmd+S)
 {"tool": "macax_press_key", "args": {
  "key": "s",
  "modifiers": ["command"]
 }}
 // Multiple modifiers (Cmd+Shift+N)
 {"tool": "macax_press_key", "args": {
  "key": "n",
  "modifiers": ["command", "shift"]
 }}
 // Special keys
 {"tool": "macax_press_key", "args": {"key": "return"}}
 {"tool": "macax_press_key", "args": {"key": "escape"}}
 {"tool": "macax_press_key", "args": {"key": "tab"}}
 {"tool": "macax_press_key", "args": {"key": "delete"}}
 ```
 **Special key names**:
 - `return`, `enter`
 - `escape`, `esc`
 - `tab`
 - `delete`, `backspace`
 - `space`
 - `up`, `down`, `left`, `right`
 - `home`, `end`, `pageup`, `pagedown`
 - `f1` through `f12`
 ## Common Roles
 | Role | Description |
 |------|-------------|
 | `button` | Clickable button |
 | `textField` | Single-line text input |
 | `textArea` | Multi-line text input |
 | `checkbox` | Checkbox control |
 | `radioButton` | Radio button |
 | `popUpButton` | Dropdown/popup menu |
 | `slider` | Slider control |
 | `table` | Table view |
 | `list` | List view |
 | `outline` | Outline/tree view |
 | `group` | Container group |
 | `window` | Application window |
 | `sheet` | Modal sheet |
 | `dialog` | Dialog window |
 | `staticText` | Non-editable text |
 | `image` | Image element |
 | `scrollArea` | Scrollable container |
 | `toolbar` | Toolbar |
 | `menuBar` | Menu bar |
 | `menu` | Menu |
 | `menuItem` | Menu item |
 ## Best Practices
 ### 1. Use Accessibility Identifiers
 When building apps you'll automate with G3, add accessibility identifiers:
 **SwiftUI**:
 ```swift
 Button("Submit") { ... }
    .accessibilityIdentifier("submit_button")
 ```
 **UIKit**:
 ```swift
 button.accessibilityIdentifier = "submit_button"
 ```
 **AppKit**:
 ```swift
 button.setAccessibilityIdentifier("submit_button")
 ```
 Identifiers are more reliable than titles (which may be localized).
 ### 2. Inspect Before Automating
 Always inspect the UI tree first:
 ```json
 {"tool": "macax_get_ui_tree", "args": {"app_name": "MyApp", "max_depth": 4}}
 ```
 This helps you understand:
 - Element hierarchy
 - Available identifiers
 - Correct role names
 ### 3. Activate App First
 Some actions require the app to be frontmost:
 ```json
 {"tool": "macax_activate_app", "args": {"app_name": "MyApp"}}
 {"tool": "macax_click", "args": {"app_name": "MyApp", "identifier": "button1"}}
 ```
 ### 4. Handle Timing
 UI updates may take time. If an element isn't found:
 1. Wait briefly
 2. Retry the operation
 3. Check if the app state changed
 ### 5. Prefer Identifiers Over Titles
 ```json
 // Good: Uses identifier
 {"tool": "macax_click", "args": {"app_name": "MyApp", "identifier": "save_btn"}}
 // Less reliable: Uses title (may be localized)
 {"tool": "macax_click", "args": {"app_name": "MyApp", "title": "Save"}}
 ```
 ## Example: Automating Calculator
 ```json
 // 1. Activate Calculator
 {"tool": "macax_activate_app", "args": {"app_name": "Calculator"}}
 // 2. Inspect UI
 {"tool": "macax_get_ui_tree", "args": {"app_name": "Calculator", "max_depth": 3}}
 // 3. Click "5"
 {"tool": "macax_click", "args": {"app_name": "Calculator", "title": "5"}}
 // 4. Click "+"
 {"tool": "macax_click", "args": {"app_name": "Calculator", "title": "+"}}
 // 5. Click "3"
 {"tool": "macax_click", "args": {"app_name": "Calculator", "title": "3"}}
 // 6. Click "="
 {"tool": "macax_click", "args": {"app_name": "Calculator", "title": "="}}
 // 7. Read result
 {"tool": "macax_get_value", "args": {"app_name": "Calculator", "role": "staticText"}}
 ```
 ## Troubleshooting
 ### "Accessibility permission denied"
 1. Check System Preferences → Security & Privacy → Accessibility
 2. Ensure your terminal app is listed and checked
 3. Restart the terminal after granting permission
 ### "Application not found"
 1. Use exact app name (case-sensitive)
 2. Run `macax_list_apps` to see available apps
 3. App must be running
 ### "Element not found"
 1. Inspect UI tree to verify element exists
 2. Check identifier/title spelling
 3. Element may be in a different window or sheet
 4. App state may have changed
 ### "Cannot perform action"
 1. Element may be disabled
 2. App may need to be frontmost
 3. Element may not support the action
 4. Check element role supports the operation
 ### Slow Performance
 1. Reduce `max_depth` in `macax_get_ui_tree`
 2. Use specific identifiers instead of searching
 3. Complex apps have large UI trees
 ## Comparison with Other Tools
 | Feature | macax | Vision Tools | WebDriver |
 |---------|-------|--------------|----------|
 | Native apps | ✅ | ✅ (via OCR) | ❌ |
 | Web browsers | ✅ | ✅ | ✅ |
 | Electron apps | ✅ | ✅ | Partial |
 | Reliability | High | Medium | High |
 | Setup | Permissions | None | Driver |
 | Speed | Fast | Slower | Medium |
 **Use macax when**:
 - Automating native macOS apps
 - You control the app and can add identifiers
 - Need reliable, fast automation
 **Use Vision tools when**:
 - App doesn't expose accessibility
 - Need to find text visually
 - Cross-platform approach needed
 **Use WebDriver when**:
 - Automating web content
 - Need JavaScript execution
 - Testing web applications
--- a/docs/providers.md
+++ b/docs/providers.md
@@ -0,0 +1,408 @@
 # G3 LLM Providers Guide
 **Last updated**: January 2025  
 **Source of truth**: `crates/g3-providers/src/`
 ## Purpose
 This document describes the LLM providers supported by G3, their capabilities, and how to choose between them.
 ## Provider Overview
 | Provider | Type | Tool Calling | Cache Control | Context Window | Best For |
 |----------|------|--------------|---------------|----------------|----------|
 | **Anthropic** | Cloud | Native | Yes | 200k (1M optional) | General use, complex tasks |
 | **Databricks** | Cloud | Native | Yes (Claude models) | Varies | Enterprise, existing Databricks users |
 | **OpenAI** | Cloud | Native | No | 128k | GPT model preference |
 | **OpenAI-Compatible** | Cloud | Native | No | Varies | OpenRouter, Groq, Together, etc. |
 | **Embedded** | Local | JSON fallback | No | 4k-32k | Privacy, offline, cost savings |
 ## Anthropic
 **Location**: `crates/g3-providers/src/anthropic.rs`
 ### Features
 - **Native tool calling**: Full support for structured tool calls
 - **Prompt caching**: Reduce costs with ephemeral caching
 - **Extended context**: Optional 1M token context (additional cost)
 - **Extended thinking**: Budget tokens for complex reasoning
 - **Streaming**: Real-time response streaming
 ### Configuration
 ```toml
 [providers.anthropic.default]
 api_key = "sk-ant-api03-..."     # Required
 model = "claude-sonnet-4-5"      # Model name
 max_tokens = 64000               # Max output tokens
 temperature = 0.3                # 0.0-1.0
 cache_config = "ephemeral"       # Optional: Enable caching
 enable_1m_context = true          # Optional: 1M context
 thinking_budget_tokens = 10000    # Optional: Extended thinking
 ```
 ### Available Models
 | Model | Context | Best For |
 |-------|---------|----------|
 | `claude-sonnet-4-5` | 200k | Balanced performance/cost |
 | `claude-opus-4-5` | 200k | Complex reasoning |
 | `claude-3-5-sonnet-20241022` | 200k | Previous generation |
 | `claude-3-opus-20240229` | 200k | Previous generation |
 ### Prompt Caching
 Enable caching to reduce costs for repeated context:
 ```toml
 cache_config = "ephemeral"  # Cache for session duration
 ```
 Caching is applied to:
 - System prompts
 - README/AGENTS.md content
 - Large tool results
 ### Extended Thinking
 For complex tasks requiring step-by-step reasoning:
 ```toml
 thinking_budget_tokens = 10000  # Tokens for internal reasoning
 ```
 The model uses these tokens for planning before responding.
 ---
 ## Databricks
 **Location**: `crates/g3-providers/src/databricks.rs`
 ### Features
 - **Foundation Model APIs**: Access to various models
 - **OAuth authentication**: Secure browser-based auth
 - **Token authentication**: Personal access tokens
 - **Enterprise integration**: Works with existing Databricks setup
 ### Configuration
 ```toml
 [providers.databricks.default]
 host = "https://your-workspace.cloud.databricks.com"
 model = "databricks-claude-sonnet-4"
 max_tokens = 4096
 temperature = 0.1
 use_oauth = true              # Recommended
 # token = "dapi..."           # Alternative: PAT
 ```
 ### Authentication
 **OAuth (Recommended)**:
 1. Set `use_oauth = true`
 2. On first run, browser opens for authentication
 3. Tokens are cached in `~/.databricks/oauth-tokens.json`
 4. Tokens refresh automatically
 **Personal Access Token**:
 1. Generate token in Databricks workspace
 2. Set `token = "dapi..."` and `use_oauth = false`
 ### Available Models
 Models depend on your Databricks workspace configuration:
 - `databricks-claude-sonnet-4` (Claude via Databricks)
 - `databricks-meta-llama-3-1-70b-instruct`
 - `databricks-dbrx-instruct`
 - Custom fine-tuned models
 ---
 ## OpenAI
 **Location**: `crates/g3-providers/src/openai.rs`
 ### Features
 - **Native tool calling**: Full support
 - **Custom endpoints**: Override base URL
 - **Streaming**: Real-time responses
 ### Configuration
 ```toml
 [providers.openai.default]
 api_key = "sk-..."               # Required
 model = "gpt-4-turbo"            # Model name
 max_tokens = 4096
 temperature = 0.1
 # base_url = "https://api.openai.com/v1"  # Optional
 ```
 ### Available Models
 | Model | Context | Notes |
 |-------|---------|-------|
 | `gpt-4-turbo` | 128k | Latest GPT-4 |
 | `gpt-4o` | 128k | Optimized GPT-4 |
 | `gpt-4` | 8k | Original GPT-4 |
 | `gpt-3.5-turbo` | 16k | Faster, cheaper |
 ---
 ## OpenAI-Compatible Providers
 **Location**: `crates/g3-providers/src/openai.rs` (reuses OpenAI implementation)
 For services that implement the OpenAI API format.
 ### Configuration
 ```toml
 # OpenRouter
 [providers.openai_compatible.openrouter]
 api_key = "sk-or-..."
 model = "anthropic/claude-3.5-sonnet"
 base_url = "https://openrouter.ai/api/v1"
 max_tokens = 4096
 temperature = 0.1
 # Groq
 [providers.openai_compatible.groq]
 api_key = "gsk_..."
 model = "llama-3.3-70b-versatile"
 base_url = "https://api.groq.com/openai/v1"
 max_tokens = 4096
 temperature = 0.1
 # Together
 [providers.openai_compatible.together]
 api_key = "..."
 model = "meta-llama/Llama-3-70b-chat-hf"
 base_url = "https://api.together.xyz/v1"
 max_tokens = 4096
 temperature = 0.1
 ```
 ### Supported Services
 - **OpenRouter**: Access to many models through one API
 - **Groq**: Fast inference for Llama models
 - **Together**: Open-source model hosting
 - **Anyscale**: Scalable model serving
 - **Local servers**: Ollama, vLLM, text-generation-inference
 ---
 ## Embedded (Local Models)
 **Location**: `crates/g3-providers/src/embedded.rs`
 ### Features
 - **Completely local**: No data leaves your machine
 - **Offline capable**: Works without internet
 - **GPU acceleration**: Metal (macOS), CUDA (Linux)
 - **No API costs**: Free after model download
 ### Configuration
 ```toml
 [providers.embedded.default]
 model_path = "~/.cache/g3/models/qwen2.5-7b-instruct-q3_k_m.gguf"
 model_type = "qwen"              # Model architecture
 context_length = 32768           # Context window
 max_tokens = 2048                # Max output
 temperature = 0.1
 gpu_layers = 32                  # GPU offload (0 = CPU only)
 threads = 8                      # CPU threads
 ```
 ### Supported Model Types
 | Type | Models | Notes |
 |------|--------|-------|
 | `qwen` | Qwen 2.5 series | Good coding ability |
 | `codellama` | Code Llama | Specialized for code |
 | `llama` | Llama 2/3 | General purpose |
 | `mistral` | Mistral/Mixtral | Efficient |
 ### Model Download
 Download GGUF models from Hugging Face:
 ```bash
 mkdir -p ~/.cache/g3/models
 cd ~/.cache/g3/models
 # Example: Qwen 2.5 7B
 wget https://huggingface.co/Qwen/Qwen2.5-7B-Instruct-GGUF/resolve/main/qwen2.5-7b-instruct-q4_k_m.gguf
 ```
 ### Hardware Requirements
 | Model Size | RAM Required | GPU VRAM | Notes |
 |------------|--------------|----------|-------|
 | 7B Q4 | 6GB | 4GB | Good for most tasks |
 | 7B Q8 | 10GB | 8GB | Better quality |
 | 13B Q4 | 10GB | 8GB | More capable |
 | 70B Q4 | 48GB | 40GB | Requires high-end hardware |
 ### GPU Acceleration
 **macOS (Metal)**:
 ```toml
 gpu_layers = 32  # Offload layers to GPU
 ```
 **Linux (CUDA)**:
 Requires CUDA toolkit installed.
 **CPU Only**:
 ```toml
 gpu_layers = 0
 threads = 8  # Use more threads
 ```
 ### Tool Calling
 Embedded models don't have native tool calling. G3 uses JSON fallback:
 1. System prompt includes tool definitions as JSON
 2. Model outputs tool calls as JSON in response
 3. G3 parses JSON and executes tools
 This works but is less reliable than native tool calling.
 ---
 ## Provider Selection Guide
 ### By Use Case
 | Use Case | Recommended Provider |
 |----------|---------------------|
 | General coding tasks | Anthropic (Claude Sonnet) |
 | Complex reasoning | Anthropic (Claude Opus) |
 | Enterprise/compliance | Databricks |
 | Cost-sensitive | Embedded or Groq |
 | Privacy-critical | Embedded |
 | Offline development | Embedded |
 | Fast iteration | Groq (Llama) |
 | Model variety | OpenRouter |
 ### By Priority
 **Quality first**: Anthropic Claude Opus/Sonnet
 - Best reasoning and coding ability
 - Native tool calling
 - Prompt caching for efficiency
 **Cost first**: Embedded or OpenAI-compatible
 - Embedded: Free after download
 - Groq: Very cheap, fast
 - OpenRouter: Pay-per-use, many options
 **Privacy first**: Embedded
 - Data never leaves your machine
 - No API calls
 - Full control
 **Speed first**: Groq or Embedded with GPU
 - Groq: Extremely fast inference
 - Embedded with Metal/CUDA: Low latency
 ---
 ## Provider Trait
 All providers implement the `LLMProvider` trait:
 ```rust
 #[async_trait]
 pub trait LLMProvider: Send + Sync {
    /// Generate a completion
    async fn complete(&self, request: CompletionRequest) -> Result<CompletionResponse>;
    /// Stream a completion
    async fn stream(&self, request: CompletionRequest) -> Result<CompletionStream>;
    /// Provider name (e.g., "anthropic.default")
    fn name(&self) -> &str;
    /// Model name (e.g., "claude-sonnet-4-5")
    fn model(&self) -> &str;
    /// Whether provider supports native tool calling
    fn has_native_tool_calling(&self) -> bool;
    /// Whether provider supports cache control
    fn supports_cache_control(&self) -> bool;
    /// Configured max tokens
    fn max_tokens(&self) -> u32;
    /// Configured temperature
    fn temperature(&self) -> f32;
 }
 ```
 ---
 ## Adding a New Provider
 1. Create `crates/g3-providers/src/newprovider.rs`
 2. Implement `LLMProvider` trait
 3. Add configuration struct to `crates/g3-config/src/lib.rs`
 4. Register in `crates/g3-core/src/lib.rs` (`new_with_mode_and_readme`)
 5. Export from `crates/g3-providers/src/lib.rs`
 6. Update documentation
 ---
 ## Troubleshooting
 ### Authentication Errors
 **Anthropic**: Verify API key starts with `sk-ant-`
 **Databricks OAuth**: 
 - Delete `~/.databricks/oauth-tokens.json` and re-authenticate
 - Ensure workspace URL is correct
 **OpenAI**: Verify API key and check billing status
 ### Rate Limits
 G3 automatically retries on rate limits with exponential backoff.
 To reduce rate limit issues:
 - Use prompt caching (Anthropic)
 - Reduce `max_tokens`
 - Use a provider with higher limits
 ### Context Window Errors
 If you see "context too long" errors:
 1. Use `/compact` to summarize conversation
 2. Use `/thinnify` to replace large tool results
 3. Increase `max_context_length` in config
 4. Switch to a provider with larger context
 ### Embedded Model Issues
 **Model not loading**:
 - Verify `model_path` is correct
 - Check file permissions
 - Ensure enough RAM
 **Slow inference**:
 - Increase `gpu_layers` for GPU offload
 - Reduce `context_length`
 - Use a smaller quantization (Q4 vs Q8)
 **Poor tool calling**:
 - Embedded models use JSON fallback
 - Consider cloud provider for complex tool use
--- a/docs/tools.md
+++ b/docs/tools.md
@@ -0,0 +1,538 @@
 # G3 Tools Reference
 **Last updated**: January 2025  
 **Source of truth**: `crates/g3-core/src/tool_definitions.rs`, `crates/g3-core/src/tools/`
 ## Purpose
 This document describes all tools available to the G3 agent. Tools are the primary mechanism by which G3 interacts with the filesystem, executes commands, and automates tasks.
 ## Tool Categories
 | Category | Tools | Enabled By |
 |----------|-------|------------|
 | **Core** | shell, read_file, write_file, str_replace, final_output, background_process | Always |
 | **Images** | read_image, take_screenshot, extract_text | Always |
 | **Task Management** | todo_read, todo_write | Always |
 | **Code Intelligence** | code_search, code_coverage | Always |
 | **WebDriver** | webdriver_* (12 tools) | `--webdriver` or `--chrome-headless` |
 | **Vision** | vision_find_text, vision_click_text, vision_click_near_text | Always (macOS) |
 | **macOS Accessibility** | macax_* (9 tools) | `--macax` |
 | **Computer Control** | mouse_click, type_text, find_element, list_windows | `computer_control.enabled = true` |
 ---
 ## Core Tools
 ### shell
 Execute shell commands.
 **Parameters**:
 - `command` (string, required): The shell command to execute
 **Example**:
 ```json
 {"tool": "shell", "args": {"command": "ls -la"}}
 ```
 **Notes**:
 - Commands run in the current working directory
 - Output is streamed in real-time
 - Both stdout and stderr are captured
 - Exit code is reported
 ---
 ### background_process
 Launch a long-running process in the background.
 **Parameters**:
 - `name` (string, required): Unique name for the process (e.g., "game_server")
 - `command` (string, required): Shell command to execute
 - `working_dir` (string, optional): Working directory
 **Example**:
 ```json
 {"tool": "background_process", "args": {"name": "dev_server", "command": "npm run dev"}}
 ```
 **Returns**: PID and log file path
 **Notes**:
 - Process runs independently of the agent
 - Logs are captured to a file
 - Use `shell` to read logs (`tail`), check status (`ps`), or stop (`kill`)
 ---
 ### read_file
 Read file contents with optional character range.
 **Parameters**:
 - `file_path` (string, required): Path to the file
 - `start` (integer, optional): Starting character position (0-indexed, inclusive)
 - `end` (integer, optional): Ending character position (0-indexed, exclusive)
 **Example**:
 ```json
 {"tool": "read_file", "args": {"file_path": "src/main.rs", "start": 0, "end": 1000}}
 ```
 **Notes**:
 - For image files (png, jpg, gif, etc.), automatically extracts text using OCR
 - Supports tilde expansion (`~`)
 - Reports file size and line count
 ---
 ### read_image
 Read image files for visual analysis by the LLM.
 **Parameters**:
 - `file_paths` (array of strings, required): Paths to image files
 **Example**:
 ```json
 {"tool": "read_image", "args": {"file_paths": ["screenshot.png", "diagram.jpg"]}}
 ```
 **Supported formats**: PNG, JPEG, GIF, WebP
 **Notes**:
 - Images are sent to the LLM for visual analysis
 - Use for inspecting sprites, UI screenshots, diagrams, etc.
 - Different from `extract_text` which only does OCR
 ---
 ### write_file
 Create or overwrite a file.
 **Parameters**:
 - `file_path` (string, required): Path to the file
 - `content` (string, required): Content to write
 **Example**:
 ```json
 {"tool": "write_file", "args": {"file_path": "hello.txt", "content": "Hello, world!"}}
 ```
 **Notes**:
 - Creates parent directories if needed
 - Overwrites existing files
 - Reports bytes written
 ---
 ### str_replace
 Apply a unified diff to a file.
 **Parameters**:
 - `file_path` (string, required): Path to the file
 - `diff` (string, required): Unified diff with context lines
 - `start` (integer, optional): Starting character position to constrain search
 - `end` (integer, optional): Ending character position to constrain search
 **Example**:
 ```json
 {"tool": "str_replace", "args": {
  "file_path": "src/main.rs",
  "diff": "@@ -10,3 +10,4 @@\n fn main() {\n     println!(\"Hello\");\n+    println!(\"World\");\n }"
 }}
 ```
 **Notes**:
 - Supports multiple hunks
 - Context lines help locate the correct position
 - Use `start`/`end` to disambiguate when multiple matches exist
 - `---/+++` headers are optional for minimal diffs
 ---
 ### final_output
 Signal task completion with a summary.
 **Parameters**:
 - `summary` (string, required): Markdown summary of what was accomplished
 **Example**:
 ```json
 {"tool": "final_output", "args": {"summary": "## Completed\n\n- Created user authentication module\n- Added unit tests\n- Updated documentation"}}
 ```
 **Notes**:
 - Ends the current task
 - Summary is displayed to the user
 - In autonomous mode, triggers coach review
 ---
 ## Image & Screenshot Tools
 ### take_screenshot
 Capture a screenshot of an application window.
 **Parameters**:
 - `path` (string, required): Filename for the screenshot
 - `window_id` (string, required): Application name (e.g., "Safari", "Terminal")
 - `region` (object, optional): `{x, y, width, height}` to capture a region
 **Example**:
 ```json
 {"tool": "take_screenshot", "args": {"path": "safari.png", "window_id": "Safari"}}
 ```
 **Notes**:
 - Use `list_windows` first to identify available windows
 - Relative paths save to `~/tmp` or `$TMPDIR`
 - Uses native screencapture on macOS
 ---
 ### extract_text
 Extract text from an image using OCR.
 **Parameters**:
 - `path` (string, optional): Path to image file
 **Example**:
 ```json
 {"tool": "extract_text", "args": {"path": "screenshot.png"}}
 ```
 **Notes**:
 - Uses Tesseract OCR or Apple Vision framework
 - For window-based OCR, use `vision_find_text` instead
 ---
 ## Task Management Tools
 ### todo_read
 Read the current TODO list.
 **Parameters**: None
 **Example**:
 ```json
 {"tool": "todo_read", "args": {}}
 ```
 **Notes**:
 - TODO lists are session-scoped
 - Stored in `.g3/sessions/<session_id>/todo.g3.md`
 - Call at start of multi-step tasks to check for existing plans
 ---
 ### todo_write
 Create or update the TODO list.
 **Parameters**:
 - `content` (string, required): TODO list content in markdown checkbox format
 **Example**:
 ```json
 {"tool": "todo_write", "args": {"content": "- [ ] Implement feature\n  - [ ] Write tests\n  - [ ] Update docs\n- [x] Setup project"}}
 ```
 **Notes**:
 - Replaces entire file content
 - Always call `todo_read` first to preserve existing content
 - Use `- [ ]` for incomplete, `- [x]` for complete
 - Supports nested tasks with indentation
 ---
 ## Code Intelligence Tools
 ### code_search
 Syntax-aware code search using tree-sitter.
 **Parameters**:
 - `searches` (array, required): Array of search objects:
  - `name` (string): Label for this search
  - `query` (string): Tree-sitter query in S-expression format
  - `language` (string): Programming language
  - `paths` (array, optional): Paths to search
  - `context_lines` (integer, optional): Lines of context (0-20)
 - `max_concurrency` (integer, optional): Parallel searches (default: 4)
 - `max_matches_per_search` (integer, optional): Max matches (default: 500)
 **Supported languages**: rust, python, javascript, typescript, go, java, c, cpp, kotlin
 **Example**:
 ```json
 {"tool": "code_search", "args": {
  "searches": [{
    "name": "functions",
    "query": "(function_item name: (identifier) @name)",
    "language": "rust",
    "context_lines": 2
  }]
 }}
 ```
 See [Code Search Guide](CODE_SEARCH.md) for detailed query patterns.
 ---
 ### code_coverage
 Generate code coverage report using cargo llvm-cov.
 **Parameters**: None
 **Example**:
 ```json
 {"tool": "code_coverage", "args": {}}
 ```
 **Notes**:
 - Runs all tests with coverage instrumentation
 - Auto-installs llvm-tools-preview and cargo-llvm-cov if missing
 - Returns coverage statistics summary
 ---
 ## WebDriver Tools
 Enabled with `--webdriver` (Safari) or `--chrome-headless` (Chrome).
 ### webdriver_start
 Start a browser session.
 **Example**:
 ```json
 {"tool": "webdriver_start", "args": {}}
 ```
 ### webdriver_navigate
 Navigate to a URL.
 **Parameters**:
 - `url` (string, required): URL with protocol (e.g., `https://`)
 ### webdriver_get_url / webdriver_get_title
 Get current URL or page title.
 ### webdriver_find_element / webdriver_find_elements
 Find element(s) by CSS selector.
 **Parameters**:
 - `selector` (string, required): CSS selector
 ### webdriver_click
 Click an element.
 **Parameters**:
 - `selector` (string, required): CSS selector
 ### webdriver_send_keys
 Type text into an input.
 **Parameters**:
 - `selector` (string, required): CSS selector
 - `text` (string, required): Text to type
 - `clear_first` (boolean, optional): Clear before typing (default: true)
 ### webdriver_execute_script
 Execute JavaScript.
 **Parameters**:
 - `script` (string, required): JavaScript code (use `return` to return values)
 ### webdriver_get_page_source
 Get rendered HTML.
 **Parameters**:
 - `max_length` (integer, optional): Max chars to return (default: 10000, 0 for no limit)
 - `save_to_file` (string, optional): Save to file instead of returning inline
 ### webdriver_screenshot
 Take browser screenshot.
 **Parameters**:
 - `path` (string, required): Save path
 ### webdriver_back / webdriver_forward / webdriver_refresh
 Navigation controls.
 ### webdriver_quit
 Close browser and end session.
 ---
 ## Vision Tools (macOS)
 Use Apple Vision framework for text recognition.
 ### vision_find_text
 Find text in an application window.
 **Parameters**:
 - `app_name` (string, required): Application name
 - `text` (string, required): Text to search for
 **Returns**: Bounding box coordinates and confidence score
 ### vision_click_text
 Find and click on text.
 **Parameters**:
 - `app_name` (string, required): Application name
 - `text` (string, required): Text to click
 ### vision_click_near_text
 Click near a text label (useful for form fields).
 **Parameters**:
 - `app_name` (string, required): Application name
 - `text` (string, required): Label text to find
 - `direction` (string, optional): "right", "below", "left", "above" (default: "right")
 - `distance` (integer, optional): Pixels from text (default: 50)
 ---
 ## macOS Accessibility Tools
 Enabled with `--macax`. See [macOS Accessibility Tools Guide](macax-tools.md).
 ### macax_list_apps
 List running applications.
 ### macax_get_frontmost_app
 Get the frontmost application.
 ### macax_activate_app
 Bring an application to front.
 **Parameters**:
 - `app_name` (string, required): Application name
 ### macax_get_ui_tree
 Get UI element hierarchy.
 **Parameters**:
 - `app_name` (string, required): Application name
 - `max_depth` (integer, optional): Tree depth limit
 ### macax_find_elements
 Find UI elements by criteria.
 **Parameters**:
 - `app_name` (string, required): Application name
 - `role` (string, optional): Element role (button, textField, etc.)
 - `title` (string, optional): Element title
 - `identifier` (string, optional): Accessibility identifier
 ### macax_click
 Click a UI element.
 **Parameters**:
 - `app_name` (string, required): Application name
 - `identifier` or `title` or `role`: Element selector
 ### macax_set_value / macax_get_value
 Set or get element value.
 ### macax_press_key
 Simulate key press.
 **Parameters**:
 - `key` (string, required): Key to press
 - `modifiers` (array, optional): ["command", "shift", "option", "control"]
 ---
 ## Computer Control Tools
 Enabled with `computer_control.enabled = true` in config.
 ### mouse_click
 Click at coordinates.
 **Parameters**:
 - `x` (integer, required): X coordinate
 - `y` (integer, required): Y coordinate
 - `button` (string, optional): "left", "right", "middle"
 ### type_text
 Type text at cursor.
 **Parameters**:
 - `text` (string, required): Text to type
 ### find_element
 Find UI element by text, role, or attributes.
 ### list_windows
 List all open windows with IDs and titles.
 ---
 ## Tool Execution Notes
 ### Duplicate Detection
 G3 prevents accidental duplicate tool calls:
 - Only immediately sequential identical calls are blocked
 - Text between tool calls resets detection
 - Tools can be reused throughout a session
 ### Error Handling
 Tool errors are reported back to the agent, which can:
 - Retry with different parameters
 - Try an alternative approach
 - Report the issue to the user
 ### Working Directory
 Tools execute in:
 1. Directory specified by `--codebase-fast-start` if provided
 2. Current working directory otherwise
 ### File Paths
 - Tilde expansion (`~`) is supported
 - Relative paths are relative to working directory
 - Screenshots default to `~/tmp` or `$TMPDIR`