Add --interactive-requirements flag for AI-enhanced requirements mode

- Adds new --interactive-requirements CLI flag for autonomous mode - Prompts user for brief requirements input - Uses AI to enhance and structure requirements into proper markdown - Shows enhanced requirements and allows user to approve/edit/cancel - Saves to requirements.md and proceeds with autonomous mode if approved - Includes test script for manual verification
Merge pull request #7 from jochenx/jochen-add-openai-and-multi-providers
2025-10-22 14:58:35 +11:00 · 2025-10-22 13:46:16 +11:00 · 2025-10-22 13:20:45 +11:00 · 2025-10-21 16:59:13 +11:00 · 2025-10-21 16:00:58 +11:00 · 2025-10-21 14:34:41 +11:00
38 changed files with 5268 additions and 432 deletions
--- a/Cargo.lock
+++ b/Cargo.lock
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -4,7 +4,8 @@ members = [
    "crates/g3-core", 
    "crates/g3-providers",
    "crates/g3-config",
-    "crates/g3-execution"
+    "crates/g3-execution",
+    "crates/g3-computer-control"
 ]
 resolver = "2"

--- a/DESIGN.md
+++ b/DESIGN.md
@@ -29,7 +29,8 @@ g3/
 │   ├── g3-core/                  # Core agent engine, tools, and streaming logic
 │   ├── g3-providers/             # LLM provider abstractions and implementations
 │   ├── g3-config/                # Configuration management
-│   └── g3-execution/             # Code execution engine
+│   ├── g3-execution/             # Code execution engine
+│   └── g3-computer-control/      # Computer control and automation
 ├── logs/                         # Session logs (auto-created)
 ├── README.md                     # Project documentation
 └── DESIGN.md                     # This design document
@@ -48,6 +49,7 @@ g3/
 │ • Retro TUI     │    │ • Tool system   │    │ • Embedded      │
 │ • Autonomous    │    │ • Streaming     │    │   (llama.cpp)   │
 │   mode          │    │ • Task exec     │    │ • OAuth flow    │
+│                 │    │ • TODO mgmt     │    │                 │
 └─────────────────┘    └─────────────────┘    └─────────────────┘
         │                       │                       │
         └───────────────────────┼───────────────────────┘
@@ -59,7 +61,18 @@ g3/
                    │ • Shell cmds    │    │ • Env overrides │
                    │ • Streaming     │    │ • Provider      │
                    │ • Error hdlg    │    │   settings      │
-                    └─────────────────┘    └─────────────────┘
+                    └─────────────────┘    │ • Computer      │
+                             │              │   control cfg   │
+                             │              └─────────────────┘
+                             │                       │
+                    ┌─────────────────┐             │
+                    │ g3-computer-    │◄────────────┘
+                    │   control       │
+                    │ • Mouse/kbd     │
+                    │ • Screenshots   │
+                    │ • OCR/Tesseract │
+                    │ • Windows/UI    │
+                    └─────────────────┘
 ```

 ## Core Components
@@ -79,6 +92,7 @@ g3/
 - **Streaming Parser**: Real-time parsing of LLM responses with tool call detection and execution
 - **Session Management**: Automatic session logging with detailed conversation history and token usage
 - **Error Recovery**: Sophisticated error classification and retry logic for recoverable errors
+- **TODO Management**: In-memory TODO list with read/write tools for task tracking

 **Available Tools:**
 - `shell`: Execute shell commands with streaming output
@@ -86,7 +100,15 @@ g3/
 - `write_file`: Create or overwrite files with content
 - `str_replace`: Apply unified diffs to files with precise editing
 - `final_output`: Signal task completion with detailed summaries
- **Project Management**: Workspace handling, requirements.md processing for autonomous mode
+- `todo_read`: Read the entire TODO list content
+- `todo_write`: Write or overwrite the entire TODO list
+- `mouse_click`: Click the mouse at specific coordinates
+- `type_text`: Type text at the current cursor position
+- `find_element`: Find UI elements by text, role, or attributes
+- `take_screenshot`: Capture screenshots of screen, region, or window
+- `extract_text`: Extract text from images or screen regions using OCR
+- `find_text_on_screen`: Find text visually on screen and return coordinates
+- `list_windows`: List all open windows with IDs and titles

 ### 2. g3-providers: LLM Provider Abstraction

@@ -172,6 +194,26 @@ g3/
 - **Validation**: Configuration validation with helpful error messages
 - **Flexible Paths**: Support for shell expansion (`~`, environment variables)

+### 6. g3-computer-control: Computer Control & Automation
+
+**Primary Responsibilities:**
+- Cross-platform computer control and automation
+- Mouse and keyboard input simulation
+- Window management and screenshot capture
+- OCR text extraction from images and screen regions
+
+**Platform Support:**
+- **macOS**: Core Graphics, Cocoa, screencapture integration
+- **Linux**: X11/Xtest for input, X11 for window management
+- **Windows**: Win32 APIs for input and window control
+
+**Key Features:**
+- **OCR Integration**: Tesseract-based text extraction from images
+- **Window Management**: List, identify, and capture specific application windows
+- **UI Automation**: Find elements, simulate clicks, type text
+- **Screenshot Capture**: Full screen, regions, or specific windows
+- **Accessibility**: Requires OS-level permissions for automation
+
 ## Advanced Features

 ### Context Window Management
@@ -180,6 +222,7 @@ G3 implements sophisticated context window management:

 - **Automatic Monitoring**: Tracks token usage with percentage-based thresholds
 - **Smart Summarization**: Auto-triggers at 80% capacity to prevent context overflow
+- **Context Thinning**: Progressive thinning at 50%, 60%, 70%, 80% thresholds - replaces large tool results with file references
 - **Conversation Preservation**: Maintains conversation continuity through intelligent summaries
 - **Provider-Specific Limits**: Adapts to different model context windows (4k to 200k+ tokens)
 - **Cumulative Tracking**: Monitors total token usage across entire sessions
@@ -354,20 +397,23 @@ This design document reflects the current state of G3 as a mature, production-re
 ### Fully Implemented
 - ✅ **Core Agent Engine**: Complete with streaming, tool execution, and context management
 - ✅ **Provider System**: Anthropic, Databricks, and Embedded providers with OAuth support
- ✅ **Tool System**: All 5 core tools (shell, read_file, write_file, str_replace, final_output)
+- ✅ **Tool System**: 13 tools including file ops, shell, TODO management, and computer control
 - ✅ **CLI Interface**: Interactive mode, single-shot mode, retro TUI
 - ✅ **Autonomous Mode**: Coach-player feedback loop with requirements.md processing
 - ✅ **Configuration**: TOML-based config with environment overrides
 - ✅ **Error Handling**: Comprehensive retry logic and error classification
 - ✅ **Session Logging**: Automatic session tracking and JSON logs
- ✅ **Context Management**: Auto-summarization at 80% capacity
+- ✅ **Context Management**: Context thinning (50-80%) and auto-summarization at 80% capacity
+- ✅ **Computer Control**: Cross-platform automation with OCR support
+- ✅ **TODO Management**: In-memory TODO list with read/write tools

 ### Architecture Highlights
- **Workspace**: 5 crates with clear separation of concerns
+- **Workspace**: 6 crates with clear separation of concerns
 - **Dependencies**: Modern Rust ecosystem (Tokio, Clap, Serde, etc.)
 - **Streaming**: Real-time response processing with tool call detection
 - **Cross-Platform**: Works on macOS, Linux, and Windows
- **GPU Support**: Metal acceleration for local models on macOS
+- **GPU Support**: Metal acceleration for local models on macOS, CUDA on Linux
+- **OCR Support**: Tesseract integration for text extraction from images

 ### Key Files
 - `src/main.rs`: main entry point delegating to g3-cli
@@ -376,3 +422,5 @@ This design document reflects the current state of G3 as a mature, production-re
 - `crates/g3-providers/src/lib.rs`: provider trait and registry
 - `crates/g3-config/src/lib.rs`: configuration management
 - `crates/g3-execution/src/lib.rs`: code execution engine
+- `crates/g3-computer-control/src/lib.rs`: computer control and automation
+- `crates/g3-computer-control/src/platform/`: platform-specific implementations
--- a/README.md
+++ b/README.md
@@ -11,8 +11,8 @@ G3 follows a modular architecture organized as a Rust workspace with multiple cr
 #### **g3-core**
 The heart of the agent system, containing:
 - **Agent Engine**: Main orchestration logic for handling conversations, tool execution, and task management
- **Context Window Management**: Intelligent tracking of token usage with auto-summarization capabilities when approaching context limits (~80% capacity)
- **Tool System**: Built-in tools for file operations (read, write, edit), shell command execution, and structured output generation
+- **Context Window Management**: Intelligent tracking of token usage with context thinning (50-80%) and auto-summarization at 80% capacity
+- **Tool System**: Built-in tools for file operations, shell commands, computer control, TODO management, and structured output
 - **Streaming Response Parser**: Real-time parsing of LLM responses with tool call detection and execution
 - **Task Execution**: Support for single and iterative task execution with automatic retry logic

@@ -40,6 +40,13 @@ Task execution framework:
 - Error handling and retry mechanisms
 - Progress tracking and reporting

+#### **g3-computer-control**
+Computer control capabilities:
+- Mouse and keyboard automation
+- UI element inspection and interaction
+- Screenshot capture and window management
+- OCR text extraction via Tesseract
+
 #### **g3-cli**
 Command-line interface:
 - Interactive terminal interface
@@ -61,13 +68,21 @@ G3 includes robust error handling with automatic retry logic:
 ### Intelligent Context Management
 - Automatic context window monitoring with percentage-based tracking
 - Smart auto-summarization when approaching token limits
+- **Context thinning** at 50%, 60%, 70%, 80% thresholds - automatically replaces large tool results with file references
 - Conversation history preservation through summaries
- Dynamic token allocation for different providers
+- Dynamic token allocation for different providers (4k to 200k+ tokens)

 ### Tool Ecosystem
 - **File Operations**: Read, write, and edit files with line-range precision
 - **Shell Integration**: Execute system commands with output capture
 - **Code Generation**: Structured code generation with syntax awareness
+- **TODO Management**: Read and write TODO lists with markdown checkbox format
+- **Computer Control** (Experimental): Automate desktop applications
+  - Mouse and keyboard control
+  - UI element inspection
+  - Screenshot capture and window management
+  - OCR text extraction from images and screen regions
+  - Window listing and identification
 - **Final Output**: Formatted result presentation

 ### Provider Flexibility
@@ -102,6 +117,7 @@ G3 is designed for:
 - API integration and testing
 - Documentation generation
 - Complex multi-step workflows
+- Desktop application automation and testing

 ## Getting Started

@@ -116,6 +132,41 @@ cargo run
 g3 "implement a function to calculate fibonacci numbers"
 ```

+## WebDriver Browser Automation
+
+G3 includes WebDriver support for browser automation tasks using Safari.
+
+**One-Time Setup** (macOS only):
+
+Safari Remote Automation must be enabled before using WebDriver tools. Run this once:
+
+```bash
+# Option 1: Use the provided script
+./scripts/enable-safari-automation.sh
+
+# Option 2: Enable manually
+safaridriver --enable  # Requires password
+
+# Option 3: Enable via Safari UI
+# Safari → Preferences → Advanced → Show Develop menu
+# Then: Develop → Allow Remote Automation
+```
+
+**For detailed setup instructions and troubleshooting**, see [WebDriver Setup Guide](docs/webdriver-setup.md).
+
+**Usage**: Run G3 with the `--webdriver` flag to enable browser automation tools.
+
+## Computer Control (Experimental)
+
+G3 can interact with your computer's GUI for automation tasks:
+
+**Available Tools**: `mouse_click`, `type_text`, `find_element`, `take_screenshot`, `extract_text`, `find_text_on_screen`, `list_windows`
+
+**Setup**: Enable in config with `computer_control.enabled = true` and grant OS accessibility permissions:
+- **macOS**: System Preferences → Security & Privacy → Accessibility  
+- **Linux**: Ensure X11 or Wayland access
+- **Windows**: Run as administrator (first time only)
+
 ## Session Logs

 G3 automatically saves session logs for each interaction in the `logs/` directory. These logs contain:
--- a/config.coach-player.example.toml
+++ b/config.coach-player.example.toml
@@ -0,0 +1,24 @@
+[providers]
+default_provider = "databricks"
+# Specify different providers for coach and player in autonomous mode
+coach = "databricks"    # Provider for coach (code reviewer) - can be more powerful/expensive
+player = "anthropic"    # Provider for player (code implementer) - can be faster/cheaper
+
+[providers.databricks]
+host = "https://your-workspace.cloud.databricks.com"
+# token = "your-databricks-token"  # Optional - will use OAuth if not provided
+model = "databricks-claude-sonnet-4"
+max_tokens = 4096
+temperature = 0.1
+use_oauth = true
+
+[providers.anthropic]
+api_key = "your-anthropic-api-key"
+model = "claude-3-haiku-20240307"  # Using a faster model for player
+max_tokens = 4096
+temperature = 0.3  # Slightly higher temperature for more creative implementations
+
+[agent]
+max_context_length = 8192
+enable_streaming = true
+timeout_seconds = 60
--- a/config.example.toml
+++ b/config.example.toml
@@ -1,5 +1,10 @@
 [providers]
 default_provider = "databricks"
+# Optional: Specify different providers for coach and player in autonomous mode
+# If not specified, will use default_provider for both
+# coach = "databricks"    # Provider for coach (code reviewer)
+# player = "anthropic"    # Provider for player (code implementer)
+# Note: Make sure the specified providers are configured below

 [providers.databricks]
 host = "https://your-workspace.cloud.databricks.com"
@@ -13,3 +18,8 @@ use_oauth = true
 max_context_length = 8192
 enable_streaming = true
 timeout_seconds = 60
+
+[computer_control]
+enabled = false  # Set to true to enable computer control (requires OS permissions)
+require_confirmation = true
+max_actions_per_second = 5
--- a/crates/g3-cli/src/lib.rs
+++ b/crates/g3-cli/src/lib.rs
@@ -1,7 +1,5 @@
 use anyhow::Result;
 use std::time::{Duration, Instant};
-/// Extract coach feedback by reading from the coach agent's specific log file
-/// Uses the coach agent's session ID to find the exact log file

 #[derive(Debug, Clone)]
 struct TurnMetrics {
@@ -21,7 +19,7 @@ fn generate_turn_histogram(turn_metrics: &[TurnMetrics]) -> String {
    // Find max values for scaling
    let max_tokens = turn_metrics.iter().map(|t| t.tokens_used).max().unwrap_or(1);
    let max_time_ms = turn_metrics.iter()
-        .map(|t| t.wall_clock_time.as_millis() as u32)
+        .map(|t| t.wall_clock_time.as_millis().min(u32::MAX as u128) as u32)
        .max()
        .unwrap_or(1);
    
@@ -35,7 +33,7 @@ fn generate_turn_histogram(turn_metrics: &[TurnMetrics]) -> String {
    histogram.push_str(&format!("   {} = Wall Clock Time (max: {:.1}s)\n\n", TIME_CHAR, max_time_ms as f64 / 1000.0));
    
    for metrics in turn_metrics {
-        let turn_time_ms = metrics.wall_clock_time.as_millis() as u32;
+        let turn_time_ms = metrics.wall_clock_time.as_millis().min(u32::MAX as u128) as u32;
        
        // Calculate bar lengths (proportional to max values)
        let token_bar_len = if max_tokens > 0 {
@@ -99,12 +97,19 @@ fn generate_turn_histogram(turn_metrics: &[TurnMetrics]) -> String {
    histogram
 }

-fn extract_coach_feedback_from_logs(_coach_result: &g3_core::TaskResult, coach_agent: &g3_core::Agent<ConsoleUiWriter>, output: &SimpleOutput) -> Result<String> {
+/// Extract coach feedback by reading from the coach agent's specific log file
+/// Uses the coach agent's session ID to find the exact log file
+fn extract_coach_feedback_from_logs(
+    coach_result: &g3_core::TaskResult,
+    coach_agent: &g3_core::Agent<ConsoleUiWriter>,
+    output: &SimpleOutput,
+) -> Result<String> {
    // CORRECT APPROACH: Get the session ID from the current coach agent
    // and read its specific log file directly

    // Get the coach agent's session ID
-    let session_id = coach_agent.get_session_id()
+    let session_id = coach_agent
+        .get_session_id()
        .ok_or_else(|| anyhow::anyhow!("Coach agent has no session ID"))?;

    // Construct the log file path for this specific coach session
@@ -122,7 +127,10 @@ fn extract_coach_feedback_from_logs(_coach_result: &g3_core::TaskResult, coach_a
                            if let Some(last_message) = messages.last() {
                                if let Some(content) = last_message.get("content") {
                                    if let Some(content_str) = content.as_str() {
-                                        output.print(&format!("✅ Extracted coach feedback from session: {}", session_id));
+                                        output.print(&format!(
+                                            "✅ Extracted coach feedback from session: {}",
+                                            session_id
+                                        ));
                                        return Ok(content_str.to_string());
                                    }
                                }
@@ -134,7 +142,18 @@ fn extract_coach_feedback_from_logs(_coach_result: &g3_core::TaskResult, coach_a
        }
    }

-    Err(anyhow::anyhow!("Could not extract feedback from coach session: {}", session_id))
+    // If we couldn't extract from logs, panic with detailed error
+    panic!(
+        "CRITICAL: Could not extract coach feedback from session: {}\n\
+         Log file path: {:?}\n\
+         Log file exists: {}\n\
+         This indicates the coach did not call any tool or the log is corrupted.\n\
+         Coach result response length: {} chars",
+        session_id,
+        log_file_path,
+        log_file_path.exists(),
+        coach_result.response.len()
+    );
 }

 use clap::Parser;
@@ -197,6 +216,10 @@ pub struct Cli {
    #[arg(long, value_name = "TEXT")]
    pub requirements: Option<String>,

+    /// Interactive mode: prompt for requirements and save to requirements.md before starting autonomous mode
+    #[arg(long)]
+    pub interactive_requirements: bool,
+
    /// Use retro terminal UI (inspired by 80s sci-fi)
    #[arg(long)]
    pub retro: bool,
@@ -284,6 +307,112 @@ pub async fn run() -> Result<()> {

    // Create project model
    let project = if cli.autonomous {
+        // Handle interactive requirements mode with AI enhancement
+        if cli.interactive_requirements {
+            println!("\n📝 Interactive Requirements Mode");
+            println!("================================\n");
+            println!("Describe what you want to build (can be brief):");
+            println!("Press Ctrl+D (Unix) or Ctrl+Z (Windows) when done.\n");
+            
+            use std::io::{self, Read, Write};
+            let mut requirements_input = String::new();
+            io::stdin().read_to_string(&mut requirements_input)?;
+            
+            if requirements_input.trim().is_empty() {
+                anyhow::bail!("No requirements provided. Exiting.");
+            }
+            
+            println!("\n🤖 Enhancing your requirements with AI...\n");
+            
+            // Create a temporary agent to enhance the requirements
+            let temp_config = Config::load_with_overrides(
+                cli.config.as_deref(),
+                cli.provider.clone(),
+                cli.model.clone(),
+            )?;
+            
+            let ui_writer = ConsoleUiWriter::new();
+            let mut temp_agent = Agent::new_with_readme_and_quiet(
+                temp_config,
+                ui_writer,
+                None,
+                true, // quiet mode
+            ).await?;
+            
+            // Craft the enhancement prompt
+            let enhancement_prompt = format!(
+                r#"You are a requirements analyst. Take this brief user input and expand it into a structured requirements document.
+
+USER INPUT:
+{}
+
+Create a professional requirements document with:
+1. A clear project title (# heading)
+2. An overview section explaining what will be built
+3. Organized requirements (functional, technical, quality)
+4. Acceptance criteria
+5. Any technical constraints or preferences mentioned
+
+Format as proper markdown. Be specific and actionable. If the user's input is vague, make reasonable assumptions but keep it focused on what they described.
+
+Output ONLY the markdown content, no explanations or meta-commentary."#,
+                requirements_input.trim()
+            );
+            
+            // Execute enhancement task
+            let result = temp_agent
+                .execute_task_with_timing(&enhancement_prompt, None, false, false, false, false)
+                .await?;
+            
+            let enhanced_requirements = result.response.trim().to_string();
+            
+            // Show the enhanced requirements
+            println!("\n📋 Enhanced Requirements Document:");
+            println!("{}\n", "=".repeat(60));
+            println!("{}", enhanced_requirements);
+            println!("{}\n", "=".repeat(60));
+            
+            // Ask for confirmation
+            println!("\n❓ Is this requirements document acceptable?");
+            println!("   [y] Yes, proceed with autonomous mode");
+            println!("   [e] Edit and save manually");
+            println!("   [n] No, cancel\n");
+            
+            print!("Your choice (y/e/n): ");
+            io::stdout().flush()?;
+            
+            let mut choice = String::new();
+            io::stdin().read_line(&mut choice)?;
+            let choice = choice.trim().to_lowercase();
+            
+            let requirements_path = workspace_dir.join("requirements.md");
+            
+            match choice.as_str() {
+                "y" | "yes" => {
+                    // Save enhanced requirements
+                    std::fs::write(&requirements_path, &enhanced_requirements)?;
+                    println!("\n✅ Requirements saved to: {}", requirements_path.display());
+                    println!("🚀 Starting autonomous mode...\n");
+                }
+                "e" | "edit" => {
+                    // Save enhanced requirements for manual editing
+                    std::fs::write(&requirements_path, &enhanced_requirements)?;
+                    println!("\n✅ Requirements saved to: {}", requirements_path.display());
+                    println!("📝 Please edit the file and run: g3 --autonomous");
+                    println!("   Exiting for now.\n");
+                    return Ok(());
+                }
+                "n" | "no" => {
+                    println!("\n❌ Cancelled. No files were saved.\n");
+                    return Ok(());
+                }
+                _ => {
+                    println!("\n❌ Invalid choice. Cancelled.\n");
+                    return Ok(());
+                }
+            }
+        }
+        
        if let Some(requirements_text) = cli.requirements {
            // Use requirements text override
            Project::new_autonomous_with_requirements(workspace_dir.clone(), requirements_text)?
@@ -316,7 +445,8 @@ pub async fn run() -> Result<()> {
        if !valid_providers.contains(&provider.as_str()) {
            return Err(anyhow::anyhow!(
                "Invalid provider '{}'. Valid options: {:?}",
-                provider, valid_providers
+                provider,
+                valid_providers
            ));
        }
    }
@@ -335,9 +465,21 @@ pub async fn run() -> Result<()> {
    };
    
    let mut agent = if cli.autonomous {
-        Agent::new_autonomous_with_readme_and_quiet(config.clone(), ui_writer, combined_content.clone(), cli.quiet).await?
+        Agent::new_autonomous_with_readme_and_quiet(
+            config.clone(),
+            ui_writer,
+            combined_content.clone(),
+            cli.quiet,
+        )
+        .await?
    } else {
-        Agent::new_with_readme_and_quiet(config.clone(), ui_writer, combined_content.clone(), cli.quiet).await?
+        Agent::new_with_readme_and_quiet(
+            config.clone(),
+            ui_writer,
+            combined_content.clone(),
+            cli.quiet,
+        )
+        .await?
    };

    // Execute task, autonomous mode, or start interactive mode
@@ -1119,7 +1261,10 @@ async fn run_autonomous(
        output.print("❌ Error: requirements.md not found in workspace directory");
        output.print("   Please either:");
        output.print("   1. Create a requirements.md file with your project requirements at:");
-        output.print(&format!("      {}/requirements.md", project.workspace().display()));
+        output.print(&format!(
+            "      {}/requirements.md",
+            project.workspace().display()
+        ));
        output.print("   2. Or use the --requirements flag to provide requirements text directly:");
        output.print("      g3 --autonomous --requirements \"Your requirements here\"");
        output.print("");
@@ -1254,11 +1399,17 @@ async fn run_autonomous(
            // If there's no coach feedback on subsequent turns, this is an error
            if coach_feedback.is_empty() {
                if turn > 1 {
-                    return Err(anyhow::anyhow!("Player mode error: No coach feedback received on turn {}", turn));
+                    return Err(anyhow::anyhow!(
+                        "Player mode error: No coach feedback received on turn {}",
+                        turn
+                    ));
                }
                output.print("📋 Player starting initial implementation (no prior coach feedback)");
            } else {
-                output.print(&format!("📋 Player received coach feedback ({} chars):", coach_feedback.len()));
+                output.print(&format!(
+                    "📋 Player received coach feedback ({} chars):",
+                    coach_feedback.len()
+                ));
                output.print(&format!("{}", coach_feedback));
            }
            output.print(""); // Empty line for readability
@@ -1356,7 +1507,7 @@ async fn run_autonomous(
                ));
                // Record turn metrics before incrementing
                let turn_duration = turn_start_time.elapsed();
-                let turn_tokens = agent.get_context_window().used_tokens - turn_start_tokens;
+                let turn_tokens = agent.get_context_window().used_tokens.saturating_sub(turn_start_tokens);
                turn_metrics.push(TurnMetrics {
                    turn_number: turn,
                    tokens_used: turn_tokens,
@@ -1382,9 +1533,15 @@ async fn run_autonomous(

        // Create a new agent instance for coach mode to ensure fresh context
        // Use the same config with overrides that was passed to the player agent
-        let config = agent.get_config().clone();
+        let base_config = agent.get_config().clone();
+        let coach_config = base_config.for_coach()?;
+
+        // Reset filter suppression state before creating coach agent
+        g3_core::fixed_filter_json::reset_fixed_json_tool_state();
+
        let ui_writer = ConsoleUiWriter::new();
-        let mut coach_agent = Agent::new_autonomous_with_readme_and_quiet(config, ui_writer, None, quiet).await?;
+        let mut coach_agent =
+            Agent::new_autonomous_with_readme_and_quiet(coach_config, ui_writer, None, quiet).await?;

        // Ensure coach agent is also in the workspace directory
        project.enter_workspace()?;
@@ -1414,13 +1571,13 @@ CRITICAL INSTRUCTIONS:
 3. Focus ONLY on what needs to be fixed or improved
 4. Do NOT include your analysis process, file contents, or compilation output in the summary

-If the implementation correctly meets all requirements and compiles without errors:
+If the implementation generally meets all requirements and compiles without errors:
 - Call final_output with summary: 'IMPLEMENTATION_APPROVED'

 If improvements are needed:
 - Call final_output with a brief summary listing ONLY the specific issues to fix

-Remember: Be thorough in your review but concise in your feedback. APPROVE if the implementation works and generally fits the requirements.",
+Remember: Be clear in your review and concise in your feedback. APPROVE if the implementation works and generally fits the requirements. Don't be picky.",
            requirements
        );

@@ -1511,7 +1668,7 @@ Remember: Be thorough in your review but concise in your feedback. APPROVE if th
            coach_feedback = "The implementation needs review. Please ensure all requirements are met and the code compiles without errors.".to_string();
            // Record turn metrics before incrementing
            let turn_duration = turn_start_time.elapsed();
-            let turn_tokens = agent.get_context_window().used_tokens - turn_start_tokens;
+            let turn_tokens = agent.get_context_window().used_tokens.saturating_sub(turn_start_tokens);
            turn_metrics.push(TurnMetrics {
                turn_number: turn,
                tokens_used: turn_tokens,
@@ -1531,7 +1688,8 @@ Remember: Be thorough in your review but concise in your feedback. APPROVE if th
        let coach_result = coach_result_opt.unwrap();

        // Extract the complete coach feedback from final_output
-        let coach_feedback_text = extract_coach_feedback_from_logs(&coach_result, &coach_agent, &output)?;
+        let coach_feedback_text =
+            extract_coach_feedback_from_logs(&coach_result, &coach_agent, &output)?;

        // Log the size of the feedback for debugging
        info!(
@@ -1546,7 +1704,7 @@ Remember: Be thorough in your review but concise in your feedback. APPROVE if th
            coach_feedback = "The implementation needs review. Please ensure all requirements are met and the code compiles without errors.".to_string();
            // Record turn metrics before incrementing
            let turn_duration = turn_start_time.elapsed();
-            let turn_tokens = agent.get_context_window().used_tokens - turn_start_tokens;
+            let turn_tokens = agent.get_context_window().used_tokens.saturating_sub(turn_start_tokens);
            turn_metrics.push(TurnMetrics {
                turn_number: turn,
                tokens_used: turn_tokens,
@@ -1577,7 +1735,7 @@ Remember: Be thorough in your review but concise in your feedback. APPROVE if th
        coach_feedback = coach_feedback_text;
        // Record turn metrics before incrementing
        let turn_duration = turn_start_time.elapsed();
-        let turn_tokens = agent.get_context_window().used_tokens - turn_start_tokens;
+        let turn_tokens = agent.get_context_window().used_tokens.saturating_sub(turn_start_tokens);
        turn_metrics.push(TurnMetrics {
            turn_number: turn,
            tokens_used: turn_tokens,
--- a/crates/g3-cli/src/ui_writer_impl.rs
+++ b/crates/g3-cli/src/ui_writer_impl.rs
@@ -10,6 +10,7 @@ pub struct ConsoleUiWriter {
    current_tool_args: Mutex<Vec<(String, String)>>,
    current_output_line: Mutex<Option<String>>,
    output_line_printed: Mutex<bool>,
+    in_todo_tool: Mutex<bool>,
 }

 impl ConsoleUiWriter {
@@ -19,6 +20,60 @@ impl ConsoleUiWriter {
            current_tool_args: Mutex::new(Vec::new()),
            current_output_line: Mutex::new(None),
            output_line_printed: Mutex::new(false),
+            in_todo_tool: Mutex::new(false),
+        }
+    }
+
+    fn print_todo_line(&self, line: &str) {
+        // Transform and print todo list lines elegantly
+        let trimmed = line.trim();
+        
+        // Skip the "📝 TODO list:" prefix line
+        if trimmed.starts_with("📝 TODO list:") || trimmed == "📝 TODO list is empty" {
+            return;
+        }
+        
+        // Handle empty lines
+        if trimmed.is_empty() {
+            println!();
+            return;
+        }
+        
+        // Detect indentation level
+        let indent_count = line.chars().take_while(|c| c.is_whitespace()).count();
+        let indent = "  ".repeat(indent_count / 2); // Convert spaces to visual indent
+        
+        // Format based on line type
+        if trimmed.starts_with("- [ ]") {
+            // Incomplete task
+            let task = trimmed.strip_prefix("- [ ]").unwrap_or(trimmed).trim();
+            println!("{}☐ {}", indent, task);
+        } else if trimmed.starts_with("- [x]") || trimmed.starts_with("- [X]") {
+            // Completed task
+            let task = trimmed.strip_prefix("- [x]")
+                .or_else(|| trimmed.strip_prefix("- [X]"))
+                .unwrap_or(trimmed)
+                .trim();
+            println!("{}\x1b[2m☑ {}\x1b[0m", indent, task);
+        } else if trimmed.starts_with("- ") {
+            // Regular bullet point
+            let item = trimmed.strip_prefix("- ").unwrap_or(trimmed).trim();
+            println!("{}• {}", indent, item);
+        } else if trimmed.starts_with("# ") {
+            // Heading
+            let heading = trimmed.strip_prefix("# ").unwrap_or(trimmed).trim();
+            println!("\n\x1b[1m{}\x1b[0m", heading);
+        } else if trimmed.starts_with("## ") {
+            // Subheading
+            let subheading = trimmed.strip_prefix("## ").unwrap_or(trimmed).trim();
+            println!("\n\x1b[1m{}\x1b[0m", subheading);
+        } else if trimmed.starts_with("**") && trimmed.ends_with("**") {
+            // Bold text (section marker)
+            let text = trimmed.trim_start_matches("**").trim_end_matches("**");
+            println!("{}\x1b[1m{}\x1b[0m", indent, text);
+        } else {
+            // Regular text or note
+            println!("{}{}", indent, trimmed);
        }
    }
 }
@@ -53,6 +108,15 @@ impl UiWriter for ConsoleUiWriter {
        // Store the tool name and clear args for collection
        *self.current_tool_name.lock().unwrap() = Some(tool_name.to_string());
        self.current_tool_args.lock().unwrap().clear();
+        
+        // Check if this is a todo tool call
+        let is_todo = tool_name == "todo_read" || tool_name == "todo_write";
+        *self.in_todo_tool.lock().unwrap() = is_todo;
+        
+        // For todo tools, we'll skip the normal header and print a custom one later
+        if is_todo {
+            return;
+        }
    }

    fn print_tool_arg(&self, key: &str, value: &str) {
@@ -75,6 +139,12 @@ impl UiWriter for ConsoleUiWriter {
    }

    fn print_tool_output_header(&self) {
+        // Skip normal header for todo tools
+        if *self.in_todo_tool.lock().unwrap() {
+            println!(); // Just add a newline
+            return;
+        }
+        
        println!();
        // Now print the tool header with the most important arg in bold green
        if let Some(tool_name) = self.current_tool_name.lock().unwrap().as_ref() {
@@ -115,8 +185,8 @@ impl UiWriter for ConsoleUiWriter {
                    String::new()
                };

-                // Print with bold green formatting using ANSI escape codes
-                println!("┌─\x1b[1;32m {} | {}{}\x1b[0m", tool_name, display_value, header_suffix);
+                // Print with bold green tool name, purple (non-bold) for pipe and args
+                println!("┌─\x1b[1;32m {}\x1b[0m\x1b[35m | {}{}\x1b[0m", tool_name, display_value, header_suffix);
            } else {
                // Print with bold green formatting using ANSI escape codes
                println!("┌─\x1b[1;32m {}\x1b[0m", tool_name);
@@ -144,10 +214,21 @@ impl UiWriter for ConsoleUiWriter {
    }

    fn print_tool_output_line(&self, line: &str) {
+        // Special handling for todo tools
+        if *self.in_todo_tool.lock().unwrap() {
+            self.print_todo_line(line);
+            return;
+        }
+        
        println!("│ \x1b[2m{}\x1b[0m", line);
    }

    fn print_tool_output_summary(&self, count: usize) {
+        // Skip for todo tools
+        if *self.in_todo_tool.lock().unwrap() {
+            return;
+        }
+        
        println!(
            "│ \x1b[2m({} line{})\x1b[0m",
            count,
@@ -156,7 +237,55 @@ impl UiWriter for ConsoleUiWriter {
    }

    fn print_tool_timing(&self, duration_str: &str) {
-        println!("└─ ⚡️ {}", duration_str);
+        // For todo tools, just print a simple completion message
+        if *self.in_todo_tool.lock().unwrap() {
+            println!();
+            *self.in_todo_tool.lock().unwrap() = false;
+            return;
+        }
+        
+        // Parse the duration string to determine color
+        // Format is like "1.5s", "500ms", "2m 30.0s"
+        let color_code = if duration_str.ends_with("ms") {
+            // Milliseconds - use default color (< 1s)
+            ""
+        } else if duration_str.contains('m') {
+            // Contains minutes
+            // Extract minutes value
+            if let Some(m_pos) = duration_str.find('m') {
+                if let Ok(minutes) = duration_str[..m_pos].trim().parse::<u32>() {
+                    if minutes >= 5 {
+                        "\x1b[31m" // Red for >= 5 minutes
+                    } else {
+                        "\x1b[38;5;208m" // Orange for >= 1 minute but < 5 minutes
+                    }
+                } else {
+                    "" // Default color if parsing fails
+                }
+            } else {
+                "" // Default color if 'm' not found (shouldn't happen)
+            }
+        } else if duration_str.ends_with('s') {
+            // Seconds only
+            if let Some(s_value) = duration_str.strip_suffix('s') {
+                if let Ok(seconds) = s_value.trim().parse::<f64>() {
+                    if seconds >= 1.0 {
+                        "\x1b[33m" // Yellow for >= 1 second
+                    } else {
+                        "" // Default color for < 1 second
+                    }
+                } else {
+                    "" // Default color if parsing fails
+                }
+            } else {
+                "" // Default color
+            }
+        } else {
+            // Milliseconds or other format - use default color
+            ""
+        };
+
+        println!("└─ ⚡️ {}{}\x1b[0m", color_code, duration_str);
        println!();
        // Clear the stored tool info
        *self.current_tool_name.lock().unwrap() = None;
--- a/crates/g3-computer-control/Cargo.toml
+++ b/crates/g3-computer-control/Cargo.toml
@@ -0,0 +1,46 @@
+[package]
+name = "g3-computer-control"
+version = "0.1.0"
+edition = "2021"
+
+[dependencies]
+# Workspace dependencies
+tokio = { workspace = true }
+anyhow = { workspace = true }
+thiserror = { workspace = true }
+serde = { workspace = true }
+serde_json = { workspace = true }
+tracing = { workspace = true }
+uuid = { workspace = true }
+
+shellexpand = "3.1"
+# Async trait support
+async-trait = "0.1"
+
+# WebDriver support
+fantoccini = "0.21"
+
+# OCR dependencies
+tesseract = "0.14"
+
+# macOS dependencies
+[target.'cfg(target_os = "macos")'.dependencies]
+core-graphics = "0.23"
+core-foundation = "0.9"
+cocoa = "0.25"
+objc = "0.2"
+image = "0.24"
+
+# Linux dependencies
+[target.'cfg(target_os = "linux")'.dependencies]
+x11 = { version = "2.21", features = ["xlib", "xtest"] }
+image = "0.24"
+
+# Windows dependencies
+[target.'cfg(target_os = "windows")'.dependencies]
+windows = { version = "0.52", features = [
+    "Win32_Foundation",
+    "Win32_UI_WindowsAndMessaging",
+    "Win32_UI_Input_KeyboardAndMouse",
+    "Win32_Graphics_Gdi",
+] }
--- a/crates/g3-computer-control/examples/debug_screenshot.rs
+++ b/crates/g3-computer-control/examples/debug_screenshot.rs
@@ -0,0 +1,46 @@
+use core_graphics::display::CGDisplay;
+
+fn main() {
+    let display = CGDisplay::main();
+    let image = display.image().expect("Failed to capture screen");
+    
+    println!("CGImage properties:");
+    println!("  Width: {}", image.width());
+    println!("  Height: {}", image.height());
+    println!("  Bits per component: {}", image.bits_per_component());
+    println!("  Bits per pixel: {}", image.bits_per_pixel());
+    println!("  Bytes per row: {}", image.bytes_per_row());
+    
+    let data = image.data();
+    let expected_size = image.width() * image.height() * 4;
+    println!("  Data length: {}", data.len());
+    println!("  Expected (w*h*4): {}", expected_size);
+    
+    // Check if there's padding in rows
+    let bytes_per_row = image.bytes_per_row();
+    let width = image.width();
+    let expected_bytes_per_row = width * 4;
+    println!("\nRow alignment:");
+    println!("  Actual bytes per row: {}", bytes_per_row);
+    println!("  Expected (width * 4): {}", expected_bytes_per_row);
+    println!("  Padding per row: {}", bytes_per_row - expected_bytes_per_row);
+    
+    // Sample some pixels from different locations
+    println!("\nFirst 3 pixels (raw bytes):");
+    for i in 0..3 {
+        let offset = i * 4;
+        println!("  Pixel {}: [{:3}, {:3}, {:3}, {:3}]", 
+                 i, data[offset], data[offset+1], data[offset+2], data[offset+3]);
+    }
+    
+    // Check a pixel from the middle
+    let mid_row = image.height() / 2;
+    let mid_col = image.width() / 2;
+    let mid_offset = (mid_row * bytes_per_row + mid_col * 4) as usize;
+    println!("\nMiddle pixel (row {}, col {}):", mid_row, mid_col);
+    println!("  Offset: {}", mid_offset);
+    if mid_offset + 3 < data.len() as usize {
+        println!("  Bytes: [{:3}, {:3}, {:3}, {:3}]", 
+                 data[mid_offset], data[mid_offset+1], data[mid_offset+2], data[mid_offset+3]);
+    }
+}
--- a/crates/g3-computer-control/examples/list_windows.rs
+++ b/crates/g3-computer-control/examples/list_windows.rs
@@ -0,0 +1,56 @@
+use core_graphics::window::{kCGWindowListOptionOnScreenOnly, kCGNullWindowID, CGWindowListCopyWindowInfo};
+use core_foundation::dictionary::CFDictionary;
+use core_foundation::string::CFString;
+use core_foundation::base::TCFType;
+
+fn main() {
+    println!("Listing all on-screen windows...");
+    println!("{:<10} {:<25} {}", "Window ID", "Owner", "Title");
+    println!("{}", "-".repeat(80));
+    
+    unsafe {
+        let window_list = CGWindowListCopyWindowInfo(
+            kCGWindowListOptionOnScreenOnly,
+            kCGNullWindowID
+        );
+        
+        let count = core_foundation::array::CFArray::<CFDictionary>::wrap_under_create_rule(window_list).len();
+        let array = core_foundation::array::CFArray::<CFDictionary>::wrap_under_create_rule(window_list);
+        
+        for i in 0..count {
+            let dict = array.get(i).unwrap();
+            
+            // Get window ID
+            let window_id_key = CFString::from_static_string("kCGWindowNumber");
+            let window_id: i64 = if let Some(value) = dict.find(window_id_key.as_concrete_TypeRef()) {
+                let num: core_foundation::number::CFNumber = TCFType::wrap_under_get_rule(*value as *const _);
+                num.to_i64().unwrap_or(0)
+            } else {
+                0
+            };
+            
+            // Get owner name
+            let owner_key = CFString::from_static_string("kCGWindowOwnerName");
+            let owner: String = if let Some(value) = dict.find(owner_key.as_concrete_TypeRef()) {
+                let s: CFString = TCFType::wrap_under_get_rule(*value as *const _);
+                s.to_string()
+            } else {
+                "Unknown".to_string()
+            };
+            
+            // Get window name/title
+            let name_key = CFString::from_static_string("kCGWindowName");
+            let title: String = if let Some(value) = dict.find(name_key.as_concrete_TypeRef()) {
+                let s: CFString = TCFType::wrap_under_get_rule(*value as *const _);
+                s.to_string()
+            } else {
+                "".to_string()
+            };
+            
+            // Filter for iTerm or show all
+            if owner.contains("iTerm") || owner.contains("Terminal") {
+                println!("{:<10} {:<25} {}", window_id, owner, title);
+            }
+        }
+    }
+}
--- a/crates/g3-computer-control/examples/safari_demo.rs
+++ b/crates/g3-computer-control/examples/safari_demo.rs
@@ -0,0 +1,64 @@
+use g3_computer_control::SafariDriver;
+use g3_computer_control::webdriver::WebDriverController;
+use anyhow::Result;
+
+#[tokio::main]
+async fn main() -> Result<()> {
+    println!("Safari WebDriver Demo");
+    println!("=====================\n");
+    
+    println!("Make sure to:");
+    println!("1. Enable 'Allow Remote Automation' in Safari's Develop menu");
+    println!("2. Run: /usr/bin/safaridriver --enable");
+    println!("3. Start safaridriver in another terminal: safaridriver --port 4444\n");
+    
+    println!("Connecting to SafariDriver...");
+    let mut driver = SafariDriver::new().await?;
+    println!("✅ Connected!\n");
+    
+    // Navigate to a website
+    println!("Navigating to example.com...");
+    driver.navigate("https://example.com").await?;
+    println!("✅ Navigated\n");
+    
+    // Get page title
+    let title = driver.title().await?;
+    println!("Page title: {}\n", title);
+    
+    // Get current URL
+    let url = driver.current_url().await?;
+    println!("Current URL: {}\n", url);
+    
+    // Find an element
+    println!("Finding h1 element...");
+    let mut h1 = driver.find_element("h1").await?;
+    let h1_text = h1.text().await?;
+    println!("H1 text: {}\n", h1_text);
+    
+    // Find all paragraphs
+    println!("Finding all paragraphs...");
+    let paragraphs = driver.find_elements("p").await?;
+    println!("Found {} paragraphs\n", paragraphs.len());
+    
+    // Get page source
+    println!("Getting page source...");
+    let source = driver.page_source().await?;
+    println!("Page source length: {} bytes\n", source.len());
+    
+    // Execute JavaScript
+    println!("Executing JavaScript...");
+    let result = driver.execute_script("return document.title", vec![]).await?;
+    println!("JS result: {:?}\n", result);
+    
+    // Take a screenshot
+    println!("Taking screenshot...");
+    driver.screenshot("/tmp/safari_demo.png").await?;
+    println!("✅ Screenshot saved to /tmp/safari_demo.png\n");
+    
+    // Close the browser
+    println!("Closing browser...");
+    driver.quit().await?;
+    println!("✅ Done!");
+    
+    Ok(())
+}
--- a/crates/g3-computer-control/examples/test_permission_prompt.rs
+++ b/crates/g3-computer-control/examples/test_permission_prompt.rs
@@ -0,0 +1,21 @@
+use g3_computer_control::{create_controller, ComputerController};
+
+#[tokio::main]
+async fn main() {
+    println!("Testing screenshot with permission prompt...");
+    
+    let controller = create_controller().expect("Failed to create controller");
+    
+    match controller.take_screenshot("/tmp/test_with_prompt.png", None, None).await {
+        Ok(_) => {
+            println!("\n✅ Screenshot saved to /tmp/test_with_prompt.png");
+            println!("Opening screenshot...");
+            let _ = std::process::Command::new("open")
+                .arg("/tmp/test_with_prompt.png")
+                .spawn();
+        }
+        Err(e) => {
+            println!("❌ Screenshot failed: {}", e);
+        }
+    }
+}
--- a/crates/g3-computer-control/examples/test_screencapture_direct.rs
+++ b/crates/g3-computer-control/examples/test_screencapture_direct.rs
@@ -0,0 +1,39 @@
+use std::process::Command;
+
+fn main() {
+    let path = "/tmp/rust_screencapture_test.png";
+    
+    println!("Testing screencapture command from Rust...");
+    
+    let mut cmd = Command::new("screencapture");
+    cmd.arg("-x"); // No sound
+    cmd.arg(path);
+    
+    println!("Command: {:?}", cmd);
+    
+    match cmd.output() {
+        Ok(output) => {
+            println!("Exit status: {}", output.status);
+            println!("Stdout: {}", String::from_utf8_lossy(&output.stdout));
+            println!("Stderr: {}", String::from_utf8_lossy(&output.stderr));
+            
+            if output.status.success() {
+                println!("\n✅ Screenshot saved to: {}", path);
+                
+                // Check file exists and size
+                if let Ok(metadata) = std::fs::metadata(path) {
+                    println!("File size: {} bytes ({:.1} MB)", metadata.len(), metadata.len() as f64 / 1_000_000.0);
+                }
+                
+                // Open it
+                let _ = Command::new("open").arg(path).spawn();
+                println!("\nOpened screenshot - please verify it looks correct!");
+            } else {
+                println!("\n❌ Screenshot failed!");
+            }
+        }
+        Err(e) => {
+            println!("❌ Failed to execute screencapture: {}", e);
+        }
+    }
+}
--- a/crates/g3-computer-control/examples/test_screenshot_fix.rs
+++ b/crates/g3-computer-control/examples/test_screenshot_fix.rs
@@ -0,0 +1,69 @@
+use core_graphics::display::CGDisplay;
+use image::{ImageBuffer, RgbaImage};
+use std::path::Path;
+
+fn main() {
+    let display = CGDisplay::main();
+    let image = display.image().expect("Failed to capture screen");
+    
+    let width = image.width() as u32;
+    let height = image.height() as u32;
+    let bytes_per_row = image.bytes_per_row() as usize;
+    let data = image.data();
+    
+    println!("Testing screenshot fix...");
+    println!("Image: {}x{}, bytes_per_row: {}", width, height, bytes_per_row);
+    println!("Expected bytes per row: {}", width * 4);
+    println!("Padding per row: {} bytes", bytes_per_row - (width as usize * 4));
+    
+    // OLD METHOD (broken) - treating data as continuous
+    println!("\n=== OLD METHOD (BROKEN) ===");
+    let mut old_rgba = Vec::with_capacity(data.len() as usize);
+    for chunk in data.chunks_exact(4) {
+        old_rgba.push(chunk[2]); // R
+        old_rgba.push(chunk[1]); // G
+        old_rgba.push(chunk[0]); // B
+        old_rgba.push(chunk[3]); // A
+    }
+    println!("Converted {} pixels", old_rgba.len() / 4);
+    println!("Expected {} pixels", width * height);
+    
+    // NEW METHOD (fixed) - handling row padding
+    println!("\n=== NEW METHOD (FIXED) ===");
+    let mut new_rgba = Vec::with_capacity((width * height * 4) as usize);
+    for row in 0..height as usize {
+        let row_start = row * bytes_per_row;
+        let row_end = row_start + (width as usize * 4);
+        
+        for chunk in data[row_start..row_end].chunks_exact(4) {
+            new_rgba.push(chunk[2]); // R
+            new_rgba.push(chunk[1]); // G
+            new_rgba.push(chunk[0]); // B
+            new_rgba.push(chunk[3]); // A
+        }
+    }
+    println!("Converted {} pixels", new_rgba.len() / 4);
+    println!("Expected {} pixels", width * height);
+    
+    // Save a small crop from both methods
+    let crop_size = 200;
+    
+    // Old method crop
+    let old_crop: Vec<u8> = old_rgba.iter().take((crop_size * crop_size * 4) as usize).copied().collect();
+    if let Some(old_img) = ImageBuffer::from_raw(crop_size, crop_size, old_crop) {
+        let old_img: RgbaImage = old_img;
+        old_img.save("/tmp/screenshot_old_method.png").unwrap();
+        println!("\nSaved OLD method crop to: /tmp/screenshot_old_method.png");
+    }
+    
+    // New method crop
+    let new_crop: Vec<u8> = new_rgba.iter().take((crop_size * crop_size * 4) as usize).copied().collect();
+    if let Some(new_img) = ImageBuffer::from_raw(crop_size, crop_size, new_crop) {
+        let new_img: RgbaImage = new_img;
+        new_img.save("/tmp/screenshot_new_method.png").unwrap();
+        println!("Saved NEW method crop to: /tmp/screenshot_new_method.png");
+    }
+    
+    println!("\nOpen both images to compare:");
+    println!("  open /tmp/screenshot_old_method.png /tmp/screenshot_new_method.png");
+}
--- a/crates/g3-computer-control/examples/test_window_capture.rs
+++ b/crates/g3-computer-control/examples/test_window_capture.rs
@@ -0,0 +1,45 @@
+use g3_computer_control::create_controller;
+
+#[tokio::main]
+async fn main() {
+    println!("Testing window-specific screenshot capture...");
+    
+    let controller = create_controller().expect("Failed to create controller");
+    
+    // Test 1: Capture iTerm2 window
+    println!("\n1. Capturing iTerm2 window...");
+    match controller.take_screenshot("/tmp/iterm_window.png", None, Some("iTerm2")).await {
+        Ok(_) => {
+            println!("   ✅ iTerm2 window captured to /tmp/iterm_window.png");
+            let _ = std::process::Command::new("open").arg("/tmp/iterm_window.png").spawn();
+        }
+        Err(e) => println!("   ❌ Failed: {}", e),
+    }
+    
+    // Wait a moment for the image to open
+    tokio::time::sleep(tokio::time::Duration::from_secs(2)).await;
+    
+    // Test 2: Full screen capture for comparison
+    println!("\n2. Capturing full screen for comparison...");
+    match controller.take_screenshot("/tmp/fullscreen.png", None, None).await {
+        Ok(_) => {
+            println!("   ✅ Full screen captured to /tmp/fullscreen.png");
+            let _ = std::process::Command::new("open").arg("/tmp/fullscreen.png").spawn();
+        }
+        Err(e) => println!("   ❌ Failed: {}", e),
+    }
+    
+    println!("\n=== Comparison ===");
+    println!("iTerm window:  /tmp/iterm_window.png (should show ONLY iTerm window)");
+    println!("Full screen:   /tmp/fullscreen.png (should show entire desktop)");
+    
+    // Show file sizes
+    if let Ok(meta1) = std::fs::metadata("/tmp/iterm_window.png") {
+        if let Ok(meta2) = std::fs::metadata("/tmp/fullscreen.png") {
+            println!("\nFile sizes:");
+            println!("  iTerm window: {:.1} MB", meta1.len() as f64 / 1_000_000.0);
+            println!("  Full screen:  {:.1} MB", meta2.len() as f64 / 1_000_000.0);
+            println!("\nWindow capture should be smaller than full screen.");
+        }
+    }
+}
--- a/crates/g3-computer-control/src/lib.rs
+++ b/crates/g3-computer-control/src/lib.rs
@@ -0,0 +1,35 @@
+pub mod types;
+pub mod platform;
+pub mod webdriver;
+
+// Re-export webdriver types for convenience
+pub use webdriver::{WebDriverController, WebElement, safari::SafariDriver};
+
+use anyhow::Result;
+use async_trait::async_trait;
+use types::*;
+
+#[async_trait]
+pub trait ComputerController: Send + Sync {
+    // Screen capture
+    async fn take_screenshot(&self, path: &str, region: Option<Rect>, window_id: Option<&str>) -> Result<()>;
+    
+    // OCR operations
+    async fn extract_text_from_screen(&self, region: Rect) -> Result<String>;
+    async fn extract_text_from_image(&self, path: &str) -> Result<String>;
+}
+
+// Platform-specific constructor
+pub fn create_controller() -> Result<Box<dyn ComputerController>> {
+    #[cfg(target_os = "macos")]
+    return Ok(Box::new(platform::macos::MacOSController::new()?));
+    
+    #[cfg(target_os = "linux")]
+    return Ok(Box::new(platform::linux::LinuxController::new()?));
+    
+    #[cfg(target_os = "windows")]
+    return Ok(Box::new(platform::windows::WindowsController::new()?));
+    
+    #[cfg(not(any(target_os = "macos", target_os = "linux", target_os = "windows")))]
+    anyhow::bail!("Unsupported platform")
+}
--- a/crates/g3-computer-control/src/platform/linux.rs
+++ b/crates/g3-computer-control/src/platform/linux.rs
@@ -0,0 +1,161 @@
+use crate::{ComputerController, types::*};
+use anyhow::Result;
+use async_trait::async_trait;
+use tesseract::Tesseract;
+use uuid::Uuid;
+
+pub struct LinuxController {
+    // Placeholder for X11 connection or other state
+}
+
+impl LinuxController {
+    pub fn new() -> Result<Self> {
+        // Initialize X11 connection
+        tracing::warn!("Linux computer control not fully implemented");
+        Ok(Self {})
+    }
+}
+
+#[async_trait]
+impl ComputerController for LinuxController {
+    async fn move_mouse(&self, _x: i32, _y: i32) -> Result<()> {
+        anyhow::bail!("Linux implementation not yet available")
+    }
+    
+    async fn click(&self, _button: MouseButton) -> Result<()> {
+        anyhow::bail!("Linux implementation not yet available")
+    }
+    
+    async fn double_click(&self, _button: MouseButton) -> Result<()> {
+        anyhow::bail!("Linux implementation not yet available")
+    }
+    
+    async fn type_text(&self, _text: &str) -> Result<()> {
+        anyhow::bail!("Linux implementation not yet available")
+    }
+    
+    async fn press_key(&self, _key: &str) -> Result<()> {
+        anyhow::bail!("Linux implementation not yet available")
+    }
+    
+    async fn list_windows(&self) -> Result<Vec<Window>> {
+        anyhow::bail!("Linux implementation not yet available")
+    }
+    
+    async fn focus_window(&self, _window_id: &str) -> Result<()> {
+        anyhow::bail!("Linux implementation not yet available")
+    }
+    
+    async fn get_window_bounds(&self, _window_id: &str) -> Result<Rect> {
+        anyhow::bail!("Linux implementation not yet available")
+    }
+    
+    async fn find_element(&self, _selector: &ElementSelector) -> Result<Option<UIElement>> {
+        anyhow::bail!("Linux implementation not yet available")
+    }
+    
+    async fn get_element_text(&self, _element_id: &str) -> Result<String> {
+        anyhow::bail!("Linux implementation not yet available")
+    }
+    
+    async fn get_element_bounds(&self, _element_id: &str) -> Result<Rect> {
+        anyhow::bail!("Linux implementation not yet available")
+    }
+    
+    async fn take_screenshot(&self, _path: &str, _region: Option<Rect>, _window_id: Option<&str>) -> Result<()> {
+        anyhow::bail!("Linux implementation not yet available")
+    }
+    
+    async fn extract_text_from_screen(&self, _region: Rect) -> Result<OCRResult> {
+        anyhow::bail!("Linux implementation not yet available")
+    }
+    
+    async fn extract_text_from_image(&self, _path: &str) -> Result<OCRResult> {
+        // Check if tesseract is available on the system
+        let tesseract_check = std::process::Command::new("which")
+            .arg("tesseract")
+            .output();
+        
+        if tesseract_check.is_err() || !tesseract_check.as_ref().unwrap().status.success() {
+            anyhow::bail!("Tesseract OCR is not installed on your system.\n\n\
+                To install tesseract:\n  \
+                Ubuntu/Debian: sudo apt-get install tesseract-ocr\n  \
+                RHEL/CentOS:   sudo yum install tesseract\n  \
+                Arch Linux:    sudo pacman -S tesseract\n\n\
+                After installation, restart your terminal and try again.");
+        }
+        
+        // Initialize Tesseract
+        let tess = Tesseract::new(None, Some("eng"))
+            .map_err(|e| {
+                anyhow::anyhow!("Failed to initialize Tesseract: {}\n\n\
+                    This usually means:\n1. Tesseract is not properly installed\n\
+                    2. Language data files are missing\n\nTo fix:\n  \
+                    Ubuntu/Debian: sudo apt-get install tesseract-ocr-eng\n  \
+                    RHEL/CentOS:   sudo yum install tesseract-langpack-eng\n  \
+                    Arch Linux:    sudo pacman -S tesseract-data-eng", e)
+            })?;
+        
+        let text = tess.set_image(_path)
+            .map_err(|e| anyhow::anyhow!("Failed to load image '{}': {}", _path, e))?
+            .get_text()
+            .map_err(|e| anyhow::anyhow!("Failed to extract text from image: {}", e))?;
+        
+        // Get confidence (simplified - would need more complex API calls for per-word confidence)
+        let confidence = 0.85; // Placeholder
+        
+        Ok(OCRResult {
+            text,
+            confidence,
+            bounds: Rect { x: 0, y: 0, width: 0, height: 0 }, // Would need image dimensions
+        })
+    }
+    
+    async fn find_text_on_screen(&self, _text: &str) -> Result<Option<Point>> {
+        // Check if tesseract is available on the system
+        let tesseract_check = std::process::Command::new("which")
+            .arg("tesseract")
+            .output();
+        
+        if tesseract_check.is_err() || !tesseract_check.as_ref().unwrap().status.success() {
+            anyhow::bail!("Tesseract OCR is not installed on your system.\n\n\
+                To install tesseract:\n  \
+                Ubuntu/Debian: sudo apt-get install tesseract-ocr\n  \
+                RHEL/CentOS:   sudo yum install tesseract\n  \
+                Arch Linux:    sudo pacman -S tesseract\n\n\
+                After installation, restart your terminal and try again.");
+        }
+        
+        // Take full screen screenshot
+        let temp_path = format!("/tmp/g3_ocr_search_{}.png", uuid::Uuid::new_v4());
+        self.take_screenshot(&temp_path, None, None).await?;
+        
+        // Use Tesseract to find text with bounding boxes
+        let tess = Tesseract::new(None, Some("eng"))
+            .map_err(|e| {
+                anyhow::anyhow!("Failed to initialize Tesseract: {}\n\n\
+                    This usually means:\n1. Tesseract is not properly installed\n\
+                    2. Language data files are missing\n\nTo fix:\n  \
+                    Ubuntu/Debian: sudo apt-get install tesseract-ocr-eng\n  \
+                    RHEL/CentOS:   sudo yum install tesseract-langpack-eng\n  \
+                    Arch Linux:    sudo pacman -S tesseract-data-eng", e)
+            })?;
+        
+        let full_text = tess.set_image(temp_path.as_str())
+            .map_err(|e| anyhow::anyhow!("Failed to load screenshot: {}", e))?
+            .get_text()
+            .map_err(|e| anyhow::anyhow!("Failed to extract text from screen: {}", e))?;
+        
+        // Clean up temp file
+        let _ = std::fs::remove_file(&temp_path);
+        
+        // Simple text search - full implementation would use get_component_images
+        // to get bounding boxes for each word
+        if full_text.contains(_text) {
+            tracing::warn!("Text found but precise coordinates not available in simplified implementation");
+            Ok(Some(Point { x: 0, y: 0 }))
+        } else {
+            Ok(None)
+        }
+    }
+}
--- a/crates/g3-computer-control/src/platform/macos.rs
+++ b/crates/g3-computer-control/src/platform/macos.rs
@@ -0,0 +1,125 @@
+use crate::{ComputerController, types::Rect};
+use anyhow::Result;
+use async_trait::async_trait;
+use std::path::Path;
+use tesseract::Tesseract;
+
+pub struct MacOSController {
+    // Empty struct for now
+}
+
+impl MacOSController {
+    pub fn new() -> Result<Self> {
+        Ok(Self {})
+    }
+}
+
+#[async_trait]
+impl ComputerController for MacOSController {
+    async fn take_screenshot(&self, path: &str, region: Option<Rect>, window_id: Option<&str>) -> Result<()> {
+        // Determine the temporary directory for screenshots
+        let temp_dir = std::env::var("TMPDIR")
+            .or_else(|_| std::env::var("HOME").map(|h| format!("{}/tmp", h)))
+            .unwrap_or_else(|_| "/tmp".to_string());
+        
+        // Ensure temp directory exists
+        std::fs::create_dir_all(&temp_dir)?;
+        
+        // If path is relative or doesn't specify a directory, use temp_dir
+        let final_path = if path.starts_with('/') {
+            path.to_string()
+        } else {
+            format!("{}/{}", temp_dir.trim_end_matches('/'), path)
+        };
+        
+        let path_obj = Path::new(&final_path);
+        if let Some(parent) = path_obj.parent() {
+            std::fs::create_dir_all(parent)?;
+        }
+        
+        let mut cmd = std::process::Command::new("screencapture");
+        
+        // Add flags
+        cmd.arg("-x"); // No sound
+        
+        if let Some(region) = region {
+            // Capture specific region: -R x,y,width,height
+            cmd.arg("-R");
+            cmd.arg(format!("{},{},{},{}", region.x, region.y, region.width, region.height));
+        }
+        
+        if let Some(app_name) = window_id {
+            // Capture specific window by app name
+            // Use AppleScript to get window ID
+            let script = format!(r#"tell application "{}" to id of window 1"#, app_name);
+            let output = std::process::Command::new("osascript")
+                .arg("-e")
+                .arg(&script)
+                .output()?;
+            
+            if output.status.success() {
+                let window_id_str = String::from_utf8_lossy(&output.stdout).trim().to_string();
+                cmd.arg(format!("-l{}", window_id_str));
+            }
+        }
+        
+        cmd.arg(&final_path);
+        
+        let screenshot_result = cmd.output()?;
+        
+        if !screenshot_result.status.success() {
+            let stderr = String::from_utf8_lossy(&screenshot_result.stderr);
+            return Err(anyhow::anyhow!("screencapture failed: {}", stderr));
+        }
+        
+        Ok(())
+    }
+    
+    async fn extract_text_from_screen(&self, region: Rect) -> Result<String> {
+        // Take screenshot of region first
+        let temp_path = format!("/tmp/g3_ocr_{}.png", uuid::Uuid::new_v4());
+        self.take_screenshot(&temp_path, Some(region), None).await?;
+        
+        // Extract text from the screenshot
+        let result = self.extract_text_from_image(&temp_path).await?;
+        
+        // Clean up temp file
+        let _ = std::fs::remove_file(&temp_path);
+        
+        Ok(result)
+    }
+    
+    async fn extract_text_from_image(&self, path: &str) -> Result<String> {
+        // Check if tesseract is available on the system
+        let tesseract_check = std::process::Command::new("which")
+            .arg("tesseract")
+            .output();
+        
+        if tesseract_check.is_err() || !tesseract_check.as_ref().unwrap().status.success() {
+            anyhow::bail!("Tesseract OCR is not installed on your system.\n\n\
+                To install tesseract:\n  macOS:   brew install tesseract\n  \
+                Linux:   sudo apt-get install tesseract-ocr (Ubuntu/Debian)\n           \
+                sudo yum install tesseract (RHEL/CentOS)\n  \
+                Windows: Download from https://github.com/UB-Mannheim/tesseract/wiki\n\n\
+                After installation, restart your terminal and try again.");
+        }
+        
+        // Initialize Tesseract
+        let tess = Tesseract::new(None, Some("eng"))
+            .map_err(|e| {
+                anyhow::anyhow!("Failed to initialize Tesseract: {}\n\n\
+                    This usually means:\n1. Tesseract is not properly installed\n\
+                    2. Language data files are missing\n\nTo fix:\n  \
+                    macOS:   brew reinstall tesseract\n  \
+                    Linux:   sudo apt-get install tesseract-ocr-eng\n  \
+                    Windows: Reinstall tesseract and ensure language files are included", e)
+            })?;
+        
+        let text = tess.set_image(path)
+            .map_err(|e| anyhow::anyhow!("Failed to load image '{}': {}", path, e))?
+            .get_text()
+            .map_err(|e| anyhow::anyhow!("Failed to extract text from image: {}", e))?;
+        
+        Ok(text)
+    }
+}
--- a/crates/g3-computer-control/src/platform/macos.rs.bak
+++ b/crates/g3-computer-control/src/platform/macos.rs.bak
@@ -0,0 +1,425 @@
+use crate::{ComputerController, types::*};
+use anyhow::Result;
+use async_trait::async_trait;
+use core_graphics::display::CGPoint;
+use core_graphics::event::{CGEvent, CGEventType, CGMouseButton, CGEventTapLocation};
+use core_graphics::event_source::{CGEventSource, CGEventSourceStateID};
+use std::path::Path;
+use tesseract::Tesseract;
+
+// MacOSController doesn't store CGEventSource to avoid Send/Sync issues
+// We create it fresh for each operation
+pub struct MacOSController {
+    // Empty struct - event source created per operation
+}
+
+impl MacOSController {
+    pub fn new() -> Result<Self> {
+        // Test that we can create an event source
+        let _event_source = CGEventSource::new(CGEventSourceStateID::CombinedSessionState)
+            .map_err(|_| anyhow::anyhow!("Failed to create event source. Make sure Accessibility permissions are granted."))?;
+        Ok(Self {})
+    }
+    
+    fn key_to_keycode(&self, key: &str) -> Result<u16> {
+        // Map key names to macOS keycodes
+        let keycode = match key.to_lowercase().as_str() {
+            "return" | "enter" => 36,
+            "tab" => 48,
+            "space" => 49,
+            "delete" | "backspace" => 51,
+            "escape" | "esc" => 53,
+            "command" | "cmd" => 55,
+            "shift" => 56,
+            "capslock" => 57,
+            "option" | "alt" => 58,
+            "control" | "ctrl" => 59,
+            "left" => 123,
+            "right" => 124,
+            "down" => 125,
+            "up" => 126,
+            _ => anyhow::bail!("Unknown key: {}", key),
+        };
+        Ok(keycode)
+    }
+}
+
+#[async_trait]
+impl ComputerController for MacOSController {
+    async fn move_mouse(&self, x: i32, y: i32) -> Result<()> {
+        let event_source = CGEventSource::new(CGEventSourceStateID::CombinedSessionState)
+            .map_err(|_| anyhow::anyhow!("Failed to create event source"))?;
+        let point = CGPoint::new(x as f64, y as f64);
+        let event = CGEvent::new_mouse_event(
+            event_source,
+            CGEventType::MouseMoved,
+            point,
+            CGMouseButton::Left,
+        ).map_err(|_| anyhow::anyhow!("Failed to create mouse move event"))?;
+        
+        event.post(CGEventTapLocation::HID);
+        Ok(())
+    }
+    
+    async fn click(&self, button: MouseButton) -> Result<()> {
+        let (cg_button, down_type, up_type) = match button {
+            MouseButton::Left => (CGMouseButton::Left, CGEventType::LeftMouseDown, CGEventType::LeftMouseUp),
+            MouseButton::Right => (CGMouseButton::Right, CGEventType::RightMouseDown, CGEventType::RightMouseUp),
+            MouseButton::Middle => (CGMouseButton::Center, CGEventType::OtherMouseDown, CGEventType::OtherMouseUp),
+        };
+        
+        let point = {
+            // Get current mouse position
+            let temp_source = CGEventSource::new(CGEventSourceStateID::CombinedSessionState)
+                .map_err(|_| anyhow::anyhow!("Failed to create event source"))?;
+            let event = CGEvent::new(temp_source)
+                .map_err(|_| anyhow::anyhow!("Failed to get mouse position"))?;
+            let p = event.location();
+            p
+        };
+        
+        {
+            let event_source = CGEventSource::new(CGEventSourceStateID::CombinedSessionState)
+                .map_err(|_| anyhow::anyhow!("Failed to create event source"))?;
+            
+            // Mouse down
+            let down_event = CGEvent::new_mouse_event(
+                event_source,
+                down_type,
+                point,
+                cg_button,
+            ).map_err(|_| anyhow::anyhow!("Failed to create mouse down event"))?;
+            down_event.post(CGEventTapLocation::HID);
+        } // event_source and down_event dropped here
+        
+        // Small delay
+        tokio::time::sleep(tokio::time::Duration::from_millis(50)).await;
+        
+        {
+            let event_source = CGEventSource::new(CGEventSourceStateID::CombinedSessionState)
+                .map_err(|_| anyhow::anyhow!("Failed to create event source"))?;
+            
+            let up_event = CGEvent::new_mouse_event(
+                event_source,
+                up_type,
+                point,
+                cg_button,
+            ).map_err(|_| anyhow::anyhow!("Failed to create mouse up event"))?;
+            up_event.post(CGEventTapLocation::HID);
+        } // event_source and up_event dropped here
+        
+        Ok(())
+    }
+    
+    async fn double_click(&self, button: MouseButton) -> Result<()> {
+        self.click(button).await?;
+        tokio::time::sleep(tokio::time::Duration::from_millis(100)).await;
+        self.click(button).await?;
+        Ok(())
+    }
+    
+    async fn type_text(&self, text: &str) -> Result<()> {
+        for ch in text.chars() {
+            {
+                let event_source = CGEventSource::new(CGEventSourceStateID::CombinedSessionState)
+                    .map_err(|_| anyhow::anyhow!("Failed to create event source"))?;
+                
+                // Create keyboard event for character
+                let event = CGEvent::new_keyboard_event(
+                    event_source,
+                    0, // keycode (0 for unicode)
+                    true,
+                ).map_err(|_| anyhow::anyhow!("Failed to create keyboard event"))?;
+                
+                // Set unicode string
+                let mut utf16_buf = [0u16; 2];
+                let utf16_slice = ch.encode_utf16(&mut utf16_buf);
+                let utf16_chars: Vec<u16> = utf16_slice.iter().copied().collect();
+                
+                event.set_string_from_utf16_unchecked(utf16_chars.as_slice());
+                event.post(CGEventTapLocation::HID);
+            } // event_source and event dropped here
+            
+            tokio::time::sleep(tokio::time::Duration::from_millis(10)).await;
+        }
+        Ok(())
+    }
+    
+    async fn press_key(&self, key: &str) -> Result<()> {
+        let keycode = self.key_to_keycode(key)?;
+        
+        {
+            let event_source = CGEventSource::new(CGEventSourceStateID::CombinedSessionState)
+                .map_err(|_| anyhow::anyhow!("Failed to create event source"))?;
+            
+            // Key down
+            let down_event = CGEvent::new_keyboard_event(
+                event_source,
+                keycode,
+                true,
+            ).map_err(|_| anyhow::anyhow!("Failed to create key down event"))?;
+            down_event.post(CGEventTapLocation::HID);
+        } // event_source and down_event dropped here
+        
+        tokio::time::sleep(tokio::time::Duration::from_millis(50)).await;
+        
+        {
+            let event_source = CGEventSource::new(CGEventSourceStateID::CombinedSessionState)
+                .map_err(|_| anyhow::anyhow!("Failed to create event source"))?;
+            
+            // Key up
+            let up_event = CGEvent::new_keyboard_event(
+                event_source,
+                keycode,
+                false,
+            ).map_err(|_| anyhow::anyhow!("Failed to create key up event"))?;
+            up_event.post(CGEventTapLocation::HID);
+        } // event_source and up_event dropped here
+        
+        Ok(())
+    }
+    
+    async fn list_windows(&self) -> Result<Vec<Window>> {
+        // Note: Full implementation would use CGWindowListCopyWindowInfo
+        // For now, return empty list as this requires more complex FFI
+        tracing::warn!("list_windows not fully implemented on macOS");
+        Ok(vec![])
+    }
+    
+    async fn focus_window(&self, _window_id: &str) -> Result<()> {
+        // Note: Full implementation would use NSWorkspace to activate application
+        tracing::warn!("focus_window not fully implemented on macOS");
+        Ok(())
+    }
+    
+    async fn get_window_bounds(&self, _window_id: &str) -> Result<Rect> {
+        // Note: Full implementation would use Accessibility API
+        tracing::warn!("get_window_bounds not fully implemented on macOS");
+        Ok(Rect { x: 0, y: 0, width: 800, height: 600 })
+    }
+    
+    async fn find_element(&self, _selector: &ElementSelector) -> Result<Option<UIElement>> {
+        // Note: Full implementation would use macOS Accessibility API
+        tracing::warn!("find_element not fully implemented on macOS");
+        Ok(None)
+    }
+    
+    async fn get_element_text(&self, _element_id: &str) -> Result<String> {
+        // Note: Full implementation would use Accessibility API
+        tracing::warn!("get_element_text not fully implemented on macOS");
+        Ok(String::new())
+    }
+    
+    async fn get_element_bounds(&self, _element_id: &str) -> Result<Rect> {
+        // Note: Full implementation would use Accessibility API
+        tracing::warn!("get_element_bounds not fully implemented on macOS");
+        Ok(Rect { x: 0, y: 0, width: 100, height: 30 })
+    }
+    
+    async fn take_screenshot(&self, path: &str, _region: Option<Rect>, window_id: Option<&str>) -> Result<()> {
+        // Use native macOS screencapture command which handles all the format complexities
+        
+        // Check if we have Screen Recording permission by attempting a test capture
+        // If we only get wallpaper/menubar but no windows, we need permission
+        let needs_permission_check = std::env::var("G3_SKIP_PERMISSION_CHECK").is_err();
+        
+        if needs_permission_check {
+            // Try to open Screen Recording settings if this is the first screenshot
+            static PERMISSION_PROMPTED: std::sync::atomic::AtomicBool = std::sync::atomic::AtomicBool::new(false);
+            
+            if !PERMISSION_PROMPTED.swap(true, std::sync::atomic::Ordering::Relaxed) {
+                tracing::warn!("\n=== Screen Recording Permission Required ===\n\
+                    macOS requires explicit permission to capture window content.\n\
+                    If screenshots only show wallpaper/menubar (no windows):\n\n\
+                    1. Open System Settings > Privacy & Security > Screen Recording\n\
+                    2. Enable permission for your terminal (iTerm/Terminal) or g3\n\
+                    3. Restart your terminal if needed\n\n\
+                    Opening Screen Recording settings now...\n");
+                
+                // Try to open the settings (non-blocking)
+                let _ = std::process::Command::new("open")
+                    .arg("x-apple.systempreferences:com.apple.preference.security?Privacy_ScreenCapture")
+                    .spawn();
+            }
+        }
+        
+        let path_obj = Path::new(path);
+        if let Some(parent) = path_obj.parent() {
+            std::fs::create_dir_all(parent)?;
+        }
+        
+        let mut cmd = std::process::Command::new("screencapture");
+        
+        // Add flags
+        cmd.arg("-x"); // No sound
+        
+        if let Some(window_id) = window_id {
+            // Capture specific window by getting its bounds and using region capture
+            // window_id format: "AppName" or "AppName:WindowTitle"
+            let app_name = window_id.split(':').next().unwrap_or(window_id);
+            
+            // Use AppleScript to get window bounds
+            let script = format!(
+                r#"tell application "{}"
+                    tell current window
+                        get bounds
+                    end tell
+                end tell"#,
+                app_name
+            );
+            
+            let output = std::process::Command::new("osascript")
+                .arg("-e")
+                .arg(&script)
+                .output()
+                .map_err(|e| anyhow::anyhow!("Failed to get window bounds: {}", e))?;
+            
+            if output.status.success() {
+                let bounds_str = String::from_utf8_lossy(&output.stdout);
+                let bounds: Vec<i32> = bounds_str
+                    .trim()
+                    .split(',')
+                    .filter_map(|s| s.trim().parse().ok())
+                    .collect();
+                
+                if bounds.len() == 4 {
+                    let (left, top, right, bottom) = (bounds[0], bounds[1], bounds[2], bounds[3]);
+                    let width = right - left;
+                    let height = bottom - top;
+                    
+                    cmd.arg("-R");
+                    cmd.arg(format!("{},{},{},{}", left, top, width, height));
+                    
+                    tracing::debug!("Capturing window '{}' at region: {},{} {}x{}", app_name, left, top, width, height);
+                } else {
+                    tracing::warn!("Failed to parse window bounds, capturing full screen");
+                }
+            } else {
+                tracing::warn!("Failed to get window bounds for '{}', capturing full screen", app_name);
+            }
+        } else if let Some(region) = _region {
+            // Capture specific region: -R x,y,width,height
+            cmd.arg("-R");
+            cmd.arg(format!("{},{},{},{}", region.x, region.y, region.width, region.height));
+        }
+        
+        cmd.arg(path);
+        
+        let output = cmd.output()
+            .map_err(|e| anyhow::anyhow!("Failed to execute screencapture: {}", e))?;
+        
+        if !output.status.success() {
+            let stderr = String::from_utf8_lossy(&output.stderr);
+            anyhow::bail!("screencapture failed: {}", stderr);
+        }
+        
+        tracing::debug!("Screenshot saved using screencapture: {}", path);
+        
+        Ok(())
+    }
+    
+    }
+    
+    async fn extract_text_from_screen(&self, region: Rect) -> Result<OCRResult> {
+        // Take screenshot of region first
+        let temp_path = format!("/tmp/g3_ocr_{}.png", uuid::Uuid::new_v4());
+        self.take_screenshot(&temp_path, Some(region), None).await?;
+        
+        // Extract text from the screenshot
+        let result = self.extract_text_from_image(&temp_path).await?;
+        
+        // Clean up temp file
+        let _ = std::fs::remove_file(&temp_path);
+        
+        Ok(result)
+    }
+    
+    async fn extract_text_from_image(&self, _path: &str) -> Result<OCRResult> {
+        // Check if tesseract is available on the system
+        let tesseract_check = std::process::Command::new("which")
+            .arg("tesseract")
+            .output();
+        
+        if tesseract_check.is_err() || !tesseract_check.as_ref().unwrap().status.success() {
+            anyhow::bail!("Tesseract OCR is not installed on your system.\n\n\
+                To install tesseract:\n  macOS:   brew install tesseract\n  \
+                Linux:   sudo apt-get install tesseract-ocr (Ubuntu/Debian)\n           \
+                sudo yum install tesseract (RHEL/CentOS)\n  \
+                Windows: Download from https://github.com/UB-Mannheim/tesseract/wiki\n\n\
+                After installation, restart your terminal and try again.");
+        }
+        
+        // Initialize Tesseract
+        let tess = Tesseract::new(None, Some("eng"))
+            .map_err(|e| {
+                anyhow::anyhow!("Failed to initialize Tesseract: {}\n\n\
+                    This usually means:\n1. Tesseract is not properly installed\n\
+                    2. Language data files are missing\n\nTo fix:\n  \
+                    macOS:   brew reinstall tesseract\n  \
+                    Linux:   sudo apt-get install tesseract-ocr-eng\n  \
+                    Windows: Reinstall tesseract and ensure language files are included", e)
+            })?;
+        
+        let text = tess.set_image(_path)
+            .map_err(|e| anyhow::anyhow!("Failed to load image '{}': {}", _path, e))?
+            .get_text()
+            .map_err(|e| anyhow::anyhow!("Failed to extract text from image: {}", e))?;
+        
+        // Get confidence (simplified - would need more complex API calls for per-word confidence)
+        let confidence = 0.85; // Placeholder
+        
+        Ok(OCRResult {
+            text,
+            confidence,
+            bounds: Rect { x: 0, y: 0, width: 0, height: 0 }, // Would need image dimensions
+        })
+    }
+    
+    async fn find_text_on_screen(&self, _text: &str) -> Result<Option<Point>> {
+        // Check if tesseract is available on the system
+        let tesseract_check = std::process::Command::new("which")
+            .arg("tesseract")
+            .output();
+        
+        if tesseract_check.is_err() || !tesseract_check.as_ref().unwrap().status.success() {
+            anyhow::bail!("Tesseract OCR is not installed on your system.\n\n\
+                To install tesseract:\n  macOS:   brew install tesseract\n  \
+                Linux:   sudo apt-get install tesseract-ocr (Ubuntu/Debian)\n           \
+                sudo yum install tesseract (RHEL/CentOS)\n  \
+                Windows: Download from https://github.com/UB-Mannheim/tesseract/wiki\n\n\
+                After installation, restart your terminal and try again.");
+        }
+        
+        // Take full screen screenshot
+        let temp_path = format!("/tmp/g3_ocr_search_{}.png", uuid::Uuid::new_v4());
+        self.take_screenshot(&temp_path, None, None).await?;
+        
+        // Use Tesseract to find text with bounding boxes
+        let tess = Tesseract::new(None, Some("eng"))
+            .map_err(|e| {
+                anyhow::anyhow!("Failed to initialize Tesseract: {}\n\n\
+                    This usually means:\n1. Tesseract is not properly installed\n\
+                    2. Language data files are missing\n\nTo fix:\n  \
+                    macOS:   brew reinstall tesseract\n  \
+                    Linux:   sudo apt-get install tesseract-ocr-eng\n  \
+                    Windows: Reinstall tesseract and ensure language files are included", e)
+            })?;
+        
+        let full_text = tess.set_image(temp_path.as_str())
+            .map_err(|e| anyhow::anyhow!("Failed to load screenshot: {}", e))?
+            .get_text()
+            .map_err(|e| anyhow::anyhow!("Failed to extract text from screen: {}", e))?;
+        
+        // Clean up temp file
+        let _ = std::fs::remove_file(&temp_path);
+        
+        // Simple text search - full implementation would use get_component_images
+        // to get bounding boxes for each word
+        if full_text.contains(_text) {
+            tracing::warn!("Text found but precise coordinates not available in simplified implementation");
+            Ok(Some(Point { x: 0, y: 0 }))
+        } else {
+            Ok(None)
+        }
+    }
+}
--- a/crates/g3-computer-control/src/platform/mod.rs
+++ b/crates/g3-computer-control/src/platform/mod.rs
@@ -0,0 +1,8 @@
+#[cfg(target_os = "macos")]
+pub mod macos;
+
+#[cfg(target_os = "linux")]
+pub mod linux;
+
+#[cfg(target_os = "windows")]
+pub mod windows;
--- a/crates/g3-computer-control/src/platform/windows.rs
+++ b/crates/g3-computer-control/src/platform/windows.rs
@@ -0,0 +1,162 @@
+use crate::{ComputerController, types::*};
+use anyhow::Result;
+use async_trait::async_trait;
+use tesseract::Tesseract;
+use uuid::Uuid;
+
+pub struct WindowsController {
+    // Placeholder for Windows-specific state
+}
+
+impl WindowsController {
+    pub fn new() -> Result<Self> {
+        tracing::warn!("Windows computer control not fully implemented");
+        Ok(Self {})
+    }
+}
+
+#[async_trait]
+impl ComputerController for WindowsController {
+    async fn move_mouse(&self, _x: i32, _y: i32) -> Result<()> {
+        anyhow::bail!("Windows implementation not yet available")
+    }
+    
+    async fn click(&self, _button: MouseButton) -> Result<()> {
+        anyhow::bail!("Windows implementation not yet available")
+    }
+    
+    async fn double_click(&self, _button: MouseButton) -> Result<()> {
+        anyhow::bail!("Windows implementation not yet available")
+    }
+    
+    async fn type_text(&self, _text: &str) -> Result<()> {
+        anyhow::bail!("Windows implementation not yet available")
+    }
+    
+    async fn press_key(&self, _key: &str) -> Result<()> {
+        anyhow::bail!("Windows implementation not yet available")
+    }
+    
+    async fn list_windows(&self) -> Result<Vec<Window>> {
+        anyhow::bail!("Windows implementation not yet available")
+    }
+    
+    async fn focus_window(&self, _window_id: &str) -> Result<()> {
+        anyhow::bail!("Windows implementation not yet available")
+    }
+    
+    async fn get_window_bounds(&self, _window_id: &str) -> Result<Rect> {
+        anyhow::bail!("Windows implementation not yet available")
+    }
+    
+    async fn find_element(&self, _selector: &ElementSelector) -> Result<Option<UIElement>> {
+        anyhow::bail!("Windows implementation not yet available")
+    }
+    
+    async fn get_element_text(&self, _element_id: &str) -> Result<String> {
+        anyhow::bail!("Windows implementation not yet available")
+    }
+    
+    async fn get_element_bounds(&self, _element_id: &str) -> Result<Rect> {
+        anyhow::bail!("Windows implementation not yet available")
+    }
+    
+    async fn take_screenshot(&self, _path: &str, _region: Option<Rect>, _window_id: Option<&str>) -> Result<()> {
+        anyhow::bail!("Windows implementation not yet available")
+    }
+    
+    async fn extract_text_from_screen(&self, _region: Rect) -> Result<OCRResult> {
+        anyhow::bail!("Windows implementation not yet available")
+    }
+    
+    async fn extract_text_from_image(&self, _path: &str) -> Result<OCRResult> {
+        // Check if tesseract is available on the system
+        let tesseract_check = std::process::Command::new("where")
+            .arg("tesseract")
+            .output();
+        
+        if tesseract_check.is_err() || !tesseract_check.as_ref().unwrap().status.success() {
+            anyhow::bail!("Tesseract OCR is not installed on your system.\n\n\
+                To install tesseract on Windows:\n  \
+                1. Download the installer from: https://github.com/UB-Mannheim/tesseract/wiki\n  \
+                2. Run the installer and follow the instructions\n  \
+                3. Add tesseract to your PATH environment variable\n  \
+                4. Restart your terminal/command prompt\n\n\
+                After installation, restart your terminal and try again.");
+        }
+        
+        // Initialize Tesseract
+        let tess = Tesseract::new(None, Some("eng"))
+            .map_err(|e| {
+                anyhow::anyhow!("Failed to initialize Tesseract: {}\n\n\
+                    This usually means:\n1. Tesseract is not properly installed\n\
+                    2. Language data files are missing\n\nTo fix:\n  \
+                    1. Reinstall tesseract from https://github.com/UB-Mannheim/tesseract/wiki\n  \
+                    2. Make sure to select 'Additional language data' during installation\n  \
+                    3. Ensure tesseract is in your PATH", e)
+            })?;
+        
+        let text = tess.set_image(_path)
+            .map_err(|e| anyhow::anyhow!("Failed to load image '{}': {}", _path, e))?
+            .get_text()
+            .map_err(|e| anyhow::anyhow!("Failed to extract text from image: {}", e))?;
+        
+        // Get confidence (simplified - would need more complex API calls for per-word confidence)
+        let confidence = 0.85; // Placeholder
+        
+        Ok(OCRResult {
+            text,
+            confidence,
+            bounds: Rect { x: 0, y: 0, width: 0, height: 0 }, // Would need image dimensions
+        })
+    }
+    
+    async fn find_text_on_screen(&self, _text: &str) -> Result<Option<Point>> {
+        // Check if tesseract is available on the system
+        let tesseract_check = std::process::Command::new("where")
+            .arg("tesseract")
+            .output();
+        
+        if tesseract_check.is_err() || !tesseract_check.as_ref().unwrap().status.success() {
+            anyhow::bail!("Tesseract OCR is not installed on your system.\n\n\
+                To install tesseract on Windows:\n  \
+                1. Download the installer from: https://github.com/UB-Mannheim/tesseract/wiki\n  \
+                2. Run the installer and follow the instructions\n  \
+                3. Add tesseract to your PATH environment variable\n  \
+                4. Restart your terminal/command prompt\n\n\
+                After installation, restart your terminal and try again.");
+        }
+        
+        // Take full screen screenshot
+        let temp_path = format!("C:\\\\Temp\\\\g3_ocr_search_{}.png", uuid::Uuid::new_v4());
+        self.take_screenshot(&temp_path, None, None).await?;
+        
+        // Use Tesseract to find text with bounding boxes
+        let tess = Tesseract::new(None, Some("eng"))
+            .map_err(|e| {
+                anyhow::anyhow!("Failed to initialize Tesseract: {}\n\n\
+                    This usually means:\n1. Tesseract is not properly installed\n\
+                    2. Language data files are missing\n\nTo fix:\n  \
+                    1. Reinstall tesseract from https://github.com/UB-Mannheim/tesseract/wiki\n  \
+                    2. Make sure to select 'Additional language data' during installation\n  \
+                    3. Ensure tesseract is in your PATH", e)
+            })?;
+        
+        let full_text = tess.set_image(temp_path.as_str())
+            .map_err(|e| anyhow::anyhow!("Failed to load screenshot: {}", e))?
+            .get_text()
+            .map_err(|e| anyhow::anyhow!("Failed to extract text from screen: {}", e))?;
+        
+        // Clean up temp file
+        let _ = std::fs::remove_file(&temp_path);
+        
+        // Simple text search - full implementation would use get_component_images
+        // to get bounding boxes for each word
+        if full_text.contains(_text) {
+            tracing::warn!("Text found but precise coordinates not available in simplified implementation");
+            Ok(Some(Point { x: 0, y: 0 }))
+        } else {
+            Ok(None)
+        }
+    }
+}
--- a/crates/g3-computer-control/src/types.rs
+++ b/crates/g3-computer-control/src/types.rs
@@ -0,0 +1,9 @@
+use serde::{Deserialize, Serialize};
+
+#[derive(Debug, Clone, Copy, Serialize, Deserialize)]
+pub struct Rect {
+    pub x: i32,
+    pub y: i32,
+    pub width: i32,
+    pub height: i32,
+}
--- a/crates/g3-computer-control/src/webdriver/mod.rs
+++ b/crates/g3-computer-control/src/webdriver/mod.rs
@@ -0,0 +1,111 @@
+pub mod safari;
+
+use anyhow::Result;
+use async_trait::async_trait;
+use serde_json::Value;
+
+/// WebDriver controller for browser automation
+#[async_trait]
+pub trait WebDriverController: Send + Sync {
+    /// Navigate to a URL
+    async fn navigate(&mut self, url: &str) -> Result<()>;
+    
+    /// Get the current URL
+    async fn current_url(&self) -> Result<String>;
+    
+    /// Get the page title
+    async fn title(&self) -> Result<String>;
+    
+    /// Find an element by CSS selector
+    async fn find_element(&mut self, selector: &str) -> Result<WebElement>;
+    
+    /// Find multiple elements by CSS selector
+    async fn find_elements(&mut self, selector: &str) -> Result<Vec<WebElement>>;
+    
+    /// Execute JavaScript in the browser
+    async fn execute_script(&mut self, script: &str, args: Vec<Value>) -> Result<Value>;
+    
+    /// Get the page source (HTML)
+    async fn page_source(&self) -> Result<String>;
+    
+    /// Take a screenshot and save to path
+    async fn screenshot(&mut self, path: &str) -> Result<()>;
+    
+    /// Close the current window/tab
+    async fn close(&mut self) -> Result<()>;
+    
+    /// Quit the browser session
+    async fn quit(self) -> Result<()>;
+}
+
+/// Represents a web element in the DOM
+pub struct WebElement {
+    pub(crate) inner: fantoccini::elements::Element,
+}
+
+impl WebElement {
+    /// Click the element
+    pub async fn click(&mut self) -> Result<()> {
+        self.inner.click().await?;
+        Ok(())
+    }
+    
+    /// Send keys/text to the element
+    pub async fn send_keys(&mut self, text: &str) -> Result<()> {
+        self.inner.send_keys(text).await?;
+        Ok(())
+    }
+    
+    /// Clear the element's content (for input fields)
+    pub async fn clear(&mut self) -> Result<()> {
+        self.inner.clear().await?;
+        Ok(())
+    }
+    
+    /// Get the element's text content
+    pub async fn text(&self) -> Result<String> {
+        Ok(self.inner.text().await?)
+    }
+    
+    /// Get an attribute value
+    pub async fn attr(&self, name: &str) -> Result<Option<String>> {
+        Ok(self.inner.attr(name).await?)
+    }
+    
+    /// Get a property value
+    pub async fn prop(&self, name: &str) -> Result<Option<String>> {
+        Ok(self.inner.prop(name).await?)
+    }
+    
+    /// Get the element's HTML
+    pub async fn html(&self, inner: bool) -> Result<String> {
+        Ok(self.inner.html(inner).await?)
+    }
+    
+    /// Check if element is displayed
+    pub async fn is_displayed(&self) -> Result<bool> {
+        Ok(self.inner.is_displayed().await?)
+    }
+    
+    /// Check if element is enabled
+    pub async fn is_enabled(&self) -> Result<bool> {
+        Ok(self.inner.is_enabled().await?)
+    }
+    
+    /// Check if element is selected (for checkboxes/radio buttons)
+    pub async fn is_selected(&self) -> Result<bool> {
+        Ok(self.inner.is_selected().await?)
+    }
+    
+    /// Find a child element by CSS selector
+    pub async fn find_element(&mut self, selector: &str) -> Result<WebElement> {
+        let elem = self.inner.find(fantoccini::Locator::Css(selector)).await?;
+        Ok(WebElement { inner: elem })
+    }
+    
+    /// Find multiple child elements by CSS selector
+    pub async fn find_elements(&mut self, selector: &str) -> Result<Vec<WebElement>> {
+        let elems = self.inner.find_all(fantoccini::Locator::Css(selector)).await?;
+        Ok(elems.into_iter().map(|inner| WebElement { inner }).collect())
+    }
+}
--- a/crates/g3-computer-control/src/webdriver/safari.rs
+++ b/crates/g3-computer-control/src/webdriver/safari.rs
@@ -0,0 +1,212 @@
+use super::{WebDriverController, WebElement};
+use anyhow::{Context, Result};
+use async_trait::async_trait;
+use fantoccini::{Client, ClientBuilder};
+use serde_json::Value;
+use std::time::Duration;
+
+/// SafariDriver WebDriver controller
+pub struct SafariDriver {
+    client: Client,
+}
+
+impl SafariDriver {
+    /// Create a new SafariDriver instance
+    /// 
+    /// This will connect to SafariDriver running on the default port (4444).
+    /// Make sure to enable "Allow Remote Automation" in Safari's Develop menu first.
+    /// 
+    /// You can start SafariDriver manually with:
+    /// ```bash
+    /// /usr/bin/safaridriver --enable
+    /// ```
+    pub async fn new() -> Result<Self> {
+        Self::with_port(4444).await
+    }
+    
+    /// Create a new SafariDriver instance with a custom port
+    pub async fn with_port(port: u16) -> Result<Self> {
+        let url = format!("http://localhost:{}", port);
+        
+        let mut caps = serde_json::Map::new();
+        caps.insert("browserName".to_string(), Value::String("safari".to_string()));
+        
+        let client = ClientBuilder::native()
+            .capabilities(caps)
+            .connect(&url)
+            .await
+            .context("Failed to connect to SafariDriver. Make sure SafariDriver is running and 'Allow Remote Automation' is enabled in Safari's Develop menu.")?;
+        
+        Ok(Self { client })
+    }
+    
+    /// Go back in browser history
+    pub async fn back(&mut self) -> Result<()> {
+        self.client.back().await?;
+        Ok(())
+    }
+    
+    /// Go forward in browser history
+    pub async fn forward(&mut self) -> Result<()> {
+        self.client.forward().await?;
+        Ok(())
+    }
+    
+    /// Refresh the current page
+    pub async fn refresh(&mut self) -> Result<()> {
+        self.client.refresh().await?;
+        Ok(())
+    }
+    
+    /// Get all window handles
+    pub async fn window_handles(&mut self) -> Result<Vec<String>> {
+        let handles = self.client.windows().await?;
+        Ok(handles.into_iter()
+            .map(|h| h.into())
+            .collect())
+    }
+    
+    /// Switch to a window by handle
+    pub async fn switch_to_window(&mut self, handle: &str) -> Result<()> {
+        let window_handle: fantoccini::wd::WindowHandle = handle.to_string().try_into()?;
+        self.client.switch_to_window(window_handle).await?;
+        Ok(())
+    }
+    
+    /// Get the current window handle
+    pub async fn current_window_handle(&mut self) -> Result<String> {
+        Ok(self.client.window().await?.into())
+    }
+    
+    /// Close the current window
+    pub async fn close_window(&mut self) -> Result<()> {
+        self.client.close_window().await?;
+        Ok(())
+    }
+    
+    /// Create a new window/tab
+    pub async fn new_window(&mut self, is_tab: bool) -> Result<String> {
+        let window_type = if is_tab { "tab" } else { "window" };
+        let response = self.client.new_window(window_type == "tab").await?;
+        Ok(response.handle.into())
+    }
+    
+    /// Get cookies
+    pub async fn get_cookies(&mut self) -> Result<Vec<fantoccini::cookies::Cookie<'static>>> {
+        Ok(self.client.get_all_cookies().await?)
+    }
+    
+    /// Add a cookie
+    pub async fn add_cookie(&mut self, cookie: fantoccini::cookies::Cookie<'static>) -> Result<()> {
+        self.client.add_cookie(cookie).await?;
+        Ok(())
+    }
+    
+    /// Delete all cookies
+    pub async fn delete_all_cookies(&mut self) -> Result<()> {
+        self.client.delete_all_cookies().await?;
+        Ok(())
+    }
+    
+    /// Wait for an element to appear (with timeout)
+    pub async fn wait_for_element(&mut self, selector: &str, timeout: Duration) -> Result<WebElement> {
+        let start = std::time::Instant::now();
+        let poll_interval = Duration::from_millis(100);
+        
+        loop {
+            if let Ok(elem) = self.find_element(selector).await {
+                return Ok(elem);
+            }
+            
+            if start.elapsed() >= timeout {
+                anyhow::bail!("Timeout waiting for element: {}", selector);
+            }
+            
+            tokio::time::sleep(poll_interval).await;
+        }
+    }
+    
+    /// Wait for an element to be visible (with timeout)
+    pub async fn wait_for_visible(&mut self, selector: &str, timeout: Duration) -> Result<WebElement> {
+        let start = std::time::Instant::now();
+        let poll_interval = Duration::from_millis(100);
+        
+        loop {
+            if let Ok(elem) = self.find_element(selector).await {
+                if elem.is_displayed().await.unwrap_or(false) {
+                    return Ok(elem);
+                }
+            }
+            
+            if start.elapsed() >= timeout {
+                anyhow::bail!("Timeout waiting for element to be visible: {}", selector);
+            }
+            
+            tokio::time::sleep(poll_interval).await;
+        }
+    }
+}
+
+#[async_trait]
+impl WebDriverController for SafariDriver {
+    async fn navigate(&mut self, url: &str) -> Result<()> {
+        self.client.goto(url).await?;
+        Ok(())
+    }
+    
+    async fn current_url(&self) -> Result<String> {
+        Ok(self.client.current_url().await?.to_string())
+    }
+    
+    async fn title(&self) -> Result<String> {
+        Ok(self.client.title().await?)
+    }
+    
+    async fn find_element(&mut self, selector: &str) -> Result<WebElement> {
+        let elem = self.client.find(fantoccini::Locator::Css(selector)).await
+            .context(format!("Failed to find element with selector: {}", selector))?;
+        Ok(WebElement { inner: elem })
+    }
+    
+    async fn find_elements(&mut self, selector: &str) -> Result<Vec<WebElement>> {
+        let elems = self.client.find_all(fantoccini::Locator::Css(selector)).await?;
+        Ok(elems.into_iter().map(|inner| WebElement { inner }).collect())
+    }
+    
+    async fn execute_script(&mut self, script: &str, args: Vec<Value>) -> Result<Value> {
+        Ok(self.client.execute(script, args).await?)
+    }
+    
+    async fn page_source(&self) -> Result<String> {
+        Ok(self.client.source().await?)
+    }
+    
+    async fn screenshot(&mut self, path: &str) -> Result<()> {
+        let screenshot_data = self.client.screenshot().await?;
+        
+        // Expand tilde in path
+        let expanded_path = shellexpand::tilde(path);
+        let path_str = expanded_path.as_ref();
+        
+        // Create parent directories if needed
+        if let Some(parent) = std::path::Path::new(path_str).parent() {
+            std::fs::create_dir_all(parent)
+                .context("Failed to create parent directories for screenshot")?;
+        }
+        
+        std::fs::write(path_str, screenshot_data)
+            .context("Failed to write screenshot to file")?;
+        
+        Ok(())
+    }
+    
+    async fn close(&mut self) -> Result<()> {
+        self.client.close_window().await?;
+        Ok(())
+    }
+    
+    async fn quit(mut self) -> Result<()> {
+        self.client.close().await?;
+        Ok(())
+    }
+}
--- a/crates/g3-computer-control/tests/integration_test.rs
+++ b/crates/g3-computer-control/tests/integration_test.rs
@@ -0,0 +1,62 @@
+use g3_computer_control::*;
+
+#[tokio::test]
+async fn test_mouse_movement() {
+    let controller = create_controller().expect("Failed to create controller");
+    
+    // Move mouse to center of screen (assuming 1920x1080)
+    let result = controller.move_mouse(960, 540).await;
+    assert!(result.is_ok(), "Failed to move mouse: {:?}", result.err());
+}
+
+#[tokio::test]
+async fn test_typing() {
+    let controller = create_controller().expect("Failed to create controller");
+    
+    // Type some text
+    let result = controller.type_text("Hello, World!").await;
+    assert!(result.is_ok(), "Failed to type text: {:?}", result.err());
+}
+
+#[tokio::test]
+async fn test_screenshot() {
+    let controller = create_controller().expect("Failed to create controller");
+    
+    // Take screenshot
+    let path = "/tmp/test_screenshot.png";
+    let result = controller.take_screenshot(path, None, None).await;
+    assert!(result.is_ok(), "Failed to take screenshot: {:?}", result.err());
+    
+    // Verify file exists
+    assert!(std::path::Path::new(path).exists(), "Screenshot file was not created");
+    
+    // Clean up
+    let _ = std::fs::remove_file(path);
+}
+
+#[tokio::test]
+async fn test_click() {
+    let controller = create_controller().expect("Failed to create controller");
+    
+    // Click at a safe location
+    let result = controller.click(types::MouseButton::Left).await;
+    assert!(result.is_ok(), "Failed to click: {:?}", result.err());
+}
+
+#[tokio::test]
+async fn test_double_click() {
+    let controller = create_controller().expect("Failed to create controller");
+    
+    // Double click
+    let result = controller.double_click(types::MouseButton::Left).await;
+    assert!(result.is_ok(), "Failed to double click: {:?}", result.err());
+}
+
+#[tokio::test]
+async fn test_press_key() {
+    let controller = create_controller().expect("Failed to create controller");
+    
+    // Press escape key
+    let result = controller.press_key("escape").await;
+    assert!(result.is_ok(), "Failed to press key: {:?}", result.err());
+}
--- a/crates/g3-config/Cargo.toml
+++ b/crates/g3-config/Cargo.toml
@@ -12,3 +12,6 @@ thiserror = { workspace = true }
 toml = "0.8"
 shellexpand = "3.0"
 dirs = "5.0"
+
+[dev-dependencies]
+tempfile = "3.8"
--- a/crates/g3-config/src/lib.rs
+++ b/crates/g3-config/src/lib.rs
@@ -6,6 +6,8 @@ use std::path::Path;
 pub struct Config {
    pub providers: ProvidersConfig,
    pub agent: AgentConfig,
+    pub computer_control: ComputerControlConfig,
+    pub webdriver: WebDriverConfig,
 }

 #[derive(Debug, Clone, Serialize, Deserialize)]
@@ -15,6 +17,8 @@ pub struct ProvidersConfig {
    pub databricks: Option<DatabricksConfig>,
    pub embedded: Option<EmbeddedConfig>,
    pub default_provider: String,
+    pub coach: Option<String>,  // Provider to use for coach in autonomous mode
+    pub player: Option<String>, // Provider to use for player in autonomous mode
 }

 #[derive(Debug, Clone, Serialize, Deserialize)]
@@ -62,6 +66,38 @@ pub struct AgentConfig {
    pub timeout_seconds: u64,
 }

+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct ComputerControlConfig {
+    pub enabled: bool,
+    pub require_confirmation: bool,
+    pub max_actions_per_second: u32,
+}
+
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct WebDriverConfig {
+    pub enabled: bool,
+    pub safari_port: u16,
+}
+
+impl Default for WebDriverConfig {
+    fn default() -> Self {
+        Self {
+            enabled: false,
+            safari_port: 4444,
+        }
+    }
+}
+
+impl Default for ComputerControlConfig {
+    fn default() -> Self {
+        Self {
+            enabled: false, // Disabled by default for safety
+            require_confirmation: true,
+            max_actions_per_second: 5,
+        }
+    }
+}
+
 impl Default for Config {
    fn default() -> Self {
        Self {
@@ -78,12 +114,16 @@ impl Default for Config {
                }),
                embedded: None,
                default_provider: "databricks".to_string(),
+                coach: None,  // Will use default_provider if not specified
+                player: None, // Will use default_provider if not specified
            },
            agent: AgentConfig {
                max_context_length: 8192,
                enable_streaming: true,
                timeout_seconds: 60,
            },
+            computer_control: ComputerControlConfig::default(),
+            webdriver: WebDriverConfig::default(),
        }
    }
 }
@@ -188,12 +228,16 @@ impl Config {
                    threads: Some(8),
                }),
                default_provider: "embedded".to_string(),
+                coach: None,  // Will use default_provider if not specified
+                player: None, // Will use default_provider if not specified
            },
            agent: AgentConfig {
                max_context_length: 8192,
                enable_streaming: true,
                timeout_seconds: 60,
            },
+            computer_control: ComputerControlConfig::default(),
+            webdriver: WebDriverConfig::default(),
        }
    }
    
@@ -262,4 +306,67 @@ impl Config {
        
        Ok(config)
    }
+    
+    /// Get the provider to use for coach mode in autonomous execution
+    pub fn get_coach_provider(&self) -> &str {
+        self.providers.coach
+            .as_deref()
+            .unwrap_or(&self.providers.default_provider)
+    }
+    
+    /// Get the provider to use for player mode in autonomous execution
+    pub fn get_player_provider(&self) -> &str {
+        self.providers.player
+            .as_deref()
+            .unwrap_or(&self.providers.default_provider)
+    }
+    
+    /// Create a copy of the config with a different default provider
+    pub fn with_provider_override(&self, provider: &str) -> Result<Self> {
+        // Validate that the provider is configured
+        match provider {
+            "anthropic" if self.providers.anthropic.is_none() => {
+                return Err(anyhow::anyhow!(
+                    "Provider '{}' is specified but not configured. Please add {} configuration to your config file.",
+                    provider, provider
+                ));
+            }
+            "databricks" if self.providers.databricks.is_none() => {
+                return Err(anyhow::anyhow!(
+                    "Provider '{}' is specified but not configured. Please add {} configuration to your config file.",
+                    provider, provider
+                ));
+            }
+            "embedded" if self.providers.embedded.is_none() => {
+                return Err(anyhow::anyhow!(
+                    "Provider '{}' is specified but not configured. Please add {} configuration to your config file.",
+                    provider, provider
+                ));
+            }
+            "openai" if self.providers.openai.is_none() => {
+                return Err(anyhow::anyhow!(
+                    "Provider '{}' is specified but not configured. Please add {} configuration to your config file.",
+                    provider, provider
+                ));
+            }
+            _ => {} // Provider is configured or unknown (will be caught later)
+        }
+        
+        let mut config = self.clone();
+        config.providers.default_provider = provider.to_string();
+        Ok(config)
+    }
+    
+    /// Create a copy of the config for coach mode in autonomous execution
+    pub fn for_coach(&self) -> Result<Self> {
+        self.with_provider_override(self.get_coach_provider())
+    }
+    
+    /// Create a copy of the config for player mode in autonomous execution
+    pub fn for_player(&self) -> Result<Self> {
+        self.with_provider_override(self.get_player_provider())
+    }
 }
+
+#[cfg(test)]
+mod tests;
--- a/crates/g3-config/src/tests.rs
+++ b/crates/g3-config/src/tests.rs
@@ -0,0 +1,131 @@
+#[cfg(test)]
+mod tests {
+    use crate::Config;
+    use std::fs;
+    use tempfile::TempDir;
+
+    #[test]
+    fn test_coach_player_providers() {
+        // Create a temporary directory for the test config
+        let temp_dir = TempDir::new().unwrap();
+        let config_path = temp_dir.path().join("test_config.toml");
+        
+        // Write a test configuration with coach and player providers
+        let config_content = r#"
+[providers]
+default_provider = "databricks"
+coach = "anthropic"
+player = "embedded"
+
+[providers.databricks]
+host = "https://test.databricks.com"
+token = "test-token"
+model = "test-model"
+
+[providers.anthropic]
+api_key = "test-key"
+model = "claude-3"
+
+[providers.embedded]
+model_path = "test.gguf"
+model_type = "llama"
+
+[agent]
+max_context_length = 8192
+enable_streaming = true
+timeout_seconds = 60
+"#;
+        
+        fs::write(&config_path, config_content).unwrap();
+        
+        // Load the configuration
+        let config = Config::load(Some(config_path.to_str().unwrap())).unwrap();
+        
+        // Test that the providers are correctly identified
+        assert_eq!(config.providers.default_provider, "databricks");
+        assert_eq!(config.get_coach_provider(), "anthropic");
+        assert_eq!(config.get_player_provider(), "embedded");
+        
+        // Test creating coach config
+        let coach_config = config.for_coach().unwrap();
+        assert_eq!(coach_config.providers.default_provider, "anthropic");
+        
+        // Test creating player config
+        let player_config = config.for_player().unwrap();
+        assert_eq!(player_config.providers.default_provider, "embedded");
+    }
+    
+    #[test]
+    fn test_coach_player_fallback_to_default() {
+        // Create a temporary directory for the test config
+        let temp_dir = TempDir::new().unwrap();
+        let config_path = temp_dir.path().join("test_config.toml");
+        
+        // Write a test configuration WITHOUT coach and player providers
+        let config_content = r#"
+[providers]
+default_provider = "databricks"
+
+[providers.databricks]
+host = "https://test.databricks.com"
+token = "test-token"
+model = "test-model"
+
+[agent]
+max_context_length = 8192
+enable_streaming = true
+timeout_seconds = 60
+"#;
+        
+        fs::write(&config_path, config_content).unwrap();
+        
+        // Load the configuration
+        let config = Config::load(Some(config_path.to_str().unwrap())).unwrap();
+        
+        // Test that coach and player fall back to default provider
+        assert_eq!(config.get_coach_provider(), "databricks");
+        assert_eq!(config.get_player_provider(), "databricks");
+        
+        // Test creating coach config (should use default)
+        let coach_config = config.for_coach().unwrap();
+        assert_eq!(coach_config.providers.default_provider, "databricks");
+        
+        // Test creating player config (should use default)
+        let player_config = config.for_player().unwrap();
+        assert_eq!(player_config.providers.default_provider, "databricks");
+    }
+    
+    #[test]
+    fn test_invalid_provider_error() {
+        // Create a temporary directory for the test config
+        let temp_dir = TempDir::new().unwrap();
+        let config_path = temp_dir.path().join("test_config.toml");
+        
+        // Write a test configuration with an unconfigured provider
+        let config_content = r#"
+[providers]
+default_provider = "databricks"
+coach = "openai"  # OpenAI is not configured
+
+[providers.databricks]
+host = "https://test.databricks.com"
+token = "test-token"
+model = "test-model"
+
+[agent]
+max_context_length = 8192
+enable_streaming = true
+timeout_seconds = 60
+"#;
+        
+        fs::write(&config_path, config_content).unwrap();
+        
+        // Load the configuration
+        let config = Config::load(Some(config_path.to_str().unwrap())).unwrap();
+        
+        // Test that trying to create a coach config with unconfigured provider fails
+        let result = config.for_coach();
+        assert!(result.is_err());
+        assert!(result.unwrap_err().to_string().contains("not configured"));
+    }
+}
--- a/crates/g3-core/Cargo.toml
+++ b/crates/g3-core/Cargo.toml
@@ -8,6 +8,7 @@ description = "Core engine for G3 AI coding agent"
 g3-providers = { path = "../g3-providers" }
 g3-config = { path = "../g3-config" }
 g3-execution = { path = "../g3-execution" }
+g3-computer-control = { path = "../g3-computer-control" }
 tokio = { workspace = true }
 reqwest = { workspace = true }
 anyhow = { workspace = true }
@@ -23,3 +24,4 @@ futures-util = "0.3"
 chrono = { version = "0.4", features = ["serde"] }
 rand = "0.8"
 regex = "1.0"
+shellexpand = "3.1"
--- a/crates/g3-core/src/lib.rs
+++ b/crates/g3-core/src/lib.rs
--- a/crates/g3-core/src/tilde_expansion_tests.rs
+++ b/crates/g3-core/src/tilde_expansion_tests.rs
@@ -0,0 +1,36 @@
+#[cfg(test)]
+mod tilde_expansion_tests {
+    use std::env;
+
+    #[test]
+    fn test_tilde_expansion() {
+        // Test that shellexpand works
+        let path_with_tilde = "~/test.txt";
+        let expanded = shellexpand::tilde(path_with_tilde);
+        
+        // Get the actual home directory
+        let home = env::var("HOME").expect("HOME environment variable not set");
+        
+        // Verify expansion happened
+        assert_eq!(expanded.as_ref(), format!("{}/test.txt", home));
+        assert!(!expanded.contains("~"));
+    }
+
+    #[test]
+    fn test_tilde_expansion_with_subdirs() {
+        let path_with_tilde = "~/Documents/test.txt";
+        let expanded = shellexpand::tilde(path_with_tilde);
+        
+        let home = env::var("HOME").expect("HOME environment variable not set");
+        
+        assert_eq!(expanded.as_ref(), format!("{}/Documents/test.txt", home));
+    }
+
+    #[test]
+    fn test_no_tilde_unchanged() {
+        let path_without_tilde = "/absolute/path/test.txt";
+        let expanded = shellexpand::tilde(path_without_tilde);
+        
+        assert_eq!(expanded.as_ref(), path_without_tilde);
+    }
+}
--- a/crates/g3-core/tests/test_context_thinning.rs
+++ b/crates/g3-core/tests/test_context_thinning.rs
@@ -0,0 +1,157 @@
+use g3_core::ContextWindow;
+use g3_providers::{Message, MessageRole};
+
+#[test]
+fn test_thinning_thresholds() {
+    let mut context = ContextWindow::new(10000);
+    
+    // At 0%, should not thin
+    assert!(!context.should_thin());
+    
+    // Simulate reaching 50% usage
+    context.used_tokens = 5000;
+    assert!(context.should_thin());
+    
+    // After thinning at 50%, should not thin again until next threshold
+    context.last_thinning_percentage = 50;
+    assert!(!context.should_thin());
+    
+    // At 60%, should thin again
+    context.used_tokens = 6000;
+    assert!(context.should_thin());
+    
+    // After thinning at 60%, should not thin
+    context.last_thinning_percentage = 60;
+    assert!(!context.should_thin());
+    
+    // At 70%, should thin
+    context.used_tokens = 7000;
+    assert!(context.should_thin());
+    
+    // At 80%, should thin
+    context.last_thinning_percentage = 70;
+    context.used_tokens = 8000;
+    assert!(context.should_thin());
+    
+    // After 80%, should not thin (compaction takes over)
+    context.last_thinning_percentage = 80;
+    context.used_tokens = 8500;
+    assert!(!context.should_thin());
+}
+
+#[test]
+fn test_thin_context_basic() {
+    let mut context = ContextWindow::new(10000);
+    
+    // Add some messages to the first third
+    for i in 0..9 {
+        if i % 2 == 0 {
+            context.add_message(Message {
+                role: MessageRole::Assistant,
+                content: format!("Assistant message {}", i),
+            });
+        } else {
+            // Add tool results with varying sizes
+            let content = if i == 1 {
+                // Large tool result (> 1000 chars)
+                format!("Tool result: {}", "x".repeat(1500))
+            } else if i == 3 {
+                // Another large tool result
+                format!("Tool result: {}", "y".repeat(2000))
+            } else {
+                // Small tool result (< 1000 chars)
+                format!("Tool result: small result {}", i)
+            };
+            
+            context.add_message(Message {
+                role: MessageRole::User,
+                content,
+            });
+        }
+    }
+    
+    // Trigger thinning at 50%
+    context.used_tokens = 5000;
+    let summary = context.thin_context();
+    
+    println!("Thinning summary: {}", summary);
+    
+    // Should have thinned at least 1 large tool result in the first third
+    assert!(summary.contains("1 tool result"), "Summary was: {}", summary);
+    assert!(summary.contains("50%"));
+    
+    // Check that the large tool results were replaced
+    let first_third_end = context.conversation_history.len() / 3;
+    for i in 0..first_third_end {
+        if let Some(msg) = context.conversation_history.get(i) {
+            if matches!(msg.role, MessageRole::User) && msg.content.starts_with("Tool result:") {
+                if msg.content.len() > 1000 {
+                    panic!("Found un-thinned large tool result at index {}", i);
+                }
+            }
+        }
+    }
+}
+
+#[test]
+fn test_thin_context_no_large_results() {
+    let mut context = ContextWindow::new(10000);
+    
+    // Add only small messages
+    for i in 0..9 {
+        context.add_message(Message {
+            role: MessageRole::User,
+            content: format!("Tool result: small {}", i),
+        });
+    }
+    
+    context.used_tokens = 5000;
+    let summary = context.thin_context();
+    
+    // Should report no large results found
+    assert!(summary.contains("no large tool results found"));
+}
+
+#[test]
+fn test_thin_context_only_affects_first_third() {
+    let mut context = ContextWindow::new(10000);
+    
+    // Add 12 messages (first third = 4 messages)
+    for i in 0..12 {
+        let content = if i % 2 == 1 {
+            // All odd indices are large tool results
+            format!("Tool result: {}", "x".repeat(1500))
+        } else {
+            format!("Assistant message {}", i)
+        };
+        
+        let role = if i % 2 == 1 {
+            MessageRole::User
+        } else {
+            MessageRole::Assistant
+        };
+        
+        context.add_message(Message { role, content });
+    }
+    
+    context.used_tokens = 5000;
+    let summary = context.thin_context();
+    
+    // First third is 4 messages (indices 0-3), so only indices 1 and 3 should be thinned
+    // That's 2 tool results
+    assert!(summary.contains("2 tool results"));
+    
+    // Check that messages after the first third are NOT thinned
+    let first_third_end = context.conversation_history.len() / 3;
+    for i in first_third_end..context.conversation_history.len() {
+        if let Some(msg) = context.conversation_history.get(i) {
+            if matches!(msg.role, MessageRole::User) && msg.content.starts_with("Tool result:") {
+                // These should still be large (not thinned)
+                if i % 2 == 1 {
+                    assert!(msg.content.len() > 1000, 
+                        "Message at index {} should not have been thinned", i);
+                }
+            }
+        }
+    }
+}
--- a/crates/g3-providers/src/anthropic.rs
+++ b/crates/g3-providers/src/anthropic.rs
@@ -156,8 +156,9 @@ impl AnthropicProvider {
            .post(ANTHROPIC_API_URL)
            .header("x-api-key", &self.api_key)
            .header("anthropic-version", ANTHROPIC_VERSION)
+            // Anthropic beta 1m context window. Enable if needed. It costs extra, so check first.
+            // .header("anthropic-beta", "context-1m-2025-08-07")
            .header("content-type", "application/json");
-
        if streaming {
            builder = builder.header("accept", "text/event-stream");
        }
--- a/crates/g3-providers/src/lib.rs
+++ b/crates/g3-providers/src/lib.rs
@@ -88,10 +88,12 @@ pub mod anthropic;
 pub mod databricks;
 pub mod embedded;
 pub mod oauth;
+pub mod openai;

 pub use anthropic::AnthropicProvider;
 pub use databricks::DatabricksProvider;
 pub use embedded::EmbeddedProvider;
+pub use openai::OpenAIProvider;

 /// Provider registry for managing multiple LLM providers
 pub struct ProviderRegistry {
--- a/crates/g3-providers/src/openai.rs
+++ b/crates/g3-providers/src/openai.rs
@@ -0,0 +1,495 @@
+use anyhow::Result;
+use async_trait::async_trait;
+use bytes::Bytes;
+use futures_util::stream::StreamExt;
+use reqwest::Client;
+use serde::Deserialize;
+use serde_json::json;
+use tokio::sync::mpsc;
+use tokio_stream::wrappers::ReceiverStream;
+use tracing::{debug, error};
+
+use crate::{
+    CompletionChunk, CompletionRequest, CompletionResponse, CompletionStream, LLMProvider,
+    Message, MessageRole, Tool, ToolCall, Usage,
+};
+
+#[derive(Clone)]
+pub struct OpenAIProvider {
+    client: Client,
+    api_key: String,
+    model: String,
+    base_url: String,
+    max_tokens: Option<u32>,
+    _temperature: Option<f32>,
+}
+
+impl OpenAIProvider {
+    pub fn new(
+        api_key: String,
+        model: Option<String>,
+        base_url: Option<String>,
+        max_tokens: Option<u32>,
+        temperature: Option<f32>,
+    ) -> Result<Self> {
+        Ok(Self {
+            client: Client::new(),
+            api_key,
+            model: model.unwrap_or_else(|| "gpt-4o".to_string()),
+            base_url: base_url.unwrap_or_else(|| "https://api.openai.com/v1".to_string()),
+            max_tokens,
+            _temperature: temperature,
+        })
+    }
+
+    fn create_request_body(
+        &self,
+        messages: &[Message],
+        tools: Option<&[Tool]>,
+        stream: bool,
+        max_tokens: Option<u32>,
+        _temperature: Option<f32>,
+    ) -> serde_json::Value {
+        let mut body = json!({
+            "model": self.model,
+            "messages": convert_messages(messages),
+            "stream": stream,
+        });
+
+        if let Some(max_tokens) = max_tokens.or(self.max_tokens) {
+            body["max_completion_tokens"] = json!(max_tokens);
+        }
+
+        // OpenAI calls with temp setting seem to fail, so don't send one.
+        // if let Some(temperature) = temperature.or(self.temperature) {
+        //     body["temperature"] = json!(temperature);
+        // }
+
+        if let Some(tools) = tools {
+            if !tools.is_empty() {
+                body["tools"] = json!(convert_tools(tools));
+            }
+        }
+
+        if stream {
+            body["stream_options"] = json!({
+                "include_usage": true,
+            });
+        }
+
+        body
+    }
+
+    async fn parse_streaming_response(
+        &self,
+        mut stream: impl futures_util::Stream<Item = reqwest::Result<Bytes>> + Unpin,
+        tx: mpsc::Sender<Result<CompletionChunk>>,
+    ) -> Option<Usage> {
+        let mut buffer = String::new();
+        let mut accumulated_content = String::new();
+        let mut accumulated_usage: Option<Usage> = None;
+        let mut current_tool_calls: Vec<OpenAIStreamingToolCall> = Vec::new();
+
+        while let Some(chunk_result) = stream.next().await {
+            match chunk_result {
+                Ok(chunk) => {
+                    let chunk_str = match std::str::from_utf8(&chunk) {
+                        Ok(s) => s,
+                        Err(e) => {
+                            error!("Failed to parse chunk as UTF-8: {}", e);
+                            continue;
+                        }
+                    };
+
+                    buffer.push_str(chunk_str);
+
+                    // Process complete lines
+                    while let Some(line_end) = buffer.find('\n') {
+                        let line = buffer[..line_end].trim().to_string();
+                        buffer.drain(..line_end + 1);
+
+                        if line.is_empty() {
+                            continue;
+                        }
+
+                        // Parse Server-Sent Events format
+                        if let Some(data) = line.strip_prefix("data: ") {
+                            if data == "[DONE]" {
+                                debug!("Received stream completion marker");
+
+                                // Send final chunk with accumulated content and tool calls
+                                if !accumulated_content.is_empty() || !current_tool_calls.is_empty() {
+                                    let tool_calls = if current_tool_calls.is_empty() {
+                                        None
+                                    } else {
+                                        Some(
+                                            current_tool_calls
+                                                .iter()
+                                                .filter_map(|tc| tc.to_tool_call())
+                                                .collect(),
+                                        )
+                                    };
+
+                                    let final_chunk = CompletionChunk {
+                                        content: accumulated_content.clone(),
+                                        finished: true,
+                                        tool_calls,
+                                        usage: accumulated_usage.clone(),
+                                    };
+                                    let _ = tx.send(Ok(final_chunk)).await;
+                                }
+
+                                return accumulated_usage;
+                            }
+
+                            // Parse the JSON data
+                            match serde_json::from_str::<OpenAIStreamChunk>(data) {
+                                Ok(chunk_data) => {
+                                    // Handle content
+                                    for choice in &chunk_data.choices {
+                                        if let Some(content) = &choice.delta.content {
+                                            accumulated_content.push_str(content);
+
+                                            let chunk = CompletionChunk {
+                                                content: content.clone(),
+                                                finished: false,
+                                                tool_calls: None,
+                                                usage: None,
+                                            };
+                                            if tx.send(Ok(chunk)).await.is_err() {
+                                                debug!("Receiver dropped, stopping stream");
+                                                return accumulated_usage;
+                                            }
+                                        }
+
+                                        // Handle tool calls
+                                        if let Some(delta_tool_calls) = &choice.delta.tool_calls {
+                                            for delta_tool_call in delta_tool_calls {
+                                                if let Some(index) = delta_tool_call.index {
+                                                    // Ensure we have enough tool calls in our vector
+                                                    while current_tool_calls.len() <= index {
+                                                        current_tool_calls
+                                                            .push(OpenAIStreamingToolCall::default());
+                                                    }
+
+                                                    let tool_call = &mut current_tool_calls[index];
+
+                                                    if let Some(id) = &delta_tool_call.id {
+                                                        tool_call.id = Some(id.clone());
+                                                    }
+
+                                                    if let Some(function) = &delta_tool_call.function {
+                                                        if let Some(name) = &function.name {
+                                                            tool_call.name = Some(name.clone());
+                                                        }
+                                                        if let Some(arguments) = &function.arguments {
+                                                            tool_call.arguments.push_str(arguments);
+                                                        }
+                                                    }
+                                                }
+                                            }
+                                        }
+                                    }
+
+                                    // Handle usage
+                                    if let Some(usage) = chunk_data.usage {
+                                        accumulated_usage = Some(Usage {
+                                            prompt_tokens: usage.prompt_tokens,
+                                            completion_tokens: usage.completion_tokens,
+                                            total_tokens: usage.total_tokens,
+                                        });
+                                    }
+                                }
+                                Err(e) => {
+                                    debug!("Failed to parse stream chunk: {} - Data: {}", e, data);
+                                }
+                            }
+                        }
+                    }
+                }
+                Err(e) => {
+                    error!("Stream error: {}", e);
+                    let _ = tx.send(Err(anyhow::anyhow!("Stream error: {}", e))).await;
+                    return accumulated_usage;
+                }
+            }
+        }
+
+        // Send final chunk if we haven't already
+        let tool_calls = if current_tool_calls.is_empty() {
+            None
+        } else {
+            Some(
+                current_tool_calls
+                    .iter()
+                    .filter_map(|tc| tc.to_tool_call())
+                    .collect(),
+            )
+        };
+        
+        let final_chunk = CompletionChunk {
+            content: String::new(),
+            finished: true,
+            tool_calls,
+            usage: accumulated_usage.clone(),
+        };
+        let _ = tx.send(Ok(final_chunk)).await;
+        
+        accumulated_usage
+    }
+}
+
+#[async_trait]
+impl LLMProvider for OpenAIProvider {
+    async fn complete(&self, request: CompletionRequest) -> Result<CompletionResponse> {
+        debug!(
+            "Processing OpenAI completion request with {} messages",
+            request.messages.len()
+        );
+
+        let body = self.create_request_body(
+            &request.messages,
+            request.tools.as_deref(),
+            false,
+            request.max_tokens,
+            request.temperature,
+        );
+
+        debug!("Sending request to OpenAI API: model={}", self.model);
+
+        let response = self
+            .client
+            .post(&format!("{}/chat/completions", self.base_url))
+            .header("Authorization", format!("Bearer {}", self.api_key))
+            .json(&body)
+            .send()
+            .await?;
+
+        let status = response.status();
+        if !status.is_success() {
+            let error_text = response
+                .text()
+                .await
+                .unwrap_or_else(|_| "Unknown error".to_string());
+            return Err(anyhow::anyhow!("OpenAI API error {}: {}", status, error_text));
+        }
+
+        let openai_response: OpenAIResponse = response.json().await?;
+
+        let content = openai_response
+            .choices
+            .first()
+            .and_then(|choice| choice.message.content.clone())
+            .unwrap_or_default();
+
+        let usage = Usage {
+            prompt_tokens: openai_response.usage.prompt_tokens,
+            completion_tokens: openai_response.usage.completion_tokens,
+            total_tokens: openai_response.usage.total_tokens,
+        };
+
+        debug!(
+            "OpenAI completion successful: {} tokens generated",
+            usage.completion_tokens
+        );
+
+        Ok(CompletionResponse {
+            content,
+            usage,
+            model: self.model.clone(),
+        })
+    }
+
+    async fn stream(&self, request: CompletionRequest) -> Result<CompletionStream> {
+        debug!(
+            "Processing OpenAI streaming request with {} messages",
+            request.messages.len()
+        );
+
+        let body = self.create_request_body(
+            &request.messages,
+            request.tools.as_deref(),
+            true,
+            request.max_tokens,
+            request.temperature,
+        );
+
+        debug!("Sending streaming request to OpenAI API: model={}", self.model);
+
+        let response = self
+            .client
+            .post(&format!("{}/chat/completions", self.base_url))
+            .header("Authorization", format!("Bearer {}", self.api_key))
+            .json(&body)
+            .send()
+            .await?;
+
+        let status = response.status();
+        if !status.is_success() {
+            let error_text = response
+                .text()
+                .await
+                .unwrap_or_else(|_| "Unknown error".to_string());
+            return Err(anyhow::anyhow!("OpenAI API error {}: {}", status, error_text));
+        }
+
+        let stream = response.bytes_stream();
+        let (tx, rx) = mpsc::channel(100);
+
+        // Spawn task to process the stream
+        let provider = self.clone();
+        tokio::spawn(async move {
+            let usage = provider.parse_streaming_response(stream, tx).await;
+            // Log the final usage if available
+            if let Some(usage) = usage {
+                debug!(
+                    "Stream completed with usage - prompt: {}, completion: {}, total: {}",
+                    usage.prompt_tokens, usage.completion_tokens, usage.total_tokens
+                );
+            }
+        });
+
+        Ok(ReceiverStream::new(rx))
+    }
+
+    fn name(&self) -> &str {
+        "openai"
+    }
+
+    fn model(&self) -> &str {
+        &self.model
+    }
+
+    fn has_native_tool_calling(&self) -> bool {
+        // OpenAI models support native tool calling
+        true
+    }
+}
+
+fn convert_messages(messages: &[Message]) -> Vec<serde_json::Value> {
+    messages
+        .iter()
+        .map(|msg| {
+            json!({
+                "role": match msg.role {
+                    MessageRole::System => "system",
+                    MessageRole::User => "user",
+                    MessageRole::Assistant => "assistant",
+                },
+                "content": msg.content,
+            })
+        })
+        .collect()
+}
+
+fn convert_tools(tools: &[Tool]) -> Vec<serde_json::Value> {
+    tools
+        .iter()
+        .map(|tool| {
+            json!({
+                "type": "function",
+                "function": {
+                    "name": tool.name,
+                    "description": tool.description,
+                    "parameters": tool.input_schema,
+                }
+            })
+        })
+        .collect()
+}
+
+// OpenAI API response structures
+#[derive(Debug, Deserialize)]
+struct OpenAIResponse {
+    choices: Vec<OpenAIChoice>,
+    usage: OpenAIUsage,
+}
+
+#[derive(Debug, Deserialize)]
+struct OpenAIChoice {
+    message: OpenAIMessage,
+}
+
+#[allow(dead_code)]
+#[derive(Debug, Deserialize)]
+struct OpenAIMessage {
+    content: Option<String>,
+    #[serde(default)]
+    tool_calls: Option<Vec<OpenAIToolCall>>,
+}
+
+#[allow(dead_code)]
+#[derive(Debug, Deserialize)]
+struct OpenAIToolCall {
+    id: String,
+    function: OpenAIFunction,
+}
+
+#[allow(dead_code)]
+#[derive(Debug, Deserialize)]
+struct OpenAIFunction {
+    name: String,
+    arguments: String,
+}
+
+// Streaming tool call accumulator
+#[derive(Debug, Default)]
+struct OpenAIStreamingToolCall {
+    id: Option<String>,
+    name: Option<String>,
+    arguments: String,
+}
+
+impl OpenAIStreamingToolCall {
+    fn to_tool_call(&self) -> Option<ToolCall> {
+        let id = self.id.as_ref()?;
+        let name = self.name.as_ref()?;
+        
+        let args = serde_json::from_str(&self.arguments).unwrap_or(serde_json::Value::Null);
+        
+        Some(ToolCall {
+            id: id.clone(),
+            tool: name.clone(),
+            args,
+        })
+    }
+}
+
+#[derive(Debug, Deserialize)]
+struct OpenAIUsage {
+    prompt_tokens: u32,
+    completion_tokens: u32,
+    total_tokens: u32,
+}
+
+// Streaming response structures
+#[derive(Debug, Deserialize)]
+struct OpenAIStreamChunk {
+    choices: Vec<OpenAIStreamChoice>,
+    usage: Option<OpenAIUsage>,
+}
+
+#[derive(Debug, Deserialize)]
+struct OpenAIStreamChoice {
+    delta: OpenAIDelta,
+}
+
+#[derive(Debug, Deserialize)]
+struct OpenAIDelta {
+    content: Option<String>,
+    #[serde(default)]
+    tool_calls: Option<Vec<OpenAIDeltaToolCall>>,
+}
+
+#[derive(Debug, Deserialize)]
+struct OpenAIDeltaToolCall {
+    index: Option<usize>,
+    id: Option<String>,
+    function: Option<OpenAIDeltaFunction>,
+}
+
+#[derive(Debug, Deserialize)]
+struct OpenAIDeltaFunction {
+    name: Option<String>,
+    arguments: Option<String>,
+}
--- a/docs/coach-player-providers.md
+++ b/docs/coach-player-providers.md
@@ -0,0 +1,75 @@
+# Coach-Player Provider Configuration
+
+G3 now supports specifying different LLM providers for the coach and player agents when running in autonomous mode. This allows you to optimize for different requirements:
+
+- **Player**: The agent that implements code - might benefit from a faster, more cost-effective model
+- **Coach**: The agent that reviews code - might benefit from a more powerful, analytical model
+
+## Configuration
+
+In your `config.toml` file, under the `[providers]` section, you can specify:
+
+```toml
+[providers]
+default_provider = "databricks"  # Used for normal operations
+coach = "databricks"              # Provider for coach (code reviewer)
+player = "anthropic"              # Provider for player (code implementer)
+```
+
+If `coach` or `player` are not specified, they will default to using the `default_provider`.
+
+## Example Use Cases
+
+### Cost Optimization
+Use a cheaper, faster model for initial implementations (player) and a more powerful model for review (coach):
+
+```toml
+coach = "anthropic"  # Claude Sonnet for thorough review
+player = "anthropic" # Claude Haiku for quick implementation
+```
+
+### Speed vs Quality Trade-off
+Use a local embedded model for fast iterations (player) and a cloud model for quality review (coach):
+
+```toml
+coach = "databricks"  # Cloud model for quality review
+player = "embedded"   # Local model for fast implementation
+```
+
+### Specialized Models
+Use different models optimized for different tasks:
+
+```toml
+coach = "databricks"  # Model fine-tuned for code review
+player = "openai"     # Model optimized for code generation
+```
+
+## Requirements
+
+- Both providers must be properly configured in your config file
+- Each provider must have valid credentials
+- The models specified for each provider must be accessible
+
+## How It Works
+
+When running in autonomous mode (`g3 --autonomous`), the system will:
+
+1. Use the `player` provider (or default) for the initial implementation
+2. Switch to the `coach` provider (or default) for code review
+3. Return to the `player` provider for implementing feedback
+4. Continue this cycle for the specified number of turns
+
+The providers are logged at startup so you can verify which models are being used:
+
+```
+🎮 Player provider: anthropic
+👨‍🏫 Coach provider: databricks
+ℹ️  Using different providers for player and coach
+```
+
+## Benefits
+
+- **Cost Efficiency**: Use expensive models only where they add the most value
+- **Speed Optimization**: Use faster models for iterative development
+- **Specialization**: Leverage models that excel at specific tasks
+- **Flexibility**: Easy to experiment with different provider combinations
--- a/test-ai-requirements.sh
+++ b/test-ai-requirements.sh
@@ -0,0 +1,39 @@
+#!/bin/bash
+# Test script for AI-enhanced interactive requirements mode
+
+echo "Testing AI-enhanced interactive requirements mode..."
+echo ""
+
+# Create a test workspace
+TEST_WORKSPACE="/tmp/g3-test-interactive-$(date +%s)"
+mkdir -p "$TEST_WORKSPACE"
+
+echo "Test workspace: $TEST_WORKSPACE"
+echo ""
+
+# Create sample brief input
+BRIEF_INPUT="build a calculator cli in rust with basic operations"
+
+echo "Brief input:"
+echo "---"
+echo "$BRIEF_INPUT"
+echo "---"
+echo ""
+
+echo "This will:"
+echo "1. Send brief input to AI"
+echo "2. AI generates structured requirements.md"
+echo "3. Show enhanced requirements"
+echo "4. Prompt for confirmation (y/e/n)"
+echo ""
+
+echo "To test manually, run:"
+echo "cargo run -- --autonomous --interactive-requirements --workspace $TEST_WORKSPACE"
+echo ""
+echo "Then type: $BRIEF_INPUT"
+echo "Press Ctrl+D"
+echo "Review the AI-generated requirements"
+echo "Choose 'y' to proceed, 'e' to edit, or 'n' to cancel"
+echo ""
+
+echo "Test workspace will be at: $TEST_WORKSPACE"
Author	SHA1	Message	Date
Michael Neale	af6d37a8e2	Add --interactive-requirements flag for AI-enhanced requirements mode - Adds new --interactive-requirements CLI flag for autonomous mode - Prompts user for brief requirements input - Uses AI to enhance and structure requirements into proper markdown - Shows enhanced requirements and allows user to approve/edit/cancel - Saves to requirements.md and proceeds with autonomous mode if approved - Includes test script for manual verification	2025-10-22 14:58:35 +11:00
Dhanji R. Prasanna	c1c6680e03	Merge pull request #7 from jochenx/jochen-add-openai-and-multi-providers coach/player provider split + add OpenAI	2025-10-22 13:46:16 +11:00
Jochen	f2d8e744bb	fix panic in CLI parser	2025-10-22 13:20:45 +11:00
Jochen	010a43d203	coach/player provider split + add OpenAI Allows coach and player LLM providers to be separately specified. Also adds OpenAI provider	2025-10-21 16:59:13 +11:00
Dhanji Prasanna	758e255af8	dont run safaridriver --enable each time	2025-10-21 16:00:58 +11:00
Dhanji Prasanna	393826ae02	webdriver tools	2025-10-21 14:34:41 +11:00
Dhanji Prasanna	3afad3d61f	progressive context thinning	2025-10-20 15:29:44 +11:00
Dhanji Prasanna	2488cc54d5	docs: update README and DESIGN to reflect current project state - Add g3-computer-control crate to architecture documentation - Document all 13 tools including computer control and TODO management - Add context thinning feature documentation (50-80% thresholds) - Update tool ecosystem section with complete tool list - Remove broken link to non-existent COMPUTER_CONTROL.md - Update workspace count from 5 to 6 crates - Add platform-specific implementation details for computer control - Document OCR support via Tesseract - Clarify setup instructions for computer control features	2025-10-20 15:03:22 +11:00
Dhanji Prasanna	2ad0c9a3fd	todo list formatting	2025-10-20 14:27:53 +11:00
Dhanji Prasanna	2008a81193	fix to pass feedback to player (broken by todo system)	2025-10-20 14:12:08 +11:00
Dhanji Prasanna	776f5034b8	TODO tools	2025-10-20 10:50:53 +11:00
Dhanji Prasanna	92bece957b	colorizing tool calls	2025-10-18 16:09:30 +11:00
Dhanji Prasanna	767299ff4e	minor	2025-10-18 16:03:58 +11:00
Dhanji Prasanna	9d35449be8	~ expansion for read_file and str_replace	2025-10-18 16:01:15 +11:00
Dhanji Prasanna	da652bf287	computer control tools	2025-10-18 14:16:50 +11:00
Dhanji Prasanna	a566171203	small turn completing bug	2025-10-18 13:25:23 +11:00
Dhanji Prasanna	347c9e1e00	colorize timing based on duration	2025-10-17 13:54:21 +11:00
Dhanji Prasanna	aa7eda0331	fix wall clock timing	2025-10-17 10:36:21 +11:00
Dhanji Prasanna	e42c76f3b9	Tune coach pickiness down	2025-10-17 10:28:08 +11:00
Dhanji Prasanna	dd211fab1c	panic fix	2025-10-17 09:50:01 +11:00
Dhanji R. Prasanna	bcece38473	Merge pull request #5 from dhanji/micn/agent-tweaks load AGENTS.md if there	2025-10-16 15:06:14 +11:00