diff --git a/AGENTS.md b/AGENTS.md index 7779dfa..9293822 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -202,4 +202,3 @@ g3/ - [Control Commands](docs/CONTROL_COMMANDS.md) - Interactive commands - [Code Search](docs/CODE_SEARCH.md) - Tree-sitter search guide - [Flock Mode](docs/FLOCK_MODE.md) - Multi-agent development -- [macOS Accessibility](docs/macax-tools.md) - macOS automation diff --git a/DESIGN.md b/DESIGN.md index 4e25b24..568ca0e 100644 --- a/DESIGN.md +++ b/DESIGN.md @@ -106,7 +106,6 @@ g3/ - `type_text`: Type text at the current cursor position - `find_element`: Find UI elements by text, role, or attributes - `take_screenshot`: Capture screenshots of screen, region, or window -- `extract_text`: Extract text from images or screen regions using OCR - `find_text_on_screen`: Find text visually on screen and return coordinates - `list_windows`: List all open windows with IDs and titles diff --git a/README.md b/README.md index 4f61b51..0cad9b5 100644 --- a/README.md +++ b/README.md @@ -103,10 +103,8 @@ These commands give you fine-grained control over context management, allowing y - **TODO Management**: Read and write TODO lists with markdown checkbox format - **Computer Control** (Experimental): Automate desktop applications - Mouse and keyboard control - - macOS Accessibility API for native app automation (via `--macax` flag) - UI element inspection - Screenshot capture and window management - - OCR text extraction from images and screen regions - Window listing and identification - **Code Search**: Embedded tree-sitter for syntax-aware code search (Rust, Python, JavaScript, TypeScript, Go, Java, C, C++) - see [Code Search Guide](docs/CODE_SEARCH.md) - **Final Output**: Formatted result presentation @@ -305,24 +303,11 @@ chrome_binary = "/Users/yourname/.chrome-for-testing/chrome-mac-arm64/Google Chr **Note**: If you see "ChromeDriver version doesn't match Chrome version" errors, use Option 1 (Chrome for Testing) which bundles matching versions. -## macOS Accessibility API Tools - -G3 includes support for controlling macOS applications via the Accessibility API, allowing you to automate native macOS apps. - -**Available Tools**: `macax_list_apps`, `macax_get_frontmost_app`, `macax_activate_app`, `macax_get_ui_tree`, `macax_find_elements`, `macax_click`, `macax_set_value`, `macax_get_value`, `macax_press_key` - -**Setup**: Enable with the `--macax` flag or in config with `macax.enabled = true`. Grant accessibility permissions: -- **macOS**: System Preferences → Security & Privacy → Privacy → Accessibility → Add your terminal app - -**For detailed documentation**, see [macOS Accessibility Tools Guide](docs/macax-tools.md). - -**Note**: This is particularly useful for testing and automating apps you're building with G3, as you can add accessibility identifiers to your UI elements. - ## Computer Control (Experimental) G3 can interact with your computer's GUI for automation tasks: -**Available Tools**: `mouse_click`, `type_text`, `find_element`, `take_screenshot`, `extract_text`, `find_text_on_screen`, `list_windows` +**Available Tools**: `mouse_click`, `type_text`, `find_element`, `take_screenshot`, `list_windows` **Setup**: Enable in config with `computer_control.enabled = true` and grant OS accessibility permissions: - **macOS**: System Preferences → Security & Privacy → Accessibility @@ -351,7 +336,6 @@ Detailed documentation is available in the `docs/` directory: | [Control Commands](docs/CONTROL_COMMANDS.md) | Interactive `/` commands for context management | | [Code Search](docs/CODE_SEARCH.md) | Tree-sitter code search query patterns | | [Flock Mode](docs/FLOCK_MODE.md) | Parallel multi-agent development | -| [macOS Accessibility](docs/macax-tools.md) | macOS Accessibility API automation | For AI agents working with this codebase, see [AGENTS.md](AGENTS.md). diff --git a/crates/g3-cli/src/lib.rs b/crates/g3-cli/src/lib.rs index 98abf4a..05d47c5 100644 --- a/crates/g3-cli/src/lib.rs +++ b/crates/g3-cli/src/lib.rs @@ -345,10 +345,6 @@ pub struct Cli { #[arg(long)] pub quiet: bool, - /// Enable macOS Accessibility API tools for native app automation - #[arg(long)] - pub macax: bool, - /// Enable WebDriver browser automation tools #[arg(long)] pub webdriver: bool, @@ -540,11 +536,6 @@ pub async fn run() -> Result<()> { cli.model.clone(), )?; - // Apply macax flag override - if cli.macax { - config.macax.enabled = true; - } - // Apply webdriver flag override if cli.webdriver { config.webdriver.enabled = true; @@ -992,11 +983,6 @@ async fn run_accumulative_mode( cli.model.clone(), )?; - // Apply macax flag override - if cli.macax { - config.macax.enabled = true; - } - // Apply webdriver flag override if cli.webdriver { config.webdriver.enabled = true; @@ -1099,11 +1085,6 @@ async fn run_accumulative_mode( cli.model.clone(), )?; - // Apply macax flag override - if cli.macax { - config.macax.enabled = true; - } - // Apply webdriver flag override if cli.webdriver { config.webdriver.enabled = true; @@ -2604,7 +2585,7 @@ Review the current state of the project and provide a concise critique focusing 2. Whether the project compiles successfully 3. What requirements are missing or incorrect 4. Specific improvements needed to satisfy requirements -5. Use UI tools such as webdriver or macax to test functionality thoroughly +5. Use UI tools such as webdriver to test functionality thoroughly CRITICAL INSTRUCTIONS: 1. You MUST use the final_output tool to provide your feedback diff --git a/crates/g3-config/src/lib.rs b/crates/g3-config/src/lib.rs index a3f5910..3c21a2b 100644 --- a/crates/g3-config/src/lib.rs +++ b/crates/g3-config/src/lib.rs @@ -10,7 +10,6 @@ pub struct Config { pub agent: AgentConfig, pub computer_control: ComputerControlConfig, pub webdriver: WebDriverConfig, - pub macax: MacAxConfig, } /// Provider configuration with named configs per provider type @@ -139,17 +138,6 @@ pub struct WebDriverConfig { pub browser: WebDriverBrowser, } -#[derive(Debug, Clone, Serialize, Deserialize)] -pub struct MacAxConfig { - pub enabled: bool, -} - -impl Default for MacAxConfig { - fn default() -> Self { - Self { enabled: false } - } -} - impl Default for WebDriverConfig { fn default() -> Self { Self { @@ -212,7 +200,6 @@ impl Default for Config { }, computer_control: ComputerControlConfig::default(), webdriver: WebDriverConfig::default(), - macax: MacAxConfig::default(), } } } diff --git a/crates/g3-config/src/tests.rs b/crates/g3-config/src/tests.rs index 9c4540c..010ce42 100644 --- a/crates/g3-config/src/tests.rs +++ b/crates/g3-config/src/tests.rs @@ -14,9 +14,6 @@ max_actions_per_second = 10 [webdriver] enabled = false safari_port = 4444 - -[macax] -enabled = false "# } diff --git a/crates/g3-core/src/lib.rs b/crates/g3-core/src/lib.rs index ae0494b..d72b095 100644 --- a/crates/g3-core/src/lib.rs +++ b/crates/g3-core/src/lib.rs @@ -108,8 +108,6 @@ pub struct Agent { >, >, webdriver_process: std::sync::Arc>>, - macax_controller: - std::sync::Arc>>, tool_call_count: usize, requirements_sha: Option, /// Working directory for tool execution (set by --codebase-fast-start) @@ -389,9 +387,6 @@ impl Agent { None }; - // Capture macax_enabled before moving config - let macax_enabled = config.macax.enabled; - Ok(Self { providers, context_window, @@ -411,13 +406,6 @@ impl Agent { computer_controller, webdriver_session: std::sync::Arc::new(tokio::sync::RwLock::new(None)), webdriver_process: std::sync::Arc::new(tokio::sync::RwLock::new(None)), - macax_controller: { - std::sync::Arc::new(tokio::sync::RwLock::new(if macax_enabled { - Some(g3_computer_control::MacAxController::new()?) - } else { - None - })) - }, tool_call_count: 0, requirements_sha: None, working_dir: None, @@ -921,7 +909,6 @@ impl Agent { Some(tool_definitions::create_tool_definitions( tool_definitions::ToolConfig::new( self.config.webdriver.enabled, - self.config.macax.enabled, self.config.computer_control.enabled, ))) } else { @@ -2674,7 +2661,6 @@ impl Agent { request.tools = Some(tool_definitions::create_tool_definitions( tool_definitions::ToolConfig::new( self.config.webdriver.enabled, - self.config.macax.enabled, self.config.computer_control.enabled, ))); } @@ -3289,7 +3275,6 @@ impl Agent { computer_controller: self.computer_controller.as_ref(), webdriver_session: &self.webdriver_session, webdriver_process: &self.webdriver_process, - macax_controller: &self.macax_controller, background_process_manager: &self.background_process_manager, todo_content: &self.todo_content, pending_images: &mut self.pending_images, diff --git a/crates/g3-core/src/tool_definitions.rs b/crates/g3-core/src/tool_definitions.rs index 5c2c245..5a45074 100644 --- a/crates/g3-core/src/tool_definitions.rs +++ b/crates/g3-core/src/tool_definitions.rs @@ -11,15 +11,13 @@ use serde_json::json; #[derive(Debug, Clone, Copy, Default)] pub struct ToolConfig { pub webdriver: bool, - pub macax: bool, pub computer_control: bool, } impl ToolConfig { - pub fn new(webdriver: bool, macax: bool, computer_control: bool) -> Self { + pub fn new(webdriver: bool, computer_control: bool) -> Self { Self { webdriver, - macax, computer_control, } } @@ -36,14 +34,6 @@ pub fn create_tool_definitions(config: ToolConfig) -> Vec { tools.extend(create_webdriver_tools()); } - if config.macax { - tools.extend(create_macax_tools()); - } - - if config.computer_control { - tools.extend(create_computer_control_tools()); - } - tools } @@ -88,7 +78,7 @@ fn create_core_tools() -> Vec { }, Tool { name: "read_file".to_string(), - description: "Read the contents of a file. For image files (png, jpg, jpeg, gif, bmp, tiff, webp), automatically extracts text using OCR. For text files, optionally read a specific character range.".to_string(), + description: "Read the contents of a file. Optionally read a specific character range.".to_string(), input_schema: json!({ "type": "object", "properties": { @@ -208,19 +198,6 @@ fn create_core_tools() -> Vec { "required": ["path", "window_id"] }), }, - Tool { - name: "extract_text".to_string(), - description: "Extract text from an image file using OCR. For extracting text from a specific window, use vision_find_text instead which automatically handles window capture.".to_string(), - input_schema: json!({ - "type": "object", - "properties": { - "path": { - "type": "string", - "description": "Path to image file (optional if region is provided)" - }, - } - }), - }, Tool { name: "todo_read".to_string(), description: "Read your current TODO list from todo.g3.md file in the session directory. Shows what tasks are planned and their status. Call this at the start of multi-step tasks to check for existing plans, and during execution to review progress before updating. TODO lists are scoped to the current session.".to_string(), @@ -476,174 +453,6 @@ fn create_webdriver_tools() -> Vec { ] } -/// Create macOS Accessibility tools -fn create_macax_tools() -> Vec { - vec![ - Tool { - name: "macax_list_apps".to_string(), - description: "List all running applications that can be controlled via macOS Accessibility API".to_string(), - input_schema: json!({ - "type": "object", - "properties": {}, - "required": [] - }), - }, - Tool { - name: "macax_get_frontmost_app".to_string(), - description: "Get the name of the currently active (frontmost) application".to_string(), - input_schema: json!({ - "type": "object", - "properties": {}, - "required": [] - }), - }, - Tool { - name: "macax_activate_app".to_string(), - description: "Bring an application to the front (activate it)".to_string(), - input_schema: json!({ - "type": "object", - "properties": { - "app_name": { - "type": "string", - "description": "Name of the application to activate (e.g., 'Safari', 'TextEdit')" - } - }, - "required": ["app_name"] - }), - }, - Tool { - name: "macax_press_key".to_string(), - description: "Press a keyboard key or shortcut in an application (e.g., Cmd+S to save)".to_string(), - input_schema: json!({ - "type": "object", - "properties": { - "app_name": { - "type": "string", - "description": "Name of the application" - }, - "key": { - "type": "string", - "description": "Key to press (e.g., 's', 'return', 'tab')" - }, - "modifiers": { - "type": "array", - "items": { - "type": "string" - }, - "description": "Modifier keys (e.g., ['command', 'shift'])" - } - }, - "required": ["app_name", "key"] - }), - }, - Tool { - name: "macax_type_text".to_string(), - description: "Type arbitrary text into the currently focused element in an application (supports unicode, emojis, etc.)".to_string(), - input_schema: json!({ - "type": "object", - "properties": { - "app_name": { - "type": "string", - "description": "Name of the application" - }, - "text": { - "type": "string", - "description": "Text to type (can include unicode, emojis, special characters)" - } - }, - "required": ["app_name", "text"] - }), - }, - Tool { - name: "extract_text_with_boxes".to_string(), - description: "Extract all text from an image file with bounding box coordinates for each text element. Returns JSON array with text, position (x, y), size (width, height), and confidence for each detected text. Uses Apple Vision Framework for precise sub-pixel accuracy.".to_string(), - input_schema: json!({ - "type": "object", - "properties": { - "path": { - "type": "string", - "description": "Path to image file to extract text from" - }, - "app_name": { - "type": "string", - "description": "Optional: Name of application to screenshot first (e.g., 'Safari', 'Things3'). If provided, takes screenshot of app before extracting text." - } - }, - "required": ["path"] - }), - }, - ] -} - -/// Create computer control / vision-guided tools -fn create_computer_control_tools() -> Vec { - vec![ - Tool { - name: "vision_find_text".to_string(), - description: "Find text in a specific application window and return its location with bounding box coordinates (x, y, width, height) and confidence score. Useful for locating UI elements. Uses Apple Vision Framework for precise sub-pixel accuracy.".to_string(), - input_schema: json!({ - "type": "object", - "properties": { - "app_name": { - "type": "string", - "description": "Name of the application to search in (e.g., 'Things3', 'Safari', 'TextEdit')" - }, - "text": { - "type": "string", - "description": "The text to search for on screen" - } - }, - "required": ["app_name", "text"] - }), - }, - Tool { - name: "vision_click_text".to_string(), - description: "Find text in a specific application window and click on it (useful for clicking buttons, links, menu items)".to_string(), - input_schema: json!({ - "type": "object", - "properties": { - "app_name": { - "type": "string", - "description": "Name of the application (e.g., 'Things3', 'Safari', 'TextEdit')" - }, - "text": { - "type": "string", - "description": "The text to click on (e.g., 'Submit', 'OK', 'Cancel', '+')" - } - }, - "required": ["app_name", "text"] - }), - }, - Tool { - name: "vision_click_near_text".to_string(), - description: "Find text in a specific application window and click near it (useful for clicking text fields next to labels)".to_string(), - input_schema: json!({ - "type": "object", - "properties": { - "app_name": { - "type": "string", - "description": "Name of the application (e.g., 'Things3', 'Safari', 'TextEdit')" - }, - "text": { - "type": "string", - "description": "The label text to find (e.g., 'Name:', 'Email:', 'Task:')" - }, - "direction": { - "type": "string", - "enum": ["right", "below", "left", "above"], - "description": "Direction to click relative to the text (default: right)" - }, - "distance": { - "type": "integer", - "description": "Distance in pixels from the text (default: 50)" - } - }, - "required": ["app_name", "text"] - }), - }, - ] -} - #[cfg(test)] mod tests { use super::*; @@ -652,9 +461,9 @@ mod tests { fn test_core_tools_count() { let tools = create_core_tools(); // Should have the core tools: shell, background_process, read_file, read_image, - // write_file, str_replace, final_output, take_screenshot, extract_text, - // todo_read, todo_write, code_coverage, code_search - assert_eq!(tools.len(), 13); + // write_file, str_replace, final_output, take_screenshot, + // todo_read, todo_write, code_coverage, code_search (12 total) + assert_eq!(tools.len(), 12); } #[test] @@ -664,33 +473,19 @@ mod tests { assert_eq!(tools.len(), 15); } - #[test] - fn test_macax_tools_count() { - let tools = create_macax_tools(); - // 6 macax tools - assert_eq!(tools.len(), 6); - } - - #[test] - fn test_computer_control_tools_count() { - let tools = create_computer_control_tools(); - // 3 vision tools - assert_eq!(tools.len(), 3); - } - #[test] fn test_create_tool_definitions_core_only() { let config = ToolConfig::default(); let tools = create_tool_definitions(config); - assert_eq!(tools.len(), 13); + assert_eq!(tools.len(), 12); } #[test] fn test_create_tool_definitions_all_enabled() { - let config = ToolConfig::new(true, true, true); + let config = ToolConfig::new(true, true); let tools = create_tool_definitions(config); - // 13 core + 15 webdriver + 6 macax + 3 computer_control = 37 - assert_eq!(tools.len(), 37); + // 12 core + 15 webdriver = 27 + assert_eq!(tools.len(), 27); } #[test] diff --git a/crates/g3-core/src/tool_dispatch.rs b/crates/g3-core/src/tool_dispatch.rs index bc3ed6c..9024deb 100644 --- a/crates/g3-core/src/tool_dispatch.rs +++ b/crates/g3-core/src/tool_dispatch.rs @@ -7,7 +7,7 @@ use anyhow::Result; use tracing::{debug, warn}; use crate::tools::executor::ToolContext; -use crate::tools::{file_ops, macax, misc, shell, todo, vision, webdriver}; +use crate::tools::{file_ops, misc, shell, todo, webdriver}; use crate::ui_writer::UiWriter; use crate::ToolCall; @@ -43,7 +43,6 @@ pub async fn dispatch_tool( Ok(result) } "take_screenshot" => misc::execute_take_screenshot(tool_call, ctx).await, - "extract_text" => misc::execute_extract_text(tool_call, ctx).await, "code_coverage" => misc::execute_code_coverage(tool_call, ctx).await, "code_search" => misc::execute_code_search(tool_call, ctx).await, @@ -64,19 +63,6 @@ pub async fn dispatch_tool( "webdriver_refresh" => webdriver::execute_webdriver_refresh(tool_call, ctx).await, "webdriver_quit" => webdriver::execute_webdriver_quit(tool_call, ctx).await, - // macOS Accessibility tools - "macax_list_apps" => macax::execute_macax_list_apps(tool_call, ctx).await, - "macax_get_frontmost_app" => macax::execute_macax_get_frontmost_app(tool_call, ctx).await, - "macax_activate_app" => macax::execute_macax_activate_app(tool_call, ctx).await, - "macax_press_key" => macax::execute_macax_press_key(tool_call, ctx).await, - "macax_type_text" => macax::execute_macax_type_text(tool_call, ctx).await, - - // Vision tools - "vision_find_text" => vision::execute_vision_find_text(tool_call, ctx).await, - "vision_click_text" => vision::execute_vision_click_text(tool_call, ctx).await, - "vision_click_near_text" => vision::execute_vision_click_near_text(tool_call, ctx).await, - "extract_text_with_boxes" => vision::execute_extract_text_with_boxes(tool_call, ctx).await, - // Unknown tool _ => { warn!("Unknown tool: {}", tool_call.tool); diff --git a/crates/g3-core/src/tools/executor.rs b/crates/g3-core/src/tools/executor.rs index 44b02c7..9733d3b 100644 --- a/crates/g3-core/src/tools/executor.rs +++ b/crates/g3-core/src/tools/executor.rs @@ -20,7 +20,6 @@ pub struct ToolContext<'a, W: UiWriter> { pub computer_controller: Option<&'a Box>, pub webdriver_session: &'a Arc>>>>, pub webdriver_process: &'a Arc>>, - pub macax_controller: &'a Arc>>, pub background_process_manager: &'a Arc, pub todo_content: &'a Arc>, pub pending_images: &'a mut Vec, diff --git a/crates/g3-core/src/tools/file_ops.rs b/crates/g3-core/src/tools/file_ops.rs index 711f030..6ed3e73 100644 --- a/crates/g3-core/src/tools/file_ops.rs +++ b/crates/g3-core/src/tools/file_ops.rs @@ -13,7 +13,7 @@ use super::executor::ToolContext; /// Execute the `read_file` tool. pub async fn execute_read_file( tool_call: &ToolCall, - ctx: &ToolContext<'_, W>, + _ctx: &ToolContext<'_, W>, ) -> Result { debug!("Processing read_file tool call"); @@ -28,35 +28,6 @@ pub async fn execute_read_file( let resolved_path = resolve_path_with_unicode_fallback(expanded_path.as_ref()); let path_str = resolved_path.as_ref(); - // Check if this is an image file - let is_image = path_str.to_lowercase().ends_with(".png") - || path_str.to_lowercase().ends_with(".jpg") - || path_str.to_lowercase().ends_with(".jpeg") - || path_str.to_lowercase().ends_with(".gif") - || path_str.to_lowercase().ends_with(".bmp") - || path_str.to_lowercase().ends_with(".tiff") - || path_str.to_lowercase().ends_with(".tif") - || path_str.to_lowercase().ends_with(".webp"); - - // If it's an image file, use OCR via extract_text - if is_image { - if let Some(controller) = ctx.computer_controller { - match controller.extract_text_from_image(path_str).await { - Ok(text) => { - return Ok(format!("📄 Image file (OCR extracted):\n{}", text)); - } - Err(e) => { - return Ok(format!( - "❌ Failed to extract text from image '{}': {}", - path_str, e - )); - } - } - } else { - return Ok("❌ Computer control not enabled. Cannot perform OCR on image files. Set computer_control.enabled = true in config.".to_string()); - } - } - // Extract optional start and end positions let start_char = tool_call .args diff --git a/crates/g3-core/src/tools/macax.rs b/crates/g3-core/src/tools/macax.rs deleted file mode 100644 index c0356c7..0000000 --- a/crates/g3-core/src/tools/macax.rs +++ /dev/null @@ -1,178 +0,0 @@ -//! macOS Accessibility API tools. - -use anyhow::Result; -use tracing::debug; - -use crate::ui_writer::UiWriter; -use crate::ToolCall; - -use super::executor::ToolContext; - -/// Execute the `macax_list_apps` tool. -pub async fn execute_macax_list_apps( - tool_call: &ToolCall, - ctx: &ToolContext<'_, W>, -) -> Result { - debug!("Processing macax_list_apps tool call"); - let _ = tool_call; // unused - - if !ctx.config.macax.enabled { - return Ok( - "❌ macOS Accessibility is not enabled. Use --macax flag to enable.".to_string(), - ); - } - - let controller_guard = ctx.macax_controller.read().await; - let controller = match controller_guard.as_ref() { - Some(c) => c, - None => return Ok("❌ macOS Accessibility controller not initialized.".to_string()), - }; - - match controller.list_applications() { - Ok(apps) => { - let app_list: Vec = apps.iter().map(|a| a.name.clone()).collect(); - Ok(format!("Running applications:\n{}", app_list.join("\n"))) - } - Err(e) => Ok(format!("❌ Failed to list applications: {}", e)), - } -} - -/// Execute the `macax_get_frontmost_app` tool. -pub async fn execute_macax_get_frontmost_app( - tool_call: &ToolCall, - ctx: &ToolContext<'_, W>, -) -> Result { - debug!("Processing macax_get_frontmost_app tool call"); - let _ = tool_call; // unused - - if !ctx.config.macax.enabled { - return Ok( - "❌ macOS Accessibility is not enabled. Use --macax flag to enable.".to_string(), - ); - } - - let controller_guard = ctx.macax_controller.read().await; - let controller = match controller_guard.as_ref() { - Some(c) => c, - None => return Ok("❌ macOS Accessibility controller not initialized.".to_string()), - }; - - match controller.get_frontmost_app() { - Ok(app) => Ok(format!("Frontmost application: {}", app.name)), - Err(e) => Ok(format!("❌ Failed to get frontmost app: {}", e)), - } -} - -/// Execute the `macax_activate_app` tool. -pub async fn execute_macax_activate_app( - tool_call: &ToolCall, - ctx: &ToolContext<'_, W>, -) -> Result { - debug!("Processing macax_activate_app tool call"); - - if !ctx.config.macax.enabled { - return Ok( - "❌ macOS Accessibility is not enabled. Use --macax flag to enable.".to_string(), - ); - } - - let app_name = match tool_call.args.get("app_name").and_then(|v| v.as_str()) { - Some(n) => n, - None => return Ok("❌ Missing app_name argument".to_string()), - }; - - let controller_guard = ctx.macax_controller.read().await; - let controller = match controller_guard.as_ref() { - Some(c) => c, - None => return Ok("❌ macOS Accessibility controller not initialized.".to_string()), - }; - - match controller.activate_app(app_name) { - Ok(_) => Ok(format!("✅ Activated application: {}", app_name)), - Err(e) => Ok(format!("❌ Failed to activate app: {}", e)), - } -} - -/// Execute the `macax_press_key` tool. -pub async fn execute_macax_press_key( - tool_call: &ToolCall, - ctx: &ToolContext<'_, W>, -) -> Result { - debug!("Processing macax_press_key tool call"); - - if !ctx.config.macax.enabled { - return Ok( - "❌ macOS Accessibility is not enabled. Use --macax flag to enable.".to_string(), - ); - } - - let app_name = match tool_call.args.get("app_name").and_then(|v| v.as_str()) { - Some(n) => n, - None => return Ok("❌ Missing app_name argument".to_string()), - }; - - let key = match tool_call.args.get("key").and_then(|v| v.as_str()) { - Some(k) => k, - None => return Ok("❌ Missing key argument".to_string()), - }; - - let modifiers_vec: Vec<&str> = tool_call - .args - .get("modifiers") - .and_then(|v| v.as_array()) - .map(|arr| arr.iter().filter_map(|v| v.as_str()).collect()) - .unwrap_or_default(); - - let controller_guard = ctx.macax_controller.read().await; - let controller = match controller_guard.as_ref() { - Some(c) => c, - None => return Ok("❌ macOS Accessibility controller not initialized.".to_string()), - }; - - match controller.press_key(app_name, key, modifiers_vec.clone()) { - Ok(_) => { - let modifier_str = if modifiers_vec.is_empty() { - String::new() - } else { - format!(" with modifiers: {}", modifiers_vec.join("+")) - }; - Ok(format!("✅ Pressed key: {}{}", key, modifier_str)) - } - Err(e) => Ok(format!("❌ Failed to press key: {}", e)), - } -} - -/// Execute the `macax_type_text` tool. -pub async fn execute_macax_type_text( - tool_call: &ToolCall, - ctx: &ToolContext<'_, W>, -) -> Result { - debug!("Processing macax_type_text tool call"); - - if !ctx.config.macax.enabled { - return Ok( - "❌ macOS Accessibility is not enabled. Use --macax flag to enable.".to_string(), - ); - } - - let app_name = match tool_call.args.get("app_name").and_then(|v| v.as_str()) { - Some(n) => n, - None => return Ok("❌ Missing app_name argument".to_string()), - }; - - let text = match tool_call.args.get("text").and_then(|v| v.as_str()) { - Some(t) => t, - None => return Ok("❌ Missing text argument".to_string()), - }; - - let controller_guard = ctx.macax_controller.read().await; - let controller = match controller_guard.as_ref() { - Some(c) => c, - None => return Ok("❌ macOS Accessibility controller not initialized.".to_string()), - }; - - match controller.type_text(app_name, text) { - Ok(_) => Ok(format!("✅ Typed text into {}", app_name)), - Err(e) => Ok(format!("❌ Failed to type text: {}", e)), - } -} diff --git a/crates/g3-core/src/tools/misc.rs b/crates/g3-core/src/tools/misc.rs index 934a8f8..469b8e2 100644 --- a/crates/g3-core/src/tools/misc.rs +++ b/crates/g3-core/src/tools/misc.rs @@ -1,4 +1,4 @@ -//! Miscellaneous tools: final_output, take_screenshot, extract_text, code_coverage, code_search. +//! Miscellaneous tools: final_output, take_screenshot, code_coverage, code_search. use anyhow::Result; use tracing::debug; @@ -118,35 +118,6 @@ pub async fn execute_take_screenshot( } } -/// Execute the `extract_text` tool. -pub async fn execute_extract_text( - tool_call: &ToolCall, - ctx: &ToolContext<'_, W>, -) -> Result { - debug!("Processing extract_text tool call"); - - let controller = match ctx.computer_controller { - Some(c) => c, - None => { - return Ok( - "❌ Computer control not enabled. Set computer_control.enabled = true in config." - .to_string(), - ) - } - }; - - let path = tool_call - .args - .get("path") - .and_then(|v| v.as_str()) - .ok_or_else(|| anyhow::anyhow!("Missing path argument"))?; - - match controller.extract_text_from_image(path).await { - Ok(text) => Ok(format!("✅ Extracted text:\n{}", text)), - Err(e) => Ok(format!("❌ Failed to extract text: {}", e)), - } -} - /// Execute the `code_coverage` tool. pub async fn execute_code_coverage( tool_call: &ToolCall, diff --git a/crates/g3-core/src/tools/mod.rs b/crates/g3-core/src/tools/mod.rs index 4aebf33..46b3e74 100644 --- a/crates/g3-core/src/tools/mod.rs +++ b/crates/g3-core/src/tools/mod.rs @@ -6,17 +6,13 @@ //! - `file_ops` - File reading, writing, and editing //! - `todo` - TODO list management //! - `webdriver` - Browser automation via WebDriver -//! - `macax` - macOS Accessibility API tools -//! - `vision` - Vision-based text finding and clicking //! - `misc` - Other tools (screenshots, code search, etc.) pub mod executor; pub mod file_ops; -pub mod macax; pub mod misc; pub mod shell; pub mod todo; -pub mod vision; pub mod webdriver; pub use executor::ToolExecutor; diff --git a/crates/g3-core/src/tools/vision.rs b/crates/g3-core/src/tools/vision.rs deleted file mode 100644 index 15a3d23..0000000 --- a/crates/g3-core/src/tools/vision.rs +++ /dev/null @@ -1,275 +0,0 @@ -//! Vision-based tools: vision_find_text, vision_click_text, vision_click_near_text, extract_text_with_boxes. - -use anyhow::Result; -use tracing::debug; - -use crate::ui_writer::UiWriter; -use crate::ToolCall; - -use super::executor::ToolContext; - -/// Execute the `vision_find_text` tool. -pub async fn execute_vision_find_text( - tool_call: &ToolCall, - ctx: &ToolContext<'_, W>, -) -> Result { - debug!("Processing vision_find_text tool call"); - - let controller = match ctx.computer_controller { - Some(c) => c, - None => { - return Ok( - "❌ Computer control not enabled. Set computer_control.enabled = true in config." - .to_string(), - ) - } - }; - - let app_name = tool_call - .args - .get("app_name") - .and_then(|v| v.as_str()) - .ok_or_else(|| anyhow::anyhow!("Missing app_name parameter"))?; - - let text = tool_call - .args - .get("text") - .and_then(|v| v.as_str()) - .ok_or_else(|| anyhow::anyhow!("Missing text parameter"))?; - - match controller.find_text_in_app(app_name, text).await { - Ok(Some(location)) => Ok(format!( - "✅ Found '{}' in {} at position ({}, {}) with size {}x{} (confidence: {:.0}%)", - location.text, - app_name, - location.x, - location.y, - location.width, - location.height, - location.confidence * 100.0 - )), - Ok(None) => Ok(format!("❌ Could not find '{}' in {}", text, app_name)), - Err(e) => Ok(format!("❌ Error finding text: {}", e)), - } -} - -/// Execute the `vision_click_text` tool. -pub async fn execute_vision_click_text( - tool_call: &ToolCall, - ctx: &ToolContext<'_, W>, -) -> Result { - debug!("Processing vision_click_text tool call"); - - let controller = match ctx.computer_controller { - Some(c) => c, - None => { - return Ok( - "❌ Computer control not enabled. Set computer_control.enabled = true in config." - .to_string(), - ) - } - }; - - let app_name = tool_call - .args - .get("app_name") - .and_then(|v| v.as_str()) - .ok_or_else(|| anyhow::anyhow!("Missing app_name parameter"))?; - - let text = tool_call - .args - .get("text") - .and_then(|v| v.as_str()) - .ok_or_else(|| anyhow::anyhow!("Missing text parameter"))?; - - match controller.find_text_in_app(app_name, text).await { - Ok(Some(location)) => { - // Click on center of text - // IMPORTANT: location coordinates are in NSScreen space (Y=0 at BOTTOM, increases UPWARD) - // location.x is the LEFT edge of the bounding box - // location.y is the TOP edge of the bounding box (highest Y value in NSScreen space) - // location.width and location.height are already scaled to screen space - // To get center: we need to add half the SCALED width and subtract half the SCALED height - - if location.width == 0 || location.height == 0 { - return Ok(format!( - "❌ Invalid bounding box dimensions: width={}, height={}", - location.width, location.height - )); - } - - debug!( - "[vision_click_text] Location from find_text_in_app: x={}, y={}, width={}, height={}, text='{}'", - location.x, location.y, location.width, location.height, location.text - ); - - // Calculate center using the SCALED dimensions - // X: Use right edge instead of center (Vision OCR bounding box seems offset) - // This gives us: left edge + full width = right edge - // Y: top edge - half of scaled height (subtract because Y increases upward) - let click_x = location.x + location.width; // Right edge - let half_height = location.height / 2; - let click_y = location.y - half_height; - - debug!( - "[vision_click_text] Click position calculation: x={} + {} = {} (right edge), y={} - {} = {}", - location.x, location.width, click_x, location.y, half_height, click_y - ); - - match controller.click_at(click_x, click_y, Some(app_name)) { - Ok(_) => Ok(format!( - "✅ Clicked on '{}' in {} at ({}, {})", - text, app_name, click_x, click_y - )), - Err(e) => Ok(format!("❌ Failed to click: {}", e)), - } - } - Ok(None) => Ok(format!("❌ Could not find '{}' in {}", text, app_name)), - Err(e) => Ok(format!("❌ Error finding text: {}", e)), - } -} - -/// Execute the `vision_click_near_text` tool. -pub async fn execute_vision_click_near_text( - tool_call: &ToolCall, - ctx: &ToolContext<'_, W>, -) -> Result { - debug!("Processing vision_click_near_text tool call"); - - let controller = match ctx.computer_controller { - Some(c) => c, - None => { - return Ok( - "❌ Computer control not enabled. Set computer_control.enabled = true in config." - .to_string(), - ) - } - }; - - let app_name = tool_call - .args - .get("app_name") - .and_then(|v| v.as_str()) - .ok_or_else(|| anyhow::anyhow!("Missing app_name parameter"))?; - - let text = tool_call - .args - .get("text") - .and_then(|v| v.as_str()) - .ok_or_else(|| anyhow::anyhow!("Missing text parameter"))?; - - let direction = tool_call - .args - .get("direction") - .and_then(|v| v.as_str()) - .unwrap_or("right"); - - let distance = tool_call - .args - .get("distance") - .and_then(|v| v.as_i64()) - .unwrap_or(50) as i32; - - match controller.find_text_in_app(app_name, text).await { - Ok(Some(location)) => { - // Calculate click position based on direction - // location.x is LEFT edge, location.y is TOP edge (in NSScreen space) - let (click_x, click_y) = match direction { - "right" => ( - location.x + location.width + distance, - location.y - (location.height / 2), - ), - "below" => ( - location.x + (location.width / 2), - location.y - location.height - distance, - ), - "left" => (location.x - distance, location.y - (location.height / 2)), - "above" => (location.x + (location.width / 2), location.y + distance), - _ => ( - location.x + location.width + distance, - location.y - (location.height / 2), - ), - }; - debug!( - "[vision_click_near_text] Clicking {} of text at ({}, {})", - direction, click_x, click_y - ); - - match controller.click_at(click_x, click_y, Some(app_name)) { - Ok(_) => Ok(format!( - "✅ Clicked {} of '{}' in {} at ({}, {})", - direction, text, app_name, click_x, click_y - )), - Err(e) => Ok(format!("❌ Failed to click: {}", e)), - } - } - Ok(None) => Ok(format!("❌ Could not find '{}' in {}", text, app_name)), - Err(e) => Ok(format!("❌ Error finding text: {}", e)), - } -} - -/// Execute the `extract_text_with_boxes` tool. -pub async fn execute_extract_text_with_boxes( - tool_call: &ToolCall, - ctx: &ToolContext<'_, W>, -) -> Result { - debug!("Processing extract_text_with_boxes tool call"); - - if !ctx.config.macax.enabled { - return Ok( - "❌ extract_text_with_boxes requires --macax flag to be enabled".to_string(), - ); - } - - let controller = match ctx.computer_controller { - Some(c) => c, - None => { - return Ok( - "❌ Computer control not enabled. Set computer_control.enabled = true in config." - .to_string(), - ) - } - }; - - let path = tool_call - .args - .get("path") - .and_then(|v| v.as_str()) - .ok_or_else(|| anyhow::anyhow!("Missing path parameter"))?; - - // Optional: take screenshot of app first - let final_path = if let Some(app_name) = tool_call.args.get("app_name").and_then(|v| v.as_str()) - { - let temp_path = format!("/tmp/g3_extract_boxes_{}.png", uuid::Uuid::new_v4()); - match controller - .take_screenshot(&temp_path, None, Some(app_name)) - .await - { - Ok(_) => temp_path, - Err(e) => return Ok(format!("❌ Failed to take screenshot: {}", e)), - } - } else { - path.to_string() - }; - - // Extract text with locations - match controller.extract_text_with_locations(&final_path).await { - Ok(locations) => { - // Clean up temp file if we created one - if final_path != path { - let _ = std::fs::remove_file(&final_path); - } - - // Return as JSON - match serde_json::to_string_pretty(&locations) { - Ok(json) => Ok(format!( - "✅ Extracted {} text elements:\n{}", - locations.len(), - json - )), - Err(e) => Ok(format!("❌ Failed to serialize results: {}", e)), - } - } - Err(e) => Ok(format!("❌ Failed to extract text: {}", e)), - } -} diff --git a/docs/architecture.md b/docs/architecture.md index 9598717..58fe54a 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -191,7 +191,6 @@ Key modules: - `platform/` - Platform-specific implementations (macOS, Linux, Windows) - `webdriver/` - Safari and Chrome WebDriver integration - `ocr/` - Text extraction (Tesseract, Apple Vision) -- `macax/` - macOS Accessibility API controller **Platform support**: - **macOS**: Core Graphics, Cocoa, screencapture, Vision framework diff --git a/docs/configuration.md b/docs/configuration.md index 1cf9328..5e85e40 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -27,7 +27,6 @@ G3 uses TOML format. The configuration is organized into sections: [agent] # Agent behavior settings [computer_control] # Mouse/keyboard automation [webdriver] # Browser automation -[macax] # macOS Accessibility API ``` ## Provider Configuration @@ -236,13 +235,11 @@ apt install chromium-chromedriver ## macOS Accessibility API Configuration ```toml -[macax] enabled = false # Set to true to enable ``` **Required permissions**: System Preferences → Security & Privacy → Privacy → Accessibility → Add your terminal app -See [macOS Accessibility Tools Guide](macax-tools.md) for detailed usage. ## Multi-Role Configuration @@ -295,7 +292,6 @@ g3 --model claude-opus-4-5 # Enable features g3 --webdriver # Enable WebDriver (Safari) g3 --chrome-headless # Enable WebDriver (Chrome headless) -g3 --macax # Enable macOS Accessibility API # Specify config file g3 --config /path/to/config.toml @@ -340,7 +336,6 @@ enabled = true browser = "safari" safari_port = 4444 -[macax] enabled = false ``` diff --git a/docs/macax-tools.md b/docs/macax-tools.md deleted file mode 100644 index 58d9659..0000000 --- a/docs/macax-tools.md +++ /dev/null @@ -1,472 +0,0 @@ -# macOS Accessibility Tools Guide - -**Last updated**: January 2025 -**Source of truth**: `crates/g3-computer-control/src/macax/` - -## Purpose - -G3 includes tools for controlling macOS applications via the Accessibility API. This enables automation of native macOS apps, including those you're building with G3. - -## Overview - -The macOS Accessibility API provides programmatic access to UI elements in any application. G3 exposes this through the `macax_*` tools, allowing you to: - -- List and activate applications -- Inspect UI element hierarchies -- Find elements by role, title, or identifier -- Click buttons and interact with controls -- Read and set values in text fields -- Simulate keyboard input - -## Setup - -### 1. Enable in Configuration - -```toml -# ~/.config/g3/config.toml -[macax] -enabled = true -``` - -Or use the CLI flag: - -```bash -g3 --macax -``` - -### 2. Grant Accessibility Permissions - -1. Open **System Preferences** → **Security & Privacy** → **Privacy** -2. Select **Accessibility** in the left sidebar -3. Click the lock icon and authenticate -4. Add your terminal application (Terminal, iTerm2, etc.) -5. Restart your terminal - -**Note**: If using VS Code's integrated terminal, add VS Code to the list. - -### 3. Verify Setup - -```json -{"tool": "macax_list_apps", "args": {}} -``` - -This should return a list of running applications. - -## Available Tools - -### macax_list_apps - -List all running applications. - -**Parameters**: None - -**Example**: -```json -{"tool": "macax_list_apps", "args": {}} -``` - -**Returns**: -``` -Running Applications: -- Safari (com.apple.Safari) -- Finder (com.apple.finder) -- Terminal (com.apple.Terminal) -- MyApp (com.example.myapp) -``` - ---- - -### macax_get_frontmost_app - -Get the currently active (frontmost) application. - -**Parameters**: None - -**Example**: -```json -{"tool": "macax_get_frontmost_app", "args": {}} -``` - -**Returns**: -``` -Frontmost Application: Safari (com.apple.Safari) -``` - ---- - -### macax_activate_app - -Bring an application to the front. - -**Parameters**: -- `app_name` (string, required): Application name - -**Example**: -```json -{"tool": "macax_activate_app", "args": {"app_name": "Safari"}} -``` - ---- - -### macax_get_ui_tree - -Get the UI element hierarchy of an application. - -**Parameters**: -- `app_name` (string, required): Application name -- `max_depth` (integer, optional): Maximum tree depth (default: 5) - -**Example**: -```json -{"tool": "macax_get_ui_tree", "args": {"app_name": "Calculator", "max_depth": 3}} -``` - -**Returns**: -``` -UI Tree for Calculator: -└── AXApplication "Calculator" - └── AXWindow "Calculator" - ├── AXGroup - │ ├── AXButton "1" [id: digit_1] - │ ├── AXButton "2" [id: digit_2] - │ ├── AXButton "+" [id: add] - │ └── AXButton "=" [id: equals] - └── AXStaticText "0" [id: display] -``` - -**Notes**: -- Use lower `max_depth` for complex apps to avoid overwhelming output -- Elements show role, title, and accessibility identifier (if set) - ---- - -### macax_find_elements - -Find UI elements matching criteria. - -**Parameters**: -- `app_name` (string, required): Application name -- `role` (string, optional): Element role (e.g., "button", "textField") -- `title` (string, optional): Element title/label -- `identifier` (string, optional): Accessibility identifier - -**Example**: -```json -{"tool": "macax_find_elements", "args": { - "app_name": "Safari", - "role": "button" -}} -``` - -**Returns**: -``` -Found 5 elements: -1. AXButton "Back" [id: BackButton] -2. AXButton "Forward" [id: ForwardButton] -3. AXButton "Reload" [id: ReloadButton] -4. AXButton "Share" [id: ShareButton] -5. AXButton "New Tab" [id: NewTabButton] -``` - ---- - -### macax_click - -Click a UI element. - -**Parameters**: -- `app_name` (string, required): Application name -- `identifier` (string, optional): Accessibility identifier -- `title` (string, optional): Element title -- `role` (string, optional): Element role - -At least one of `identifier`, `title`, or `role` must be provided. - -**Examples**: - -```json -// Click by identifier (most reliable) -{"tool": "macax_click", "args": { - "app_name": "Calculator", - "identifier": "digit_5" -}} - -// Click by title -{"tool": "macax_click", "args": { - "app_name": "Calculator", - "title": "5" -}} - -// Click by role and title -{"tool": "macax_click", "args": { - "app_name": "Safari", - "role": "button", - "title": "Reload" -}} -``` - ---- - -### macax_set_value - -Set the value of a UI element (text fields, sliders, etc.). - -**Parameters**: -- `app_name` (string, required): Application name -- `identifier` (string, optional): Accessibility identifier -- `title` (string, optional): Element title -- `value` (string, required): Value to set - -**Example**: -```json -{"tool": "macax_set_value", "args": { - "app_name": "TextEdit", - "role": "textArea", - "value": "Hello, World!" -}} -``` - ---- - -### macax_get_value - -Get the current value of a UI element. - -**Parameters**: -- `app_name` (string, required): Application name -- `identifier` (string, optional): Accessibility identifier -- `title` (string, optional): Element title - -**Example**: -```json -{"tool": "macax_get_value", "args": { - "app_name": "Calculator", - "identifier": "display" -}} -``` - -**Returns**: -``` -Value: 42 -``` - ---- - -### macax_press_key - -Simulate a key press. - -**Parameters**: -- `key` (string, required): Key to press -- `modifiers` (array, optional): Modifier keys - -**Supported modifiers**: `command`, `shift`, `option`, `control` - -**Examples**: - -```json -// Simple key press -{"tool": "macax_press_key", "args": {"key": "a"}} - -// With modifiers (Cmd+S) -{"tool": "macax_press_key", "args": { - "key": "s", - "modifiers": ["command"] -}} - -// Multiple modifiers (Cmd+Shift+N) -{"tool": "macax_press_key", "args": { - "key": "n", - "modifiers": ["command", "shift"] -}} - -// Special keys -{"tool": "macax_press_key", "args": {"key": "return"}} -{"tool": "macax_press_key", "args": {"key": "escape"}} -{"tool": "macax_press_key", "args": {"key": "tab"}} -{"tool": "macax_press_key", "args": {"key": "delete"}} -``` - -**Special key names**: -- `return`, `enter` -- `escape`, `esc` -- `tab` -- `delete`, `backspace` -- `space` -- `up`, `down`, `left`, `right` -- `home`, `end`, `pageup`, `pagedown` -- `f1` through `f12` - -## Common Roles - -| Role | Description | -|------|-------------| -| `button` | Clickable button | -| `textField` | Single-line text input | -| `textArea` | Multi-line text input | -| `checkbox` | Checkbox control | -| `radioButton` | Radio button | -| `popUpButton` | Dropdown/popup menu | -| `slider` | Slider control | -| `table` | Table view | -| `list` | List view | -| `outline` | Outline/tree view | -| `group` | Container group | -| `window` | Application window | -| `sheet` | Modal sheet | -| `dialog` | Dialog window | -| `staticText` | Non-editable text | -| `image` | Image element | -| `scrollArea` | Scrollable container | -| `toolbar` | Toolbar | -| `menuBar` | Menu bar | -| `menu` | Menu | -| `menuItem` | Menu item | - -## Best Practices - -### 1. Use Accessibility Identifiers - -When building apps you'll automate with G3, add accessibility identifiers: - -**SwiftUI**: -```swift -Button("Submit") { ... } - .accessibilityIdentifier("submit_button") -``` - -**UIKit**: -```swift -button.accessibilityIdentifier = "submit_button" -``` - -**AppKit**: -```swift -button.setAccessibilityIdentifier("submit_button") -``` - -Identifiers are more reliable than titles (which may be localized). - -### 2. Inspect Before Automating - -Always inspect the UI tree first: - -```json -{"tool": "macax_get_ui_tree", "args": {"app_name": "MyApp", "max_depth": 4}} -``` - -This helps you understand: -- Element hierarchy -- Available identifiers -- Correct role names - -### 3. Activate App First - -Some actions require the app to be frontmost: - -```json -{"tool": "macax_activate_app", "args": {"app_name": "MyApp"}} -{"tool": "macax_click", "args": {"app_name": "MyApp", "identifier": "button1"}} -``` - -### 4. Handle Timing - -UI updates may take time. If an element isn't found: -1. Wait briefly -2. Retry the operation -3. Check if the app state changed - -### 5. Prefer Identifiers Over Titles - -```json -// Good: Uses identifier -{"tool": "macax_click", "args": {"app_name": "MyApp", "identifier": "save_btn"}} - -// Less reliable: Uses title (may be localized) -{"tool": "macax_click", "args": {"app_name": "MyApp", "title": "Save"}} -``` - -## Example: Automating Calculator - -```json -// 1. Activate Calculator -{"tool": "macax_activate_app", "args": {"app_name": "Calculator"}} - -// 2. Inspect UI -{"tool": "macax_get_ui_tree", "args": {"app_name": "Calculator", "max_depth": 3}} - -// 3. Click "5" -{"tool": "macax_click", "args": {"app_name": "Calculator", "title": "5"}} - -// 4. Click "+" -{"tool": "macax_click", "args": {"app_name": "Calculator", "title": "+"}} - -// 5. Click "3" -{"tool": "macax_click", "args": {"app_name": "Calculator", "title": "3"}} - -// 6. Click "=" -{"tool": "macax_click", "args": {"app_name": "Calculator", "title": "="}} - -// 7. Read result -{"tool": "macax_get_value", "args": {"app_name": "Calculator", "role": "staticText"}} -``` - -## Troubleshooting - -### "Accessibility permission denied" - -1. Check System Preferences → Security & Privacy → Accessibility -2. Ensure your terminal app is listed and checked -3. Restart the terminal after granting permission - -### "Application not found" - -1. Use exact app name (case-sensitive) -2. Run `macax_list_apps` to see available apps -3. App must be running - -### "Element not found" - -1. Inspect UI tree to verify element exists -2. Check identifier/title spelling -3. Element may be in a different window or sheet -4. App state may have changed - -### "Cannot perform action" - -1. Element may be disabled -2. App may need to be frontmost -3. Element may not support the action -4. Check element role supports the operation - -### Slow Performance - -1. Reduce `max_depth` in `macax_get_ui_tree` -2. Use specific identifiers instead of searching -3. Complex apps have large UI trees - -## Comparison with Other Tools - -| Feature | macax | Vision Tools | WebDriver | -|---------|-------|--------------|----------| -| Native apps | ✅ | ✅ (via OCR) | ❌ | -| Web browsers | ✅ | ✅ | ✅ | -| Electron apps | ✅ | ✅ | Partial | -| Reliability | High | Medium | High | -| Setup | Permissions | None | Driver | -| Speed | Fast | Slower | Medium | - -**Use macax when**: -- Automating native macOS apps -- You control the app and can add identifiers -- Need reliable, fast automation - -**Use Vision tools when**: -- App doesn't expose accessibility -- Need to find text visually -- Cross-platform approach needed - -**Use WebDriver when**: -- Automating web content -- Need JavaScript execution -- Testing web applications diff --git a/docs/tools.md b/docs/tools.md index 6700ca0..0a87b38 100644 --- a/docs/tools.md +++ b/docs/tools.md @@ -12,12 +12,10 @@ This document describes all tools available to the G3 agent. Tools are the prima | Category | Tools | Enabled By | |----------|-------|------------| | **Core** | shell, read_file, write_file, str_replace, final_output, background_process | Always | -| **Images** | read_image, take_screenshot, extract_text | Always | +| **Images** | read_image, take_screenshot | Always | | **Task Management** | todo_read, todo_write | Always | | **Code Intelligence** | code_search, code_coverage | Always | | **WebDriver** | webdriver_* (12 tools) | `--webdriver` or `--chrome-headless` | -| **Vision** | vision_find_text, vision_click_text, vision_click_near_text | Always (macOS) | -| **macOS Accessibility** | macax_* (9 tools) | `--macax` | | **Computer Control** | mouse_click, type_text, find_element, list_windows | `computer_control.enabled = true` | --- @@ -82,7 +80,6 @@ Read file contents with optional character range. ``` **Notes**: -- For image files (png, jpg, gif, etc.), automatically extracts text using OCR - Supports tilde expansion (`~`) - Reports file size and line count @@ -105,7 +102,6 @@ Read image files for visual analysis by the LLM. **Notes**: - Images are sent to the LLM for visual analysis - Use for inspecting sprites, UI screenshots, diagrams, etc. -- Different from `extract_text` which only does OCR --- @@ -197,23 +193,6 @@ Capture a screenshot of an application window. --- -### extract_text - -Extract text from an image using OCR. - -**Parameters**: -- `path` (string, optional): Path to image file - -**Example**: -```json -{"tool": "extract_text", "args": {"path": "screenshot.png"}} -``` - -**Notes**: -- Uses Tesseract OCR or Apple Vision framework -- For window-based OCR, use `vision_find_text` instead - ---- ## Task Management Tools @@ -386,98 +365,7 @@ Close browser and end session. --- -## Vision Tools (macOS) -Use Apple Vision framework for text recognition. - -### vision_find_text - -Find text in an application window. - -**Parameters**: -- `app_name` (string, required): Application name -- `text` (string, required): Text to search for - -**Returns**: Bounding box coordinates and confidence score - -### vision_click_text - -Find and click on text. - -**Parameters**: -- `app_name` (string, required): Application name -- `text` (string, required): Text to click - -### vision_click_near_text - -Click near a text label (useful for form fields). - -**Parameters**: -- `app_name` (string, required): Application name -- `text` (string, required): Label text to find -- `direction` (string, optional): "right", "below", "left", "above" (default: "right") -- `distance` (integer, optional): Pixels from text (default: 50) - ---- - -## macOS Accessibility Tools - -Enabled with `--macax`. See [macOS Accessibility Tools Guide](macax-tools.md). - -### macax_list_apps - -List running applications. - -### macax_get_frontmost_app - -Get the frontmost application. - -### macax_activate_app - -Bring an application to front. - -**Parameters**: -- `app_name` (string, required): Application name - -### macax_get_ui_tree - -Get UI element hierarchy. - -**Parameters**: -- `app_name` (string, required): Application name -- `max_depth` (integer, optional): Tree depth limit - -### macax_find_elements - -Find UI elements by criteria. - -**Parameters**: -- `app_name` (string, required): Application name -- `role` (string, optional): Element role (button, textField, etc.) -- `title` (string, optional): Element title -- `identifier` (string, optional): Accessibility identifier - -### macax_click - -Click a UI element. - -**Parameters**: -- `app_name` (string, required): Application name -- `identifier` or `title` or `role`: Element selector - -### macax_set_value / macax_get_value - -Set or get element value. - -### macax_press_key - -Simulate key press. - -**Parameters**: -- `key` (string, required): Key to press -- `modifiers` (array, optional): ["command", "shift", "option", "control"] - ---- ## Computer Control Tools