Remove vision tools (except take_screenshot) and macax tools
Vision tools removed: - extract_text (OCR from image files) - extract_text_with_boxes (OCR with bounding boxes) - vision_find_text (find text in app windows) - vision_click_text (find and click on text) - vision_click_near_text (click near text labels) macax tools removed: - macax_list_apps - macax_get_frontmost_app - macax_activate_app - macax_press_key - macax_type_text The LLM can now read images directly via read_image tool. take_screenshot is retained for capturing application windows. Files deleted: - crates/g3-core/src/tools/vision.rs - crates/g3-core/src/tools/macax.rs - docs/macax-tools.md Updated tool counts: 12 core + 15 webdriver = 27 total
This commit is contained in:
@@ -191,7 +191,6 @@ Key modules:
|
||||
- `platform/` - Platform-specific implementations (macOS, Linux, Windows)
|
||||
- `webdriver/` - Safari and Chrome WebDriver integration
|
||||
- `ocr/` - Text extraction (Tesseract, Apple Vision)
|
||||
- `macax/` - macOS Accessibility API controller
|
||||
|
||||
**Platform support**:
|
||||
- **macOS**: Core Graphics, Cocoa, screencapture, Vision framework
|
||||
|
||||
@@ -27,7 +27,6 @@ G3 uses TOML format. The configuration is organized into sections:
|
||||
[agent] # Agent behavior settings
|
||||
[computer_control] # Mouse/keyboard automation
|
||||
[webdriver] # Browser automation
|
||||
[macax] # macOS Accessibility API
|
||||
```
|
||||
|
||||
## Provider Configuration
|
||||
@@ -236,13 +235,11 @@ apt install chromium-chromedriver
|
||||
## macOS Accessibility API Configuration
|
||||
|
||||
```toml
|
||||
[macax]
|
||||
enabled = false # Set to true to enable
|
||||
```
|
||||
|
||||
**Required permissions**: System Preferences → Security & Privacy → Privacy → Accessibility → Add your terminal app
|
||||
|
||||
See [macOS Accessibility Tools Guide](macax-tools.md) for detailed usage.
|
||||
|
||||
## Multi-Role Configuration
|
||||
|
||||
@@ -295,7 +292,6 @@ g3 --model claude-opus-4-5
|
||||
# Enable features
|
||||
g3 --webdriver # Enable WebDriver (Safari)
|
||||
g3 --chrome-headless # Enable WebDriver (Chrome headless)
|
||||
g3 --macax # Enable macOS Accessibility API
|
||||
|
||||
# Specify config file
|
||||
g3 --config /path/to/config.toml
|
||||
@@ -340,7 +336,6 @@ enabled = true
|
||||
browser = "safari"
|
||||
safari_port = 4444
|
||||
|
||||
[macax]
|
||||
enabled = false
|
||||
```
|
||||
|
||||
|
||||
@@ -1,472 +0,0 @@
|
||||
# macOS Accessibility Tools Guide
|
||||
|
||||
**Last updated**: January 2025
|
||||
**Source of truth**: `crates/g3-computer-control/src/macax/`
|
||||
|
||||
## Purpose
|
||||
|
||||
G3 includes tools for controlling macOS applications via the Accessibility API. This enables automation of native macOS apps, including those you're building with G3.
|
||||
|
||||
## Overview
|
||||
|
||||
The macOS Accessibility API provides programmatic access to UI elements in any application. G3 exposes this through the `macax_*` tools, allowing you to:
|
||||
|
||||
- List and activate applications
|
||||
- Inspect UI element hierarchies
|
||||
- Find elements by role, title, or identifier
|
||||
- Click buttons and interact with controls
|
||||
- Read and set values in text fields
|
||||
- Simulate keyboard input
|
||||
|
||||
## Setup
|
||||
|
||||
### 1. Enable in Configuration
|
||||
|
||||
```toml
|
||||
# ~/.config/g3/config.toml
|
||||
[macax]
|
||||
enabled = true
|
||||
```
|
||||
|
||||
Or use the CLI flag:
|
||||
|
||||
```bash
|
||||
g3 --macax
|
||||
```
|
||||
|
||||
### 2. Grant Accessibility Permissions
|
||||
|
||||
1. Open **System Preferences** → **Security & Privacy** → **Privacy**
|
||||
2. Select **Accessibility** in the left sidebar
|
||||
3. Click the lock icon and authenticate
|
||||
4. Add your terminal application (Terminal, iTerm2, etc.)
|
||||
5. Restart your terminal
|
||||
|
||||
**Note**: If using VS Code's integrated terminal, add VS Code to the list.
|
||||
|
||||
### 3. Verify Setup
|
||||
|
||||
```json
|
||||
{"tool": "macax_list_apps", "args": {}}
|
||||
```
|
||||
|
||||
This should return a list of running applications.
|
||||
|
||||
## Available Tools
|
||||
|
||||
### macax_list_apps
|
||||
|
||||
List all running applications.
|
||||
|
||||
**Parameters**: None
|
||||
|
||||
**Example**:
|
||||
```json
|
||||
{"tool": "macax_list_apps", "args": {}}
|
||||
```
|
||||
|
||||
**Returns**:
|
||||
```
|
||||
Running Applications:
|
||||
- Safari (com.apple.Safari)
|
||||
- Finder (com.apple.finder)
|
||||
- Terminal (com.apple.Terminal)
|
||||
- MyApp (com.example.myapp)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### macax_get_frontmost_app
|
||||
|
||||
Get the currently active (frontmost) application.
|
||||
|
||||
**Parameters**: None
|
||||
|
||||
**Example**:
|
||||
```json
|
||||
{"tool": "macax_get_frontmost_app", "args": {}}
|
||||
```
|
||||
|
||||
**Returns**:
|
||||
```
|
||||
Frontmost Application: Safari (com.apple.Safari)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### macax_activate_app
|
||||
|
||||
Bring an application to the front.
|
||||
|
||||
**Parameters**:
|
||||
- `app_name` (string, required): Application name
|
||||
|
||||
**Example**:
|
||||
```json
|
||||
{"tool": "macax_activate_app", "args": {"app_name": "Safari"}}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### macax_get_ui_tree
|
||||
|
||||
Get the UI element hierarchy of an application.
|
||||
|
||||
**Parameters**:
|
||||
- `app_name` (string, required): Application name
|
||||
- `max_depth` (integer, optional): Maximum tree depth (default: 5)
|
||||
|
||||
**Example**:
|
||||
```json
|
||||
{"tool": "macax_get_ui_tree", "args": {"app_name": "Calculator", "max_depth": 3}}
|
||||
```
|
||||
|
||||
**Returns**:
|
||||
```
|
||||
UI Tree for Calculator:
|
||||
└── AXApplication "Calculator"
|
||||
└── AXWindow "Calculator"
|
||||
├── AXGroup
|
||||
│ ├── AXButton "1" [id: digit_1]
|
||||
│ ├── AXButton "2" [id: digit_2]
|
||||
│ ├── AXButton "+" [id: add]
|
||||
│ └── AXButton "=" [id: equals]
|
||||
└── AXStaticText "0" [id: display]
|
||||
```
|
||||
|
||||
**Notes**:
|
||||
- Use lower `max_depth` for complex apps to avoid overwhelming output
|
||||
- Elements show role, title, and accessibility identifier (if set)
|
||||
|
||||
---
|
||||
|
||||
### macax_find_elements
|
||||
|
||||
Find UI elements matching criteria.
|
||||
|
||||
**Parameters**:
|
||||
- `app_name` (string, required): Application name
|
||||
- `role` (string, optional): Element role (e.g., "button", "textField")
|
||||
- `title` (string, optional): Element title/label
|
||||
- `identifier` (string, optional): Accessibility identifier
|
||||
|
||||
**Example**:
|
||||
```json
|
||||
{"tool": "macax_find_elements", "args": {
|
||||
"app_name": "Safari",
|
||||
"role": "button"
|
||||
}}
|
||||
```
|
||||
|
||||
**Returns**:
|
||||
```
|
||||
Found 5 elements:
|
||||
1. AXButton "Back" [id: BackButton]
|
||||
2. AXButton "Forward" [id: ForwardButton]
|
||||
3. AXButton "Reload" [id: ReloadButton]
|
||||
4. AXButton "Share" [id: ShareButton]
|
||||
5. AXButton "New Tab" [id: NewTabButton]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### macax_click
|
||||
|
||||
Click a UI element.
|
||||
|
||||
**Parameters**:
|
||||
- `app_name` (string, required): Application name
|
||||
- `identifier` (string, optional): Accessibility identifier
|
||||
- `title` (string, optional): Element title
|
||||
- `role` (string, optional): Element role
|
||||
|
||||
At least one of `identifier`, `title`, or `role` must be provided.
|
||||
|
||||
**Examples**:
|
||||
|
||||
```json
|
||||
// Click by identifier (most reliable)
|
||||
{"tool": "macax_click", "args": {
|
||||
"app_name": "Calculator",
|
||||
"identifier": "digit_5"
|
||||
}}
|
||||
|
||||
// Click by title
|
||||
{"tool": "macax_click", "args": {
|
||||
"app_name": "Calculator",
|
||||
"title": "5"
|
||||
}}
|
||||
|
||||
// Click by role and title
|
||||
{"tool": "macax_click", "args": {
|
||||
"app_name": "Safari",
|
||||
"role": "button",
|
||||
"title": "Reload"
|
||||
}}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### macax_set_value
|
||||
|
||||
Set the value of a UI element (text fields, sliders, etc.).
|
||||
|
||||
**Parameters**:
|
||||
- `app_name` (string, required): Application name
|
||||
- `identifier` (string, optional): Accessibility identifier
|
||||
- `title` (string, optional): Element title
|
||||
- `value` (string, required): Value to set
|
||||
|
||||
**Example**:
|
||||
```json
|
||||
{"tool": "macax_set_value", "args": {
|
||||
"app_name": "TextEdit",
|
||||
"role": "textArea",
|
||||
"value": "Hello, World!"
|
||||
}}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### macax_get_value
|
||||
|
||||
Get the current value of a UI element.
|
||||
|
||||
**Parameters**:
|
||||
- `app_name` (string, required): Application name
|
||||
- `identifier` (string, optional): Accessibility identifier
|
||||
- `title` (string, optional): Element title
|
||||
|
||||
**Example**:
|
||||
```json
|
||||
{"tool": "macax_get_value", "args": {
|
||||
"app_name": "Calculator",
|
||||
"identifier": "display"
|
||||
}}
|
||||
```
|
||||
|
||||
**Returns**:
|
||||
```
|
||||
Value: 42
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### macax_press_key
|
||||
|
||||
Simulate a key press.
|
||||
|
||||
**Parameters**:
|
||||
- `key` (string, required): Key to press
|
||||
- `modifiers` (array, optional): Modifier keys
|
||||
|
||||
**Supported modifiers**: `command`, `shift`, `option`, `control`
|
||||
|
||||
**Examples**:
|
||||
|
||||
```json
|
||||
// Simple key press
|
||||
{"tool": "macax_press_key", "args": {"key": "a"}}
|
||||
|
||||
// With modifiers (Cmd+S)
|
||||
{"tool": "macax_press_key", "args": {
|
||||
"key": "s",
|
||||
"modifiers": ["command"]
|
||||
}}
|
||||
|
||||
// Multiple modifiers (Cmd+Shift+N)
|
||||
{"tool": "macax_press_key", "args": {
|
||||
"key": "n",
|
||||
"modifiers": ["command", "shift"]
|
||||
}}
|
||||
|
||||
// Special keys
|
||||
{"tool": "macax_press_key", "args": {"key": "return"}}
|
||||
{"tool": "macax_press_key", "args": {"key": "escape"}}
|
||||
{"tool": "macax_press_key", "args": {"key": "tab"}}
|
||||
{"tool": "macax_press_key", "args": {"key": "delete"}}
|
||||
```
|
||||
|
||||
**Special key names**:
|
||||
- `return`, `enter`
|
||||
- `escape`, `esc`
|
||||
- `tab`
|
||||
- `delete`, `backspace`
|
||||
- `space`
|
||||
- `up`, `down`, `left`, `right`
|
||||
- `home`, `end`, `pageup`, `pagedown`
|
||||
- `f1` through `f12`
|
||||
|
||||
## Common Roles
|
||||
|
||||
| Role | Description |
|
||||
|------|-------------|
|
||||
| `button` | Clickable button |
|
||||
| `textField` | Single-line text input |
|
||||
| `textArea` | Multi-line text input |
|
||||
| `checkbox` | Checkbox control |
|
||||
| `radioButton` | Radio button |
|
||||
| `popUpButton` | Dropdown/popup menu |
|
||||
| `slider` | Slider control |
|
||||
| `table` | Table view |
|
||||
| `list` | List view |
|
||||
| `outline` | Outline/tree view |
|
||||
| `group` | Container group |
|
||||
| `window` | Application window |
|
||||
| `sheet` | Modal sheet |
|
||||
| `dialog` | Dialog window |
|
||||
| `staticText` | Non-editable text |
|
||||
| `image` | Image element |
|
||||
| `scrollArea` | Scrollable container |
|
||||
| `toolbar` | Toolbar |
|
||||
| `menuBar` | Menu bar |
|
||||
| `menu` | Menu |
|
||||
| `menuItem` | Menu item |
|
||||
|
||||
## Best Practices
|
||||
|
||||
### 1. Use Accessibility Identifiers
|
||||
|
||||
When building apps you'll automate with G3, add accessibility identifiers:
|
||||
|
||||
**SwiftUI**:
|
||||
```swift
|
||||
Button("Submit") { ... }
|
||||
.accessibilityIdentifier("submit_button")
|
||||
```
|
||||
|
||||
**UIKit**:
|
||||
```swift
|
||||
button.accessibilityIdentifier = "submit_button"
|
||||
```
|
||||
|
||||
**AppKit**:
|
||||
```swift
|
||||
button.setAccessibilityIdentifier("submit_button")
|
||||
```
|
||||
|
||||
Identifiers are more reliable than titles (which may be localized).
|
||||
|
||||
### 2. Inspect Before Automating
|
||||
|
||||
Always inspect the UI tree first:
|
||||
|
||||
```json
|
||||
{"tool": "macax_get_ui_tree", "args": {"app_name": "MyApp", "max_depth": 4}}
|
||||
```
|
||||
|
||||
This helps you understand:
|
||||
- Element hierarchy
|
||||
- Available identifiers
|
||||
- Correct role names
|
||||
|
||||
### 3. Activate App First
|
||||
|
||||
Some actions require the app to be frontmost:
|
||||
|
||||
```json
|
||||
{"tool": "macax_activate_app", "args": {"app_name": "MyApp"}}
|
||||
{"tool": "macax_click", "args": {"app_name": "MyApp", "identifier": "button1"}}
|
||||
```
|
||||
|
||||
### 4. Handle Timing
|
||||
|
||||
UI updates may take time. If an element isn't found:
|
||||
1. Wait briefly
|
||||
2. Retry the operation
|
||||
3. Check if the app state changed
|
||||
|
||||
### 5. Prefer Identifiers Over Titles
|
||||
|
||||
```json
|
||||
// Good: Uses identifier
|
||||
{"tool": "macax_click", "args": {"app_name": "MyApp", "identifier": "save_btn"}}
|
||||
|
||||
// Less reliable: Uses title (may be localized)
|
||||
{"tool": "macax_click", "args": {"app_name": "MyApp", "title": "Save"}}
|
||||
```
|
||||
|
||||
## Example: Automating Calculator
|
||||
|
||||
```json
|
||||
// 1. Activate Calculator
|
||||
{"tool": "macax_activate_app", "args": {"app_name": "Calculator"}}
|
||||
|
||||
// 2. Inspect UI
|
||||
{"tool": "macax_get_ui_tree", "args": {"app_name": "Calculator", "max_depth": 3}}
|
||||
|
||||
// 3. Click "5"
|
||||
{"tool": "macax_click", "args": {"app_name": "Calculator", "title": "5"}}
|
||||
|
||||
// 4. Click "+"
|
||||
{"tool": "macax_click", "args": {"app_name": "Calculator", "title": "+"}}
|
||||
|
||||
// 5. Click "3"
|
||||
{"tool": "macax_click", "args": {"app_name": "Calculator", "title": "3"}}
|
||||
|
||||
// 6. Click "="
|
||||
{"tool": "macax_click", "args": {"app_name": "Calculator", "title": "="}}
|
||||
|
||||
// 7. Read result
|
||||
{"tool": "macax_get_value", "args": {"app_name": "Calculator", "role": "staticText"}}
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### "Accessibility permission denied"
|
||||
|
||||
1. Check System Preferences → Security & Privacy → Accessibility
|
||||
2. Ensure your terminal app is listed and checked
|
||||
3. Restart the terminal after granting permission
|
||||
|
||||
### "Application not found"
|
||||
|
||||
1. Use exact app name (case-sensitive)
|
||||
2. Run `macax_list_apps` to see available apps
|
||||
3. App must be running
|
||||
|
||||
### "Element not found"
|
||||
|
||||
1. Inspect UI tree to verify element exists
|
||||
2. Check identifier/title spelling
|
||||
3. Element may be in a different window or sheet
|
||||
4. App state may have changed
|
||||
|
||||
### "Cannot perform action"
|
||||
|
||||
1. Element may be disabled
|
||||
2. App may need to be frontmost
|
||||
3. Element may not support the action
|
||||
4. Check element role supports the operation
|
||||
|
||||
### Slow Performance
|
||||
|
||||
1. Reduce `max_depth` in `macax_get_ui_tree`
|
||||
2. Use specific identifiers instead of searching
|
||||
3. Complex apps have large UI trees
|
||||
|
||||
## Comparison with Other Tools
|
||||
|
||||
| Feature | macax | Vision Tools | WebDriver |
|
||||
|---------|-------|--------------|----------|
|
||||
| Native apps | ✅ | ✅ (via OCR) | ❌ |
|
||||
| Web browsers | ✅ | ✅ | ✅ |
|
||||
| Electron apps | ✅ | ✅ | Partial |
|
||||
| Reliability | High | Medium | High |
|
||||
| Setup | Permissions | None | Driver |
|
||||
| Speed | Fast | Slower | Medium |
|
||||
|
||||
**Use macax when**:
|
||||
- Automating native macOS apps
|
||||
- You control the app and can add identifiers
|
||||
- Need reliable, fast automation
|
||||
|
||||
**Use Vision tools when**:
|
||||
- App doesn't expose accessibility
|
||||
- Need to find text visually
|
||||
- Cross-platform approach needed
|
||||
|
||||
**Use WebDriver when**:
|
||||
- Automating web content
|
||||
- Need JavaScript execution
|
||||
- Testing web applications
|
||||
114
docs/tools.md
114
docs/tools.md
@@ -12,12 +12,10 @@ This document describes all tools available to the G3 agent. Tools are the prima
|
||||
| Category | Tools | Enabled By |
|
||||
|----------|-------|------------|
|
||||
| **Core** | shell, read_file, write_file, str_replace, final_output, background_process | Always |
|
||||
| **Images** | read_image, take_screenshot, extract_text | Always |
|
||||
| **Images** | read_image, take_screenshot | Always |
|
||||
| **Task Management** | todo_read, todo_write | Always |
|
||||
| **Code Intelligence** | code_search, code_coverage | Always |
|
||||
| **WebDriver** | webdriver_* (12 tools) | `--webdriver` or `--chrome-headless` |
|
||||
| **Vision** | vision_find_text, vision_click_text, vision_click_near_text | Always (macOS) |
|
||||
| **macOS Accessibility** | macax_* (9 tools) | `--macax` |
|
||||
| **Computer Control** | mouse_click, type_text, find_element, list_windows | `computer_control.enabled = true` |
|
||||
|
||||
---
|
||||
@@ -82,7 +80,6 @@ Read file contents with optional character range.
|
||||
```
|
||||
|
||||
**Notes**:
|
||||
- For image files (png, jpg, gif, etc.), automatically extracts text using OCR
|
||||
- Supports tilde expansion (`~`)
|
||||
- Reports file size and line count
|
||||
|
||||
@@ -105,7 +102,6 @@ Read image files for visual analysis by the LLM.
|
||||
**Notes**:
|
||||
- Images are sent to the LLM for visual analysis
|
||||
- Use for inspecting sprites, UI screenshots, diagrams, etc.
|
||||
- Different from `extract_text` which only does OCR
|
||||
|
||||
---
|
||||
|
||||
@@ -197,23 +193,6 @@ Capture a screenshot of an application window.
|
||||
|
||||
---
|
||||
|
||||
### extract_text
|
||||
|
||||
Extract text from an image using OCR.
|
||||
|
||||
**Parameters**:
|
||||
- `path` (string, optional): Path to image file
|
||||
|
||||
**Example**:
|
||||
```json
|
||||
{"tool": "extract_text", "args": {"path": "screenshot.png"}}
|
||||
```
|
||||
|
||||
**Notes**:
|
||||
- Uses Tesseract OCR or Apple Vision framework
|
||||
- For window-based OCR, use `vision_find_text` instead
|
||||
|
||||
---
|
||||
|
||||
## Task Management Tools
|
||||
|
||||
@@ -386,98 +365,7 @@ Close browser and end session.
|
||||
|
||||
---
|
||||
|
||||
## Vision Tools (macOS)
|
||||
|
||||
Use Apple Vision framework for text recognition.
|
||||
|
||||
### vision_find_text
|
||||
|
||||
Find text in an application window.
|
||||
|
||||
**Parameters**:
|
||||
- `app_name` (string, required): Application name
|
||||
- `text` (string, required): Text to search for
|
||||
|
||||
**Returns**: Bounding box coordinates and confidence score
|
||||
|
||||
### vision_click_text
|
||||
|
||||
Find and click on text.
|
||||
|
||||
**Parameters**:
|
||||
- `app_name` (string, required): Application name
|
||||
- `text` (string, required): Text to click
|
||||
|
||||
### vision_click_near_text
|
||||
|
||||
Click near a text label (useful for form fields).
|
||||
|
||||
**Parameters**:
|
||||
- `app_name` (string, required): Application name
|
||||
- `text` (string, required): Label text to find
|
||||
- `direction` (string, optional): "right", "below", "left", "above" (default: "right")
|
||||
- `distance` (integer, optional): Pixels from text (default: 50)
|
||||
|
||||
---
|
||||
|
||||
## macOS Accessibility Tools
|
||||
|
||||
Enabled with `--macax`. See [macOS Accessibility Tools Guide](macax-tools.md).
|
||||
|
||||
### macax_list_apps
|
||||
|
||||
List running applications.
|
||||
|
||||
### macax_get_frontmost_app
|
||||
|
||||
Get the frontmost application.
|
||||
|
||||
### macax_activate_app
|
||||
|
||||
Bring an application to front.
|
||||
|
||||
**Parameters**:
|
||||
- `app_name` (string, required): Application name
|
||||
|
||||
### macax_get_ui_tree
|
||||
|
||||
Get UI element hierarchy.
|
||||
|
||||
**Parameters**:
|
||||
- `app_name` (string, required): Application name
|
||||
- `max_depth` (integer, optional): Tree depth limit
|
||||
|
||||
### macax_find_elements
|
||||
|
||||
Find UI elements by criteria.
|
||||
|
||||
**Parameters**:
|
||||
- `app_name` (string, required): Application name
|
||||
- `role` (string, optional): Element role (button, textField, etc.)
|
||||
- `title` (string, optional): Element title
|
||||
- `identifier` (string, optional): Accessibility identifier
|
||||
|
||||
### macax_click
|
||||
|
||||
Click a UI element.
|
||||
|
||||
**Parameters**:
|
||||
- `app_name` (string, required): Application name
|
||||
- `identifier` or `title` or `role`: Element selector
|
||||
|
||||
### macax_set_value / macax_get_value
|
||||
|
||||
Set or get element value.
|
||||
|
||||
### macax_press_key
|
||||
|
||||
Simulate key press.
|
||||
|
||||
**Parameters**:
|
||||
- `key` (string, required): Key to press
|
||||
- `modifiers` (array, optional): ["command", "shift", "option", "control"]
|
||||
|
||||
---
|
||||
|
||||
## Computer Control Tools
|
||||
|
||||
|
||||
Reference in New Issue
Block a user