Remove vision tools (except take_screenshot) and macax tools

Vision tools removed:
- extract_text (OCR from image files)
- extract_text_with_boxes (OCR with bounding boxes)
- vision_find_text (find text in app windows)
- vision_click_text (find and click on text)
- vision_click_near_text (click near text labels)

macax tools removed:
- macax_list_apps
- macax_get_frontmost_app
- macax_activate_app
- macax_press_key
- macax_type_text

The LLM can now read images directly via read_image tool.
take_screenshot is retained for capturing application windows.

Files deleted:
- crates/g3-core/src/tools/vision.rs
- crates/g3-core/src/tools/macax.rs
- docs/macax-tools.md

Updated tool counts: 12 core + 15 webdriver = 27 total
This commit is contained in:
Dhanji R. Prasanna
2026-01-03 17:38:25 +11:00
parent 29e263ac49
commit 386176899e
19 changed files with 15 additions and 1408 deletions

View File

@@ -27,7 +27,6 @@ G3 uses TOML format. The configuration is organized into sections:
[agent] # Agent behavior settings
[computer_control] # Mouse/keyboard automation
[webdriver] # Browser automation
[macax] # macOS Accessibility API
```
## Provider Configuration
@@ -236,13 +235,11 @@ apt install chromium-chromedriver
## macOS Accessibility API Configuration
```toml
[macax]
enabled = false # Set to true to enable
```
**Required permissions**: System Preferences → Security & Privacy → Privacy → Accessibility → Add your terminal app
See [macOS Accessibility Tools Guide](macax-tools.md) for detailed usage.
## Multi-Role Configuration
@@ -295,7 +292,6 @@ g3 --model claude-opus-4-5
# Enable features
g3 --webdriver # Enable WebDriver (Safari)
g3 --chrome-headless # Enable WebDriver (Chrome headless)
g3 --macax # Enable macOS Accessibility API
# Specify config file
g3 --config /path/to/config.toml
@@ -340,7 +336,6 @@ enabled = true
browser = "safari"
safari_port = 4444
[macax]
enabled = false
```