lamport run
This commit is contained in:
538
docs/tools.md
Normal file
538
docs/tools.md
Normal file
@@ -0,0 +1,538 @@
|
||||
# G3 Tools Reference
|
||||
|
||||
**Last updated**: January 2025
|
||||
**Source of truth**: `crates/g3-core/src/tool_definitions.rs`, `crates/g3-core/src/tools/`
|
||||
|
||||
## Purpose
|
||||
|
||||
This document describes all tools available to the G3 agent. Tools are the primary mechanism by which G3 interacts with the filesystem, executes commands, and automates tasks.
|
||||
|
||||
## Tool Categories
|
||||
|
||||
| Category | Tools | Enabled By |
|
||||
|----------|-------|------------|
|
||||
| **Core** | shell, read_file, write_file, str_replace, final_output, background_process | Always |
|
||||
| **Images** | read_image, take_screenshot, extract_text | Always |
|
||||
| **Task Management** | todo_read, todo_write | Always |
|
||||
| **Code Intelligence** | code_search, code_coverage | Always |
|
||||
| **WebDriver** | webdriver_* (12 tools) | `--webdriver` or `--chrome-headless` |
|
||||
| **Vision** | vision_find_text, vision_click_text, vision_click_near_text | Always (macOS) |
|
||||
| **macOS Accessibility** | macax_* (9 tools) | `--macax` |
|
||||
| **Computer Control** | mouse_click, type_text, find_element, list_windows | `computer_control.enabled = true` |
|
||||
|
||||
---
|
||||
|
||||
## Core Tools
|
||||
|
||||
### shell
|
||||
|
||||
Execute shell commands.
|
||||
|
||||
**Parameters**:
|
||||
- `command` (string, required): The shell command to execute
|
||||
|
||||
**Example**:
|
||||
```json
|
||||
{"tool": "shell", "args": {"command": "ls -la"}}
|
||||
```
|
||||
|
||||
**Notes**:
|
||||
- Commands run in the current working directory
|
||||
- Output is streamed in real-time
|
||||
- Both stdout and stderr are captured
|
||||
- Exit code is reported
|
||||
|
||||
---
|
||||
|
||||
### background_process
|
||||
|
||||
Launch a long-running process in the background.
|
||||
|
||||
**Parameters**:
|
||||
- `name` (string, required): Unique name for the process (e.g., "game_server")
|
||||
- `command` (string, required): Shell command to execute
|
||||
- `working_dir` (string, optional): Working directory
|
||||
|
||||
**Example**:
|
||||
```json
|
||||
{"tool": "background_process", "args": {"name": "dev_server", "command": "npm run dev"}}
|
||||
```
|
||||
|
||||
**Returns**: PID and log file path
|
||||
|
||||
**Notes**:
|
||||
- Process runs independently of the agent
|
||||
- Logs are captured to a file
|
||||
- Use `shell` to read logs (`tail`), check status (`ps`), or stop (`kill`)
|
||||
|
||||
---
|
||||
|
||||
### read_file
|
||||
|
||||
Read file contents with optional character range.
|
||||
|
||||
**Parameters**:
|
||||
- `file_path` (string, required): Path to the file
|
||||
- `start` (integer, optional): Starting character position (0-indexed, inclusive)
|
||||
- `end` (integer, optional): Ending character position (0-indexed, exclusive)
|
||||
|
||||
**Example**:
|
||||
```json
|
||||
{"tool": "read_file", "args": {"file_path": "src/main.rs", "start": 0, "end": 1000}}
|
||||
```
|
||||
|
||||
**Notes**:
|
||||
- For image files (png, jpg, gif, etc.), automatically extracts text using OCR
|
||||
- Supports tilde expansion (`~`)
|
||||
- Reports file size and line count
|
||||
|
||||
---
|
||||
|
||||
### read_image
|
||||
|
||||
Read image files for visual analysis by the LLM.
|
||||
|
||||
**Parameters**:
|
||||
- `file_paths` (array of strings, required): Paths to image files
|
||||
|
||||
**Example**:
|
||||
```json
|
||||
{"tool": "read_image", "args": {"file_paths": ["screenshot.png", "diagram.jpg"]}}
|
||||
```
|
||||
|
||||
**Supported formats**: PNG, JPEG, GIF, WebP
|
||||
|
||||
**Notes**:
|
||||
- Images are sent to the LLM for visual analysis
|
||||
- Use for inspecting sprites, UI screenshots, diagrams, etc.
|
||||
- Different from `extract_text` which only does OCR
|
||||
|
||||
---
|
||||
|
||||
### write_file
|
||||
|
||||
Create or overwrite a file.
|
||||
|
||||
**Parameters**:
|
||||
- `file_path` (string, required): Path to the file
|
||||
- `content` (string, required): Content to write
|
||||
|
||||
**Example**:
|
||||
```json
|
||||
{"tool": "write_file", "args": {"file_path": "hello.txt", "content": "Hello, world!"}}
|
||||
```
|
||||
|
||||
**Notes**:
|
||||
- Creates parent directories if needed
|
||||
- Overwrites existing files
|
||||
- Reports bytes written
|
||||
|
||||
---
|
||||
|
||||
### str_replace
|
||||
|
||||
Apply a unified diff to a file.
|
||||
|
||||
**Parameters**:
|
||||
- `file_path` (string, required): Path to the file
|
||||
- `diff` (string, required): Unified diff with context lines
|
||||
- `start` (integer, optional): Starting character position to constrain search
|
||||
- `end` (integer, optional): Ending character position to constrain search
|
||||
|
||||
**Example**:
|
||||
```json
|
||||
{"tool": "str_replace", "args": {
|
||||
"file_path": "src/main.rs",
|
||||
"diff": "@@ -10,3 +10,4 @@\n fn main() {\n println!(\"Hello\");\n+ println!(\"World\");\n }"
|
||||
}}
|
||||
```
|
||||
|
||||
**Notes**:
|
||||
- Supports multiple hunks
|
||||
- Context lines help locate the correct position
|
||||
- Use `start`/`end` to disambiguate when multiple matches exist
|
||||
- `---/+++` headers are optional for minimal diffs
|
||||
|
||||
---
|
||||
|
||||
### final_output
|
||||
|
||||
Signal task completion with a summary.
|
||||
|
||||
**Parameters**:
|
||||
- `summary` (string, required): Markdown summary of what was accomplished
|
||||
|
||||
**Example**:
|
||||
```json
|
||||
{"tool": "final_output", "args": {"summary": "## Completed\n\n- Created user authentication module\n- Added unit tests\n- Updated documentation"}}
|
||||
```
|
||||
|
||||
**Notes**:
|
||||
- Ends the current task
|
||||
- Summary is displayed to the user
|
||||
- In autonomous mode, triggers coach review
|
||||
|
||||
---
|
||||
|
||||
## Image & Screenshot Tools
|
||||
|
||||
### take_screenshot
|
||||
|
||||
Capture a screenshot of an application window.
|
||||
|
||||
**Parameters**:
|
||||
- `path` (string, required): Filename for the screenshot
|
||||
- `window_id` (string, required): Application name (e.g., "Safari", "Terminal")
|
||||
- `region` (object, optional): `{x, y, width, height}` to capture a region
|
||||
|
||||
**Example**:
|
||||
```json
|
||||
{"tool": "take_screenshot", "args": {"path": "safari.png", "window_id": "Safari"}}
|
||||
```
|
||||
|
||||
**Notes**:
|
||||
- Use `list_windows` first to identify available windows
|
||||
- Relative paths save to `~/tmp` or `$TMPDIR`
|
||||
- Uses native screencapture on macOS
|
||||
|
||||
---
|
||||
|
||||
### extract_text
|
||||
|
||||
Extract text from an image using OCR.
|
||||
|
||||
**Parameters**:
|
||||
- `path` (string, optional): Path to image file
|
||||
|
||||
**Example**:
|
||||
```json
|
||||
{"tool": "extract_text", "args": {"path": "screenshot.png"}}
|
||||
```
|
||||
|
||||
**Notes**:
|
||||
- Uses Tesseract OCR or Apple Vision framework
|
||||
- For window-based OCR, use `vision_find_text` instead
|
||||
|
||||
---
|
||||
|
||||
## Task Management Tools
|
||||
|
||||
### todo_read
|
||||
|
||||
Read the current TODO list.
|
||||
|
||||
**Parameters**: None
|
||||
|
||||
**Example**:
|
||||
```json
|
||||
{"tool": "todo_read", "args": {}}
|
||||
```
|
||||
|
||||
**Notes**:
|
||||
- TODO lists are session-scoped
|
||||
- Stored in `.g3/sessions/<session_id>/todo.g3.md`
|
||||
- Call at start of multi-step tasks to check for existing plans
|
||||
|
||||
---
|
||||
|
||||
### todo_write
|
||||
|
||||
Create or update the TODO list.
|
||||
|
||||
**Parameters**:
|
||||
- `content` (string, required): TODO list content in markdown checkbox format
|
||||
|
||||
**Example**:
|
||||
```json
|
||||
{"tool": "todo_write", "args": {"content": "- [ ] Implement feature\n - [ ] Write tests\n - [ ] Update docs\n- [x] Setup project"}}
|
||||
```
|
||||
|
||||
**Notes**:
|
||||
- Replaces entire file content
|
||||
- Always call `todo_read` first to preserve existing content
|
||||
- Use `- [ ]` for incomplete, `- [x]` for complete
|
||||
- Supports nested tasks with indentation
|
||||
|
||||
---
|
||||
|
||||
## Code Intelligence Tools
|
||||
|
||||
### code_search
|
||||
|
||||
Syntax-aware code search using tree-sitter.
|
||||
|
||||
**Parameters**:
|
||||
- `searches` (array, required): Array of search objects:
|
||||
- `name` (string): Label for this search
|
||||
- `query` (string): Tree-sitter query in S-expression format
|
||||
- `language` (string): Programming language
|
||||
- `paths` (array, optional): Paths to search
|
||||
- `context_lines` (integer, optional): Lines of context (0-20)
|
||||
- `max_concurrency` (integer, optional): Parallel searches (default: 4)
|
||||
- `max_matches_per_search` (integer, optional): Max matches (default: 500)
|
||||
|
||||
**Supported languages**: rust, python, javascript, typescript, go, java, c, cpp, kotlin
|
||||
|
||||
**Example**:
|
||||
```json
|
||||
{"tool": "code_search", "args": {
|
||||
"searches": [{
|
||||
"name": "functions",
|
||||
"query": "(function_item name: (identifier) @name)",
|
||||
"language": "rust",
|
||||
"context_lines": 2
|
||||
}]
|
||||
}}
|
||||
```
|
||||
|
||||
See [Code Search Guide](CODE_SEARCH.md) for detailed query patterns.
|
||||
|
||||
---
|
||||
|
||||
### code_coverage
|
||||
|
||||
Generate code coverage report using cargo llvm-cov.
|
||||
|
||||
**Parameters**: None
|
||||
|
||||
**Example**:
|
||||
```json
|
||||
{"tool": "code_coverage", "args": {}}
|
||||
```
|
||||
|
||||
**Notes**:
|
||||
- Runs all tests with coverage instrumentation
|
||||
- Auto-installs llvm-tools-preview and cargo-llvm-cov if missing
|
||||
- Returns coverage statistics summary
|
||||
|
||||
---
|
||||
|
||||
## WebDriver Tools
|
||||
|
||||
Enabled with `--webdriver` (Safari) or `--chrome-headless` (Chrome).
|
||||
|
||||
### webdriver_start
|
||||
|
||||
Start a browser session.
|
||||
|
||||
**Example**:
|
||||
```json
|
||||
{"tool": "webdriver_start", "args": {}}
|
||||
```
|
||||
|
||||
### webdriver_navigate
|
||||
|
||||
Navigate to a URL.
|
||||
|
||||
**Parameters**:
|
||||
- `url` (string, required): URL with protocol (e.g., `https://`)
|
||||
|
||||
### webdriver_get_url / webdriver_get_title
|
||||
|
||||
Get current URL or page title.
|
||||
|
||||
### webdriver_find_element / webdriver_find_elements
|
||||
|
||||
Find element(s) by CSS selector.
|
||||
|
||||
**Parameters**:
|
||||
- `selector` (string, required): CSS selector
|
||||
|
||||
### webdriver_click
|
||||
|
||||
Click an element.
|
||||
|
||||
**Parameters**:
|
||||
- `selector` (string, required): CSS selector
|
||||
|
||||
### webdriver_send_keys
|
||||
|
||||
Type text into an input.
|
||||
|
||||
**Parameters**:
|
||||
- `selector` (string, required): CSS selector
|
||||
- `text` (string, required): Text to type
|
||||
- `clear_first` (boolean, optional): Clear before typing (default: true)
|
||||
|
||||
### webdriver_execute_script
|
||||
|
||||
Execute JavaScript.
|
||||
|
||||
**Parameters**:
|
||||
- `script` (string, required): JavaScript code (use `return` to return values)
|
||||
|
||||
### webdriver_get_page_source
|
||||
|
||||
Get rendered HTML.
|
||||
|
||||
**Parameters**:
|
||||
- `max_length` (integer, optional): Max chars to return (default: 10000, 0 for no limit)
|
||||
- `save_to_file` (string, optional): Save to file instead of returning inline
|
||||
|
||||
### webdriver_screenshot
|
||||
|
||||
Take browser screenshot.
|
||||
|
||||
**Parameters**:
|
||||
- `path` (string, required): Save path
|
||||
|
||||
### webdriver_back / webdriver_forward / webdriver_refresh
|
||||
|
||||
Navigation controls.
|
||||
|
||||
### webdriver_quit
|
||||
|
||||
Close browser and end session.
|
||||
|
||||
---
|
||||
|
||||
## Vision Tools (macOS)
|
||||
|
||||
Use Apple Vision framework for text recognition.
|
||||
|
||||
### vision_find_text
|
||||
|
||||
Find text in an application window.
|
||||
|
||||
**Parameters**:
|
||||
- `app_name` (string, required): Application name
|
||||
- `text` (string, required): Text to search for
|
||||
|
||||
**Returns**: Bounding box coordinates and confidence score
|
||||
|
||||
### vision_click_text
|
||||
|
||||
Find and click on text.
|
||||
|
||||
**Parameters**:
|
||||
- `app_name` (string, required): Application name
|
||||
- `text` (string, required): Text to click
|
||||
|
||||
### vision_click_near_text
|
||||
|
||||
Click near a text label (useful for form fields).
|
||||
|
||||
**Parameters**:
|
||||
- `app_name` (string, required): Application name
|
||||
- `text` (string, required): Label text to find
|
||||
- `direction` (string, optional): "right", "below", "left", "above" (default: "right")
|
||||
- `distance` (integer, optional): Pixels from text (default: 50)
|
||||
|
||||
---
|
||||
|
||||
## macOS Accessibility Tools
|
||||
|
||||
Enabled with `--macax`. See [macOS Accessibility Tools Guide](macax-tools.md).
|
||||
|
||||
### macax_list_apps
|
||||
|
||||
List running applications.
|
||||
|
||||
### macax_get_frontmost_app
|
||||
|
||||
Get the frontmost application.
|
||||
|
||||
### macax_activate_app
|
||||
|
||||
Bring an application to front.
|
||||
|
||||
**Parameters**:
|
||||
- `app_name` (string, required): Application name
|
||||
|
||||
### macax_get_ui_tree
|
||||
|
||||
Get UI element hierarchy.
|
||||
|
||||
**Parameters**:
|
||||
- `app_name` (string, required): Application name
|
||||
- `max_depth` (integer, optional): Tree depth limit
|
||||
|
||||
### macax_find_elements
|
||||
|
||||
Find UI elements by criteria.
|
||||
|
||||
**Parameters**:
|
||||
- `app_name` (string, required): Application name
|
||||
- `role` (string, optional): Element role (button, textField, etc.)
|
||||
- `title` (string, optional): Element title
|
||||
- `identifier` (string, optional): Accessibility identifier
|
||||
|
||||
### macax_click
|
||||
|
||||
Click a UI element.
|
||||
|
||||
**Parameters**:
|
||||
- `app_name` (string, required): Application name
|
||||
- `identifier` or `title` or `role`: Element selector
|
||||
|
||||
### macax_set_value / macax_get_value
|
||||
|
||||
Set or get element value.
|
||||
|
||||
### macax_press_key
|
||||
|
||||
Simulate key press.
|
||||
|
||||
**Parameters**:
|
||||
- `key` (string, required): Key to press
|
||||
- `modifiers` (array, optional): ["command", "shift", "option", "control"]
|
||||
|
||||
---
|
||||
|
||||
## Computer Control Tools
|
||||
|
||||
Enabled with `computer_control.enabled = true` in config.
|
||||
|
||||
### mouse_click
|
||||
|
||||
Click at coordinates.
|
||||
|
||||
**Parameters**:
|
||||
- `x` (integer, required): X coordinate
|
||||
- `y` (integer, required): Y coordinate
|
||||
- `button` (string, optional): "left", "right", "middle"
|
||||
|
||||
### type_text
|
||||
|
||||
Type text at cursor.
|
||||
|
||||
**Parameters**:
|
||||
- `text` (string, required): Text to type
|
||||
|
||||
### find_element
|
||||
|
||||
Find UI element by text, role, or attributes.
|
||||
|
||||
### list_windows
|
||||
|
||||
List all open windows with IDs and titles.
|
||||
|
||||
---
|
||||
|
||||
## Tool Execution Notes
|
||||
|
||||
### Duplicate Detection
|
||||
|
||||
G3 prevents accidental duplicate tool calls:
|
||||
- Only immediately sequential identical calls are blocked
|
||||
- Text between tool calls resets detection
|
||||
- Tools can be reused throughout a session
|
||||
|
||||
### Error Handling
|
||||
|
||||
Tool errors are reported back to the agent, which can:
|
||||
- Retry with different parameters
|
||||
- Try an alternative approach
|
||||
- Report the issue to the user
|
||||
|
||||
### Working Directory
|
||||
|
||||
Tools execute in:
|
||||
1. Directory specified by `--codebase-fast-start` if provided
|
||||
2. Current working directory otherwise
|
||||
|
||||
### File Paths
|
||||
|
||||
- Tilde expansion (`~`) is supported
|
||||
- Relative paths are relative to working directory
|
||||
- Screenshots default to `~/tmp` or `$TMPDIR`
|
||||
Reference in New Issue
Block a user