lamport run

This commit is contained in:
Dhanji R. Prasanna
2026-01-03 16:48:30 +11:00
parent f4a1bf5e93
commit f7e2f38fe9
10 changed files with 3444 additions and 0 deletions

538
docs/tools.md Normal file
View File

@@ -0,0 +1,538 @@
# G3 Tools Reference
**Last updated**: January 2025
**Source of truth**: `crates/g3-core/src/tool_definitions.rs`, `crates/g3-core/src/tools/`
## Purpose
This document describes all tools available to the G3 agent. Tools are the primary mechanism by which G3 interacts with the filesystem, executes commands, and automates tasks.
## Tool Categories
| Category | Tools | Enabled By |
|----------|-------|------------|
| **Core** | shell, read_file, write_file, str_replace, final_output, background_process | Always |
| **Images** | read_image, take_screenshot, extract_text | Always |
| **Task Management** | todo_read, todo_write | Always |
| **Code Intelligence** | code_search, code_coverage | Always |
| **WebDriver** | webdriver_* (12 tools) | `--webdriver` or `--chrome-headless` |
| **Vision** | vision_find_text, vision_click_text, vision_click_near_text | Always (macOS) |
| **macOS Accessibility** | macax_* (9 tools) | `--macax` |
| **Computer Control** | mouse_click, type_text, find_element, list_windows | `computer_control.enabled = true` |
---
## Core Tools
### shell
Execute shell commands.
**Parameters**:
- `command` (string, required): The shell command to execute
**Example**:
```json
{"tool": "shell", "args": {"command": "ls -la"}}
```
**Notes**:
- Commands run in the current working directory
- Output is streamed in real-time
- Both stdout and stderr are captured
- Exit code is reported
---
### background_process
Launch a long-running process in the background.
**Parameters**:
- `name` (string, required): Unique name for the process (e.g., "game_server")
- `command` (string, required): Shell command to execute
- `working_dir` (string, optional): Working directory
**Example**:
```json
{"tool": "background_process", "args": {"name": "dev_server", "command": "npm run dev"}}
```
**Returns**: PID and log file path
**Notes**:
- Process runs independently of the agent
- Logs are captured to a file
- Use `shell` to read logs (`tail`), check status (`ps`), or stop (`kill`)
---
### read_file
Read file contents with optional character range.
**Parameters**:
- `file_path` (string, required): Path to the file
- `start` (integer, optional): Starting character position (0-indexed, inclusive)
- `end` (integer, optional): Ending character position (0-indexed, exclusive)
**Example**:
```json
{"tool": "read_file", "args": {"file_path": "src/main.rs", "start": 0, "end": 1000}}
```
**Notes**:
- For image files (png, jpg, gif, etc.), automatically extracts text using OCR
- Supports tilde expansion (`~`)
- Reports file size and line count
---
### read_image
Read image files for visual analysis by the LLM.
**Parameters**:
- `file_paths` (array of strings, required): Paths to image files
**Example**:
```json
{"tool": "read_image", "args": {"file_paths": ["screenshot.png", "diagram.jpg"]}}
```
**Supported formats**: PNG, JPEG, GIF, WebP
**Notes**:
- Images are sent to the LLM for visual analysis
- Use for inspecting sprites, UI screenshots, diagrams, etc.
- Different from `extract_text` which only does OCR
---
### write_file
Create or overwrite a file.
**Parameters**:
- `file_path` (string, required): Path to the file
- `content` (string, required): Content to write
**Example**:
```json
{"tool": "write_file", "args": {"file_path": "hello.txt", "content": "Hello, world!"}}
```
**Notes**:
- Creates parent directories if needed
- Overwrites existing files
- Reports bytes written
---
### str_replace
Apply a unified diff to a file.
**Parameters**:
- `file_path` (string, required): Path to the file
- `diff` (string, required): Unified diff with context lines
- `start` (integer, optional): Starting character position to constrain search
- `end` (integer, optional): Ending character position to constrain search
**Example**:
```json
{"tool": "str_replace", "args": {
"file_path": "src/main.rs",
"diff": "@@ -10,3 +10,4 @@\n fn main() {\n println!(\"Hello\");\n+ println!(\"World\");\n }"
}}
```
**Notes**:
- Supports multiple hunks
- Context lines help locate the correct position
- Use `start`/`end` to disambiguate when multiple matches exist
- `---/+++` headers are optional for minimal diffs
---
### final_output
Signal task completion with a summary.
**Parameters**:
- `summary` (string, required): Markdown summary of what was accomplished
**Example**:
```json
{"tool": "final_output", "args": {"summary": "## Completed\n\n- Created user authentication module\n- Added unit tests\n- Updated documentation"}}
```
**Notes**:
- Ends the current task
- Summary is displayed to the user
- In autonomous mode, triggers coach review
---
## Image & Screenshot Tools
### take_screenshot
Capture a screenshot of an application window.
**Parameters**:
- `path` (string, required): Filename for the screenshot
- `window_id` (string, required): Application name (e.g., "Safari", "Terminal")
- `region` (object, optional): `{x, y, width, height}` to capture a region
**Example**:
```json
{"tool": "take_screenshot", "args": {"path": "safari.png", "window_id": "Safari"}}
```
**Notes**:
- Use `list_windows` first to identify available windows
- Relative paths save to `~/tmp` or `$TMPDIR`
- Uses native screencapture on macOS
---
### extract_text
Extract text from an image using OCR.
**Parameters**:
- `path` (string, optional): Path to image file
**Example**:
```json
{"tool": "extract_text", "args": {"path": "screenshot.png"}}
```
**Notes**:
- Uses Tesseract OCR or Apple Vision framework
- For window-based OCR, use `vision_find_text` instead
---
## Task Management Tools
### todo_read
Read the current TODO list.
**Parameters**: None
**Example**:
```json
{"tool": "todo_read", "args": {}}
```
**Notes**:
- TODO lists are session-scoped
- Stored in `.g3/sessions/<session_id>/todo.g3.md`
- Call at start of multi-step tasks to check for existing plans
---
### todo_write
Create or update the TODO list.
**Parameters**:
- `content` (string, required): TODO list content in markdown checkbox format
**Example**:
```json
{"tool": "todo_write", "args": {"content": "- [ ] Implement feature\n - [ ] Write tests\n - [ ] Update docs\n- [x] Setup project"}}
```
**Notes**:
- Replaces entire file content
- Always call `todo_read` first to preserve existing content
- Use `- [ ]` for incomplete, `- [x]` for complete
- Supports nested tasks with indentation
---
## Code Intelligence Tools
### code_search
Syntax-aware code search using tree-sitter.
**Parameters**:
- `searches` (array, required): Array of search objects:
- `name` (string): Label for this search
- `query` (string): Tree-sitter query in S-expression format
- `language` (string): Programming language
- `paths` (array, optional): Paths to search
- `context_lines` (integer, optional): Lines of context (0-20)
- `max_concurrency` (integer, optional): Parallel searches (default: 4)
- `max_matches_per_search` (integer, optional): Max matches (default: 500)
**Supported languages**: rust, python, javascript, typescript, go, java, c, cpp, kotlin
**Example**:
```json
{"tool": "code_search", "args": {
"searches": [{
"name": "functions",
"query": "(function_item name: (identifier) @name)",
"language": "rust",
"context_lines": 2
}]
}}
```
See [Code Search Guide](CODE_SEARCH.md) for detailed query patterns.
---
### code_coverage
Generate code coverage report using cargo llvm-cov.
**Parameters**: None
**Example**:
```json
{"tool": "code_coverage", "args": {}}
```
**Notes**:
- Runs all tests with coverage instrumentation
- Auto-installs llvm-tools-preview and cargo-llvm-cov if missing
- Returns coverage statistics summary
---
## WebDriver Tools
Enabled with `--webdriver` (Safari) or `--chrome-headless` (Chrome).
### webdriver_start
Start a browser session.
**Example**:
```json
{"tool": "webdriver_start", "args": {}}
```
### webdriver_navigate
Navigate to a URL.
**Parameters**:
- `url` (string, required): URL with protocol (e.g., `https://`)
### webdriver_get_url / webdriver_get_title
Get current URL or page title.
### webdriver_find_element / webdriver_find_elements
Find element(s) by CSS selector.
**Parameters**:
- `selector` (string, required): CSS selector
### webdriver_click
Click an element.
**Parameters**:
- `selector` (string, required): CSS selector
### webdriver_send_keys
Type text into an input.
**Parameters**:
- `selector` (string, required): CSS selector
- `text` (string, required): Text to type
- `clear_first` (boolean, optional): Clear before typing (default: true)
### webdriver_execute_script
Execute JavaScript.
**Parameters**:
- `script` (string, required): JavaScript code (use `return` to return values)
### webdriver_get_page_source
Get rendered HTML.
**Parameters**:
- `max_length` (integer, optional): Max chars to return (default: 10000, 0 for no limit)
- `save_to_file` (string, optional): Save to file instead of returning inline
### webdriver_screenshot
Take browser screenshot.
**Parameters**:
- `path` (string, required): Save path
### webdriver_back / webdriver_forward / webdriver_refresh
Navigation controls.
### webdriver_quit
Close browser and end session.
---
## Vision Tools (macOS)
Use Apple Vision framework for text recognition.
### vision_find_text
Find text in an application window.
**Parameters**:
- `app_name` (string, required): Application name
- `text` (string, required): Text to search for
**Returns**: Bounding box coordinates and confidence score
### vision_click_text
Find and click on text.
**Parameters**:
- `app_name` (string, required): Application name
- `text` (string, required): Text to click
### vision_click_near_text
Click near a text label (useful for form fields).
**Parameters**:
- `app_name` (string, required): Application name
- `text` (string, required): Label text to find
- `direction` (string, optional): "right", "below", "left", "above" (default: "right")
- `distance` (integer, optional): Pixels from text (default: 50)
---
## macOS Accessibility Tools
Enabled with `--macax`. See [macOS Accessibility Tools Guide](macax-tools.md).
### macax_list_apps
List running applications.
### macax_get_frontmost_app
Get the frontmost application.
### macax_activate_app
Bring an application to front.
**Parameters**:
- `app_name` (string, required): Application name
### macax_get_ui_tree
Get UI element hierarchy.
**Parameters**:
- `app_name` (string, required): Application name
- `max_depth` (integer, optional): Tree depth limit
### macax_find_elements
Find UI elements by criteria.
**Parameters**:
- `app_name` (string, required): Application name
- `role` (string, optional): Element role (button, textField, etc.)
- `title` (string, optional): Element title
- `identifier` (string, optional): Accessibility identifier
### macax_click
Click a UI element.
**Parameters**:
- `app_name` (string, required): Application name
- `identifier` or `title` or `role`: Element selector
### macax_set_value / macax_get_value
Set or get element value.
### macax_press_key
Simulate key press.
**Parameters**:
- `key` (string, required): Key to press
- `modifiers` (array, optional): ["command", "shift", "option", "control"]
---
## Computer Control Tools
Enabled with `computer_control.enabled = true` in config.
### mouse_click
Click at coordinates.
**Parameters**:
- `x` (integer, required): X coordinate
- `y` (integer, required): Y coordinate
- `button` (string, optional): "left", "right", "middle"
### type_text
Type text at cursor.
**Parameters**:
- `text` (string, required): Text to type
### find_element
Find UI element by text, role, or attributes.
### list_windows
List all open windows with IDs and titles.
---
## Tool Execution Notes
### Duplicate Detection
G3 prevents accidental duplicate tool calls:
- Only immediately sequential identical calls are blocked
- Text between tool calls resets detection
- Tools can be reused throughout a session
### Error Handling
Tool errors are reported back to the agent, which can:
- Retry with different parameters
- Try an alternative approach
- Report the issue to the user
### Working Directory
Tools execute in:
1. Directory specified by `--codebase-fast-start` if provided
2. Current working directory otherwise
### File Paths
- Tilde expansion (`~`) is supported
- Relative paths are relative to working directory
- Screenshots default to `~/tmp` or `$TMPDIR`