Files
g3/docs/tools.md
Dhanji R. Prasanna f7e2f38fe9 lamport run
2026-01-03 16:48:30 +11:00

12 KiB

G3 Tools Reference

Last updated: January 2025
Source of truth: crates/g3-core/src/tool_definitions.rs, crates/g3-core/src/tools/

Purpose

This document describes all tools available to the G3 agent. Tools are the primary mechanism by which G3 interacts with the filesystem, executes commands, and automates tasks.

Tool Categories

Category Tools Enabled By
Core shell, read_file, write_file, str_replace, final_output, background_process Always
Images read_image, take_screenshot, extract_text Always
Task Management todo_read, todo_write Always
Code Intelligence code_search, code_coverage Always
WebDriver webdriver_* (12 tools) --webdriver or --chrome-headless
Vision vision_find_text, vision_click_text, vision_click_near_text Always (macOS)
macOS Accessibility macax_* (9 tools) --macax
Computer Control mouse_click, type_text, find_element, list_windows computer_control.enabled = true

Core Tools

shell

Execute shell commands.

Parameters:

  • command (string, required): The shell command to execute

Example:

{"tool": "shell", "args": {"command": "ls -la"}}

Notes:

  • Commands run in the current working directory
  • Output is streamed in real-time
  • Both stdout and stderr are captured
  • Exit code is reported

background_process

Launch a long-running process in the background.

Parameters:

  • name (string, required): Unique name for the process (e.g., "game_server")
  • command (string, required): Shell command to execute
  • working_dir (string, optional): Working directory

Example:

{"tool": "background_process", "args": {"name": "dev_server", "command": "npm run dev"}}

Returns: PID and log file path

Notes:

  • Process runs independently of the agent
  • Logs are captured to a file
  • Use shell to read logs (tail), check status (ps), or stop (kill)

read_file

Read file contents with optional character range.

Parameters:

  • file_path (string, required): Path to the file
  • start (integer, optional): Starting character position (0-indexed, inclusive)
  • end (integer, optional): Ending character position (0-indexed, exclusive)

Example:

{"tool": "read_file", "args": {"file_path": "src/main.rs", "start": 0, "end": 1000}}

Notes:

  • For image files (png, jpg, gif, etc.), automatically extracts text using OCR
  • Supports tilde expansion (~)
  • Reports file size and line count

read_image

Read image files for visual analysis by the LLM.

Parameters:

  • file_paths (array of strings, required): Paths to image files

Example:

{"tool": "read_image", "args": {"file_paths": ["screenshot.png", "diagram.jpg"]}}

Supported formats: PNG, JPEG, GIF, WebP

Notes:

  • Images are sent to the LLM for visual analysis
  • Use for inspecting sprites, UI screenshots, diagrams, etc.
  • Different from extract_text which only does OCR

write_file

Create or overwrite a file.

Parameters:

  • file_path (string, required): Path to the file
  • content (string, required): Content to write

Example:

{"tool": "write_file", "args": {"file_path": "hello.txt", "content": "Hello, world!"}}

Notes:

  • Creates parent directories if needed
  • Overwrites existing files
  • Reports bytes written

str_replace

Apply a unified diff to a file.

Parameters:

  • file_path (string, required): Path to the file
  • diff (string, required): Unified diff with context lines
  • start (integer, optional): Starting character position to constrain search
  • end (integer, optional): Ending character position to constrain search

Example:

{"tool": "str_replace", "args": {
  "file_path": "src/main.rs",
  "diff": "@@ -10,3 +10,4 @@\n fn main() {\n     println!(\"Hello\");\n+    println!(\"World\");\n }"
}}

Notes:

  • Supports multiple hunks
  • Context lines help locate the correct position
  • Use start/end to disambiguate when multiple matches exist
  • ---/+++ headers are optional for minimal diffs

final_output

Signal task completion with a summary.

Parameters:

  • summary (string, required): Markdown summary of what was accomplished

Example:

{"tool": "final_output", "args": {"summary": "## Completed\n\n- Created user authentication module\n- Added unit tests\n- Updated documentation"}}

Notes:

  • Ends the current task
  • Summary is displayed to the user
  • In autonomous mode, triggers coach review

Image & Screenshot Tools

take_screenshot

Capture a screenshot of an application window.

Parameters:

  • path (string, required): Filename for the screenshot
  • window_id (string, required): Application name (e.g., "Safari", "Terminal")
  • region (object, optional): {x, y, width, height} to capture a region

Example:

{"tool": "take_screenshot", "args": {"path": "safari.png", "window_id": "Safari"}}

Notes:

  • Use list_windows first to identify available windows
  • Relative paths save to ~/tmp or $TMPDIR
  • Uses native screencapture on macOS

extract_text

Extract text from an image using OCR.

Parameters:

  • path (string, optional): Path to image file

Example:

{"tool": "extract_text", "args": {"path": "screenshot.png"}}

Notes:

  • Uses Tesseract OCR or Apple Vision framework
  • For window-based OCR, use vision_find_text instead

Task Management Tools

todo_read

Read the current TODO list.

Parameters: None

Example:

{"tool": "todo_read", "args": {}}

Notes:

  • TODO lists are session-scoped
  • Stored in .g3/sessions/<session_id>/todo.g3.md
  • Call at start of multi-step tasks to check for existing plans

todo_write

Create or update the TODO list.

Parameters:

  • content (string, required): TODO list content in markdown checkbox format

Example:

{"tool": "todo_write", "args": {"content": "- [ ] Implement feature\n  - [ ] Write tests\n  - [ ] Update docs\n- [x] Setup project"}}

Notes:

  • Replaces entire file content
  • Always call todo_read first to preserve existing content
  • Use - [ ] for incomplete, - [x] for complete
  • Supports nested tasks with indentation

Code Intelligence Tools

Syntax-aware code search using tree-sitter.

Parameters:

  • searches (array, required): Array of search objects:
    • name (string): Label for this search
    • query (string): Tree-sitter query in S-expression format
    • language (string): Programming language
    • paths (array, optional): Paths to search
    • context_lines (integer, optional): Lines of context (0-20)
  • max_concurrency (integer, optional): Parallel searches (default: 4)
  • max_matches_per_search (integer, optional): Max matches (default: 500)

Supported languages: rust, python, javascript, typescript, go, java, c, cpp, kotlin

Example:

{"tool": "code_search", "args": {
  "searches": [{
    "name": "functions",
    "query": "(function_item name: (identifier) @name)",
    "language": "rust",
    "context_lines": 2
  }]
}}

See Code Search Guide for detailed query patterns.


code_coverage

Generate code coverage report using cargo llvm-cov.

Parameters: None

Example:

{"tool": "code_coverage", "args": {}}

Notes:

  • Runs all tests with coverage instrumentation
  • Auto-installs llvm-tools-preview and cargo-llvm-cov if missing
  • Returns coverage statistics summary

WebDriver Tools

Enabled with --webdriver (Safari) or --chrome-headless (Chrome).

webdriver_start

Start a browser session.

Example:

{"tool": "webdriver_start", "args": {}}

webdriver_navigate

Navigate to a URL.

Parameters:

  • url (string, required): URL with protocol (e.g., https://)

webdriver_get_url / webdriver_get_title

Get current URL or page title.

webdriver_find_element / webdriver_find_elements

Find element(s) by CSS selector.

Parameters:

  • selector (string, required): CSS selector

webdriver_click

Click an element.

Parameters:

  • selector (string, required): CSS selector

webdriver_send_keys

Type text into an input.

Parameters:

  • selector (string, required): CSS selector
  • text (string, required): Text to type
  • clear_first (boolean, optional): Clear before typing (default: true)

webdriver_execute_script

Execute JavaScript.

Parameters:

  • script (string, required): JavaScript code (use return to return values)

webdriver_get_page_source

Get rendered HTML.

Parameters:

  • max_length (integer, optional): Max chars to return (default: 10000, 0 for no limit)
  • save_to_file (string, optional): Save to file instead of returning inline

webdriver_screenshot

Take browser screenshot.

Parameters:

  • path (string, required): Save path

webdriver_back / webdriver_forward / webdriver_refresh

Navigation controls.

webdriver_quit

Close browser and end session.


Vision Tools (macOS)

Use Apple Vision framework for text recognition.

vision_find_text

Find text in an application window.

Parameters:

  • app_name (string, required): Application name
  • text (string, required): Text to search for

Returns: Bounding box coordinates and confidence score

vision_click_text

Find and click on text.

Parameters:

  • app_name (string, required): Application name
  • text (string, required): Text to click

vision_click_near_text

Click near a text label (useful for form fields).

Parameters:

  • app_name (string, required): Application name
  • text (string, required): Label text to find
  • direction (string, optional): "right", "below", "left", "above" (default: "right")
  • distance (integer, optional): Pixels from text (default: 50)

macOS Accessibility Tools

Enabled with --macax. See macOS Accessibility Tools Guide.

macax_list_apps

List running applications.

macax_get_frontmost_app

Get the frontmost application.

macax_activate_app

Bring an application to front.

Parameters:

  • app_name (string, required): Application name

macax_get_ui_tree

Get UI element hierarchy.

Parameters:

  • app_name (string, required): Application name
  • max_depth (integer, optional): Tree depth limit

macax_find_elements

Find UI elements by criteria.

Parameters:

  • app_name (string, required): Application name
  • role (string, optional): Element role (button, textField, etc.)
  • title (string, optional): Element title
  • identifier (string, optional): Accessibility identifier

macax_click

Click a UI element.

Parameters:

  • app_name (string, required): Application name
  • identifier or title or role: Element selector

macax_set_value / macax_get_value

Set or get element value.

macax_press_key

Simulate key press.

Parameters:

  • key (string, required): Key to press
  • modifiers (array, optional): ["command", "shift", "option", "control"]

Computer Control Tools

Enabled with computer_control.enabled = true in config.

mouse_click

Click at coordinates.

Parameters:

  • x (integer, required): X coordinate
  • y (integer, required): Y coordinate
  • button (string, optional): "left", "right", "middle"

type_text

Type text at cursor.

Parameters:

  • text (string, required): Text to type

find_element

Find UI element by text, role, or attributes.

list_windows

List all open windows with IDs and titles.


Tool Execution Notes

Duplicate Detection

G3 prevents accidental duplicate tool calls:

  • Only immediately sequential identical calls are blocked
  • Text between tool calls resets detection
  • Tools can be reused throughout a session

Error Handling

Tool errors are reported back to the agent, which can:

  • Retry with different parameters
  • Try an alternative approach
  • Report the issue to the user

Working Directory

Tools execute in:

  1. Directory specified by --codebase-fast-start if provided
  2. Current working directory otherwise

File Paths

  • Tilde expansion (~) is supported
  • Relative paths are relative to working directory
  • Screenshots default to ~/tmp or $TMPDIR