lamport run
This commit is contained in:
205
AGENTS.md
Normal file
205
AGENTS.md
Normal file
@@ -0,0 +1,205 @@
|
|||||||
|
# AGENTS.md - Machine Instructions for G3
|
||||||
|
|
||||||
|
**Last updated**: January 2025
|
||||||
|
**Purpose**: Enable AI agents to work safely and effectively with this codebase
|
||||||
|
|
||||||
|
## System Overview
|
||||||
|
|
||||||
|
G3 is an AI coding agent built in Rust. It uses LLM providers to execute tasks through a tool-based interface. The codebase is organized as a Cargo workspace with 9 crates.
|
||||||
|
|
||||||
|
### Quick Reference
|
||||||
|
|
||||||
|
| Crate | Purpose | Stability |
|
||||||
|
|-------|---------|----------|
|
||||||
|
| `g3-core` | Agent engine, tools, context management | Stable |
|
||||||
|
| `g3-providers` | LLM provider abstractions | Stable |
|
||||||
|
| `g3-cli` | Command-line interface | Stable |
|
||||||
|
| `g3-config` | Configuration management | Stable |
|
||||||
|
| `g3-execution` | Code execution | Stable |
|
||||||
|
| `g3-computer-control` | Computer automation | Experimental |
|
||||||
|
| `g3-planner` | Planning mode | Stable |
|
||||||
|
| `g3-ensembles` | Multi-agent (flock) mode | Experimental |
|
||||||
|
| `g3-console` | Web monitoring console | Experimental |
|
||||||
|
|
||||||
|
## Critical Invariants
|
||||||
|
|
||||||
|
### MUST Hold
|
||||||
|
|
||||||
|
1. **Tool calls must be valid JSON** - The streaming parser expects well-formed tool calls
|
||||||
|
2. **Context window limits must be respected** - Exceeding limits causes API errors
|
||||||
|
3. **Provider trait implementations must be Send + Sync** - Required for async runtime
|
||||||
|
4. **Session IDs must be unique** - Used for log file paths and TODO scoping
|
||||||
|
5. **File paths in tools support tilde expansion** - `~` expands to home directory
|
||||||
|
|
||||||
|
### MUST NOT Do
|
||||||
|
|
||||||
|
1. **Never block the async runtime** - Use `tokio::spawn` for CPU-intensive work
|
||||||
|
2. **Never store secrets in logs** - API keys are redacted in error logs
|
||||||
|
3. **Never modify files outside working directory without explicit permission**
|
||||||
|
4. **Never assume tool results fit in context** - Large results are thinned automatically
|
||||||
|
|
||||||
|
## Recommended Entry Points
|
||||||
|
|
||||||
|
### For Understanding the System
|
||||||
|
|
||||||
|
1. `src/main.rs` - Entry point (trivial)
|
||||||
|
2. `crates/g3-cli/src/lib.rs` - CLI logic and execution modes
|
||||||
|
3. `crates/g3-core/src/lib.rs` - Agent struct and orchestration
|
||||||
|
4. `crates/g3-providers/src/lib.rs` - Provider trait definition
|
||||||
|
|
||||||
|
### For Adding Features
|
||||||
|
|
||||||
|
1. **New tool**: `crates/g3-core/src/tool_definitions.rs` → `crates/g3-core/src/tools/`
|
||||||
|
2. **New provider**: `crates/g3-providers/src/` → implement `LLMProvider` trait
|
||||||
|
3. **New CLI mode**: `crates/g3-cli/src/lib.rs`
|
||||||
|
4. **New config option**: `crates/g3-config/src/lib.rs`
|
||||||
|
|
||||||
|
### For Debugging
|
||||||
|
|
||||||
|
1. Session logs: `.g3/sessions/<session_id>/session.json`
|
||||||
|
2. Error logs: `logs/errors/`
|
||||||
|
3. Context state: Use `/stats` command in interactive mode
|
||||||
|
|
||||||
|
## Dangerous/Subtle Code Paths
|
||||||
|
|
||||||
|
### Context Window Management (`g3-core/src/context_window.rs`)
|
||||||
|
|
||||||
|
- **Thinning**: Automatically replaces large tool results with file references
|
||||||
|
- **Summarization**: Compresses conversation history at 80% capacity
|
||||||
|
- **Token estimation**: Uses character-based heuristics, not exact tokenization
|
||||||
|
- **Risk**: Incorrect token estimates can cause context overflow
|
||||||
|
|
||||||
|
### Streaming Parser (`g3-core/src/streaming_parser.rs`)
|
||||||
|
|
||||||
|
- Parses LLM responses in real-time for tool calls
|
||||||
|
- Must handle partial JSON across chunk boundaries
|
||||||
|
- **Risk**: Malformed responses can cause parsing failures
|
||||||
|
|
||||||
|
### Tool Dispatch (`g3-core/src/tool_dispatch.rs`)
|
||||||
|
|
||||||
|
- Routes tool calls to implementations
|
||||||
|
- Handles both native and JSON-based tool calling
|
||||||
|
- **Risk**: Missing dispatch cases cause silent failures
|
||||||
|
|
||||||
|
### Retry Logic (`g3-core/src/retry.rs`)
|
||||||
|
|
||||||
|
- Exponential backoff with jitter
|
||||||
|
- Different configs for interactive vs autonomous mode
|
||||||
|
- **Risk**: Aggressive retries can hit rate limits harder
|
||||||
|
|
||||||
|
## Performance Constraints
|
||||||
|
|
||||||
|
1. **Streaming is preferred** - Non-streaming requests block UI
|
||||||
|
2. **Tool results are size-limited** - Large outputs are truncated or thinned
|
||||||
|
3. **Concurrent tool calls** - Enabled by `allow_multiple_tool_calls` config
|
||||||
|
4. **Background processes** - Long-running commands use `background_process` tool
|
||||||
|
|
||||||
|
## Testing Strategy
|
||||||
|
|
||||||
|
### Test Locations
|
||||||
|
|
||||||
|
- Unit tests: `crates/*/tests/`
|
||||||
|
- Integration tests: `crates/*/tests/`
|
||||||
|
- Test fixtures: `examples/test_code/`
|
||||||
|
|
||||||
|
### Running Tests
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# All tests
|
||||||
|
cargo test
|
||||||
|
|
||||||
|
# Specific crate
|
||||||
|
cargo test -p g3-core
|
||||||
|
|
||||||
|
# With output
|
||||||
|
cargo test -- --nocapture
|
||||||
|
```
|
||||||
|
|
||||||
|
### Test Considerations
|
||||||
|
|
||||||
|
- Provider tests may require API keys
|
||||||
|
- Computer control tests require OS permissions
|
||||||
|
- WebDriver tests require browser setup
|
||||||
|
|
||||||
|
## Do's and Don'ts for Automated Changes
|
||||||
|
|
||||||
|
### Do
|
||||||
|
|
||||||
|
- ✅ Run `cargo check` after modifications
|
||||||
|
- ✅ Run `cargo test` before committing
|
||||||
|
- ✅ Update tool definitions when adding tools
|
||||||
|
- ✅ Add tests for new functionality
|
||||||
|
- ✅ Use existing patterns for similar features
|
||||||
|
- ✅ Keep functions under 80 lines
|
||||||
|
- ✅ Update documentation for user-facing changes
|
||||||
|
|
||||||
|
### Don't
|
||||||
|
|
||||||
|
- ❌ Modify `Cargo.toml` dependencies without justification
|
||||||
|
- ❌ Add blocking code in async contexts
|
||||||
|
- ❌ Store sensitive data in plain text
|
||||||
|
- ❌ Ignore error handling
|
||||||
|
- ❌ Create deeply nested conditionals (>6 levels)
|
||||||
|
- ❌ Add external dependencies for simple tasks
|
||||||
|
|
||||||
|
## Common Incorrect Assumptions
|
||||||
|
|
||||||
|
1. **"All providers support tool calling"** - Embedded models use JSON fallback
|
||||||
|
2. **"Context window is unlimited"** - Each provider has limits (4k-200k tokens)
|
||||||
|
3. **"Tool results are always small"** - File reads can return megabytes
|
||||||
|
4. **"Sessions persist across runs"** - Sessions are ephemeral by default
|
||||||
|
5. **"All platforms are equal"** - macOS has more features (Vision, Accessibility)
|
||||||
|
|
||||||
|
## Architecture Decisions
|
||||||
|
|
||||||
|
See `DESIGN.md` for original design rationale.
|
||||||
|
|
||||||
|
Key decisions:
|
||||||
|
- **Rust for performance and safety** - Async runtime, memory safety
|
||||||
|
- **Workspace structure** - Separation of concerns, independent compilation
|
||||||
|
- **Provider abstraction** - Swap providers without code changes
|
||||||
|
- **Tool-first philosophy** - Agent acts through tools, not just advice
|
||||||
|
- **Session-scoped state** - TODO lists, logs tied to sessions
|
||||||
|
|
||||||
|
## File Structure Quick Reference
|
||||||
|
|
||||||
|
```
|
||||||
|
g3/
|
||||||
|
├── src/main.rs # Entry point
|
||||||
|
├── crates/
|
||||||
|
│ ├── g3-cli/src/
|
||||||
|
│ │ ├── lib.rs # CLI logic (~112k chars)
|
||||||
|
│ │ └── retro_tui.rs # Retro TUI mode
|
||||||
|
│ ├── g3-core/src/
|
||||||
|
│ │ ├── lib.rs # Agent struct (~3400 lines)
|
||||||
|
│ │ ├── context_window.rs # Context management
|
||||||
|
│ │ ├── tool_definitions.rs # Tool schemas
|
||||||
|
│ │ ├── tool_dispatch.rs # Tool routing
|
||||||
|
│ │ ├── tools/ # Tool implementations
|
||||||
|
│ │ ├── streaming_parser.rs # Response parsing
|
||||||
|
│ │ └── retry.rs # Retry logic
|
||||||
|
│ ├── g3-providers/src/
|
||||||
|
│ │ ├── lib.rs # Provider trait
|
||||||
|
│ │ ├── anthropic.rs # Anthropic Claude
|
||||||
|
│ │ ├── databricks.rs # Databricks
|
||||||
|
│ │ ├── openai.rs # OpenAI
|
||||||
|
│ │ └── embedded.rs # Local models
|
||||||
|
│ ├── g3-config/src/lib.rs # Configuration
|
||||||
|
│ ├── g3-planner/src/ # Planning mode
|
||||||
|
│ ├── g3-ensembles/src/ # Flock mode
|
||||||
|
│ └── g3-computer-control/src/ # Automation
|
||||||
|
├── agents/ # Agent personas
|
||||||
|
├── docs/ # Documentation
|
||||||
|
└── logs/ # Session logs
|
||||||
|
```
|
||||||
|
|
||||||
|
## Pointers to Documentation
|
||||||
|
|
||||||
|
- [Architecture](docs/architecture.md) - System design and data flow
|
||||||
|
- [Configuration](docs/configuration.md) - Config file format and options
|
||||||
|
- [Tools Reference](docs/tools.md) - All available tools
|
||||||
|
- [Providers Guide](docs/providers.md) - LLM provider setup
|
||||||
|
- [Control Commands](docs/CONTROL_COMMANDS.md) - Interactive commands
|
||||||
|
- [Code Search](docs/CODE_SEARCH.md) - Tree-sitter search guide
|
||||||
|
- [Flock Mode](docs/FLOCK_MODE.md) - Multi-agent development
|
||||||
|
- [macOS Accessibility](docs/macax-tools.md) - macOS automation
|
||||||
22
README.md
22
README.md
@@ -338,6 +338,28 @@ G3 automatically saves session logs for each interaction in the `logs/` director
|
|||||||
|
|
||||||
The `logs/` directory is created automatically on first use and is excluded from version control.
|
The `logs/` directory is created automatically on first use and is excluded from version control.
|
||||||
|
|
||||||
|
## Documentation Map
|
||||||
|
|
||||||
|
Detailed documentation is available in the `docs/` directory:
|
||||||
|
|
||||||
|
| Document | Description |
|
||||||
|
|----------|-------------|
|
||||||
|
| [Architecture](docs/architecture.md) | System design, crate responsibilities, data flow |
|
||||||
|
| [Configuration](docs/configuration.md) | Config file format, provider setup, all options |
|
||||||
|
| [Tools Reference](docs/tools.md) | Complete reference for all available tools |
|
||||||
|
| [Providers Guide](docs/providers.md) | LLM provider setup and selection guide |
|
||||||
|
| [Control Commands](docs/CONTROL_COMMANDS.md) | Interactive `/` commands for context management |
|
||||||
|
| [Code Search](docs/CODE_SEARCH.md) | Tree-sitter code search query patterns |
|
||||||
|
| [Flock Mode](docs/FLOCK_MODE.md) | Parallel multi-agent development |
|
||||||
|
| [macOS Accessibility](docs/macax-tools.md) | macOS Accessibility API automation |
|
||||||
|
|
||||||
|
For AI agents working with this codebase, see [AGENTS.md](AGENTS.md).
|
||||||
|
|
||||||
|
Additional resources:
|
||||||
|
- `DESIGN.md` - Original design document and rationale
|
||||||
|
- `config.example.toml` - Complete configuration example
|
||||||
|
- `config.coach-player.example.toml` - Multi-role configuration example
|
||||||
|
|
||||||
## License
|
## License
|
||||||
|
|
||||||
MIT License - see LICENSE file for details
|
MIT License - see LICENSE file for details
|
||||||
|
|||||||
430
docs/CODE_SEARCH.md
Normal file
430
docs/CODE_SEARCH.md
Normal file
@@ -0,0 +1,430 @@
|
|||||||
|
# G3 Code Search Guide
|
||||||
|
|
||||||
|
**Last updated**: January 2025
|
||||||
|
**Source of truth**: `crates/g3-core/src/code_search/`, `crates/g3-core/src/tool_definitions.rs`
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
|
||||||
|
G3 includes a syntax-aware code search tool powered by tree-sitter. Unlike text-based search (grep), it understands code structure and finds actual functions, classes, methods, and other constructs—ignoring matches in comments and strings.
|
||||||
|
|
||||||
|
## Why Use Code Search?
|
||||||
|
|
||||||
|
| Feature | grep/ripgrep | code_search |
|
||||||
|
|---------|--------------|-------------|
|
||||||
|
| Finds text in comments | ✅ | ❌ |
|
||||||
|
| Finds text in strings | ✅ | ❌ |
|
||||||
|
| Understands code structure | ❌ | ✅ |
|
||||||
|
| Finds function definitions | Regex needed | Native |
|
||||||
|
| Finds class hierarchies | ❌ | ✅ |
|
||||||
|
| Language-aware | ❌ | ✅ |
|
||||||
|
|
||||||
|
**Use code_search when**:
|
||||||
|
- Finding function/method definitions
|
||||||
|
- Finding class/struct declarations
|
||||||
|
- Searching for specific code constructs
|
||||||
|
- Need accurate results without false positives
|
||||||
|
|
||||||
|
**Use grep when**:
|
||||||
|
- Searching non-code files (logs, markdown)
|
||||||
|
- Simple string searches
|
||||||
|
- Searching comments or documentation
|
||||||
|
- Regex for text patterns
|
||||||
|
|
||||||
|
## Supported Languages
|
||||||
|
|
||||||
|
- Rust
|
||||||
|
- Python
|
||||||
|
- JavaScript
|
||||||
|
- TypeScript
|
||||||
|
- Go
|
||||||
|
- Java
|
||||||
|
- C
|
||||||
|
- C++
|
||||||
|
- Kotlin
|
||||||
|
|
||||||
|
## Basic Usage
|
||||||
|
|
||||||
|
```json
|
||||||
|
{"tool": "code_search", "args": {
|
||||||
|
"searches": [{
|
||||||
|
"name": "my_search",
|
||||||
|
"query": "(function_item name: (identifier) @name)",
|
||||||
|
"language": "rust"
|
||||||
|
}]
|
||||||
|
}}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Parameters
|
||||||
|
|
||||||
|
| Parameter | Type | Required | Description |
|
||||||
|
|-----------|------|----------|-------------|
|
||||||
|
| `searches` | array | Yes | Array of search objects (max 20) |
|
||||||
|
| `max_concurrency` | integer | No | Parallel searches (default: 4) |
|
||||||
|
| `max_matches_per_search` | integer | No | Max matches (default: 500) |
|
||||||
|
|
||||||
|
### Search Object
|
||||||
|
|
||||||
|
| Field | Type | Required | Description |
|
||||||
|
|-------|------|----------|-------------|
|
||||||
|
| `name` | string | Yes | Label for this search |
|
||||||
|
| `query` | string | Yes | Tree-sitter query (S-expression) |
|
||||||
|
| `language` | string | Yes | Programming language |
|
||||||
|
| `paths` | array | No | Paths to search (default: current dir) |
|
||||||
|
| `context_lines` | integer | No | Lines of context (0-20, default: 0) |
|
||||||
|
|
||||||
|
## Query Syntax
|
||||||
|
|
||||||
|
Tree-sitter queries use S-expression syntax. The basic pattern is:
|
||||||
|
|
||||||
|
```
|
||||||
|
(node_type field: (child_type) @capture_name)
|
||||||
|
```
|
||||||
|
|
||||||
|
- `node_type`: The AST node to match
|
||||||
|
- `field`: Optional field name
|
||||||
|
- `child_type`: Type of child node
|
||||||
|
- `@capture_name`: Name for the captured node
|
||||||
|
|
||||||
|
## Common Query Patterns
|
||||||
|
|
||||||
|
### Rust
|
||||||
|
|
||||||
|
```lisp
|
||||||
|
;; All functions
|
||||||
|
(function_item name: (identifier) @name)
|
||||||
|
|
||||||
|
;; Async functions
|
||||||
|
(function_item (function_modifiers) name: (identifier) @name)
|
||||||
|
|
||||||
|
;; Structs
|
||||||
|
(struct_item name: (type_identifier) @name)
|
||||||
|
|
||||||
|
;; Enums
|
||||||
|
(enum_item name: (type_identifier) @name)
|
||||||
|
|
||||||
|
;; Impl blocks
|
||||||
|
(impl_item type: (type_identifier) @name)
|
||||||
|
|
||||||
|
;; Trait definitions
|
||||||
|
(trait_item name: (type_identifier) @name)
|
||||||
|
|
||||||
|
;; Macros
|
||||||
|
(macro_definition name: (identifier) @name)
|
||||||
|
|
||||||
|
;; Constants
|
||||||
|
(const_item name: (identifier) @name)
|
||||||
|
|
||||||
|
;; Static variables
|
||||||
|
(static_item name: (identifier) @name)
|
||||||
|
|
||||||
|
;; Type aliases
|
||||||
|
(type_item name: (type_identifier) @name)
|
||||||
|
|
||||||
|
;; Modules
|
||||||
|
(mod_item name: (identifier) @name)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Python
|
||||||
|
|
||||||
|
```lisp
|
||||||
|
;; Functions
|
||||||
|
(function_definition name: (identifier) @name)
|
||||||
|
|
||||||
|
;; Async functions
|
||||||
|
(function_definition name: (identifier) @name) @fn
|
||||||
|
|
||||||
|
;; Classes
|
||||||
|
(class_definition name: (identifier) @name)
|
||||||
|
|
||||||
|
;; Methods (functions inside classes)
|
||||||
|
(class_definition
|
||||||
|
body: (block
|
||||||
|
(function_definition name: (identifier) @name)))
|
||||||
|
|
||||||
|
;; Decorators
|
||||||
|
(decorator) @decorator
|
||||||
|
|
||||||
|
;; Imports
|
||||||
|
(import_statement) @import
|
||||||
|
(import_from_statement) @import
|
||||||
|
```
|
||||||
|
|
||||||
|
### JavaScript / TypeScript
|
||||||
|
|
||||||
|
```lisp
|
||||||
|
;; Function declarations
|
||||||
|
(function_declaration name: (identifier) @name)
|
||||||
|
|
||||||
|
;; Arrow functions assigned to variables
|
||||||
|
(variable_declarator
|
||||||
|
name: (identifier) @name
|
||||||
|
value: (arrow_function))
|
||||||
|
|
||||||
|
;; Classes
|
||||||
|
(class_declaration name: (identifier) @name)
|
||||||
|
|
||||||
|
;; Methods
|
||||||
|
(method_definition name: (property_identifier) @name)
|
||||||
|
|
||||||
|
;; Exports
|
||||||
|
(export_statement) @export
|
||||||
|
|
||||||
|
;; Imports
|
||||||
|
(import_statement) @import
|
||||||
|
```
|
||||||
|
|
||||||
|
### Go
|
||||||
|
|
||||||
|
```lisp
|
||||||
|
;; Functions
|
||||||
|
(function_declaration name: (identifier) @name)
|
||||||
|
|
||||||
|
;; Methods
|
||||||
|
(method_declaration name: (field_identifier) @name)
|
||||||
|
|
||||||
|
;; Structs
|
||||||
|
(type_declaration
|
||||||
|
(type_spec name: (type_identifier) @name
|
||||||
|
type: (struct_type)))
|
||||||
|
|
||||||
|
;; Interfaces
|
||||||
|
(type_declaration
|
||||||
|
(type_spec name: (type_identifier) @name
|
||||||
|
type: (interface_type)))
|
||||||
|
```
|
||||||
|
|
||||||
|
### Java
|
||||||
|
|
||||||
|
```lisp
|
||||||
|
;; Classes
|
||||||
|
(class_declaration name: (identifier) @name)
|
||||||
|
|
||||||
|
;; Interfaces
|
||||||
|
(interface_declaration name: (identifier) @name)
|
||||||
|
|
||||||
|
;; Methods
|
||||||
|
(method_declaration name: (identifier) @name)
|
||||||
|
|
||||||
|
;; Constructors
|
||||||
|
(constructor_declaration name: (identifier) @name)
|
||||||
|
|
||||||
|
;; Fields
|
||||||
|
(field_declaration
|
||||||
|
declarator: (variable_declarator name: (identifier) @name))
|
||||||
|
```
|
||||||
|
|
||||||
|
### C / C++
|
||||||
|
|
||||||
|
```lisp
|
||||||
|
;; Functions
|
||||||
|
(function_definition
|
||||||
|
declarator: (function_declarator
|
||||||
|
declarator: (identifier) @name))
|
||||||
|
|
||||||
|
;; Structs (C)
|
||||||
|
(struct_specifier name: (type_identifier) @name)
|
||||||
|
|
||||||
|
;; Classes (C++)
|
||||||
|
(class_specifier name: (type_identifier) @name)
|
||||||
|
|
||||||
|
;; Namespaces (C++)
|
||||||
|
(namespace_definition name: (identifier) @name)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Advanced Queries
|
||||||
|
|
||||||
|
### Wildcards
|
||||||
|
|
||||||
|
Use `_` to match any node:
|
||||||
|
|
||||||
|
```lisp
|
||||||
|
;; Any function with any name
|
||||||
|
(function_item name: (_) @name)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Alternatives
|
||||||
|
|
||||||
|
Match multiple patterns:
|
||||||
|
|
||||||
|
```lisp
|
||||||
|
;; Functions or methods
|
||||||
|
[(function_item) (impl_item)] @item
|
||||||
|
```
|
||||||
|
|
||||||
|
### Predicates
|
||||||
|
|
||||||
|
Filter matches:
|
||||||
|
|
||||||
|
```lisp
|
||||||
|
;; Functions starting with "test_"
|
||||||
|
(function_item name: (identifier) @name
|
||||||
|
(#match? @name "^test_"))
|
||||||
|
|
||||||
|
;; Functions NOT starting with "_"
|
||||||
|
(function_item name: (identifier) @name
|
||||||
|
(#not-match? @name "^_"))
|
||||||
|
```
|
||||||
|
|
||||||
|
### Nested Matches
|
||||||
|
|
||||||
|
```lisp
|
||||||
|
;; Methods inside impl blocks
|
||||||
|
(impl_item
|
||||||
|
body: (declaration_list
|
||||||
|
(function_item name: (identifier) @method_name)))
|
||||||
|
```
|
||||||
|
|
||||||
|
## Batch Searches
|
||||||
|
|
||||||
|
Run multiple searches in parallel:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{"tool": "code_search", "args": {
|
||||||
|
"searches": [
|
||||||
|
{
|
||||||
|
"name": "functions",
|
||||||
|
"query": "(function_item name: (identifier) @name)",
|
||||||
|
"language": "rust"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "structs",
|
||||||
|
"query": "(struct_item name: (type_identifier) @name)",
|
||||||
|
"language": "rust"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "tests",
|
||||||
|
"query": "(function_item name: (identifier) @name (#match? @name \"^test_\"))",
|
||||||
|
"language": "rust",
|
||||||
|
"paths": ["tests/"]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"max_concurrency": 4
|
||||||
|
}}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Context Lines
|
||||||
|
|
||||||
|
Include surrounding code:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{"tool": "code_search", "args": {
|
||||||
|
"searches": [{
|
||||||
|
"name": "functions",
|
||||||
|
"query": "(function_item name: (identifier) @name)",
|
||||||
|
"language": "rust",
|
||||||
|
"context_lines": 3
|
||||||
|
}]
|
||||||
|
}}
|
||||||
|
```
|
||||||
|
|
||||||
|
This shows 3 lines before and after each match.
|
||||||
|
|
||||||
|
## Path Filtering
|
||||||
|
|
||||||
|
Search specific directories:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{"tool": "code_search", "args": {
|
||||||
|
"searches": [{
|
||||||
|
"name": "core_functions",
|
||||||
|
"query": "(function_item name: (identifier) @name)",
|
||||||
|
"language": "rust",
|
||||||
|
"paths": ["src/core", "src/lib.rs"]
|
||||||
|
}]
|
||||||
|
}}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Output Format
|
||||||
|
|
||||||
|
Results include:
|
||||||
|
- File path
|
||||||
|
- Line number
|
||||||
|
- Matched code
|
||||||
|
- Context (if requested)
|
||||||
|
|
||||||
|
```
|
||||||
|
=== functions (15 matches) ===
|
||||||
|
|
||||||
|
src/lib.rs:42
|
||||||
|
fn process_request(req: Request) -> Response {
|
||||||
|
|
||||||
|
src/lib.rs:78
|
||||||
|
fn handle_error(err: Error) -> Result<()> {
|
||||||
|
|
||||||
|
src/utils.rs:15
|
||||||
|
fn format_output(data: &str) -> String {
|
||||||
|
```
|
||||||
|
|
||||||
|
## Tips
|
||||||
|
|
||||||
|
### Finding the Right Query
|
||||||
|
|
||||||
|
1. **Start simple**: Begin with basic node types
|
||||||
|
2. **Use AST explorer**: Understand your language's AST
|
||||||
|
3. **Iterate**: Refine queries based on results
|
||||||
|
|
||||||
|
### Performance
|
||||||
|
|
||||||
|
- **Limit paths**: Search specific directories when possible
|
||||||
|
- **Use concurrency**: Batch related searches
|
||||||
|
- **Set max_matches**: Prevent overwhelming output
|
||||||
|
|
||||||
|
### Debugging Queries
|
||||||
|
|
||||||
|
If a query returns no results:
|
||||||
|
1. Check language spelling (lowercase)
|
||||||
|
2. Verify node type names for your language
|
||||||
|
3. Start with simpler query, add constraints
|
||||||
|
4. Check if files exist in search paths
|
||||||
|
|
||||||
|
## Examples by Task
|
||||||
|
|
||||||
|
### Find all public functions in Rust
|
||||||
|
|
||||||
|
```json
|
||||||
|
{"tool": "code_search", "args": {
|
||||||
|
"searches": [{
|
||||||
|
"name": "public_fns",
|
||||||
|
"query": "(function_item (visibility_modifier) name: (identifier) @name)",
|
||||||
|
"language": "rust"
|
||||||
|
}]
|
||||||
|
}}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Find all test functions
|
||||||
|
|
||||||
|
```json
|
||||||
|
{"tool": "code_search", "args": {
|
||||||
|
"searches": [{
|
||||||
|
"name": "tests",
|
||||||
|
"query": "(function_item name: (identifier) @name (#match? @name \"^test_\"))",
|
||||||
|
"language": "rust",
|
||||||
|
"paths": ["tests/"]
|
||||||
|
}]
|
||||||
|
}}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Find all API endpoints (Python Flask)
|
||||||
|
|
||||||
|
```json
|
||||||
|
{"tool": "code_search", "args": {
|
||||||
|
"searches": [{
|
||||||
|
"name": "routes",
|
||||||
|
"query": "(decorated_definition (decorator) @dec (function_definition name: (identifier) @name))",
|
||||||
|
"language": "python"
|
||||||
|
}]
|
||||||
|
}}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Find all React components
|
||||||
|
|
||||||
|
```json
|
||||||
|
{"tool": "code_search", "args": {
|
||||||
|
"searches": [{
|
||||||
|
"name": "components",
|
||||||
|
"query": "(function_declaration name: (identifier) @name (#match? @name \"^[A-Z]\"))",
|
||||||
|
"language": "javascript",
|
||||||
|
"paths": ["src/components/"]
|
||||||
|
}]
|
||||||
|
}}
|
||||||
|
```
|
||||||
224
docs/CONTROL_COMMANDS.md
Normal file
224
docs/CONTROL_COMMANDS.md
Normal file
@@ -0,0 +1,224 @@
|
|||||||
|
# G3 Control Commands
|
||||||
|
|
||||||
|
**Last updated**: January 2025
|
||||||
|
**Source of truth**: `crates/g3-cli/src/lib.rs`
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
|
||||||
|
Control commands are special commands you can use during an interactive G3 session to manage context, refresh documentation, and view statistics. They start with `/` and are processed by the CLI, not sent to the LLM.
|
||||||
|
|
||||||
|
## Available Commands
|
||||||
|
|
||||||
|
| Command | Description |
|
||||||
|
|---------|-------------|
|
||||||
|
| `/compact` | Manually trigger conversation summarization |
|
||||||
|
| `/thinnify` | Replace large tool results with file references (first third) |
|
||||||
|
| `/skinnify` | Full context thinning (entire context window) |
|
||||||
|
| `/readme` | Reload README.md and AGENTS.md from disk |
|
||||||
|
| `/stats` | Show detailed context and performance statistics |
|
||||||
|
| `/help` | Display all available control commands |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## /compact
|
||||||
|
|
||||||
|
Manually trigger conversation summarization to reduce context size.
|
||||||
|
|
||||||
|
**When to use**:
|
||||||
|
- Context usage is getting high (70%+)
|
||||||
|
- You want to start a new phase of work
|
||||||
|
- Conversation has accumulated irrelevant history
|
||||||
|
|
||||||
|
**What it does**:
|
||||||
|
1. Sends conversation history to LLM for summarization
|
||||||
|
2. Replaces detailed history with concise summary
|
||||||
|
3. Preserves key decisions and context
|
||||||
|
4. Significantly reduces token usage
|
||||||
|
|
||||||
|
**Example**:
|
||||||
|
```
|
||||||
|
g3> /compact
|
||||||
|
📝 Compacting conversation history...
|
||||||
|
✅ Reduced context from 45,000 to 8,000 tokens (82% reduction)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Notes**:
|
||||||
|
- Summarization uses tokens, so there's a small cost
|
||||||
|
- Some detail is lost; use before major context shifts
|
||||||
|
- Auto-triggered at 80% context usage if `auto_compact = true`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## /thinnify
|
||||||
|
|
||||||
|
Replace large tool results with file references to save context space.
|
||||||
|
|
||||||
|
**When to use**:
|
||||||
|
- Large file contents are consuming context
|
||||||
|
- Tool outputs are taking up space
|
||||||
|
- You want to preserve conversation structure but reduce size
|
||||||
|
|
||||||
|
**What it does**:
|
||||||
|
1. Scans the first third of context for large tool results
|
||||||
|
2. Saves content to `.g3/sessions/<session>/thinned/`
|
||||||
|
3. Replaces inline content with file reference
|
||||||
|
4. Preserves the ability to re-read if needed
|
||||||
|
|
||||||
|
**Example**:
|
||||||
|
```
|
||||||
|
g3> /thinnify
|
||||||
|
🔧 Thinning context window...
|
||||||
|
✅ Thinned 3 large tool results, saved 12,000 characters
|
||||||
|
```
|
||||||
|
|
||||||
|
**Notes**:
|
||||||
|
- Only processes the first third of context (older content)
|
||||||
|
- Recent tool results are preserved inline
|
||||||
|
- Auto-triggered at 50%, 60%, 70%, 80% thresholds
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## /skinnify
|
||||||
|
|
||||||
|
Full context thinning - processes the entire context window.
|
||||||
|
|
||||||
|
**When to use**:
|
||||||
|
- Context is critically full
|
||||||
|
- `/thinnify` wasn't enough
|
||||||
|
- You need maximum space recovery
|
||||||
|
|
||||||
|
**What it does**:
|
||||||
|
- Same as `/thinnify` but processes entire context
|
||||||
|
- More aggressive space recovery
|
||||||
|
- May thin recent tool results too
|
||||||
|
|
||||||
|
**Example**:
|
||||||
|
```
|
||||||
|
g3> /skinnify
|
||||||
|
🔧 Full context thinning...
|
||||||
|
✅ Thinned 8 tool results, saved 35,000 characters
|
||||||
|
```
|
||||||
|
|
||||||
|
**Notes**:
|
||||||
|
- Use sparingly; may thin content you still need inline
|
||||||
|
- Consider `/compact` first for better context preservation
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## /readme
|
||||||
|
|
||||||
|
Reload README.md and AGENTS.md from disk without restarting.
|
||||||
|
|
||||||
|
**When to use**:
|
||||||
|
- You've updated project documentation
|
||||||
|
- AGENTS.md has new instructions
|
||||||
|
- README.md has changed
|
||||||
|
|
||||||
|
**What it does**:
|
||||||
|
1. Re-reads README.md from workspace root
|
||||||
|
2. Re-reads AGENTS.md from workspace root
|
||||||
|
3. Updates the agent's system context
|
||||||
|
4. New instructions take effect immediately
|
||||||
|
|
||||||
|
**Example**:
|
||||||
|
```
|
||||||
|
g3> /readme
|
||||||
|
📖 Reloading documentation...
|
||||||
|
✅ Loaded README.md (5,234 chars)
|
||||||
|
✅ Loaded AGENTS.md (2,100 chars)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Notes**:
|
||||||
|
- Useful during iterative documentation updates
|
||||||
|
- Changes apply to subsequent messages
|
||||||
|
- Previous context retains old documentation
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## /stats
|
||||||
|
|
||||||
|
Show detailed context and performance statistics.
|
||||||
|
|
||||||
|
**What it shows**:
|
||||||
|
- Current context usage (tokens and percentage)
|
||||||
|
- Session duration
|
||||||
|
- Token usage breakdown
|
||||||
|
- Tool call metrics
|
||||||
|
- Thinning and summarization events
|
||||||
|
- First-token latency statistics
|
||||||
|
|
||||||
|
**Example**:
|
||||||
|
```
|
||||||
|
g3> /stats
|
||||||
|
📊 Session Statistics
|
||||||
|
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||||
|
Context Usage: 45,230 / 200,000 tokens (22.6%)
|
||||||
|
Session Duration: 1h 23m 45s
|
||||||
|
Total Tokens Used: 125,430
|
||||||
|
Tool Calls: 47 (45 successful, 2 failed)
|
||||||
|
Thinning Events: 3 (saved 28,000 chars)
|
||||||
|
Summarizations: 1 (saved 35,000 chars)
|
||||||
|
Avg First Token: 1.2s
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## /help
|
||||||
|
|
||||||
|
Display all available control commands with brief descriptions.
|
||||||
|
|
||||||
|
**Example**:
|
||||||
|
```
|
||||||
|
g3> /help
|
||||||
|
📚 Available Commands:
|
||||||
|
/compact - Summarize conversation to reduce context
|
||||||
|
/thinnify - Replace large tool results with file refs
|
||||||
|
/skinnify - Full context thinning (entire window)
|
||||||
|
/readme - Reload README.md and AGENTS.md
|
||||||
|
/stats - Show context and performance statistics
|
||||||
|
/help - Show this help message
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Context Management Strategy
|
||||||
|
|
||||||
|
G3 automatically manages context, but manual intervention can help:
|
||||||
|
|
||||||
|
### Proactive Management
|
||||||
|
|
||||||
|
1. **Check stats regularly**: Use `/stats` to monitor usage
|
||||||
|
2. **Thin early**: Use `/thinnify` before hitting thresholds
|
||||||
|
3. **Compact at transitions**: Use `/compact` when switching tasks
|
||||||
|
|
||||||
|
### Reactive Management
|
||||||
|
|
||||||
|
When context gets high:
|
||||||
|
|
||||||
|
1. **50-70%**: Consider `/thinnify`
|
||||||
|
2. **70-80%**: Use `/compact`
|
||||||
|
3. **80-90%**: Use `/skinnify` then `/compact`
|
||||||
|
4. **90%+**: Auto-summarization triggers
|
||||||
|
|
||||||
|
### Best Practices
|
||||||
|
|
||||||
|
- **Long sessions**: Compact periodically to maintain quality
|
||||||
|
- **Large files**: Thin after reading large codebases
|
||||||
|
- **Documentation updates**: Use `/readme` instead of restarting
|
||||||
|
- **Before complex tasks**: Ensure adequate context space
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Automatic Context Management
|
||||||
|
|
||||||
|
G3 performs automatic context management:
|
||||||
|
|
||||||
|
| Threshold | Action |
|
||||||
|
|-----------|--------|
|
||||||
|
| 50% | Thin oldest third of context |
|
||||||
|
| 60% | Thin oldest third of context |
|
||||||
|
| 70% | Thin oldest third of context |
|
||||||
|
| 80% | Auto-summarization (if `auto_compact = true`) |
|
||||||
|
| 90% | Aggressive thinning before tool calls |
|
||||||
|
|
||||||
|
Manual commands give you finer control over when and how this happens.
|
||||||
397
docs/FLOCK_MODE.md
Normal file
397
docs/FLOCK_MODE.md
Normal file
@@ -0,0 +1,397 @@
|
|||||||
|
# G3 Flock Mode Guide
|
||||||
|
|
||||||
|
**Last updated**: January 2025
|
||||||
|
**Source of truth**: `crates/g3-ensembles/src/flock.rs`
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
|
||||||
|
Flock mode enables parallel multi-agent development by spawning multiple G3 agent instances that work on different parts of a project simultaneously. This is useful for large projects with modular architectures where independent components can be developed in parallel.
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
In Flock mode:
|
||||||
|
- Multiple agent instances run concurrently
|
||||||
|
- Each agent works on a specific module or component
|
||||||
|
- Agents operate independently but share the same codebase
|
||||||
|
- Progress is tracked and coordinated centrally
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────────────────────────────────────────────┐
|
||||||
|
│ Flock Coordinator │
|
||||||
|
│ │
|
||||||
|
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
|
||||||
|
│ │ Agent 1 │ │ Agent 2 │ │ Agent 3 │ │ Agent N │ │
|
||||||
|
│ │ Module A│ │ Module B│ │ Module C│ │ Module N│ │
|
||||||
|
│ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │
|
||||||
|
│ │ │ │ │ │
|
||||||
|
│ ▼ ▼ ▼ ▼ │
|
||||||
|
│ ┌─────────────────────────────────────────────────┐ │
|
||||||
|
│ │ Shared Codebase │ │
|
||||||
|
│ └─────────────────────────────────────────────────┘ │
|
||||||
|
└─────────────────────────────────────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
## When to Use Flock Mode
|
||||||
|
|
||||||
|
**Good candidates**:
|
||||||
|
- Microservices architectures
|
||||||
|
- Projects with independent modules
|
||||||
|
- Large refactoring across multiple files
|
||||||
|
- Parallel feature development
|
||||||
|
- Test suite expansion
|
||||||
|
|
||||||
|
**Not recommended for**:
|
||||||
|
- Tightly coupled code
|
||||||
|
- Sequential dependencies
|
||||||
|
- Small projects
|
||||||
|
- Single-file changes
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
Flock mode is configured through a YAML manifest file:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# flock.yaml
|
||||||
|
name: "my-project-flock"
|
||||||
|
description: "Parallel development of project modules"
|
||||||
|
|
||||||
|
# Global settings
|
||||||
|
settings:
|
||||||
|
max_agents: 4
|
||||||
|
timeout_minutes: 60
|
||||||
|
provider: "anthropic.default"
|
||||||
|
|
||||||
|
# Agent definitions
|
||||||
|
agents:
|
||||||
|
- name: "api-agent"
|
||||||
|
description: "Develops the REST API layer"
|
||||||
|
working_dir: "src/api"
|
||||||
|
requirements: |
|
||||||
|
Implement REST endpoints for user management:
|
||||||
|
- GET /users
|
||||||
|
- POST /users
|
||||||
|
- GET /users/{id}
|
||||||
|
- PUT /users/{id}
|
||||||
|
- DELETE /users/{id}
|
||||||
|
|
||||||
|
- name: "db-agent"
|
||||||
|
description: "Develops the database layer"
|
||||||
|
working_dir: "src/db"
|
||||||
|
requirements: |
|
||||||
|
Implement database models and queries:
|
||||||
|
- User model with CRUD operations
|
||||||
|
- Connection pooling
|
||||||
|
- Migration support
|
||||||
|
|
||||||
|
- name: "test-agent"
|
||||||
|
description: "Writes integration tests"
|
||||||
|
working_dir: "tests"
|
||||||
|
requirements: |
|
||||||
|
Write integration tests for:
|
||||||
|
- API endpoints
|
||||||
|
- Database operations
|
||||||
|
- Error handling
|
||||||
|
```
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
### Starting a Flock
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Start flock with manifest
|
||||||
|
g3 --flock flock.yaml
|
||||||
|
|
||||||
|
# Start with specific agents only
|
||||||
|
g3 --flock flock.yaml --agents api-agent,db-agent
|
||||||
|
|
||||||
|
# Start with custom timeout
|
||||||
|
g3 --flock flock.yaml --timeout 120
|
||||||
|
```
|
||||||
|
|
||||||
|
### Monitoring Progress
|
||||||
|
|
||||||
|
Flock mode provides real-time status updates:
|
||||||
|
|
||||||
|
```
|
||||||
|
🐦 Flock Status: my-project-flock
|
||||||
|
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||||
|
|
||||||
|
api-agent [████████░░] 80% Implementing DELETE endpoint
|
||||||
|
db-agent [██████████] 100% ✅ Complete
|
||||||
|
test-agent [██████░░░░] 60% Writing error handling tests
|
||||||
|
|
||||||
|
Elapsed: 15m 32s | Tokens: 45,230 | Errors: 0
|
||||||
|
```
|
||||||
|
|
||||||
|
### Stopping a Flock
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Graceful stop (wait for current tasks)
|
||||||
|
Ctrl+C
|
||||||
|
|
||||||
|
# Force stop all agents
|
||||||
|
Ctrl+C Ctrl+C
|
||||||
|
```
|
||||||
|
|
||||||
|
## Agent Communication
|
||||||
|
|
||||||
|
Agents in a flock operate independently but can:
|
||||||
|
|
||||||
|
1. **Read shared files**: All agents can read the entire codebase
|
||||||
|
2. **Write to their area**: Each agent writes to its designated working directory
|
||||||
|
3. **Signal completion**: Agents report when their tasks are done
|
||||||
|
4. **Report errors**: Failures are logged and can trigger coordinator action
|
||||||
|
|
||||||
|
### Conflict Prevention
|
||||||
|
|
||||||
|
To prevent conflicts:
|
||||||
|
- Assign non-overlapping working directories
|
||||||
|
- Use clear module boundaries
|
||||||
|
- Define explicit interfaces between modules
|
||||||
|
- Run integration after all agents complete
|
||||||
|
|
||||||
|
## Status Tracking
|
||||||
|
|
||||||
|
Flock status is tracked in `.g3/flock/`:
|
||||||
|
|
||||||
|
```
|
||||||
|
.g3/flock/
|
||||||
|
├── status.json # Overall flock status
|
||||||
|
├── api-agent/
|
||||||
|
│ ├── session.json # Agent session log
|
||||||
|
│ └── todo.g3.md # Agent's TODO list
|
||||||
|
├── db-agent/
|
||||||
|
│ ├── session.json
|
||||||
|
│ └── todo.g3.md
|
||||||
|
└── test-agent/
|
||||||
|
├── session.json
|
||||||
|
└── todo.g3.md
|
||||||
|
```
|
||||||
|
|
||||||
|
### Status File Format
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"flock_name": "my-project-flock",
|
||||||
|
"started_at": "2025-01-03T10:00:00Z",
|
||||||
|
"status": "running",
|
||||||
|
"agents": [
|
||||||
|
{
|
||||||
|
"name": "api-agent",
|
||||||
|
"status": "running",
|
||||||
|
"progress": 80,
|
||||||
|
"current_task": "Implementing DELETE endpoint",
|
||||||
|
"tokens_used": 15000,
|
||||||
|
"errors": 0
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Best Practices
|
||||||
|
|
||||||
|
### 1. Define Clear Boundaries
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# Good: Clear module separation
|
||||||
|
agents:
|
||||||
|
- name: "frontend"
|
||||||
|
working_dir: "src/frontend"
|
||||||
|
- name: "backend"
|
||||||
|
working_dir: "src/backend"
|
||||||
|
|
||||||
|
# Bad: Overlapping directories
|
||||||
|
agents:
|
||||||
|
- name: "agent1"
|
||||||
|
working_dir: "src"
|
||||||
|
- name: "agent2"
|
||||||
|
working_dir: "src/utils" # Overlaps with agent1!
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Specify Interfaces First
|
||||||
|
|
||||||
|
Define shared interfaces before parallel development:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
agents:
|
||||||
|
- name: "interface-agent"
|
||||||
|
priority: 1 # Runs first
|
||||||
|
requirements: |
|
||||||
|
Define shared interfaces in src/interfaces/:
|
||||||
|
- UserService trait
|
||||||
|
- DatabaseConnection trait
|
||||||
|
- Error types
|
||||||
|
|
||||||
|
- name: "impl-agent"
|
||||||
|
priority: 2 # Runs after interfaces
|
||||||
|
depends_on: ["interface-agent"]
|
||||||
|
requirements: |
|
||||||
|
Implement UserService trait...
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Use Appropriate Granularity
|
||||||
|
|
||||||
|
- **Too few agents**: Doesn't leverage parallelism
|
||||||
|
- **Too many agents**: Coordination overhead, potential conflicts
|
||||||
|
- **Sweet spot**: 2-6 agents for most projects
|
||||||
|
|
||||||
|
### 4. Include a Test Agent
|
||||||
|
|
||||||
|
Always include an agent for testing:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
agents:
|
||||||
|
- name: "test-agent"
|
||||||
|
working_dir: "tests"
|
||||||
|
requirements: |
|
||||||
|
Write tests for all new functionality.
|
||||||
|
Run tests after other agents complete.
|
||||||
|
```
|
||||||
|
|
||||||
|
### 5. Plan for Integration
|
||||||
|
|
||||||
|
After flock completion:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Run all tests
|
||||||
|
cargo test
|
||||||
|
|
||||||
|
# Check for conflicts
|
||||||
|
git status
|
||||||
|
|
||||||
|
# Review changes
|
||||||
|
git diff
|
||||||
|
```
|
||||||
|
|
||||||
|
## Error Handling
|
||||||
|
|
||||||
|
### Agent Failures
|
||||||
|
|
||||||
|
If an agent fails:
|
||||||
|
1. Error is logged to agent's session
|
||||||
|
2. Coordinator is notified
|
||||||
|
3. Other agents continue (by default)
|
||||||
|
4. Failed agent can be restarted
|
||||||
|
|
||||||
|
### Restart Failed Agent
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Restart specific agent
|
||||||
|
g3 --flock flock.yaml --restart api-agent
|
||||||
|
|
||||||
|
# Restart all failed agents
|
||||||
|
g3 --flock flock.yaml --restart-failed
|
||||||
|
```
|
||||||
|
|
||||||
|
### Conflict Resolution
|
||||||
|
|
||||||
|
If agents modify the same file:
|
||||||
|
1. Last write wins (by default)
|
||||||
|
2. Conflicts are logged
|
||||||
|
3. Manual resolution may be needed
|
||||||
|
|
||||||
|
## Resource Management
|
||||||
|
|
||||||
|
### Token Usage
|
||||||
|
|
||||||
|
Each agent has its own token budget:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
settings:
|
||||||
|
max_tokens_per_agent: 100000
|
||||||
|
total_token_budget: 500000
|
||||||
|
```
|
||||||
|
|
||||||
|
### Concurrency
|
||||||
|
|
||||||
|
Limit concurrent agents based on:
|
||||||
|
- API rate limits
|
||||||
|
- System resources
|
||||||
|
- Provider capacity
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
settings:
|
||||||
|
max_concurrent_agents: 3 # Run at most 3 at once
|
||||||
|
```
|
||||||
|
|
||||||
|
## Example: Microservices Project
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
name: "microservices-flock"
|
||||||
|
|
||||||
|
settings:
|
||||||
|
max_agents: 5
|
||||||
|
provider: "anthropic.default"
|
||||||
|
|
||||||
|
agents:
|
||||||
|
- name: "user-service"
|
||||||
|
working_dir: "services/user"
|
||||||
|
requirements: |
|
||||||
|
Implement user service:
|
||||||
|
- User registration
|
||||||
|
- Authentication
|
||||||
|
- Profile management
|
||||||
|
|
||||||
|
- name: "order-service"
|
||||||
|
working_dir: "services/order"
|
||||||
|
requirements: |
|
||||||
|
Implement order service:
|
||||||
|
- Order creation
|
||||||
|
- Order status tracking
|
||||||
|
- Payment integration
|
||||||
|
|
||||||
|
- name: "inventory-service"
|
||||||
|
working_dir: "services/inventory"
|
||||||
|
requirements: |
|
||||||
|
Implement inventory service:
|
||||||
|
- Stock management
|
||||||
|
- Availability checking
|
||||||
|
- Reorder alerts
|
||||||
|
|
||||||
|
- name: "gateway"
|
||||||
|
working_dir: "services/gateway"
|
||||||
|
requirements: |
|
||||||
|
Implement API gateway:
|
||||||
|
- Request routing
|
||||||
|
- Authentication middleware
|
||||||
|
- Rate limiting
|
||||||
|
|
||||||
|
- name: "integration-tests"
|
||||||
|
working_dir: "tests/integration"
|
||||||
|
depends_on: ["user-service", "order-service", "inventory-service", "gateway"]
|
||||||
|
requirements: |
|
||||||
|
Write integration tests for:
|
||||||
|
- End-to-end order flow
|
||||||
|
- Service communication
|
||||||
|
- Error scenarios
|
||||||
|
```
|
||||||
|
|
||||||
|
## Limitations
|
||||||
|
|
||||||
|
- **No real-time coordination**: Agents don't communicate during execution
|
||||||
|
- **File conflicts**: Possible if boundaries aren't clear
|
||||||
|
- **Resource intensive**: Multiple LLM calls in parallel
|
||||||
|
- **Debugging complexity**: Multiple logs to review
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### Agents Not Starting
|
||||||
|
|
||||||
|
1. Check manifest syntax (YAML)
|
||||||
|
2. Verify working directories exist
|
||||||
|
3. Check provider configuration
|
||||||
|
4. Review logs in `.g3/flock/`
|
||||||
|
|
||||||
|
### Slow Progress
|
||||||
|
|
||||||
|
1. Reduce number of concurrent agents
|
||||||
|
2. Check for rate limiting
|
||||||
|
3. Simplify requirements
|
||||||
|
4. Use faster provider
|
||||||
|
|
||||||
|
### Inconsistent Results
|
||||||
|
|
||||||
|
1. Define clearer interfaces
|
||||||
|
2. Add more specific requirements
|
||||||
|
3. Use lower temperature
|
||||||
|
4. Add validation steps
|
||||||
363
docs/architecture.md
Normal file
363
docs/architecture.md
Normal file
@@ -0,0 +1,363 @@
|
|||||||
|
# G3 Architecture
|
||||||
|
|
||||||
|
**Last updated**: January 2025
|
||||||
|
**Source of truth**: Crate structure in `crates/`, `Cargo.toml`, `DESIGN.md`
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
|
||||||
|
This document describes the internal architecture of G3, a modular AI coding agent built in Rust. It is intended for developers who want to understand, extend, or maintain the codebase.
|
||||||
|
|
||||||
|
## High-Level Overview
|
||||||
|
|
||||||
|
G3 follows a **tool-first philosophy**: instead of just providing advice, it actively uses tools to read files, write code, execute commands, and complete tasks autonomously.
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
|
||||||
|
│ g3-cli │ │ g3-core │ │ g3-providers │
|
||||||
|
│ │ │ │ │ │
|
||||||
|
│ • CLI parsing │◄──►│ • Agent engine │◄──►│ • Anthropic │
|
||||||
|
│ • Interactive │ │ • Context mgmt │ │ • Databricks │
|
||||||
|
│ • Retro TUI │ │ • Tool system │ │ • OpenAI │
|
||||||
|
│ • Autonomous │ │ • Streaming │ │ • Embedded │
|
||||||
|
│ mode │ │ • Task exec │ │ (llama.cpp) │
|
||||||
|
│ │ │ • TODO mgmt │ │ • OAuth flow │
|
||||||
|
└─────────────────┘ └─────────────────┘ └─────────────────┘
|
||||||
|
│ │ │
|
||||||
|
└───────────────────────┼───────────────────────┘
|
||||||
|
│
|
||||||
|
┌───────────────────────┼───────────────────────┐
|
||||||
|
│ │ │
|
||||||
|
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
|
||||||
|
│ g3-execution │ │ g3-config │ │ g3-planner │
|
||||||
|
│ │ │ │ │ │
|
||||||
|
│ • Code exec │ │ • TOML config │ │ • Requirements │
|
||||||
|
│ • Shell cmds │ │ • Env overrides │ │ • Git ops │
|
||||||
|
│ • Streaming │ │ • Provider │ │ • Planning │
|
||||||
|
│ • Error hdlg │ │ settings │ │ workflow │
|
||||||
|
└─────────────────┘ └─────────────────┘ └─────────────────┘
|
||||||
|
│ │ │
|
||||||
|
│ ┌─────────────────┐ │
|
||||||
|
│ │ g3-computer- │ │
|
||||||
|
└─────────────►│ control │◄─────────────┘
|
||||||
|
│ • Mouse/kbd │
|
||||||
|
│ • Screenshots │
|
||||||
|
│ • OCR/Vision │
|
||||||
|
│ • WebDriver │
|
||||||
|
│ • macOS Ax API │
|
||||||
|
└─────────────────┘
|
||||||
|
│
|
||||||
|
┌───────────────────────┼───────────────────────┐
|
||||||
|
│ │ │
|
||||||
|
┌─────────────────┐ ┌─────────────────┐
|
||||||
|
│ g3-ensembles │ │ g3-console │
|
||||||
|
│ │ │ │
|
||||||
|
│ • Flock mode │ │ • Web console │
|
||||||
|
│ • Multi-agent │ │ • Process mgmt │
|
||||||
|
│ • Parallel dev │ │ • Log viewing │
|
||||||
|
└─────────────────┘ └─────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
## Workspace Structure
|
||||||
|
|
||||||
|
G3 is organized as a Rust workspace with 9 crates:
|
||||||
|
|
||||||
|
```
|
||||||
|
g3/
|
||||||
|
├── src/main.rs # Entry point (delegates to g3-cli)
|
||||||
|
├── crates/
|
||||||
|
│ ├── g3-cli/ # Command-line interface and TUI
|
||||||
|
│ ├── g3-core/ # Core agent engine and tools
|
||||||
|
│ ├── g3-providers/ # LLM provider abstractions
|
||||||
|
│ ├── g3-config/ # Configuration management
|
||||||
|
│ ├── g3-execution/ # Code execution engine
|
||||||
|
│ ├── g3-computer-control/ # Computer automation
|
||||||
|
│ ├── g3-planner/ # Planning mode workflow
|
||||||
|
│ ├── g3-ensembles/ # Multi-agent (flock) mode
|
||||||
|
│ └── g3-console/ # Web monitoring console
|
||||||
|
├── agents/ # Agent persona definitions
|
||||||
|
├── logs/ # Session logs (auto-created)
|
||||||
|
└── g3-plan/ # Planning artifacts
|
||||||
|
```
|
||||||
|
|
||||||
|
## Crate Responsibilities
|
||||||
|
|
||||||
|
### g3-core (Central Hub)
|
||||||
|
|
||||||
|
**Location**: `crates/g3-core/`
|
||||||
|
**Purpose**: Core agent engine, tool system, and orchestration logic
|
||||||
|
|
||||||
|
Key modules:
|
||||||
|
- `lib.rs` - Main `Agent` struct and orchestration (~3400 lines)
|
||||||
|
- `context_window.rs` - Token tracking and context management
|
||||||
|
- `streaming_parser.rs` - Real-time LLM response parsing
|
||||||
|
- `tool_definitions.rs` - JSON schema definitions for all tools
|
||||||
|
- `tool_dispatch.rs` - Routes tool calls to implementations
|
||||||
|
- `tools/` - Tool implementations (file ops, shell, vision, webdriver, etc.)
|
||||||
|
- `error_handling.rs` - Error classification and recovery
|
||||||
|
- `retry.rs` - Retry logic with exponential backoff
|
||||||
|
- `prompts.rs` - System prompt generation
|
||||||
|
- `code_search/` - Tree-sitter based code search
|
||||||
|
|
||||||
|
**Key types**:
|
||||||
|
- `Agent<W: UiWriter>` - Main agent struct, generic over UI output
|
||||||
|
- `ContextWindow` - Manages conversation history and token limits
|
||||||
|
- `StreamingToolParser` - Parses streaming LLM responses for tool calls
|
||||||
|
- `ToolCall` - Represents a tool invocation
|
||||||
|
|
||||||
|
### g3-providers (LLM Abstraction)
|
||||||
|
|
||||||
|
**Location**: `crates/g3-providers/`
|
||||||
|
**Purpose**: Unified interface for multiple LLM backends
|
||||||
|
|
||||||
|
Key modules:
|
||||||
|
- `lib.rs` - `LLMProvider` trait and `ProviderRegistry`
|
||||||
|
- `anthropic.rs` - Anthropic Claude API (~51k chars)
|
||||||
|
- `databricks.rs` - Databricks Foundation Models (~58k chars)
|
||||||
|
- `openai.rs` - OpenAI and compatible APIs (~18k chars)
|
||||||
|
- `embedded.rs` - Local models via llama.cpp (~34k chars)
|
||||||
|
- `oauth.rs` - OAuth authentication flow
|
||||||
|
|
||||||
|
**Key traits**:
|
||||||
|
```rust
|
||||||
|
#[async_trait]
|
||||||
|
pub trait LLMProvider: Send + Sync {
|
||||||
|
async fn complete(&self, request: CompletionRequest) -> Result<CompletionResponse>;
|
||||||
|
async fn stream(&self, request: CompletionRequest) -> Result<CompletionStream>;
|
||||||
|
fn name(&self) -> &str;
|
||||||
|
fn model(&self) -> &str;
|
||||||
|
fn has_native_tool_calling(&self) -> bool;
|
||||||
|
fn supports_cache_control(&self) -> bool;
|
||||||
|
fn max_tokens(&self) -> u32;
|
||||||
|
fn temperature(&self) -> f32;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### g3-cli (User Interface)
|
||||||
|
|
||||||
|
**Location**: `crates/g3-cli/`
|
||||||
|
**Purpose**: Command-line interface, TUI, and execution modes
|
||||||
|
|
||||||
|
Key modules:
|
||||||
|
- `lib.rs` - Main CLI logic and execution modes (~112k chars)
|
||||||
|
- `retro_tui.rs` - Full-screen retro terminal UI (~63k chars)
|
||||||
|
- `filter_json.rs` - JSON tool call filtering for display
|
||||||
|
- `ui_writer_impl.rs` - Console output implementation
|
||||||
|
- `theme.rs` - Color themes for retro mode
|
||||||
|
|
||||||
|
**Execution modes**:
|
||||||
|
1. **Single-shot**: `g3 "task description"` - Execute one task and exit
|
||||||
|
2. **Interactive**: `g3` - REPL-style conversation (default)
|
||||||
|
3. **Autonomous**: `g3 --autonomous` - Coach-player feedback loop
|
||||||
|
4. **Accumulative**: Default interactive mode with autonomous runs
|
||||||
|
5. **Planning**: `g3 --planning` - Requirements-driven development
|
||||||
|
6. **Retro TUI**: `g3 --retro` - Full-screen terminal interface
|
||||||
|
|
||||||
|
### g3-config (Configuration)
|
||||||
|
|
||||||
|
**Location**: `crates/g3-config/`
|
||||||
|
**Purpose**: TOML-based configuration management
|
||||||
|
|
||||||
|
Key structures:
|
||||||
|
- `Config` - Root configuration
|
||||||
|
- `ProvidersConfig` - Provider settings with named configs
|
||||||
|
- `AgentConfig` - Agent behavior settings
|
||||||
|
- `WebDriverConfig` - Browser automation settings
|
||||||
|
- `MacAxConfig` - macOS Accessibility API settings
|
||||||
|
|
||||||
|
**Configuration hierarchy** (highest priority last):
|
||||||
|
1. Default configuration
|
||||||
|
2. `~/.config/g3/config.toml`
|
||||||
|
3. `./g3.toml`
|
||||||
|
4. Environment variables (`G3_*`)
|
||||||
|
5. CLI arguments
|
||||||
|
|
||||||
|
### g3-execution (Code Execution)
|
||||||
|
|
||||||
|
**Location**: `crates/g3-execution/`
|
||||||
|
**Purpose**: Safe execution of shell commands and scripts
|
||||||
|
|
||||||
|
Features:
|
||||||
|
- Streaming output capture
|
||||||
|
- Exit code tracking
|
||||||
|
- Async execution via Tokio
|
||||||
|
- Error handling and formatting
|
||||||
|
|
||||||
|
### g3-computer-control (Automation)
|
||||||
|
|
||||||
|
**Location**: `crates/g3-computer-control/`
|
||||||
|
**Purpose**: Cross-platform computer control and automation
|
||||||
|
|
||||||
|
Key modules:
|
||||||
|
- `platform/` - Platform-specific implementations (macOS, Linux, Windows)
|
||||||
|
- `webdriver/` - Safari and Chrome WebDriver integration
|
||||||
|
- `ocr/` - Text extraction (Tesseract, Apple Vision)
|
||||||
|
- `macax/` - macOS Accessibility API controller
|
||||||
|
|
||||||
|
**Platform support**:
|
||||||
|
- **macOS**: Core Graphics, Cocoa, screencapture, Vision framework
|
||||||
|
- **Linux**: X11/Xtest for input
|
||||||
|
- **Windows**: Win32 APIs
|
||||||
|
|
||||||
|
### g3-planner (Planning Mode)
|
||||||
|
|
||||||
|
**Location**: `crates/g3-planner/`
|
||||||
|
**Purpose**: Requirements-driven development workflow
|
||||||
|
|
||||||
|
Key modules:
|
||||||
|
- `planner.rs` - Main planning state machine (~40k chars)
|
||||||
|
- `state.rs` - Planning state management
|
||||||
|
- `git.rs` - Git operations
|
||||||
|
- `code_explore.rs` - Codebase exploration
|
||||||
|
- `llm.rs` - LLM interactions for planning
|
||||||
|
- `history.rs` - Planning history tracking
|
||||||
|
|
||||||
|
**Workflow**:
|
||||||
|
1. Write requirements in `<codepath>/g3-plan/new_requirements.md`
|
||||||
|
2. LLM refines requirements
|
||||||
|
3. Requirements renamed to `current_requirements.md`
|
||||||
|
4. Coach/player loop implements
|
||||||
|
5. Files archived with timestamps
|
||||||
|
6. Git commit with LLM-generated message
|
||||||
|
|
||||||
|
### g3-ensembles (Multi-Agent)
|
||||||
|
|
||||||
|
**Location**: `crates/g3-ensembles/`
|
||||||
|
**Purpose**: Parallel multi-agent development (Flock mode)
|
||||||
|
|
||||||
|
Key modules:
|
||||||
|
- `flock.rs` - Flock orchestration (~43k chars)
|
||||||
|
- `status.rs` - Agent status tracking
|
||||||
|
|
||||||
|
Flock mode enables parallel development by spawning multiple agent instances working on different parts of a project.
|
||||||
|
|
||||||
|
### g3-console (Web Console)
|
||||||
|
|
||||||
|
**Location**: `crates/g3-console/`
|
||||||
|
**Purpose**: Web-based monitoring and control
|
||||||
|
|
||||||
|
Key modules:
|
||||||
|
- `main.rs` - Axum web server
|
||||||
|
- `api/` - REST API endpoints
|
||||||
|
- `process/` - Process detection and control
|
||||||
|
- `logs.rs` - Log parsing and streaming
|
||||||
|
|
||||||
|
## Data Flow
|
||||||
|
|
||||||
|
### Request Flow
|
||||||
|
|
||||||
|
```
|
||||||
|
User Input
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────┐
|
||||||
|
│ g3-cli │ Parse input, determine mode
|
||||||
|
└─────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────┐
|
||||||
|
│ g3-core │ Add to context window
|
||||||
|
│ Agent │ Build completion request
|
||||||
|
└─────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────┐
|
||||||
|
│ g3-providers│ Send to LLM provider
|
||||||
|
│ Registry │ Stream response
|
||||||
|
└─────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────┐
|
||||||
|
│ g3-core │ Parse streaming response
|
||||||
|
│ Parser │ Detect tool calls
|
||||||
|
└─────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────┐
|
||||||
|
│ g3-core │ Execute tools
|
||||||
|
│ Tools │ Return results
|
||||||
|
└─────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────┐
|
||||||
|
│ g3-core │ Add results to context
|
||||||
|
│ Agent │ Continue or complete
|
||||||
|
└─────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
### Context Window Management
|
||||||
|
|
||||||
|
The `ContextWindow` struct manages conversation history with intelligent token tracking:
|
||||||
|
|
||||||
|
1. **Token Tracking**: Monitors usage as percentage of provider's context limit
|
||||||
|
2. **Context Thinning**: At 50%, 60%, 70%, 80% thresholds, replaces large tool results with file references
|
||||||
|
3. **Auto-Summarization**: At 80% capacity, triggers conversation summarization
|
||||||
|
4. **Provider Adaptation**: Adjusts to different model context windows (4k to 200k+ tokens)
|
||||||
|
|
||||||
|
## Error Handling
|
||||||
|
|
||||||
|
G3 implements comprehensive error handling:
|
||||||
|
|
||||||
|
1. **Error Classification**: Distinguishes recoverable vs non-recoverable errors
|
||||||
|
2. **Automatic Retry**: Exponential backoff with jitter for:
|
||||||
|
- Rate limits (HTTP 429)
|
||||||
|
- Network errors
|
||||||
|
- Server errors (HTTP 5xx)
|
||||||
|
- Timeouts
|
||||||
|
3. **Error Logging**: Detailed logs saved to `logs/errors/`
|
||||||
|
4. **Graceful Degradation**: Continues when possible, fails gracefully when not
|
||||||
|
|
||||||
|
## Session Management
|
||||||
|
|
||||||
|
Sessions are tracked in `.g3/sessions/<session_id>/`:
|
||||||
|
- `session.json` - Full conversation history and metadata
|
||||||
|
- `todo.g3.md` - Session-scoped TODO list
|
||||||
|
- Context summaries and thinned content
|
||||||
|
|
||||||
|
Legacy logs are stored in `logs/g3_session_*.json`.
|
||||||
|
|
||||||
|
## Extension Points
|
||||||
|
|
||||||
|
### Adding a New Tool
|
||||||
|
|
||||||
|
1. Add tool definition in `g3-core/src/tool_definitions.rs`
|
||||||
|
2. Implement handler in `g3-core/src/tools/`
|
||||||
|
3. Add dispatch case in `g3-core/src/tool_dispatch.rs`
|
||||||
|
4. Update system prompt if needed in `g3-core/src/prompts.rs`
|
||||||
|
|
||||||
|
### Adding a New Provider
|
||||||
|
|
||||||
|
1. Implement `LLMProvider` trait in `g3-providers/src/`
|
||||||
|
2. Add configuration struct in `g3-config/src/lib.rs`
|
||||||
|
3. Register provider in `g3-core/src/lib.rs` (in `new_with_mode_and_readme`)
|
||||||
|
4. Update documentation
|
||||||
|
|
||||||
|
### Adding a New Execution Mode
|
||||||
|
|
||||||
|
1. Add CLI arguments in `g3-cli/src/lib.rs`
|
||||||
|
2. Implement mode logic in the CLI
|
||||||
|
3. May require new agent methods in `g3-core`
|
||||||
|
|
||||||
|
## Key Files for Understanding
|
||||||
|
|
||||||
|
Start reading here:
|
||||||
|
|
||||||
|
1. `src/main.rs` - Entry point (trivial, delegates to g3-cli)
|
||||||
|
2. `crates/g3-cli/src/lib.rs` - CLI and execution modes
|
||||||
|
3. `crates/g3-core/src/lib.rs` - Agent implementation
|
||||||
|
4. `crates/g3-providers/src/lib.rs` - Provider trait and registry
|
||||||
|
5. `crates/g3-core/src/tool_definitions.rs` - Available tools
|
||||||
|
6. `crates/g3-config/src/lib.rs` - Configuration structures
|
||||||
|
7. `DESIGN.md` - Original design document
|
||||||
|
|
||||||
|
## Dependencies
|
||||||
|
|
||||||
|
Key external dependencies:
|
||||||
|
|
||||||
|
- **tokio**: Async runtime
|
||||||
|
- **reqwest**: HTTP client for API calls
|
||||||
|
- **serde/serde_json**: Serialization
|
||||||
|
- **clap**: CLI argument parsing
|
||||||
|
- **tree-sitter**: Syntax-aware code search
|
||||||
|
- **llama_cpp**: Local model inference (with Metal acceleration)
|
||||||
|
- **fantoccini**: WebDriver client
|
||||||
|
- **axum**: Web framework (for g3-console)
|
||||||
385
docs/configuration.md
Normal file
385
docs/configuration.md
Normal file
@@ -0,0 +1,385 @@
|
|||||||
|
# G3 Configuration Guide
|
||||||
|
|
||||||
|
**Last updated**: January 2025
|
||||||
|
**Source of truth**: `crates/g3-config/src/lib.rs`, `config.example.toml`
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
|
||||||
|
This document explains how to configure G3, including provider setup, agent behavior, and optional features like WebDriver and computer control.
|
||||||
|
|
||||||
|
## Configuration File Location
|
||||||
|
|
||||||
|
G3 looks for configuration files in this order:
|
||||||
|
|
||||||
|
1. Path specified via `--config` CLI argument
|
||||||
|
2. `./g3.toml` (current directory)
|
||||||
|
3. `~/.config/g3/config.toml` (user config)
|
||||||
|
4. `~/.g3.toml` (legacy location)
|
||||||
|
|
||||||
|
If no configuration file exists, G3 creates a default one at `~/.config/g3/config.toml` on first run.
|
||||||
|
|
||||||
|
## Configuration Format
|
||||||
|
|
||||||
|
G3 uses TOML format. The configuration is organized into sections:
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[providers] # LLM provider settings
|
||||||
|
[agent] # Agent behavior settings
|
||||||
|
[computer_control] # Mouse/keyboard automation
|
||||||
|
[webdriver] # Browser automation
|
||||||
|
[macax] # macOS Accessibility API
|
||||||
|
```
|
||||||
|
|
||||||
|
## Provider Configuration
|
||||||
|
|
||||||
|
### Provider Reference Format
|
||||||
|
|
||||||
|
Providers are referenced using the format: `<provider_type>.<config_name>`
|
||||||
|
|
||||||
|
Examples:
|
||||||
|
- `anthropic.default`
|
||||||
|
- `databricks.production`
|
||||||
|
- `openai.gpt4`
|
||||||
|
- `embedded.local`
|
||||||
|
|
||||||
|
### Basic Provider Setup
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[providers]
|
||||||
|
# Default provider used for all operations
|
||||||
|
default_provider = "anthropic.default"
|
||||||
|
|
||||||
|
# Optional: Different providers for different roles
|
||||||
|
# planner = "anthropic.planner" # Planning mode
|
||||||
|
# coach = "anthropic.default" # Code reviewer in autonomous mode
|
||||||
|
# player = "anthropic.default" # Code implementer in autonomous mode
|
||||||
|
```
|
||||||
|
|
||||||
|
### Anthropic Configuration
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[providers.anthropic.default]
|
||||||
|
api_key = "sk-ant-..." # Required: Your Anthropic API key
|
||||||
|
model = "claude-sonnet-4-5" # Model to use
|
||||||
|
max_tokens = 64000 # Max output tokens per request
|
||||||
|
temperature = 0.3 # Sampling temperature (0.0-1.0)
|
||||||
|
# cache_config = "ephemeral" # Optional: Enable prompt caching
|
||||||
|
# enable_1m_context = true # Optional: Enable 1M context (extra cost)
|
||||||
|
# thinking_budget_tokens = 10000 # Optional: Extended thinking mode
|
||||||
|
```
|
||||||
|
|
||||||
|
**Available Anthropic models**:
|
||||||
|
- `claude-sonnet-4-5` (recommended)
|
||||||
|
- `claude-opus-4-5`
|
||||||
|
- `claude-3-5-sonnet-20241022`
|
||||||
|
- `claude-3-opus-20240229`
|
||||||
|
|
||||||
|
### Databricks Configuration
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[providers.databricks.default]
|
||||||
|
host = "https://your-workspace.cloud.databricks.com" # Required
|
||||||
|
model = "databricks-claude-sonnet-4" # Model endpoint
|
||||||
|
max_tokens = 4096
|
||||||
|
temperature = 0.1
|
||||||
|
use_oauth = true # Use OAuth (recommended)
|
||||||
|
# token = "dapi..." # Or use personal access token
|
||||||
|
```
|
||||||
|
|
||||||
|
**OAuth vs Token Authentication**:
|
||||||
|
- **OAuth** (`use_oauth = true`): Opens browser for authentication, tokens refresh automatically
|
||||||
|
- **Token** (`token = "..."`, `use_oauth = false`): Uses personal access token directly
|
||||||
|
|
||||||
|
### OpenAI Configuration
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[providers.openai.default]
|
||||||
|
api_key = "sk-..." # Required: Your OpenAI API key
|
||||||
|
model = "gpt-4-turbo" # Model to use
|
||||||
|
max_tokens = 4096
|
||||||
|
temperature = 0.1
|
||||||
|
# base_url = "https://api.openai.com/v1" # Optional: Custom endpoint
|
||||||
|
```
|
||||||
|
|
||||||
|
### OpenAI-Compatible Providers
|
||||||
|
|
||||||
|
For services with OpenAI-compatible APIs (OpenRouter, Groq, Together, etc.):
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[providers.openai_compatible.openrouter]
|
||||||
|
api_key = "sk-or-..." # Provider's API key
|
||||||
|
model = "anthropic/claude-3.5-sonnet"
|
||||||
|
base_url = "https://openrouter.ai/api/v1"
|
||||||
|
max_tokens = 4096
|
||||||
|
temperature = 0.1
|
||||||
|
|
||||||
|
[providers.openai_compatible.groq]
|
||||||
|
api_key = "gsk_..."
|
||||||
|
model = "llama-3.3-70b-versatile"
|
||||||
|
base_url = "https://api.groq.com/openai/v1"
|
||||||
|
max_tokens = 4096
|
||||||
|
temperature = 0.1
|
||||||
|
```
|
||||||
|
|
||||||
|
Reference these as `openrouter.default` or `groq.default` in `default_provider`.
|
||||||
|
|
||||||
|
### Embedded (Local) Models
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[providers.embedded.default]
|
||||||
|
model_path = "~/.cache/g3/models/qwen2.5-7b-instruct-q3_k_m.gguf"
|
||||||
|
model_type = "qwen" # Model architecture
|
||||||
|
context_length = 32768 # Context window size
|
||||||
|
max_tokens = 2048 # Max output tokens
|
||||||
|
temperature = 0.1
|
||||||
|
gpu_layers = 32 # Layers to offload to GPU (Metal/CUDA)
|
||||||
|
threads = 8 # CPU threads for inference
|
||||||
|
```
|
||||||
|
|
||||||
|
**Supported model types**: `qwen`, `codellama`, `llama`, `mistral`
|
||||||
|
|
||||||
|
**Hardware requirements**:
|
||||||
|
- 4-16GB RAM depending on model size
|
||||||
|
- Optional GPU acceleration (Metal on macOS, CUDA on Linux)
|
||||||
|
|
||||||
|
## Agent Configuration
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[agent]
|
||||||
|
# Context and token settings
|
||||||
|
fallback_default_max_tokens = 8192 # Default max tokens if provider doesn't specify
|
||||||
|
# max_context_length = 200000 # Override context window size for all providers
|
||||||
|
|
||||||
|
# Behavior settings
|
||||||
|
enable_streaming = true # Stream responses in real-time
|
||||||
|
allow_multiple_tool_calls = true # Allow multiple tools per response
|
||||||
|
timeout_seconds = 60 # Request timeout
|
||||||
|
auto_compact = true # Auto-compact context at 90%
|
||||||
|
|
||||||
|
# Retry settings
|
||||||
|
max_retry_attempts = 3 # Retries for interactive mode
|
||||||
|
autonomous_max_retry_attempts = 6 # Retries for autonomous mode
|
||||||
|
|
||||||
|
# TODO management
|
||||||
|
check_todo_staleness = true # Warn about stale TODO items
|
||||||
|
```
|
||||||
|
|
||||||
|
### Retry Behavior
|
||||||
|
|
||||||
|
G3 automatically retries on recoverable errors:
|
||||||
|
- Rate limits (HTTP 429)
|
||||||
|
- Network errors
|
||||||
|
- Server errors (HTTP 5xx)
|
||||||
|
- Timeouts
|
||||||
|
|
||||||
|
**Interactive mode** uses `max_retry_attempts` (default: 3)
|
||||||
|
**Autonomous mode** uses `autonomous_max_retry_attempts` (default: 6) with longer delays
|
||||||
|
|
||||||
|
## Computer Control Configuration
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[computer_control]
|
||||||
|
enabled = false # Set to true to enable
|
||||||
|
require_confirmation = true # Require confirmation before actions
|
||||||
|
max_actions_per_second = 5 # Rate limit for safety
|
||||||
|
```
|
||||||
|
|
||||||
|
**Required OS permissions**:
|
||||||
|
- **macOS**: System Preferences → Security & Privacy → Accessibility
|
||||||
|
- **Linux**: X11 or Wayland access
|
||||||
|
- **Windows**: Run as administrator (first time)
|
||||||
|
|
||||||
|
## WebDriver Configuration
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[webdriver]
|
||||||
|
enabled = false # Set to true to enable
|
||||||
|
browser = "safari" # "safari" or "chrome-headless"
|
||||||
|
safari_port = 4444 # Safari WebDriver port
|
||||||
|
chrome_port = 9515 # ChromeDriver port
|
||||||
|
# chrome_binary = "/path/to/chrome" # Optional: Custom Chrome path
|
||||||
|
```
|
||||||
|
|
||||||
|
### Safari Setup (macOS)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Enable Safari remote automation (one-time setup)
|
||||||
|
safaridriver --enable
|
||||||
|
|
||||||
|
# Or via Safari UI:
|
||||||
|
# Safari → Preferences → Advanced → Show Develop menu
|
||||||
|
# Develop → Allow Remote Automation
|
||||||
|
```
|
||||||
|
|
||||||
|
### Chrome Setup
|
||||||
|
|
||||||
|
**Option 1: Chrome for Testing (Recommended)**
|
||||||
|
```bash
|
||||||
|
./scripts/setup-chrome-for-testing.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
Then configure:
|
||||||
|
```toml
|
||||||
|
[webdriver]
|
||||||
|
chrome_binary = "/Users/yourname/.chrome-for-testing/chrome-mac-arm64/Google Chrome for Testing.app/Contents/MacOS/Google Chrome for Testing"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Option 2: System Chrome**
|
||||||
|
```bash
|
||||||
|
# macOS
|
||||||
|
brew install chromedriver
|
||||||
|
|
||||||
|
# Linux
|
||||||
|
apt install chromium-chromedriver
|
||||||
|
```
|
||||||
|
|
||||||
|
## macOS Accessibility API Configuration
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[macax]
|
||||||
|
enabled = false # Set to true to enable
|
||||||
|
```
|
||||||
|
|
||||||
|
**Required permissions**: System Preferences → Security & Privacy → Privacy → Accessibility → Add your terminal app
|
||||||
|
|
||||||
|
See [macOS Accessibility Tools Guide](macax-tools.md) for detailed usage.
|
||||||
|
|
||||||
|
## Multi-Role Configuration
|
||||||
|
|
||||||
|
For autonomous mode with different models for coach and player:
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[providers]
|
||||||
|
default_provider = "anthropic.default"
|
||||||
|
coach = "anthropic.coach" # Code reviewer
|
||||||
|
player = "anthropic.player" # Code implementer
|
||||||
|
|
||||||
|
[providers.anthropic.coach]
|
||||||
|
api_key = "sk-ant-..."
|
||||||
|
model = "claude-sonnet-4-5"
|
||||||
|
max_tokens = 32000
|
||||||
|
temperature = 0.1 # Lower for consistent reviews
|
||||||
|
|
||||||
|
[providers.anthropic.player]
|
||||||
|
api_key = "sk-ant-..."
|
||||||
|
model = "claude-sonnet-4-5"
|
||||||
|
max_tokens = 64000
|
||||||
|
temperature = 0.3 # Higher for creative implementations
|
||||||
|
```
|
||||||
|
|
||||||
|
See `config.coach-player.example.toml` for a complete example.
|
||||||
|
|
||||||
|
## Environment Variables
|
||||||
|
|
||||||
|
Environment variables override configuration file settings:
|
||||||
|
|
||||||
|
| Variable | Description |
|
||||||
|
|----------|-------------|
|
||||||
|
| `G3_WORKSPACE_PATH` | Override workspace directory |
|
||||||
|
| `ANTHROPIC_API_KEY` | Anthropic API key |
|
||||||
|
| `OPENAI_API_KEY` | OpenAI API key |
|
||||||
|
| `DATABRICKS_HOST` | Databricks workspace URL |
|
||||||
|
| `DATABRICKS_TOKEN` | Databricks personal access token |
|
||||||
|
|
||||||
|
## CLI Overrides
|
||||||
|
|
||||||
|
CLI arguments have the highest priority:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Override provider
|
||||||
|
g3 --provider anthropic.default
|
||||||
|
|
||||||
|
# Override model
|
||||||
|
g3 --model claude-opus-4-5
|
||||||
|
|
||||||
|
# Enable features
|
||||||
|
g3 --webdriver # Enable WebDriver (Safari)
|
||||||
|
g3 --chrome-headless # Enable WebDriver (Chrome headless)
|
||||||
|
g3 --macax # Enable macOS Accessibility API
|
||||||
|
|
||||||
|
# Specify config file
|
||||||
|
g3 --config /path/to/config.toml
|
||||||
|
```
|
||||||
|
|
||||||
|
## Complete Example Configuration
|
||||||
|
|
||||||
|
```toml
|
||||||
|
# ~/.config/g3/config.toml
|
||||||
|
|
||||||
|
[providers]
|
||||||
|
default_provider = "anthropic.default"
|
||||||
|
|
||||||
|
[providers.anthropic.default]
|
||||||
|
api_key = "sk-ant-api03-..."
|
||||||
|
model = "claude-sonnet-4-5"
|
||||||
|
max_tokens = 64000
|
||||||
|
temperature = 0.3
|
||||||
|
|
||||||
|
[providers.databricks.work]
|
||||||
|
host = "https://mycompany.cloud.databricks.com"
|
||||||
|
model = "databricks-claude-sonnet-4"
|
||||||
|
max_tokens = 4096
|
||||||
|
temperature = 0.1
|
||||||
|
use_oauth = true
|
||||||
|
|
||||||
|
[agent]
|
||||||
|
fallback_default_max_tokens = 8192
|
||||||
|
enable_streaming = true
|
||||||
|
allow_multiple_tool_calls = true
|
||||||
|
timeout_seconds = 60
|
||||||
|
max_retry_attempts = 3
|
||||||
|
autonomous_max_retry_attempts = 6
|
||||||
|
|
||||||
|
[computer_control]
|
||||||
|
enabled = false
|
||||||
|
require_confirmation = true
|
||||||
|
max_actions_per_second = 5
|
||||||
|
|
||||||
|
[webdriver]
|
||||||
|
enabled = true
|
||||||
|
browser = "safari"
|
||||||
|
safari_port = 4444
|
||||||
|
|
||||||
|
[macax]
|
||||||
|
enabled = false
|
||||||
|
```
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### "Old config format" error
|
||||||
|
|
||||||
|
If you see this error, your config uses a deprecated format. Update to the new named provider format:
|
||||||
|
|
||||||
|
**Old format** (deprecated):
|
||||||
|
```toml
|
||||||
|
[providers.anthropic]
|
||||||
|
api_key = "..."
|
||||||
|
```
|
||||||
|
|
||||||
|
**New format**:
|
||||||
|
```toml
|
||||||
|
[providers.anthropic.default]
|
||||||
|
api_key = "..."
|
||||||
|
```
|
||||||
|
|
||||||
|
### Provider not found
|
||||||
|
|
||||||
|
Ensure your `default_provider` matches a configured provider:
|
||||||
|
```toml
|
||||||
|
default_provider = "anthropic.default" # Must match [providers.anthropic.default]
|
||||||
|
```
|
||||||
|
|
||||||
|
### OAuth issues
|
||||||
|
|
||||||
|
For Databricks OAuth:
|
||||||
|
1. Ensure `use_oauth = true`
|
||||||
|
2. Remove any `token` setting
|
||||||
|
3. A browser window will open for authentication
|
||||||
|
4. Tokens are cached in `~/.databricks/oauth-tokens.json`
|
||||||
|
|
||||||
|
### Context window errors
|
||||||
|
|
||||||
|
If you see context overflow errors:
|
||||||
|
1. Check `max_context_length` in `[agent]`
|
||||||
|
2. Use `/compact` command to manually summarize
|
||||||
|
3. Use `/thinnify` to replace large tool results with file references
|
||||||
472
docs/macax-tools.md
Normal file
472
docs/macax-tools.md
Normal file
@@ -0,0 +1,472 @@
|
|||||||
|
# macOS Accessibility Tools Guide
|
||||||
|
|
||||||
|
**Last updated**: January 2025
|
||||||
|
**Source of truth**: `crates/g3-computer-control/src/macax/`
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
|
||||||
|
G3 includes tools for controlling macOS applications via the Accessibility API. This enables automation of native macOS apps, including those you're building with G3.
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
The macOS Accessibility API provides programmatic access to UI elements in any application. G3 exposes this through the `macax_*` tools, allowing you to:
|
||||||
|
|
||||||
|
- List and activate applications
|
||||||
|
- Inspect UI element hierarchies
|
||||||
|
- Find elements by role, title, or identifier
|
||||||
|
- Click buttons and interact with controls
|
||||||
|
- Read and set values in text fields
|
||||||
|
- Simulate keyboard input
|
||||||
|
|
||||||
|
## Setup
|
||||||
|
|
||||||
|
### 1. Enable in Configuration
|
||||||
|
|
||||||
|
```toml
|
||||||
|
# ~/.config/g3/config.toml
|
||||||
|
[macax]
|
||||||
|
enabled = true
|
||||||
|
```
|
||||||
|
|
||||||
|
Or use the CLI flag:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
g3 --macax
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Grant Accessibility Permissions
|
||||||
|
|
||||||
|
1. Open **System Preferences** → **Security & Privacy** → **Privacy**
|
||||||
|
2. Select **Accessibility** in the left sidebar
|
||||||
|
3. Click the lock icon and authenticate
|
||||||
|
4. Add your terminal application (Terminal, iTerm2, etc.)
|
||||||
|
5. Restart your terminal
|
||||||
|
|
||||||
|
**Note**: If using VS Code's integrated terminal, add VS Code to the list.
|
||||||
|
|
||||||
|
### 3. Verify Setup
|
||||||
|
|
||||||
|
```json
|
||||||
|
{"tool": "macax_list_apps", "args": {}}
|
||||||
|
```
|
||||||
|
|
||||||
|
This should return a list of running applications.
|
||||||
|
|
||||||
|
## Available Tools
|
||||||
|
|
||||||
|
### macax_list_apps
|
||||||
|
|
||||||
|
List all running applications.
|
||||||
|
|
||||||
|
**Parameters**: None
|
||||||
|
|
||||||
|
**Example**:
|
||||||
|
```json
|
||||||
|
{"tool": "macax_list_apps", "args": {}}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Returns**:
|
||||||
|
```
|
||||||
|
Running Applications:
|
||||||
|
- Safari (com.apple.Safari)
|
||||||
|
- Finder (com.apple.finder)
|
||||||
|
- Terminal (com.apple.Terminal)
|
||||||
|
- MyApp (com.example.myapp)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### macax_get_frontmost_app
|
||||||
|
|
||||||
|
Get the currently active (frontmost) application.
|
||||||
|
|
||||||
|
**Parameters**: None
|
||||||
|
|
||||||
|
**Example**:
|
||||||
|
```json
|
||||||
|
{"tool": "macax_get_frontmost_app", "args": {}}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Returns**:
|
||||||
|
```
|
||||||
|
Frontmost Application: Safari (com.apple.Safari)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### macax_activate_app
|
||||||
|
|
||||||
|
Bring an application to the front.
|
||||||
|
|
||||||
|
**Parameters**:
|
||||||
|
- `app_name` (string, required): Application name
|
||||||
|
|
||||||
|
**Example**:
|
||||||
|
```json
|
||||||
|
{"tool": "macax_activate_app", "args": {"app_name": "Safari"}}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### macax_get_ui_tree
|
||||||
|
|
||||||
|
Get the UI element hierarchy of an application.
|
||||||
|
|
||||||
|
**Parameters**:
|
||||||
|
- `app_name` (string, required): Application name
|
||||||
|
- `max_depth` (integer, optional): Maximum tree depth (default: 5)
|
||||||
|
|
||||||
|
**Example**:
|
||||||
|
```json
|
||||||
|
{"tool": "macax_get_ui_tree", "args": {"app_name": "Calculator", "max_depth": 3}}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Returns**:
|
||||||
|
```
|
||||||
|
UI Tree for Calculator:
|
||||||
|
└── AXApplication "Calculator"
|
||||||
|
└── AXWindow "Calculator"
|
||||||
|
├── AXGroup
|
||||||
|
│ ├── AXButton "1" [id: digit_1]
|
||||||
|
│ ├── AXButton "2" [id: digit_2]
|
||||||
|
│ ├── AXButton "+" [id: add]
|
||||||
|
│ └── AXButton "=" [id: equals]
|
||||||
|
└── AXStaticText "0" [id: display]
|
||||||
|
```
|
||||||
|
|
||||||
|
**Notes**:
|
||||||
|
- Use lower `max_depth` for complex apps to avoid overwhelming output
|
||||||
|
- Elements show role, title, and accessibility identifier (if set)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### macax_find_elements
|
||||||
|
|
||||||
|
Find UI elements matching criteria.
|
||||||
|
|
||||||
|
**Parameters**:
|
||||||
|
- `app_name` (string, required): Application name
|
||||||
|
- `role` (string, optional): Element role (e.g., "button", "textField")
|
||||||
|
- `title` (string, optional): Element title/label
|
||||||
|
- `identifier` (string, optional): Accessibility identifier
|
||||||
|
|
||||||
|
**Example**:
|
||||||
|
```json
|
||||||
|
{"tool": "macax_find_elements", "args": {
|
||||||
|
"app_name": "Safari",
|
||||||
|
"role": "button"
|
||||||
|
}}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Returns**:
|
||||||
|
```
|
||||||
|
Found 5 elements:
|
||||||
|
1. AXButton "Back" [id: BackButton]
|
||||||
|
2. AXButton "Forward" [id: ForwardButton]
|
||||||
|
3. AXButton "Reload" [id: ReloadButton]
|
||||||
|
4. AXButton "Share" [id: ShareButton]
|
||||||
|
5. AXButton "New Tab" [id: NewTabButton]
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### macax_click
|
||||||
|
|
||||||
|
Click a UI element.
|
||||||
|
|
||||||
|
**Parameters**:
|
||||||
|
- `app_name` (string, required): Application name
|
||||||
|
- `identifier` (string, optional): Accessibility identifier
|
||||||
|
- `title` (string, optional): Element title
|
||||||
|
- `role` (string, optional): Element role
|
||||||
|
|
||||||
|
At least one of `identifier`, `title`, or `role` must be provided.
|
||||||
|
|
||||||
|
**Examples**:
|
||||||
|
|
||||||
|
```json
|
||||||
|
// Click by identifier (most reliable)
|
||||||
|
{"tool": "macax_click", "args": {
|
||||||
|
"app_name": "Calculator",
|
||||||
|
"identifier": "digit_5"
|
||||||
|
}}
|
||||||
|
|
||||||
|
// Click by title
|
||||||
|
{"tool": "macax_click", "args": {
|
||||||
|
"app_name": "Calculator",
|
||||||
|
"title": "5"
|
||||||
|
}}
|
||||||
|
|
||||||
|
// Click by role and title
|
||||||
|
{"tool": "macax_click", "args": {
|
||||||
|
"app_name": "Safari",
|
||||||
|
"role": "button",
|
||||||
|
"title": "Reload"
|
||||||
|
}}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### macax_set_value
|
||||||
|
|
||||||
|
Set the value of a UI element (text fields, sliders, etc.).
|
||||||
|
|
||||||
|
**Parameters**:
|
||||||
|
- `app_name` (string, required): Application name
|
||||||
|
- `identifier` (string, optional): Accessibility identifier
|
||||||
|
- `title` (string, optional): Element title
|
||||||
|
- `value` (string, required): Value to set
|
||||||
|
|
||||||
|
**Example**:
|
||||||
|
```json
|
||||||
|
{"tool": "macax_set_value", "args": {
|
||||||
|
"app_name": "TextEdit",
|
||||||
|
"role": "textArea",
|
||||||
|
"value": "Hello, World!"
|
||||||
|
}}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### macax_get_value
|
||||||
|
|
||||||
|
Get the current value of a UI element.
|
||||||
|
|
||||||
|
**Parameters**:
|
||||||
|
- `app_name` (string, required): Application name
|
||||||
|
- `identifier` (string, optional): Accessibility identifier
|
||||||
|
- `title` (string, optional): Element title
|
||||||
|
|
||||||
|
**Example**:
|
||||||
|
```json
|
||||||
|
{"tool": "macax_get_value", "args": {
|
||||||
|
"app_name": "Calculator",
|
||||||
|
"identifier": "display"
|
||||||
|
}}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Returns**:
|
||||||
|
```
|
||||||
|
Value: 42
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### macax_press_key
|
||||||
|
|
||||||
|
Simulate a key press.
|
||||||
|
|
||||||
|
**Parameters**:
|
||||||
|
- `key` (string, required): Key to press
|
||||||
|
- `modifiers` (array, optional): Modifier keys
|
||||||
|
|
||||||
|
**Supported modifiers**: `command`, `shift`, `option`, `control`
|
||||||
|
|
||||||
|
**Examples**:
|
||||||
|
|
||||||
|
```json
|
||||||
|
// Simple key press
|
||||||
|
{"tool": "macax_press_key", "args": {"key": "a"}}
|
||||||
|
|
||||||
|
// With modifiers (Cmd+S)
|
||||||
|
{"tool": "macax_press_key", "args": {
|
||||||
|
"key": "s",
|
||||||
|
"modifiers": ["command"]
|
||||||
|
}}
|
||||||
|
|
||||||
|
// Multiple modifiers (Cmd+Shift+N)
|
||||||
|
{"tool": "macax_press_key", "args": {
|
||||||
|
"key": "n",
|
||||||
|
"modifiers": ["command", "shift"]
|
||||||
|
}}
|
||||||
|
|
||||||
|
// Special keys
|
||||||
|
{"tool": "macax_press_key", "args": {"key": "return"}}
|
||||||
|
{"tool": "macax_press_key", "args": {"key": "escape"}}
|
||||||
|
{"tool": "macax_press_key", "args": {"key": "tab"}}
|
||||||
|
{"tool": "macax_press_key", "args": {"key": "delete"}}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Special key names**:
|
||||||
|
- `return`, `enter`
|
||||||
|
- `escape`, `esc`
|
||||||
|
- `tab`
|
||||||
|
- `delete`, `backspace`
|
||||||
|
- `space`
|
||||||
|
- `up`, `down`, `left`, `right`
|
||||||
|
- `home`, `end`, `pageup`, `pagedown`
|
||||||
|
- `f1` through `f12`
|
||||||
|
|
||||||
|
## Common Roles
|
||||||
|
|
||||||
|
| Role | Description |
|
||||||
|
|------|-------------|
|
||||||
|
| `button` | Clickable button |
|
||||||
|
| `textField` | Single-line text input |
|
||||||
|
| `textArea` | Multi-line text input |
|
||||||
|
| `checkbox` | Checkbox control |
|
||||||
|
| `radioButton` | Radio button |
|
||||||
|
| `popUpButton` | Dropdown/popup menu |
|
||||||
|
| `slider` | Slider control |
|
||||||
|
| `table` | Table view |
|
||||||
|
| `list` | List view |
|
||||||
|
| `outline` | Outline/tree view |
|
||||||
|
| `group` | Container group |
|
||||||
|
| `window` | Application window |
|
||||||
|
| `sheet` | Modal sheet |
|
||||||
|
| `dialog` | Dialog window |
|
||||||
|
| `staticText` | Non-editable text |
|
||||||
|
| `image` | Image element |
|
||||||
|
| `scrollArea` | Scrollable container |
|
||||||
|
| `toolbar` | Toolbar |
|
||||||
|
| `menuBar` | Menu bar |
|
||||||
|
| `menu` | Menu |
|
||||||
|
| `menuItem` | Menu item |
|
||||||
|
|
||||||
|
## Best Practices
|
||||||
|
|
||||||
|
### 1. Use Accessibility Identifiers
|
||||||
|
|
||||||
|
When building apps you'll automate with G3, add accessibility identifiers:
|
||||||
|
|
||||||
|
**SwiftUI**:
|
||||||
|
```swift
|
||||||
|
Button("Submit") { ... }
|
||||||
|
.accessibilityIdentifier("submit_button")
|
||||||
|
```
|
||||||
|
|
||||||
|
**UIKit**:
|
||||||
|
```swift
|
||||||
|
button.accessibilityIdentifier = "submit_button"
|
||||||
|
```
|
||||||
|
|
||||||
|
**AppKit**:
|
||||||
|
```swift
|
||||||
|
button.setAccessibilityIdentifier("submit_button")
|
||||||
|
```
|
||||||
|
|
||||||
|
Identifiers are more reliable than titles (which may be localized).
|
||||||
|
|
||||||
|
### 2. Inspect Before Automating
|
||||||
|
|
||||||
|
Always inspect the UI tree first:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{"tool": "macax_get_ui_tree", "args": {"app_name": "MyApp", "max_depth": 4}}
|
||||||
|
```
|
||||||
|
|
||||||
|
This helps you understand:
|
||||||
|
- Element hierarchy
|
||||||
|
- Available identifiers
|
||||||
|
- Correct role names
|
||||||
|
|
||||||
|
### 3. Activate App First
|
||||||
|
|
||||||
|
Some actions require the app to be frontmost:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{"tool": "macax_activate_app", "args": {"app_name": "MyApp"}}
|
||||||
|
{"tool": "macax_click", "args": {"app_name": "MyApp", "identifier": "button1"}}
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4. Handle Timing
|
||||||
|
|
||||||
|
UI updates may take time. If an element isn't found:
|
||||||
|
1. Wait briefly
|
||||||
|
2. Retry the operation
|
||||||
|
3. Check if the app state changed
|
||||||
|
|
||||||
|
### 5. Prefer Identifiers Over Titles
|
||||||
|
|
||||||
|
```json
|
||||||
|
// Good: Uses identifier
|
||||||
|
{"tool": "macax_click", "args": {"app_name": "MyApp", "identifier": "save_btn"}}
|
||||||
|
|
||||||
|
// Less reliable: Uses title (may be localized)
|
||||||
|
{"tool": "macax_click", "args": {"app_name": "MyApp", "title": "Save"}}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Example: Automating Calculator
|
||||||
|
|
||||||
|
```json
|
||||||
|
// 1. Activate Calculator
|
||||||
|
{"tool": "macax_activate_app", "args": {"app_name": "Calculator"}}
|
||||||
|
|
||||||
|
// 2. Inspect UI
|
||||||
|
{"tool": "macax_get_ui_tree", "args": {"app_name": "Calculator", "max_depth": 3}}
|
||||||
|
|
||||||
|
// 3. Click "5"
|
||||||
|
{"tool": "macax_click", "args": {"app_name": "Calculator", "title": "5"}}
|
||||||
|
|
||||||
|
// 4. Click "+"
|
||||||
|
{"tool": "macax_click", "args": {"app_name": "Calculator", "title": "+"}}
|
||||||
|
|
||||||
|
// 5. Click "3"
|
||||||
|
{"tool": "macax_click", "args": {"app_name": "Calculator", "title": "3"}}
|
||||||
|
|
||||||
|
// 6. Click "="
|
||||||
|
{"tool": "macax_click", "args": {"app_name": "Calculator", "title": "="}}
|
||||||
|
|
||||||
|
// 7. Read result
|
||||||
|
{"tool": "macax_get_value", "args": {"app_name": "Calculator", "role": "staticText"}}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### "Accessibility permission denied"
|
||||||
|
|
||||||
|
1. Check System Preferences → Security & Privacy → Accessibility
|
||||||
|
2. Ensure your terminal app is listed and checked
|
||||||
|
3. Restart the terminal after granting permission
|
||||||
|
|
||||||
|
### "Application not found"
|
||||||
|
|
||||||
|
1. Use exact app name (case-sensitive)
|
||||||
|
2. Run `macax_list_apps` to see available apps
|
||||||
|
3. App must be running
|
||||||
|
|
||||||
|
### "Element not found"
|
||||||
|
|
||||||
|
1. Inspect UI tree to verify element exists
|
||||||
|
2. Check identifier/title spelling
|
||||||
|
3. Element may be in a different window or sheet
|
||||||
|
4. App state may have changed
|
||||||
|
|
||||||
|
### "Cannot perform action"
|
||||||
|
|
||||||
|
1. Element may be disabled
|
||||||
|
2. App may need to be frontmost
|
||||||
|
3. Element may not support the action
|
||||||
|
4. Check element role supports the operation
|
||||||
|
|
||||||
|
### Slow Performance
|
||||||
|
|
||||||
|
1. Reduce `max_depth` in `macax_get_ui_tree`
|
||||||
|
2. Use specific identifiers instead of searching
|
||||||
|
3. Complex apps have large UI trees
|
||||||
|
|
||||||
|
## Comparison with Other Tools
|
||||||
|
|
||||||
|
| Feature | macax | Vision Tools | WebDriver |
|
||||||
|
|---------|-------|--------------|----------|
|
||||||
|
| Native apps | ✅ | ✅ (via OCR) | ❌ |
|
||||||
|
| Web browsers | ✅ | ✅ | ✅ |
|
||||||
|
| Electron apps | ✅ | ✅ | Partial |
|
||||||
|
| Reliability | High | Medium | High |
|
||||||
|
| Setup | Permissions | None | Driver |
|
||||||
|
| Speed | Fast | Slower | Medium |
|
||||||
|
|
||||||
|
**Use macax when**:
|
||||||
|
- Automating native macOS apps
|
||||||
|
- You control the app and can add identifiers
|
||||||
|
- Need reliable, fast automation
|
||||||
|
|
||||||
|
**Use Vision tools when**:
|
||||||
|
- App doesn't expose accessibility
|
||||||
|
- Need to find text visually
|
||||||
|
- Cross-platform approach needed
|
||||||
|
|
||||||
|
**Use WebDriver when**:
|
||||||
|
- Automating web content
|
||||||
|
- Need JavaScript execution
|
||||||
|
- Testing web applications
|
||||||
408
docs/providers.md
Normal file
408
docs/providers.md
Normal file
@@ -0,0 +1,408 @@
|
|||||||
|
# G3 LLM Providers Guide
|
||||||
|
|
||||||
|
**Last updated**: January 2025
|
||||||
|
**Source of truth**: `crates/g3-providers/src/`
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
|
||||||
|
This document describes the LLM providers supported by G3, their capabilities, and how to choose between them.
|
||||||
|
|
||||||
|
## Provider Overview
|
||||||
|
|
||||||
|
| Provider | Type | Tool Calling | Cache Control | Context Window | Best For |
|
||||||
|
|----------|------|--------------|---------------|----------------|----------|
|
||||||
|
| **Anthropic** | Cloud | Native | Yes | 200k (1M optional) | General use, complex tasks |
|
||||||
|
| **Databricks** | Cloud | Native | Yes (Claude models) | Varies | Enterprise, existing Databricks users |
|
||||||
|
| **OpenAI** | Cloud | Native | No | 128k | GPT model preference |
|
||||||
|
| **OpenAI-Compatible** | Cloud | Native | No | Varies | OpenRouter, Groq, Together, etc. |
|
||||||
|
| **Embedded** | Local | JSON fallback | No | 4k-32k | Privacy, offline, cost savings |
|
||||||
|
|
||||||
|
## Anthropic
|
||||||
|
|
||||||
|
**Location**: `crates/g3-providers/src/anthropic.rs`
|
||||||
|
|
||||||
|
### Features
|
||||||
|
|
||||||
|
- **Native tool calling**: Full support for structured tool calls
|
||||||
|
- **Prompt caching**: Reduce costs with ephemeral caching
|
||||||
|
- **Extended context**: Optional 1M token context (additional cost)
|
||||||
|
- **Extended thinking**: Budget tokens for complex reasoning
|
||||||
|
- **Streaming**: Real-time response streaming
|
||||||
|
|
||||||
|
### Configuration
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[providers.anthropic.default]
|
||||||
|
api_key = "sk-ant-api03-..." # Required
|
||||||
|
model = "claude-sonnet-4-5" # Model name
|
||||||
|
max_tokens = 64000 # Max output tokens
|
||||||
|
temperature = 0.3 # 0.0-1.0
|
||||||
|
cache_config = "ephemeral" # Optional: Enable caching
|
||||||
|
enable_1m_context = true # Optional: 1M context
|
||||||
|
thinking_budget_tokens = 10000 # Optional: Extended thinking
|
||||||
|
```
|
||||||
|
|
||||||
|
### Available Models
|
||||||
|
|
||||||
|
| Model | Context | Best For |
|
||||||
|
|-------|---------|----------|
|
||||||
|
| `claude-sonnet-4-5` | 200k | Balanced performance/cost |
|
||||||
|
| `claude-opus-4-5` | 200k | Complex reasoning |
|
||||||
|
| `claude-3-5-sonnet-20241022` | 200k | Previous generation |
|
||||||
|
| `claude-3-opus-20240229` | 200k | Previous generation |
|
||||||
|
|
||||||
|
### Prompt Caching
|
||||||
|
|
||||||
|
Enable caching to reduce costs for repeated context:
|
||||||
|
|
||||||
|
```toml
|
||||||
|
cache_config = "ephemeral" # Cache for session duration
|
||||||
|
```
|
||||||
|
|
||||||
|
Caching is applied to:
|
||||||
|
- System prompts
|
||||||
|
- README/AGENTS.md content
|
||||||
|
- Large tool results
|
||||||
|
|
||||||
|
### Extended Thinking
|
||||||
|
|
||||||
|
For complex tasks requiring step-by-step reasoning:
|
||||||
|
|
||||||
|
```toml
|
||||||
|
thinking_budget_tokens = 10000 # Tokens for internal reasoning
|
||||||
|
```
|
||||||
|
|
||||||
|
The model uses these tokens for planning before responding.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Databricks
|
||||||
|
|
||||||
|
**Location**: `crates/g3-providers/src/databricks.rs`
|
||||||
|
|
||||||
|
### Features
|
||||||
|
|
||||||
|
- **Foundation Model APIs**: Access to various models
|
||||||
|
- **OAuth authentication**: Secure browser-based auth
|
||||||
|
- **Token authentication**: Personal access tokens
|
||||||
|
- **Enterprise integration**: Works with existing Databricks setup
|
||||||
|
|
||||||
|
### Configuration
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[providers.databricks.default]
|
||||||
|
host = "https://your-workspace.cloud.databricks.com"
|
||||||
|
model = "databricks-claude-sonnet-4"
|
||||||
|
max_tokens = 4096
|
||||||
|
temperature = 0.1
|
||||||
|
use_oauth = true # Recommended
|
||||||
|
# token = "dapi..." # Alternative: PAT
|
||||||
|
```
|
||||||
|
|
||||||
|
### Authentication
|
||||||
|
|
||||||
|
**OAuth (Recommended)**:
|
||||||
|
1. Set `use_oauth = true`
|
||||||
|
2. On first run, browser opens for authentication
|
||||||
|
3. Tokens are cached in `~/.databricks/oauth-tokens.json`
|
||||||
|
4. Tokens refresh automatically
|
||||||
|
|
||||||
|
**Personal Access Token**:
|
||||||
|
1. Generate token in Databricks workspace
|
||||||
|
2. Set `token = "dapi..."` and `use_oauth = false`
|
||||||
|
|
||||||
|
### Available Models
|
||||||
|
|
||||||
|
Models depend on your Databricks workspace configuration:
|
||||||
|
- `databricks-claude-sonnet-4` (Claude via Databricks)
|
||||||
|
- `databricks-meta-llama-3-1-70b-instruct`
|
||||||
|
- `databricks-dbrx-instruct`
|
||||||
|
- Custom fine-tuned models
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## OpenAI
|
||||||
|
|
||||||
|
**Location**: `crates/g3-providers/src/openai.rs`
|
||||||
|
|
||||||
|
### Features
|
||||||
|
|
||||||
|
- **Native tool calling**: Full support
|
||||||
|
- **Custom endpoints**: Override base URL
|
||||||
|
- **Streaming**: Real-time responses
|
||||||
|
|
||||||
|
### Configuration
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[providers.openai.default]
|
||||||
|
api_key = "sk-..." # Required
|
||||||
|
model = "gpt-4-turbo" # Model name
|
||||||
|
max_tokens = 4096
|
||||||
|
temperature = 0.1
|
||||||
|
# base_url = "https://api.openai.com/v1" # Optional
|
||||||
|
```
|
||||||
|
|
||||||
|
### Available Models
|
||||||
|
|
||||||
|
| Model | Context | Notes |
|
||||||
|
|-------|---------|-------|
|
||||||
|
| `gpt-4-turbo` | 128k | Latest GPT-4 |
|
||||||
|
| `gpt-4o` | 128k | Optimized GPT-4 |
|
||||||
|
| `gpt-4` | 8k | Original GPT-4 |
|
||||||
|
| `gpt-3.5-turbo` | 16k | Faster, cheaper |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## OpenAI-Compatible Providers
|
||||||
|
|
||||||
|
**Location**: `crates/g3-providers/src/openai.rs` (reuses OpenAI implementation)
|
||||||
|
|
||||||
|
For services that implement the OpenAI API format.
|
||||||
|
|
||||||
|
### Configuration
|
||||||
|
|
||||||
|
```toml
|
||||||
|
# OpenRouter
|
||||||
|
[providers.openai_compatible.openrouter]
|
||||||
|
api_key = "sk-or-..."
|
||||||
|
model = "anthropic/claude-3.5-sonnet"
|
||||||
|
base_url = "https://openrouter.ai/api/v1"
|
||||||
|
max_tokens = 4096
|
||||||
|
temperature = 0.1
|
||||||
|
|
||||||
|
# Groq
|
||||||
|
[providers.openai_compatible.groq]
|
||||||
|
api_key = "gsk_..."
|
||||||
|
model = "llama-3.3-70b-versatile"
|
||||||
|
base_url = "https://api.groq.com/openai/v1"
|
||||||
|
max_tokens = 4096
|
||||||
|
temperature = 0.1
|
||||||
|
|
||||||
|
# Together
|
||||||
|
[providers.openai_compatible.together]
|
||||||
|
api_key = "..."
|
||||||
|
model = "meta-llama/Llama-3-70b-chat-hf"
|
||||||
|
base_url = "https://api.together.xyz/v1"
|
||||||
|
max_tokens = 4096
|
||||||
|
temperature = 0.1
|
||||||
|
```
|
||||||
|
|
||||||
|
### Supported Services
|
||||||
|
|
||||||
|
- **OpenRouter**: Access to many models through one API
|
||||||
|
- **Groq**: Fast inference for Llama models
|
||||||
|
- **Together**: Open-source model hosting
|
||||||
|
- **Anyscale**: Scalable model serving
|
||||||
|
- **Local servers**: Ollama, vLLM, text-generation-inference
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Embedded (Local Models)
|
||||||
|
|
||||||
|
**Location**: `crates/g3-providers/src/embedded.rs`
|
||||||
|
|
||||||
|
### Features
|
||||||
|
|
||||||
|
- **Completely local**: No data leaves your machine
|
||||||
|
- **Offline capable**: Works without internet
|
||||||
|
- **GPU acceleration**: Metal (macOS), CUDA (Linux)
|
||||||
|
- **No API costs**: Free after model download
|
||||||
|
|
||||||
|
### Configuration
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[providers.embedded.default]
|
||||||
|
model_path = "~/.cache/g3/models/qwen2.5-7b-instruct-q3_k_m.gguf"
|
||||||
|
model_type = "qwen" # Model architecture
|
||||||
|
context_length = 32768 # Context window
|
||||||
|
max_tokens = 2048 # Max output
|
||||||
|
temperature = 0.1
|
||||||
|
gpu_layers = 32 # GPU offload (0 = CPU only)
|
||||||
|
threads = 8 # CPU threads
|
||||||
|
```
|
||||||
|
|
||||||
|
### Supported Model Types
|
||||||
|
|
||||||
|
| Type | Models | Notes |
|
||||||
|
|------|--------|-------|
|
||||||
|
| `qwen` | Qwen 2.5 series | Good coding ability |
|
||||||
|
| `codellama` | Code Llama | Specialized for code |
|
||||||
|
| `llama` | Llama 2/3 | General purpose |
|
||||||
|
| `mistral` | Mistral/Mixtral | Efficient |
|
||||||
|
|
||||||
|
### Model Download
|
||||||
|
|
||||||
|
Download GGUF models from Hugging Face:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
mkdir -p ~/.cache/g3/models
|
||||||
|
cd ~/.cache/g3/models
|
||||||
|
|
||||||
|
# Example: Qwen 2.5 7B
|
||||||
|
wget https://huggingface.co/Qwen/Qwen2.5-7B-Instruct-GGUF/resolve/main/qwen2.5-7b-instruct-q4_k_m.gguf
|
||||||
|
```
|
||||||
|
|
||||||
|
### Hardware Requirements
|
||||||
|
|
||||||
|
| Model Size | RAM Required | GPU VRAM | Notes |
|
||||||
|
|------------|--------------|----------|-------|
|
||||||
|
| 7B Q4 | 6GB | 4GB | Good for most tasks |
|
||||||
|
| 7B Q8 | 10GB | 8GB | Better quality |
|
||||||
|
| 13B Q4 | 10GB | 8GB | More capable |
|
||||||
|
| 70B Q4 | 48GB | 40GB | Requires high-end hardware |
|
||||||
|
|
||||||
|
### GPU Acceleration
|
||||||
|
|
||||||
|
**macOS (Metal)**:
|
||||||
|
```toml
|
||||||
|
gpu_layers = 32 # Offload layers to GPU
|
||||||
|
```
|
||||||
|
|
||||||
|
**Linux (CUDA)**:
|
||||||
|
Requires CUDA toolkit installed.
|
||||||
|
|
||||||
|
**CPU Only**:
|
||||||
|
```toml
|
||||||
|
gpu_layers = 0
|
||||||
|
threads = 8 # Use more threads
|
||||||
|
```
|
||||||
|
|
||||||
|
### Tool Calling
|
||||||
|
|
||||||
|
Embedded models don't have native tool calling. G3 uses JSON fallback:
|
||||||
|
1. System prompt includes tool definitions as JSON
|
||||||
|
2. Model outputs tool calls as JSON in response
|
||||||
|
3. G3 parses JSON and executes tools
|
||||||
|
|
||||||
|
This works but is less reliable than native tool calling.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Provider Selection Guide
|
||||||
|
|
||||||
|
### By Use Case
|
||||||
|
|
||||||
|
| Use Case | Recommended Provider |
|
||||||
|
|----------|---------------------|
|
||||||
|
| General coding tasks | Anthropic (Claude Sonnet) |
|
||||||
|
| Complex reasoning | Anthropic (Claude Opus) |
|
||||||
|
| Enterprise/compliance | Databricks |
|
||||||
|
| Cost-sensitive | Embedded or Groq |
|
||||||
|
| Privacy-critical | Embedded |
|
||||||
|
| Offline development | Embedded |
|
||||||
|
| Fast iteration | Groq (Llama) |
|
||||||
|
| Model variety | OpenRouter |
|
||||||
|
|
||||||
|
### By Priority
|
||||||
|
|
||||||
|
**Quality first**: Anthropic Claude Opus/Sonnet
|
||||||
|
- Best reasoning and coding ability
|
||||||
|
- Native tool calling
|
||||||
|
- Prompt caching for efficiency
|
||||||
|
|
||||||
|
**Cost first**: Embedded or OpenAI-compatible
|
||||||
|
- Embedded: Free after download
|
||||||
|
- Groq: Very cheap, fast
|
||||||
|
- OpenRouter: Pay-per-use, many options
|
||||||
|
|
||||||
|
**Privacy first**: Embedded
|
||||||
|
- Data never leaves your machine
|
||||||
|
- No API calls
|
||||||
|
- Full control
|
||||||
|
|
||||||
|
**Speed first**: Groq or Embedded with GPU
|
||||||
|
- Groq: Extremely fast inference
|
||||||
|
- Embedded with Metal/CUDA: Low latency
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Provider Trait
|
||||||
|
|
||||||
|
All providers implement the `LLMProvider` trait:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
#[async_trait]
|
||||||
|
pub trait LLMProvider: Send + Sync {
|
||||||
|
/// Generate a completion
|
||||||
|
async fn complete(&self, request: CompletionRequest) -> Result<CompletionResponse>;
|
||||||
|
|
||||||
|
/// Stream a completion
|
||||||
|
async fn stream(&self, request: CompletionRequest) -> Result<CompletionStream>;
|
||||||
|
|
||||||
|
/// Provider name (e.g., "anthropic.default")
|
||||||
|
fn name(&self) -> &str;
|
||||||
|
|
||||||
|
/// Model name (e.g., "claude-sonnet-4-5")
|
||||||
|
fn model(&self) -> &str;
|
||||||
|
|
||||||
|
/// Whether provider supports native tool calling
|
||||||
|
fn has_native_tool_calling(&self) -> bool;
|
||||||
|
|
||||||
|
/// Whether provider supports cache control
|
||||||
|
fn supports_cache_control(&self) -> bool;
|
||||||
|
|
||||||
|
/// Configured max tokens
|
||||||
|
fn max_tokens(&self) -> u32;
|
||||||
|
|
||||||
|
/// Configured temperature
|
||||||
|
fn temperature(&self) -> f32;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Adding a New Provider
|
||||||
|
|
||||||
|
1. Create `crates/g3-providers/src/newprovider.rs`
|
||||||
|
2. Implement `LLMProvider` trait
|
||||||
|
3. Add configuration struct to `crates/g3-config/src/lib.rs`
|
||||||
|
4. Register in `crates/g3-core/src/lib.rs` (`new_with_mode_and_readme`)
|
||||||
|
5. Export from `crates/g3-providers/src/lib.rs`
|
||||||
|
6. Update documentation
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### Authentication Errors
|
||||||
|
|
||||||
|
**Anthropic**: Verify API key starts with `sk-ant-`
|
||||||
|
|
||||||
|
**Databricks OAuth**:
|
||||||
|
- Delete `~/.databricks/oauth-tokens.json` and re-authenticate
|
||||||
|
- Ensure workspace URL is correct
|
||||||
|
|
||||||
|
**OpenAI**: Verify API key and check billing status
|
||||||
|
|
||||||
|
### Rate Limits
|
||||||
|
|
||||||
|
G3 automatically retries on rate limits with exponential backoff.
|
||||||
|
|
||||||
|
To reduce rate limit issues:
|
||||||
|
- Use prompt caching (Anthropic)
|
||||||
|
- Reduce `max_tokens`
|
||||||
|
- Use a provider with higher limits
|
||||||
|
|
||||||
|
### Context Window Errors
|
||||||
|
|
||||||
|
If you see "context too long" errors:
|
||||||
|
1. Use `/compact` to summarize conversation
|
||||||
|
2. Use `/thinnify` to replace large tool results
|
||||||
|
3. Increase `max_context_length` in config
|
||||||
|
4. Switch to a provider with larger context
|
||||||
|
|
||||||
|
### Embedded Model Issues
|
||||||
|
|
||||||
|
**Model not loading**:
|
||||||
|
- Verify `model_path` is correct
|
||||||
|
- Check file permissions
|
||||||
|
- Ensure enough RAM
|
||||||
|
|
||||||
|
**Slow inference**:
|
||||||
|
- Increase `gpu_layers` for GPU offload
|
||||||
|
- Reduce `context_length`
|
||||||
|
- Use a smaller quantization (Q4 vs Q8)
|
||||||
|
|
||||||
|
**Poor tool calling**:
|
||||||
|
- Embedded models use JSON fallback
|
||||||
|
- Consider cloud provider for complex tool use
|
||||||
538
docs/tools.md
Normal file
538
docs/tools.md
Normal file
@@ -0,0 +1,538 @@
|
|||||||
|
# G3 Tools Reference
|
||||||
|
|
||||||
|
**Last updated**: January 2025
|
||||||
|
**Source of truth**: `crates/g3-core/src/tool_definitions.rs`, `crates/g3-core/src/tools/`
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
|
||||||
|
This document describes all tools available to the G3 agent. Tools are the primary mechanism by which G3 interacts with the filesystem, executes commands, and automates tasks.
|
||||||
|
|
||||||
|
## Tool Categories
|
||||||
|
|
||||||
|
| Category | Tools | Enabled By |
|
||||||
|
|----------|-------|------------|
|
||||||
|
| **Core** | shell, read_file, write_file, str_replace, final_output, background_process | Always |
|
||||||
|
| **Images** | read_image, take_screenshot, extract_text | Always |
|
||||||
|
| **Task Management** | todo_read, todo_write | Always |
|
||||||
|
| **Code Intelligence** | code_search, code_coverage | Always |
|
||||||
|
| **WebDriver** | webdriver_* (12 tools) | `--webdriver` or `--chrome-headless` |
|
||||||
|
| **Vision** | vision_find_text, vision_click_text, vision_click_near_text | Always (macOS) |
|
||||||
|
| **macOS Accessibility** | macax_* (9 tools) | `--macax` |
|
||||||
|
| **Computer Control** | mouse_click, type_text, find_element, list_windows | `computer_control.enabled = true` |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Core Tools
|
||||||
|
|
||||||
|
### shell
|
||||||
|
|
||||||
|
Execute shell commands.
|
||||||
|
|
||||||
|
**Parameters**:
|
||||||
|
- `command` (string, required): The shell command to execute
|
||||||
|
|
||||||
|
**Example**:
|
||||||
|
```json
|
||||||
|
{"tool": "shell", "args": {"command": "ls -la"}}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Notes**:
|
||||||
|
- Commands run in the current working directory
|
||||||
|
- Output is streamed in real-time
|
||||||
|
- Both stdout and stderr are captured
|
||||||
|
- Exit code is reported
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### background_process
|
||||||
|
|
||||||
|
Launch a long-running process in the background.
|
||||||
|
|
||||||
|
**Parameters**:
|
||||||
|
- `name` (string, required): Unique name for the process (e.g., "game_server")
|
||||||
|
- `command` (string, required): Shell command to execute
|
||||||
|
- `working_dir` (string, optional): Working directory
|
||||||
|
|
||||||
|
**Example**:
|
||||||
|
```json
|
||||||
|
{"tool": "background_process", "args": {"name": "dev_server", "command": "npm run dev"}}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Returns**: PID and log file path
|
||||||
|
|
||||||
|
**Notes**:
|
||||||
|
- Process runs independently of the agent
|
||||||
|
- Logs are captured to a file
|
||||||
|
- Use `shell` to read logs (`tail`), check status (`ps`), or stop (`kill`)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### read_file
|
||||||
|
|
||||||
|
Read file contents with optional character range.
|
||||||
|
|
||||||
|
**Parameters**:
|
||||||
|
- `file_path` (string, required): Path to the file
|
||||||
|
- `start` (integer, optional): Starting character position (0-indexed, inclusive)
|
||||||
|
- `end` (integer, optional): Ending character position (0-indexed, exclusive)
|
||||||
|
|
||||||
|
**Example**:
|
||||||
|
```json
|
||||||
|
{"tool": "read_file", "args": {"file_path": "src/main.rs", "start": 0, "end": 1000}}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Notes**:
|
||||||
|
- For image files (png, jpg, gif, etc.), automatically extracts text using OCR
|
||||||
|
- Supports tilde expansion (`~`)
|
||||||
|
- Reports file size and line count
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### read_image
|
||||||
|
|
||||||
|
Read image files for visual analysis by the LLM.
|
||||||
|
|
||||||
|
**Parameters**:
|
||||||
|
- `file_paths` (array of strings, required): Paths to image files
|
||||||
|
|
||||||
|
**Example**:
|
||||||
|
```json
|
||||||
|
{"tool": "read_image", "args": {"file_paths": ["screenshot.png", "diagram.jpg"]}}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Supported formats**: PNG, JPEG, GIF, WebP
|
||||||
|
|
||||||
|
**Notes**:
|
||||||
|
- Images are sent to the LLM for visual analysis
|
||||||
|
- Use for inspecting sprites, UI screenshots, diagrams, etc.
|
||||||
|
- Different from `extract_text` which only does OCR
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### write_file
|
||||||
|
|
||||||
|
Create or overwrite a file.
|
||||||
|
|
||||||
|
**Parameters**:
|
||||||
|
- `file_path` (string, required): Path to the file
|
||||||
|
- `content` (string, required): Content to write
|
||||||
|
|
||||||
|
**Example**:
|
||||||
|
```json
|
||||||
|
{"tool": "write_file", "args": {"file_path": "hello.txt", "content": "Hello, world!"}}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Notes**:
|
||||||
|
- Creates parent directories if needed
|
||||||
|
- Overwrites existing files
|
||||||
|
- Reports bytes written
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### str_replace
|
||||||
|
|
||||||
|
Apply a unified diff to a file.
|
||||||
|
|
||||||
|
**Parameters**:
|
||||||
|
- `file_path` (string, required): Path to the file
|
||||||
|
- `diff` (string, required): Unified diff with context lines
|
||||||
|
- `start` (integer, optional): Starting character position to constrain search
|
||||||
|
- `end` (integer, optional): Ending character position to constrain search
|
||||||
|
|
||||||
|
**Example**:
|
||||||
|
```json
|
||||||
|
{"tool": "str_replace", "args": {
|
||||||
|
"file_path": "src/main.rs",
|
||||||
|
"diff": "@@ -10,3 +10,4 @@\n fn main() {\n println!(\"Hello\");\n+ println!(\"World\");\n }"
|
||||||
|
}}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Notes**:
|
||||||
|
- Supports multiple hunks
|
||||||
|
- Context lines help locate the correct position
|
||||||
|
- Use `start`/`end` to disambiguate when multiple matches exist
|
||||||
|
- `---/+++` headers are optional for minimal diffs
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### final_output
|
||||||
|
|
||||||
|
Signal task completion with a summary.
|
||||||
|
|
||||||
|
**Parameters**:
|
||||||
|
- `summary` (string, required): Markdown summary of what was accomplished
|
||||||
|
|
||||||
|
**Example**:
|
||||||
|
```json
|
||||||
|
{"tool": "final_output", "args": {"summary": "## Completed\n\n- Created user authentication module\n- Added unit tests\n- Updated documentation"}}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Notes**:
|
||||||
|
- Ends the current task
|
||||||
|
- Summary is displayed to the user
|
||||||
|
- In autonomous mode, triggers coach review
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Image & Screenshot Tools
|
||||||
|
|
||||||
|
### take_screenshot
|
||||||
|
|
||||||
|
Capture a screenshot of an application window.
|
||||||
|
|
||||||
|
**Parameters**:
|
||||||
|
- `path` (string, required): Filename for the screenshot
|
||||||
|
- `window_id` (string, required): Application name (e.g., "Safari", "Terminal")
|
||||||
|
- `region` (object, optional): `{x, y, width, height}` to capture a region
|
||||||
|
|
||||||
|
**Example**:
|
||||||
|
```json
|
||||||
|
{"tool": "take_screenshot", "args": {"path": "safari.png", "window_id": "Safari"}}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Notes**:
|
||||||
|
- Use `list_windows` first to identify available windows
|
||||||
|
- Relative paths save to `~/tmp` or `$TMPDIR`
|
||||||
|
- Uses native screencapture on macOS
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### extract_text
|
||||||
|
|
||||||
|
Extract text from an image using OCR.
|
||||||
|
|
||||||
|
**Parameters**:
|
||||||
|
- `path` (string, optional): Path to image file
|
||||||
|
|
||||||
|
**Example**:
|
||||||
|
```json
|
||||||
|
{"tool": "extract_text", "args": {"path": "screenshot.png"}}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Notes**:
|
||||||
|
- Uses Tesseract OCR or Apple Vision framework
|
||||||
|
- For window-based OCR, use `vision_find_text` instead
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Task Management Tools
|
||||||
|
|
||||||
|
### todo_read
|
||||||
|
|
||||||
|
Read the current TODO list.
|
||||||
|
|
||||||
|
**Parameters**: None
|
||||||
|
|
||||||
|
**Example**:
|
||||||
|
```json
|
||||||
|
{"tool": "todo_read", "args": {}}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Notes**:
|
||||||
|
- TODO lists are session-scoped
|
||||||
|
- Stored in `.g3/sessions/<session_id>/todo.g3.md`
|
||||||
|
- Call at start of multi-step tasks to check for existing plans
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### todo_write
|
||||||
|
|
||||||
|
Create or update the TODO list.
|
||||||
|
|
||||||
|
**Parameters**:
|
||||||
|
- `content` (string, required): TODO list content in markdown checkbox format
|
||||||
|
|
||||||
|
**Example**:
|
||||||
|
```json
|
||||||
|
{"tool": "todo_write", "args": {"content": "- [ ] Implement feature\n - [ ] Write tests\n - [ ] Update docs\n- [x] Setup project"}}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Notes**:
|
||||||
|
- Replaces entire file content
|
||||||
|
- Always call `todo_read` first to preserve existing content
|
||||||
|
- Use `- [ ]` for incomplete, `- [x]` for complete
|
||||||
|
- Supports nested tasks with indentation
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Code Intelligence Tools
|
||||||
|
|
||||||
|
### code_search
|
||||||
|
|
||||||
|
Syntax-aware code search using tree-sitter.
|
||||||
|
|
||||||
|
**Parameters**:
|
||||||
|
- `searches` (array, required): Array of search objects:
|
||||||
|
- `name` (string): Label for this search
|
||||||
|
- `query` (string): Tree-sitter query in S-expression format
|
||||||
|
- `language` (string): Programming language
|
||||||
|
- `paths` (array, optional): Paths to search
|
||||||
|
- `context_lines` (integer, optional): Lines of context (0-20)
|
||||||
|
- `max_concurrency` (integer, optional): Parallel searches (default: 4)
|
||||||
|
- `max_matches_per_search` (integer, optional): Max matches (default: 500)
|
||||||
|
|
||||||
|
**Supported languages**: rust, python, javascript, typescript, go, java, c, cpp, kotlin
|
||||||
|
|
||||||
|
**Example**:
|
||||||
|
```json
|
||||||
|
{"tool": "code_search", "args": {
|
||||||
|
"searches": [{
|
||||||
|
"name": "functions",
|
||||||
|
"query": "(function_item name: (identifier) @name)",
|
||||||
|
"language": "rust",
|
||||||
|
"context_lines": 2
|
||||||
|
}]
|
||||||
|
}}
|
||||||
|
```
|
||||||
|
|
||||||
|
See [Code Search Guide](CODE_SEARCH.md) for detailed query patterns.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### code_coverage
|
||||||
|
|
||||||
|
Generate code coverage report using cargo llvm-cov.
|
||||||
|
|
||||||
|
**Parameters**: None
|
||||||
|
|
||||||
|
**Example**:
|
||||||
|
```json
|
||||||
|
{"tool": "code_coverage", "args": {}}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Notes**:
|
||||||
|
- Runs all tests with coverage instrumentation
|
||||||
|
- Auto-installs llvm-tools-preview and cargo-llvm-cov if missing
|
||||||
|
- Returns coverage statistics summary
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## WebDriver Tools
|
||||||
|
|
||||||
|
Enabled with `--webdriver` (Safari) or `--chrome-headless` (Chrome).
|
||||||
|
|
||||||
|
### webdriver_start
|
||||||
|
|
||||||
|
Start a browser session.
|
||||||
|
|
||||||
|
**Example**:
|
||||||
|
```json
|
||||||
|
{"tool": "webdriver_start", "args": {}}
|
||||||
|
```
|
||||||
|
|
||||||
|
### webdriver_navigate
|
||||||
|
|
||||||
|
Navigate to a URL.
|
||||||
|
|
||||||
|
**Parameters**:
|
||||||
|
- `url` (string, required): URL with protocol (e.g., `https://`)
|
||||||
|
|
||||||
|
### webdriver_get_url / webdriver_get_title
|
||||||
|
|
||||||
|
Get current URL or page title.
|
||||||
|
|
||||||
|
### webdriver_find_element / webdriver_find_elements
|
||||||
|
|
||||||
|
Find element(s) by CSS selector.
|
||||||
|
|
||||||
|
**Parameters**:
|
||||||
|
- `selector` (string, required): CSS selector
|
||||||
|
|
||||||
|
### webdriver_click
|
||||||
|
|
||||||
|
Click an element.
|
||||||
|
|
||||||
|
**Parameters**:
|
||||||
|
- `selector` (string, required): CSS selector
|
||||||
|
|
||||||
|
### webdriver_send_keys
|
||||||
|
|
||||||
|
Type text into an input.
|
||||||
|
|
||||||
|
**Parameters**:
|
||||||
|
- `selector` (string, required): CSS selector
|
||||||
|
- `text` (string, required): Text to type
|
||||||
|
- `clear_first` (boolean, optional): Clear before typing (default: true)
|
||||||
|
|
||||||
|
### webdriver_execute_script
|
||||||
|
|
||||||
|
Execute JavaScript.
|
||||||
|
|
||||||
|
**Parameters**:
|
||||||
|
- `script` (string, required): JavaScript code (use `return` to return values)
|
||||||
|
|
||||||
|
### webdriver_get_page_source
|
||||||
|
|
||||||
|
Get rendered HTML.
|
||||||
|
|
||||||
|
**Parameters**:
|
||||||
|
- `max_length` (integer, optional): Max chars to return (default: 10000, 0 for no limit)
|
||||||
|
- `save_to_file` (string, optional): Save to file instead of returning inline
|
||||||
|
|
||||||
|
### webdriver_screenshot
|
||||||
|
|
||||||
|
Take browser screenshot.
|
||||||
|
|
||||||
|
**Parameters**:
|
||||||
|
- `path` (string, required): Save path
|
||||||
|
|
||||||
|
### webdriver_back / webdriver_forward / webdriver_refresh
|
||||||
|
|
||||||
|
Navigation controls.
|
||||||
|
|
||||||
|
### webdriver_quit
|
||||||
|
|
||||||
|
Close browser and end session.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Vision Tools (macOS)
|
||||||
|
|
||||||
|
Use Apple Vision framework for text recognition.
|
||||||
|
|
||||||
|
### vision_find_text
|
||||||
|
|
||||||
|
Find text in an application window.
|
||||||
|
|
||||||
|
**Parameters**:
|
||||||
|
- `app_name` (string, required): Application name
|
||||||
|
- `text` (string, required): Text to search for
|
||||||
|
|
||||||
|
**Returns**: Bounding box coordinates and confidence score
|
||||||
|
|
||||||
|
### vision_click_text
|
||||||
|
|
||||||
|
Find and click on text.
|
||||||
|
|
||||||
|
**Parameters**:
|
||||||
|
- `app_name` (string, required): Application name
|
||||||
|
- `text` (string, required): Text to click
|
||||||
|
|
||||||
|
### vision_click_near_text
|
||||||
|
|
||||||
|
Click near a text label (useful for form fields).
|
||||||
|
|
||||||
|
**Parameters**:
|
||||||
|
- `app_name` (string, required): Application name
|
||||||
|
- `text` (string, required): Label text to find
|
||||||
|
- `direction` (string, optional): "right", "below", "left", "above" (default: "right")
|
||||||
|
- `distance` (integer, optional): Pixels from text (default: 50)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## macOS Accessibility Tools
|
||||||
|
|
||||||
|
Enabled with `--macax`. See [macOS Accessibility Tools Guide](macax-tools.md).
|
||||||
|
|
||||||
|
### macax_list_apps
|
||||||
|
|
||||||
|
List running applications.
|
||||||
|
|
||||||
|
### macax_get_frontmost_app
|
||||||
|
|
||||||
|
Get the frontmost application.
|
||||||
|
|
||||||
|
### macax_activate_app
|
||||||
|
|
||||||
|
Bring an application to front.
|
||||||
|
|
||||||
|
**Parameters**:
|
||||||
|
- `app_name` (string, required): Application name
|
||||||
|
|
||||||
|
### macax_get_ui_tree
|
||||||
|
|
||||||
|
Get UI element hierarchy.
|
||||||
|
|
||||||
|
**Parameters**:
|
||||||
|
- `app_name` (string, required): Application name
|
||||||
|
- `max_depth` (integer, optional): Tree depth limit
|
||||||
|
|
||||||
|
### macax_find_elements
|
||||||
|
|
||||||
|
Find UI elements by criteria.
|
||||||
|
|
||||||
|
**Parameters**:
|
||||||
|
- `app_name` (string, required): Application name
|
||||||
|
- `role` (string, optional): Element role (button, textField, etc.)
|
||||||
|
- `title` (string, optional): Element title
|
||||||
|
- `identifier` (string, optional): Accessibility identifier
|
||||||
|
|
||||||
|
### macax_click
|
||||||
|
|
||||||
|
Click a UI element.
|
||||||
|
|
||||||
|
**Parameters**:
|
||||||
|
- `app_name` (string, required): Application name
|
||||||
|
- `identifier` or `title` or `role`: Element selector
|
||||||
|
|
||||||
|
### macax_set_value / macax_get_value
|
||||||
|
|
||||||
|
Set or get element value.
|
||||||
|
|
||||||
|
### macax_press_key
|
||||||
|
|
||||||
|
Simulate key press.
|
||||||
|
|
||||||
|
**Parameters**:
|
||||||
|
- `key` (string, required): Key to press
|
||||||
|
- `modifiers` (array, optional): ["command", "shift", "option", "control"]
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Computer Control Tools
|
||||||
|
|
||||||
|
Enabled with `computer_control.enabled = true` in config.
|
||||||
|
|
||||||
|
### mouse_click
|
||||||
|
|
||||||
|
Click at coordinates.
|
||||||
|
|
||||||
|
**Parameters**:
|
||||||
|
- `x` (integer, required): X coordinate
|
||||||
|
- `y` (integer, required): Y coordinate
|
||||||
|
- `button` (string, optional): "left", "right", "middle"
|
||||||
|
|
||||||
|
### type_text
|
||||||
|
|
||||||
|
Type text at cursor.
|
||||||
|
|
||||||
|
**Parameters**:
|
||||||
|
- `text` (string, required): Text to type
|
||||||
|
|
||||||
|
### find_element
|
||||||
|
|
||||||
|
Find UI element by text, role, or attributes.
|
||||||
|
|
||||||
|
### list_windows
|
||||||
|
|
||||||
|
List all open windows with IDs and titles.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Tool Execution Notes
|
||||||
|
|
||||||
|
### Duplicate Detection
|
||||||
|
|
||||||
|
G3 prevents accidental duplicate tool calls:
|
||||||
|
- Only immediately sequential identical calls are blocked
|
||||||
|
- Text between tool calls resets detection
|
||||||
|
- Tools can be reused throughout a session
|
||||||
|
|
||||||
|
### Error Handling
|
||||||
|
|
||||||
|
Tool errors are reported back to the agent, which can:
|
||||||
|
- Retry with different parameters
|
||||||
|
- Try an alternative approach
|
||||||
|
- Report the issue to the user
|
||||||
|
|
||||||
|
### Working Directory
|
||||||
|
|
||||||
|
Tools execute in:
|
||||||
|
1. Directory specified by `--codebase-fast-start` if provided
|
||||||
|
2. Current working directory otherwise
|
||||||
|
|
||||||
|
### File Paths
|
||||||
|
|
||||||
|
- Tilde expansion (`~`) is supported
|
||||||
|
- Relative paths are relative to working directory
|
||||||
|
- Screenshots default to `~/tmp` or `$TMPDIR`
|
||||||
Reference in New Issue
Block a user