Vision tools removed: - extract_text (OCR from image files) - extract_text_with_boxes (OCR with bounding boxes) - vision_find_text (find text in app windows) - vision_click_text (find and click on text) - vision_click_near_text (click near text labels) macax tools removed: - macax_list_apps - macax_get_frontmost_app - macax_activate_app - macax_press_key - macax_type_text The LLM can now read images directly via read_image tool. take_screenshot is retained for capturing application windows. Files deleted: - crates/g3-core/src/tools/vision.rs - crates/g3-core/src/tools/macax.rs - docs/macax-tools.md Updated tool counts: 12 core + 15 webdriver = 27 total
205 lines
7.6 KiB
Markdown
205 lines
7.6 KiB
Markdown
# AGENTS.md - Machine Instructions for G3
|
|
|
|
**Last updated**: January 2025
|
|
**Purpose**: Enable AI agents to work safely and effectively with this codebase
|
|
|
|
## System Overview
|
|
|
|
G3 is an AI coding agent built in Rust. It uses LLM providers to execute tasks through a tool-based interface. The codebase is organized as a Cargo workspace with 9 crates.
|
|
|
|
### Quick Reference
|
|
|
|
| Crate | Purpose | Stability |
|
|
|-------|---------|----------|
|
|
| `g3-core` | Agent engine, tools, context management | Stable |
|
|
| `g3-providers` | LLM provider abstractions | Stable |
|
|
| `g3-cli` | Command-line interface | Stable |
|
|
| `g3-config` | Configuration management | Stable |
|
|
| `g3-execution` | Code execution | Stable |
|
|
| `g3-computer-control` | Computer automation | Experimental |
|
|
| `g3-planner` | Planning mode | Stable |
|
|
| `g3-ensembles` | Multi-agent (flock) mode | Experimental |
|
|
| `g3-console` | Web monitoring console | Experimental |
|
|
|
|
## Critical Invariants
|
|
|
|
### MUST Hold
|
|
|
|
1. **Tool calls must be valid JSON** - The streaming parser expects well-formed tool calls
|
|
2. **Context window limits must be respected** - Exceeding limits causes API errors
|
|
3. **Provider trait implementations must be Send + Sync** - Required for async runtime
|
|
4. **Session IDs must be unique** - Used for log file paths and TODO scoping
|
|
5. **File paths in tools support tilde expansion** - `~` expands to home directory
|
|
|
|
### MUST NOT Do
|
|
|
|
1. **Never block the async runtime** - Use `tokio::spawn` for CPU-intensive work
|
|
2. **Never store secrets in logs** - API keys are redacted in error logs
|
|
3. **Never modify files outside working directory without explicit permission**
|
|
4. **Never assume tool results fit in context** - Large results are thinned automatically
|
|
|
|
## Recommended Entry Points
|
|
|
|
### For Understanding the System
|
|
|
|
1. `src/main.rs` - Entry point (trivial)
|
|
2. `crates/g3-cli/src/lib.rs` - CLI logic and execution modes
|
|
3. `crates/g3-core/src/lib.rs` - Agent struct and orchestration
|
|
4. `crates/g3-providers/src/lib.rs` - Provider trait definition
|
|
|
|
### For Adding Features
|
|
|
|
1. **New tool**: `crates/g3-core/src/tool_definitions.rs` → `crates/g3-core/src/tools/`
|
|
2. **New provider**: `crates/g3-providers/src/` → implement `LLMProvider` trait
|
|
3. **New CLI mode**: `crates/g3-cli/src/lib.rs`
|
|
4. **New config option**: `crates/g3-config/src/lib.rs`
|
|
|
|
### For Debugging
|
|
|
|
1. Session logs: `.g3/sessions/<session_id>/session.json`
|
|
2. Error logs: `logs/errors/`
|
|
3. Context state: Use `/stats` command in interactive mode
|
|
|
|
## Dangerous/Subtle Code Paths
|
|
|
|
### Context Window Management (`g3-core/src/context_window.rs`)
|
|
|
|
- **Thinning**: Automatically replaces large tool results with file references
|
|
- **Summarization**: Compresses conversation history at 80% capacity
|
|
- **Token estimation**: Uses character-based heuristics, not exact tokenization
|
|
- **Risk**: Incorrect token estimates can cause context overflow
|
|
|
|
### Streaming Parser (`g3-core/src/streaming_parser.rs`)
|
|
|
|
- Parses LLM responses in real-time for tool calls
|
|
- Must handle partial JSON across chunk boundaries
|
|
- **Risk**: Malformed responses can cause parsing failures
|
|
|
|
### Tool Dispatch (`g3-core/src/tool_dispatch.rs`)
|
|
|
|
- Routes tool calls to implementations
|
|
- Handles both native and JSON-based tool calling
|
|
- **Risk**: Missing dispatch cases cause silent failures
|
|
|
|
### Retry Logic (`g3-core/src/retry.rs`)
|
|
|
|
- Exponential backoff with jitter
|
|
- Different configs for interactive vs autonomous mode
|
|
- **Risk**: Aggressive retries can hit rate limits harder
|
|
|
|
## Performance Constraints
|
|
|
|
1. **Streaming is preferred** - Non-streaming requests block UI
|
|
2. **Tool results are size-limited** - Large outputs are truncated or thinned
|
|
3. **Concurrent tool calls** - Enabled by `allow_multiple_tool_calls` config
|
|
4. **Background processes** - Long-running commands use `background_process` tool
|
|
|
|
## Testing Strategy
|
|
|
|
### Test Locations
|
|
|
|
- Unit tests: `crates/*/tests/`
|
|
- Integration tests: `crates/*/tests/`
|
|
- Test fixtures: `examples/test_code/`
|
|
|
|
### Running Tests
|
|
|
|
```bash
|
|
# All tests
|
|
cargo test
|
|
|
|
# Specific crate
|
|
cargo test -p g3-core
|
|
|
|
# With output
|
|
cargo test -- --nocapture
|
|
```
|
|
|
|
### Test Considerations
|
|
|
|
- Provider tests may require API keys
|
|
- Computer control tests require OS permissions
|
|
- WebDriver tests require browser setup
|
|
|
|
## Do's and Don'ts for Automated Changes
|
|
|
|
### Do
|
|
|
|
- ✅ Run `cargo check` after modifications
|
|
- ✅ Run `cargo test` before committing
|
|
- ✅ Update tool definitions when adding tools
|
|
- ✅ Add tests for new functionality
|
|
- ✅ Use existing patterns for similar features
|
|
- ✅ Keep functions under 80 lines
|
|
- ✅ Update documentation for user-facing changes
|
|
|
|
### Don't
|
|
|
|
- ❌ Modify `Cargo.toml` dependencies without justification
|
|
- ❌ Add blocking code in async contexts
|
|
- ❌ Store sensitive data in plain text
|
|
- ❌ Ignore error handling
|
|
- ❌ Create deeply nested conditionals (>6 levels)
|
|
- ❌ Add external dependencies for simple tasks
|
|
|
|
## Common Incorrect Assumptions
|
|
|
|
1. **"All providers support tool calling"** - Embedded models use JSON fallback
|
|
2. **"Context window is unlimited"** - Each provider has limits (4k-200k tokens)
|
|
3. **"Tool results are always small"** - File reads can return megabytes
|
|
4. **"Sessions persist across runs"** - Sessions are ephemeral by default
|
|
5. **"All platforms are equal"** - macOS has more features (Vision, Accessibility)
|
|
|
|
## Architecture Decisions
|
|
|
|
See `DESIGN.md` for original design rationale.
|
|
|
|
Key decisions:
|
|
- **Rust for performance and safety** - Async runtime, memory safety
|
|
- **Workspace structure** - Separation of concerns, independent compilation
|
|
- **Provider abstraction** - Swap providers without code changes
|
|
- **Tool-first philosophy** - Agent acts through tools, not just advice
|
|
- **Session-scoped state** - TODO lists, logs tied to sessions
|
|
|
|
## File Structure Quick Reference
|
|
|
|
```
|
|
g3/
|
|
├── src/main.rs # Entry point
|
|
├── crates/
|
|
│ ├── g3-cli/src/
|
|
│ │ ├── lib.rs # CLI logic (~112k chars)
|
|
│ │ └── retro_tui.rs # Retro TUI mode
|
|
│ ├── g3-core/src/
|
|
│ │ ├── lib.rs # Agent struct (~3400 lines)
|
|
│ │ ├── context_window.rs # Context management
|
|
│ │ ├── tool_definitions.rs # Tool schemas
|
|
│ │ ├── tool_dispatch.rs # Tool routing
|
|
│ │ ├── tools/ # Tool implementations
|
|
│ │ ├── streaming_parser.rs # Response parsing
|
|
│ │ └── retry.rs # Retry logic
|
|
│ ├── g3-providers/src/
|
|
│ │ ├── lib.rs # Provider trait
|
|
│ │ ├── anthropic.rs # Anthropic Claude
|
|
│ │ ├── databricks.rs # Databricks
|
|
│ │ ├── openai.rs # OpenAI
|
|
│ │ └── embedded.rs # Local models
|
|
│ ├── g3-config/src/lib.rs # Configuration
|
|
│ ├── g3-planner/src/ # Planning mode
|
|
│ ├── g3-ensembles/src/ # Flock mode
|
|
│ └── g3-computer-control/src/ # Automation
|
|
├── agents/ # Agent personas
|
|
├── docs/ # Documentation
|
|
└── logs/ # Session logs
|
|
```
|
|
|
|
## Pointers to Documentation
|
|
|
|
- [Architecture](docs/architecture.md) - System design and data flow
|
|
- [Configuration](docs/configuration.md) - Config file format and options
|
|
- [Tools Reference](docs/tools.md) - All available tools
|
|
- [Providers Guide](docs/providers.md) - LLM provider setup
|
|
- [Control Commands](docs/CONTROL_COMMANDS.md) - Interactive commands
|
|
- [Code Search](docs/CODE_SEARCH.md) - Tree-sitter search guide
|
|
- [Flock Mode](docs/FLOCK_MODE.md) - Multi-agent development
|