# G3 - AI Coding Agent G3 is a coding AI agent designed to help you complete tasks by writing code and executing commands. Built in Rust, it provides a flexible architecture for interacting with various Large Language Model (LLM) providers while offering powerful code generation and task automation capabilities. ## Architecture Overview G3 follows a modular architecture organized as a Rust workspace with multiple crates, each responsible for specific functionality: ### Core Components #### **g3-core** The heart of the agent system, containing: - **Agent Engine**: Main orchestration logic for handling conversations, tool execution, and task management - **Context Window Management**: Intelligent tracking of token usage with context thinning (50-80%) and auto-summarization at 80% capacity - **Tool System**: Built-in tools for file operations, shell commands, computer control, TODO management, and structured output - **Streaming Response Parser**: Real-time parsing of LLM responses with tool call detection and execution - **Task Execution**: Support for single and iterative task execution with automatic retry logic #### **g3-providers** Abstraction layer for LLM providers: - **Provider Interface**: Common trait-based API for different LLM backends - **Multiple Provider Support**: - Anthropic (Claude models) - Databricks (DBRX and other models) - Local/embedded models via llama.cpp with Metal acceleration on macOS - **OAuth Authentication**: Built-in OAuth flow support for secure provider authentication - **Provider Registry**: Dynamic provider management and selection #### **g3-config** Configuration management system: - Environment-based configuration - Provider credentials and settings - Model selection and parameters - Runtime configuration options #### **g3-execution** Task execution framework: - Task planning and decomposition - Execution strategies (sequential, parallel) - Error handling and retry mechanisms - Progress tracking and reporting #### **g3-computer-control** Computer control capabilities: - Mouse and keyboard automation - UI element inspection and interaction - Screenshot capture and window management - OCR text extraction via Tesseract #### **g3-cli** Command-line interface: - Interactive terminal interface - Task submission and monitoring - Configuration management commands - Session management ### Error Handling & Resilience G3 includes robust error handling with automatic retry logic: - **Recoverable Error Detection**: Automatically identifies recoverable errors (rate limits, network issues, server errors, timeouts) - **Exponential Backoff with Jitter**: Implements intelligent retry delays to avoid overwhelming services - **Detailed Error Logging**: Captures comprehensive error context including stack traces, request/response data, and session information - **Error Persistence**: Saves detailed error logs to `logs/errors/` for post-mortem analysis - **Graceful Degradation**: Non-recoverable errors are logged with full context before terminating ## Key Features ### Intelligent Context Management - Automatic context window monitoring with percentage-based tracking - Smart auto-summarization when approaching token limits - **Context thinning** at 50%, 60%, 70%, 80% thresholds - automatically replaces large tool results with file references - Conversation history preservation through summaries - Dynamic token allocation for different providers (4k to 200k+ tokens) ### Interactive Control Commands G3's interactive CLI includes control commands for manual context management: - **`/compact`**: Manually trigger summarization to compact conversation history - **`/thinnify`**: Manually trigger context thinning to replace large tool results with file references - **`/readme`**: Reload README.md and AGENTS.md from disk without restarting - **`/stats`**: Show detailed context and performance statistics - **`/help`**: Display all available control commands These commands give you fine-grained control over context management, allowing you to proactively optimize token usage and refresh project documentation. See [Control Commands Documentation](docs/CONTROL_COMMANDS.md) for detailed usage. ### Tool Ecosystem - **File Operations**: Read, write, and edit files with line-range precision - **Shell Integration**: Execute system commands with output capture - **Code Generation**: Structured code generation with syntax awareness - **TODO Management**: Read and write TODO lists with markdown checkbox format - **Computer Control** (Experimental): Automate desktop applications - Mouse and keyboard control - macOS Accessibility API for native app automation (via `--macax` flag) - UI element inspection - Screenshot capture and window management - OCR text extraction from images and screen regions - Window listing and identification - **Code Search**: Embedded tree-sitter for syntax-aware code search (Rust, Python, JavaScript, TypeScript, Go, Java, C, C++) - see [Code Search Guide](docs/CODE_SEARCH.md) - **Final Output**: Formatted result presentation ### Provider Flexibility - Support for multiple LLM providers through a unified interface - Hot-swappable providers without code changes - Provider-specific optimizations and feature support - Local model support for offline operation ### Task Automation - Single-shot task execution for quick operations - Iterative task mode for complex, multi-step workflows - Automatic error recovery and retry logic - Progress tracking and intermediate result handling ## Language & Technology Stack - **Language**: Rust (2021 edition) - **Async Runtime**: Tokio for concurrent operations - **HTTP Client**: Reqwest for API communications - **Serialization**: Serde for JSON handling - **CLI Framework**: Clap for command-line parsing - **Logging**: Tracing for structured logging - **Local Models**: llama.cpp with Metal acceleration support ## Use Cases G3 is designed for: - Automated code generation and refactoring - File manipulation and project scaffolding - System administration tasks - Data processing and transformation - API integration and testing - Documentation generation - Complex multi-step workflows - Desktop application automation and testing ## Getting Started ### Default Mode: Accumulative Autonomous The default interactive mode now uses **accumulative autonomous mode**, which combines the best of interactive and autonomous workflows: ```bash # Simply run g3 in any directory g3 # You'll be prompted to describe what you want to build # Each input you provide: # 1. Gets added to accumulated requirements # 2. Automatically triggers autonomous mode (coach-player loop) # 3. Implements your requirements iteratively # Example session: requirement> create a simple web server in Python with Flask # ... autonomous mode runs and implements it ... requirement> add a /health endpoint that returns JSON # ... autonomous mode runs again with both requirements ... ``` ### Other Modes ```bash # Single-shot mode (one task, then exit) g3 "implement a function to calculate fibonacci numbers" # Traditional autonomous mode (reads requirements.md) g3 --autonomous # Traditional chat mode (simple interactive chat without autonomous runs) g3 --chat ``` ```bash # Build the project cargo build --release # Run from the build directory ./target/release/g3 # Or copy both files to somewhere in your PATH (macOS only needs both files) cp target/release/g3 ~/.local/bin/ cp target/release/libVisionBridge.dylib ~/.local/bin/ # macOS only # Execute a task g3 "implement a function to calculate fibonacci numbers" ``` ## WebDriver Browser Automation G3 includes WebDriver support for browser automation tasks using Safari. **One-Time Setup** (macOS only): Safari Remote Automation must be enabled before using WebDriver tools. Run this once: ```bash # Option 1: Use the provided script ./scripts/enable-safari-automation.sh # Option 2: Enable manually safaridriver --enable # Requires password # Option 3: Enable via Safari UI # Safari → Preferences → Advanced → Show Develop menu # Then: Develop → Allow Remote Automation ``` **For detailed setup instructions and troubleshooting**, see [WebDriver Setup Guide](docs/webdriver-setup.md). **Usage**: Run G3 with the `--webdriver` flag to enable browser automation tools. ## macOS Accessibility API Tools G3 includes support for controlling macOS applications via the Accessibility API, allowing you to automate native macOS apps. **Available Tools**: `macax_list_apps`, `macax_get_frontmost_app`, `macax_activate_app`, `macax_get_ui_tree`, `macax_find_elements`, `macax_click`, `macax_set_value`, `macax_get_value`, `macax_press_key` **Setup**: Enable with the `--macax` flag or in config with `macax.enabled = true`. Grant accessibility permissions: - **macOS**: System Preferences → Security & Privacy → Privacy → Accessibility → Add your terminal app **For detailed documentation**, see [macOS Accessibility Tools Guide](docs/macax-tools.md). **Note**: This is particularly useful for testing and automating apps you're building with G3, as you can add accessibility identifiers to your UI elements. ## Computer Control (Experimental) G3 can interact with your computer's GUI for automation tasks: **Available Tools**: `mouse_click`, `type_text`, `find_element`, `take_screenshot`, `extract_text`, `find_text_on_screen`, `list_windows` **Setup**: Enable in config with `computer_control.enabled = true` and grant OS accessibility permissions: - **macOS**: System Preferences → Security & Privacy → Accessibility - **Linux**: Ensure X11 or Wayland access - **Windows**: Run as administrator (first time only) ## Session Logs G3 automatically saves session logs for each interaction in the `logs/` directory. These logs contain: - Complete conversation history - Token usage statistics - Timestamps and session status The `logs/` directory is created automatically on first use and is excluded from version control. ## License MIT License - see LICENSE file for details ## Contributing G3 is an open-source project. Contributions are welcome! Please see CONTRIBUTING.md for guidelines.