# G3 - AI Coding Agent G3 is a coding AI agent designed to help you complete tasks by writing code and executing commands. Built in Rust, it provides a flexible architecture for interacting with various Large Language Model (LLM) providers while offering powerful code generation and task automation capabilities. ## Architecture Overview G3 follows a modular architecture organized as a Rust workspace with multiple crates, each responsible for specific functionality: ### Core Components #### **g3-core** The heart of the agent system, containing: - **Agent Engine**: Main orchestration logic for handling conversations, tool execution, and task management - **Context Window Management**: Intelligent tracking of token usage with context thinning (50-80%) and auto-summarization at 80% capacity - **Tool System**: Built-in tools for file operations, shell commands, computer control, TODO management, and structured output - **Streaming Response Parser**: Real-time parsing of LLM responses with tool call detection and execution - **Task Execution**: Support for single and iterative task execution with automatic retry logic #### **g3-providers** Abstraction layer for LLM providers: - **Provider Interface**: Common trait-based API for different LLM backends - **Multiple Provider Support**: - Anthropic (Claude models) - Databricks (DBRX and other models) - Local/embedded models via llama.cpp with Metal acceleration on macOS - **OAuth Authentication**: Built-in OAuth flow support for secure provider authentication - **Provider Registry**: Dynamic provider management and selection #### **g3-config** Configuration management system: - Environment-based configuration - Provider credentials and settings - Model selection and parameters - Runtime configuration options #### **g3-execution** Task execution framework: - Task planning and decomposition - Execution strategies (sequential, parallel) - Error handling and retry mechanisms - Progress tracking and reporting #### **g3-computer-control** Computer control capabilities: - Mouse and keyboard automation - UI element inspection and interaction - Screenshot capture and window management - OCR text extraction via Tesseract #### **g3-cli** Command-line interface: - Interactive terminal interface - Task submission and monitoring - Configuration management commands - Session management ### Error Handling & Resilience G3 includes robust error handling with automatic retry logic: - **Recoverable Error Detection**: Automatically identifies recoverable errors (rate limits, network issues, server errors, timeouts) - **Exponential Backoff with Jitter**: Implements intelligent retry delays to avoid overwhelming services - **Detailed Error Logging**: Captures comprehensive error context including stack traces, request/response data, and session information - **Error Persistence**: Saves detailed error logs to `logs/errors/` for post-mortem analysis - **Graceful Degradation**: Non-recoverable errors are logged with full context before terminating ## Key Features ### Intelligent Context Management - Automatic context window monitoring with percentage-based tracking - Smart auto-summarization when approaching token limits - **Context thinning** at 50%, 60%, 70%, 80% thresholds - automatically replaces large tool results with file references - Conversation history preservation through summaries - Dynamic token allocation for different providers (4k to 200k+ tokens) ### Tool Ecosystem - **File Operations**: Read, write, and edit files with line-range precision - **Shell Integration**: Execute system commands with output capture - **Code Generation**: Structured code generation with syntax awareness - **TODO Management**: Read and write TODO lists with markdown checkbox format - **Computer Control** (Experimental): Automate desktop applications - Mouse and keyboard control - UI element inspection - Screenshot capture and window management - OCR text extraction from images and screen regions - Window listing and identification - **Final Output**: Formatted result presentation ### Provider Flexibility - Support for multiple LLM providers through a unified interface - Hot-swappable providers without code changes - Provider-specific optimizations and feature support - Local model support for offline operation ### Task Automation - Single-shot task execution for quick operations - Iterative task mode for complex, multi-step workflows - Automatic error recovery and retry logic - Progress tracking and intermediate result handling ## Language & Technology Stack - **Language**: Rust (2021 edition) - **Async Runtime**: Tokio for concurrent operations - **HTTP Client**: Reqwest for API communications - **Serialization**: Serde for JSON handling - **CLI Framework**: Clap for command-line parsing - **Logging**: Tracing for structured logging - **Local Models**: llama.cpp with Metal acceleration support ## Use Cases G3 is designed for: - Automated code generation and refactoring - File manipulation and project scaffolding - System administration tasks - Data processing and transformation - API integration and testing - Documentation generation - Complex multi-step workflows - Desktop application automation and testing ## Getting Started ```bash # Build the project cargo build --release # Run G3 cargo run # Execute a task g3 "implement a function to calculate fibonacci numbers" ``` ## WebDriver Browser Automation G3 includes WebDriver support for browser automation tasks using Safari. **One-Time Setup** (macOS only): Safari Remote Automation must be enabled before using WebDriver tools. Run this once: ```bash # Option 1: Use the provided script ./scripts/enable-safari-automation.sh # Option 2: Enable manually safaridriver --enable # Requires password # Option 3: Enable via Safari UI # Safari → Preferences → Advanced → Show Develop menu # Then: Develop → Allow Remote Automation ``` **For detailed setup instructions and troubleshooting**, see [WebDriver Setup Guide](docs/webdriver-setup.md). **Usage**: Run G3 with the `--webdriver` flag to enable browser automation tools. ## Computer Control (Experimental) G3 can interact with your computer's GUI for automation tasks: **Available Tools**: `mouse_click`, `type_text`, `find_element`, `take_screenshot`, `extract_text`, `find_text_on_screen`, `list_windows` **Setup**: Enable in config with `computer_control.enabled = true` and grant OS accessibility permissions: - **macOS**: System Preferences → Security & Privacy → Accessibility - **Linux**: Ensure X11 or Wayland access - **Windows**: Run as administrator (first time only) ## Session Logs G3 automatically saves session logs for each interaction in the `logs/` directory. These logs contain: - Complete conversation history - Token usage statistics - Timestamps and session status The `logs/` directory is created automatically on first use and is excluded from version control. ## License MIT License - see LICENSE file for details ## Contributing G3 is an open-source project. Contributions are welcome! Please see CONTRIBUTING.md for guidelines.