# g3 - AI Coding Agent - Design Document ## Overview g3 is a **modular, composable AI coding agent** built in Rust that helps you complete tasks by writing and executing code. It provides a flexible architecture for interacting with various Large Language Model (LLM) providers while offering powerful code generation, file manipulation, and task automation capabilities. The agent follows a **tool-first philosophy**: instead of just providing advice, g3 actively uses tools to read files, write code, execute commands, and complete tasks autonomously. ## Core Principles 1. **Tool-First Philosophy**: Solve problems by actively using tools rather than just providing advice 2. **Modular Architecture**: Clear separation of concerns across multiple Rust crates 3. **Provider Flexibility**: Support multiple LLM providers through a unified interface 4. **Modularity**: Clear separation of concerns 5. **Composability**: Components can be combined in different ways 6. **Performance**: Built in Rust for speed and reliability 7. **Context Intelligence**: Smart context window management with auto-compaction 8. **Error Resilience**: Robust error handling with automatic retry logic ## Project Structure g3 is organized as a Rust workspace with the following crates: ``` g3/ ├── src/main.rs # Main entry point (delegates to g3-cli) ├── crates/ │ ├── g3-cli/ # Command-line interface, TUI, and retro mode │ ├── g3-core/ # Core agent engine, tools, and streaming logic │ ├── g3-providers/ # LLM provider abstractions and implementations │ ├── g3-config/ # Configuration management │ ├── g3-execution/ # Code execution engine │ └── g3-computer-control/ # Computer control and automation ├── logs/ # Session logs (auto-created) ├── README.md # Project documentation └── DESIGN.md # This design document ``` ## Architecture Overview ### High-Level Architecture ``` ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ g3-cli │ │ g3-core │ │ g3-providers │ │ │ │ │ │ │ │ • CLI parsing │◄──►│ • Agent engine │◄──►│ • Anthropic │ │ • Interactive │ │ • Context mgmt │ │ • Databricks │ │ • Retro TUI │ │ • Tool system │ │ • Embedded │ │ • Autonomous │ │ • Streaming │ │ (llama.cpp) │ │ mode │ │ • Task exec │ │ • OAuth flow │ │ │ │ • TODO mgmt │ │ │ └─────────────────┘ └─────────────────┘ └─────────────────┘ │ │ │ └───────────────────────┼───────────────────────┘ │ ┌─────────────────┐ ┌─────────────────┐ │ g3-execution │ │ g3-config │ │ │ │ │ │ • Code exec │ │ • TOML config │ │ • Shell cmds │ │ • Env overrides │ │ • Streaming │ │ • Provider │ │ • Error hdlg │ │ settings │ └─────────────────┘ │ • Computer │ │ │ control cfg │ │ └─────────────────┘ │ │ ┌─────────────────┐ │ │ g3-computer- │◄────────────┘ │ control │ │ • Mouse/kbd │ │ • Screenshots │ │ • OCR/Tesseract │ │ • Windows/UI │ └─────────────────┘ ``` ## Core Components ### 1. g3-core: Agent Engine **Primary Responsibilities:** - Main orchestration logic for handling conversations and task execution - Context window management with intelligent token tracking - Built-in tool system for file operations and command execution - Streaming response parsing with real-time tool call detection - Error handling with automatic retry logic **Key Features:** - **Context Window Intelligence**: Automatic monitoring with percentage-based tracking (80% capacity triggers auto-compaction) - **Tool System**: Built-in tools for file operations (read, write, edit), shell commands, and structured output - **Streaming Parser**: Real-time parsing of LLM responses with tool call detection and execution - **Session Management**: Automatic session logging with detailed conversation history and token usage - **Error Recovery**: Sophisticated error classification and retry logic for recoverable errors - **TODO Management**: In-memory TODO list with read/write tools for task tracking **Available Tools:** - `shell`: Execute shell commands with streaming output - `read_file`: Read file contents with optional character range support - `write_file`: Create or overwrite files with content - `str_replace`: Apply unified diffs to files with precise editing - `final_output`: Signal task completion with detailed summaries - `todo_read`: Read the entire TODO list content - `todo_write`: Write or overwrite the entire TODO list - `mouse_click`: Click the mouse at specific coordinates - `type_text`: Type text at the current cursor position - `find_element`: Find UI elements by text, role, or attributes - `take_screenshot`: Capture screenshots of screen, region, or window - `find_text_on_screen`: Find text visually on screen and return coordinates - `list_windows`: List all open windows with IDs and titles ### 2. g3-providers: LLM Provider Abstraction **Primary Responsibilities:** - Unified interface for multiple LLM providers - Provider-specific optimizations and feature support - OAuth authentication flows - Streaming and non-streaming completion support **Supported Providers:** - **Anthropic**: Claude models via API with native tool calling support - **Databricks**: Foundation Model APIs with OAuth and token-based authentication (default provider) - **Embedded**: Local models via llama.cpp with GPU acceleration (Metal/CUDA) - **Provider Registry**: Dynamic provider management and hot-swapping **Key Features:** - **Native Tool Calling**: Full support for structured tool calls where available - **Fallback Parsing**: JSON tool call parsing for providers without native support - **OAuth Integration**: Built-in OAuth flow for secure provider authentication - **Context-Aware**: Provider-specific context length and token limit handling - **Streaming Support**: Real-time response streaming with tool call detection ### 3. g3-cli: Command-Line Interface **Primary Responsibilities:** - Command-line argument parsing and validation - Interactive terminal interface with history support - Retro-style terminal UI (80s sci-fi inspired) - Autonomous mode with coach-player feedback loops - Session management and workspace handling **Execution Modes:** - **Single-shot**: Execute one task and exit - **Interactive**: REPL-style conversation with the agent (default mode) - **Autonomous**: Coach-player feedback loop for complex projects - **Retro TUI**: Full-screen terminal interface with real-time updates **Key Features:** - **Multi-line Input**: Support for complex, multi-line prompts with backslash continuation - **Context Progress**: Real-time display of token usage and context window status - **Error Recovery**: Automatic retry logic for timeout and recoverable errors - **History Management**: Persistent command history across sessions - **Theme Support**: Customizable color themes for retro mode - **Cancellation**: Ctrl+C support for graceful operation cancellation ### 4. g3-execution: Code Execution Engine **Primary Responsibilities:** - Safe execution of shell commands and scripts - Streaming output capture and display - Multi-language code execution support - Error handling and result formatting **Supported Execution:** - **Bash/Shell**: Direct command execution with streaming output (primary use case) - **Python**: Script execution via temporary files (legacy support) - **JavaScript**: Node.js-based execution (legacy support) **Key Features:** - **Streaming Output**: Real-time command output display - **Error Capture**: Comprehensive stderr and stdout handling - **Exit Code Tracking**: Proper success/failure detection - **Async Execution**: Non-blocking command execution - **Output Formatting**: Clean, user-friendly result presentation ### 5. g3-config: Configuration Management **Primary Responsibilities:** - TOML-based configuration file management - Environment variable overrides - Provider-specific settings and credentials - CLI argument integration **Configuration Hierarchy:** 1. Default configuration (Databricks provider with OAuth) 2. Configuration files (`~/.config/g3/config.toml`, `./g3.toml`) 3. Environment variables (`G3_*`) 4. CLI arguments (highest priority) **Key Features:** - **Auto-generation**: Creates default configuration files if none exist - **Provider Overrides**: Runtime provider and model selection - **Validation**: Configuration validation with helpful error messages - **Flexible Paths**: Support for shell expansion (`~`, environment variables) ### 6. g3-computer-control: Computer Control & Automation **Primary Responsibilities:** - Cross-platform computer control and automation - Mouse and keyboard input simulation - Window management and screenshot capture - OCR text extraction from images and screen regions **Platform Support:** - **macOS**: Core Graphics, Cocoa, screencapture integration - **Linux**: X11/Xtest for input, X11 for window management - **Windows**: Win32 APIs for input and window control **Key Features:** - **OCR Integration**: Tesseract-based text extraction from images - **Window Management**: List, identify, and capture specific application windows - **UI Automation**: Find elements, simulate clicks, type text - **Screenshot Capture**: Full screen, regions, or specific windows - **Accessibility**: Requires OS-level permissions for automation ## Advanced Features ### Context Window Management g3 implements sophisticated context window management: - **Automatic Monitoring**: Tracks token usage with percentage-based thresholds - **Smart Summarization**: Auto-triggers at 80% capacity to prevent context overflow - **Context Thinning**: Progressive thinning at 50%, 60%, 70%, 80% thresholds - replaces large tool results with file references - **Conversation Preservation**: Maintains conversation continuity through intelligent summaries - **Provider-Specific Limits**: Adapts to different model context windows (4k to 200k+ tokens) - **Cumulative Tracking**: Monitors total token usage across entire sessions ### Error Handling & Resilience Comprehensive error handling system: - **Error Classification**: Distinguishes between recoverable and non-recoverable errors - **Automatic Retry**: Exponential backoff with jitter for rate limits, timeouts, and server errors - **Detailed Logging**: Comprehensive error context including stack traces and session data - **Error Persistence**: Saves detailed error logs to `logs/errors/` for analysis - **Graceful Degradation**: Continues operation when possible, fails gracefully when not ### Session Management Automatic session tracking and logging: - **Session IDs**: Generated based on initial prompts for easy identification - **Complete Logs**: Full conversation history, token usage, and timing data - **JSON Format**: Structured logs for easy parsing and analysis - **Automatic Cleanup**: Organized in `logs/` directory with timestamps - **Status Tracking**: Records session completion status (completed, cancelled, error) ### Autonomous Mode Advanced autonomous operation with coach-player feedback: - **Requirements-Driven**: Reads `requirements.md` for project specifications - **Dual-Agent System**: Separate player (implementation) and coach (review) agents - **Iterative Improvement**: Multiple rounds of implementation and feedback - **Progress Tracking**: Detailed reporting of turns, token usage, and final status - **Workspace Management**: Automatic workspace setup and file organization ## Provider Comparison | Feature | Anthropic | Databricks (Default) | Embedded | |---------|-----------|------------|----------| | **Cost** | Pay per token | Pay per token | Free after download | | **Privacy** | Data sent to API | Data sent to API | Completely local | | **Performance** | Very fast | Very fast | Depends on hardware | | **Model Quality** | Excellent | Excellent | Good (varies by model) | | **Offline Support** | No | No | Yes | | **Setup Complexity** | API key only | OAuth or token | Model download required | | **Context Window** | 200k tokens | Varies by model | 4k-32k tokens | | **Tool Calling** | Native support | Native support | JSON fallback | | **Hardware Requirements** | None | None | 4-16GB RAM, optional GPU | ## Configuration Examples ### Cloud-First Setup (Anthropic) ```toml [providers] default_provider = "anthropic" [providers.anthropic] api_key = "sk-ant-..." model = "claude-3-5-sonnet-20241022" max_tokens = 8192 temperature = 0.1 ``` ### Enterprise Setup (Databricks - Default) ```toml [providers] default_provider = "databricks" [providers.databricks] host = "https://your-workspace.cloud.databricks.com" model = "databricks-claude-sonnet-4" max_tokens = 32000 temperature = 0.1 use_oauth = true ``` ### Privacy-First Setup (Local Models) ```toml [providers] default_provider = "embedded" [providers.embedded] model_path = "~/.cache/g3/models/qwen2.5-7b-instruct-q3_k_m.gguf" model_type = "qwen" context_length = 32768 max_tokens = 2048 temperature = 0.1 gpu_layers = 32 threads = 8 ``` ### Hybrid Setup ```toml [providers] default_provider = "embedded" # Local model for most tasks [providers.embedded] model_path = "~/.cache/g3/models/codellama-7b-instruct.Q4_K_M.gguf" model_type = "codellama" context_length = 16384 gpu_layers = 32 # Cloud fallback for complex tasks [providers.anthropic] api_key = "sk-ant-..." model = "claude-3-5-sonnet-20241022" ``` ## Usage Examples ### Single-Shot Mode ```bash g3 "implement a fibonacci function in Rust" ``` ### Interactive Mode ```bash g3 g3> read the README and suggest improvements g3> implement the suggestions you made ``` ### Autonomous Mode ```bash g3 --autonomous --max-turns 10 # Reads requirements.md and implements iteratively ``` ### Retro TUI Mode ```bash g3 --retro --theme dracula # Full-screen terminal interface ``` ## Implementation Details ### Planned Features - **Plugin System**: Custom tool and provider plugins - **Web Interface**: Browser-based UI for remote access - **Model Quantization**: Optimized local model deployment - **Multi-Model Ensemble**: Combine multiple models for better results - **Advanced Sandboxing**: Enhanced security for code execution - **Collaborative Mode**: Multi-user sessions and shared workspaces ### Technical Improvements - **Performance Optimization**: Faster streaming and tool execution - **Memory Management**: Better handling of large contexts and files - **Caching System**: Intelligent caching of model responses and computations - **Monitoring**: Built-in metrics and performance monitoring - **Testing**: Comprehensive test suite and CI/CD integration ## Development Guidelines ### Code Organization - **Modular Design**: Each crate has a single, well-defined responsibility - **Trait-Based**: Use traits for abstraction and testability - **Error Handling**: Comprehensive error types with context - **Documentation**: Inline docs and examples for all public APIs - **Testing**: Unit tests, integration tests, and property-based testing ### Performance Considerations - **Async-First**: All I/O operations are asynchronous (Tokio runtime) - **Streaming**: Real-time response processing where possible - **Memory Efficiency**: Careful memory management for large contexts - **Caching**: Strategic caching of expensive operations - **Profiling**: Regular performance profiling and optimization This design document reflects the current state of g3 as a mature, production-ready AI coding agent with sophisticated architecture and comprehensive feature set. ## Current Implementation Status ### Fully Implemented - ✅ **Core Agent Engine**: Complete with streaming, tool execution, and context management - ✅ **Provider System**: Anthropic, Databricks, and Embedded providers with OAuth support - ✅ **Tool System**: 13 tools including file ops, shell, TODO management, and computer control - ✅ **CLI Interface**: Interactive mode, single-shot mode, retro TUI - ✅ **Autonomous Mode**: Coach-player feedback loop with requirements.md processing - ✅ **Configuration**: TOML-based config with environment overrides - ✅ **Error Handling**: Comprehensive retry logic and error classification - ✅ **Session Logging**: Automatic session tracking and JSON logs - ✅ **Context Management**: Context thinning (50-80%) and auto-compaction at 80% capacity - ✅ **Computer Control**: Cross-platform automation with OCR support - ✅ **TODO Management**: In-memory TODO list with read/write tools ### Architecture Highlights - **Workspace**: 6 crates with clear separation of concerns - **Dependencies**: Modern Rust ecosystem (Tokio, Clap, Serde, etc.) - **Streaming**: Real-time response processing with tool call detection - **Cross-Platform**: Works on macOS, Linux, and Windows - **GPU Support**: Metal acceleration for local models on macOS, CUDA on Linux - **OCR Support**: Tesseract integration for text extraction from images ### Key Files - `src/main.rs`: main entry point delegating to g3-cli - `crates/g3-core/src/lib.rs`: main agent implementation - `crates/g3-cli/src/lib.rs`: CLI and interaction modes - `crates/g3-providers/src/lib.rs`: provider trait and registry - `crates/g3-config/src/lib.rs`: configuration management - `crates/g3-execution/src/lib.rs`: code execution engine - `crates/g3-computer-control/src/lib.rs`: computer control and automation - `crates/g3-computer-control/src/platform/`: platform-specific implementations