14 KiB
G3 General Purpose AI Agent - Design Document
Overview
G3 is a modular, composable AI coding agent built in Rust that helps you complete tasks by writing and executing code. It provides a flexible architecture for interacting with various Large Language Model (LLM) providers while offering powerful code generation, file manipulation, and task automation capabilities.
The agent follows a tool-first philosophy: instead of just providing advice, G3 actively uses tools to read files, write code, execute commands, and complete tasks autonomously.
Core Principles
- Tool-First Philosophy: Solve problems by actively using tools rather than just describing solutions
- Modular Architecture: Clear separation of concerns across multiple Rust crates
- Provider Flexibility: Support multiple LLM providers through a unified interface
- Modularity: Clear separation of concerns
- Composability: Components can be combined in different ways
- Performance: Built in Rust for speed and reliability
- Context Intelligence: Smart context window management with auto-summarization
- Error Resilience: Robust error handling with automatic retry logic
Project Structure
G3 is organized as a Rust workspace with the following crates:
g3/
├── src/main.rs # Main entry point
├── crates/
│ ├── g3-cli/ # Command-line interface and TUI
│ ├── g3-core/ # Core agent engine and logic
│ ├── g3-providers/ # LLM provider abstractions
│ ├── g3-config/ # Configuration management
│ └── g3-execution/ # Code execution engine
├── logs/ # Session logs (auto-created)
├── README.md # Project documentation
└── DESIGN.md # This design document
Architecture Overview
High-Level Architecture
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ g3-cli │ │ g3-core │ │ g3-providers │
│ │ │ │ │ │
│ • CLI parsing │◄──►│ • Agent engine │◄──►│ • Anthropic │
│ • Interactive │ │ • Context mgmt │ │ • Databricks │
│ • Retro TUI │ │ • Tool system │ │ • Embedded │
│ • Autonomous │ │ • Streaming │ │ (llama.cpp) │
│ mode │ │ • Task exec │ │ • OAuth flow │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
└───────────────────────┼───────────────────────┘
│
┌─────────────────┐ ┌─────────────────┐
│ g3-execution │ │ g3-config │
│ │ │ │
│ • Code exec │ │ • TOML config │
│ • Shell cmds │ │ • Env overrides │
│ • Streaming │ │ • Provider │
│ • Error hdlg │ │ settings │
└─────────────────┘ └─────────────────┘
Core Components
1. g3-core: Agent Engine
Primary Responsibilities:
- Main orchestration logic for handling conversations and task execution
- Context window management with intelligent token tracking
- Built-in tool system for file operations and command execution
- Streaming response parsing with real-time tool call detection
- Error handling with automatic retry logic
Key Features:
- Context Window Intelligence: Automatic monitoring with percentage-based tracking (~80% capacity triggers auto-summarization)
- Tool System: Built-in tools for file operations (read, write, edit), shell commands, and structured output
- Streaming Parser: Real-time parsing of LLM responses with tool call detection and execution
- Session Management: Automatic session logging with detailed conversation history and token usage
- Error Recovery: Sophisticated error classification and retry logic for recoverable errors
Available Tools:
shell: Execute shell commands with streaming outputread_file: Read file contents with optional character range supportwrite_file: Create or overwrite files with contentstr_replace: Apply unified diffs to files with precise editingfinal_output: Signal task completion with detailed summaries
2. g3-providers: LLM Provider Abstraction
Primary Responsibilities:
- Unified interface for multiple LLM providers
- Provider-specific optimizations and feature support
- OAuth authentication flows
- Streaming and non-streaming completion support
Supported Providers:
- Anthropic: Claude models via API with native tool calling support
- Databricks: Foundation Model APIs with OAuth and token-based authentication
- Embedded: Local models via llama.cpp with GPU acceleration (Metal/CUDA)
- Provider Registry: Dynamic provider management and hot-swapping
Key Features:
- Native Tool Calling: Full support for structured tool calls where available
- Fallback Parsing: JSON tool call parsing for providers without native support
- OAuth Integration: Built-in OAuth flow for secure provider authentication
- Context-Aware: Provider-specific context length and token limit handling
- Streaming Support: Real-time response streaming with tool call detection
3. g3-cli: Command-Line Interface
Primary Responsibilities:
- Command-line argument parsing and validation
- Interactive terminal interface with history support
- Retro-style terminal UI (80s sci-fi inspired)
- Autonomous mode with coach-player feedback loops
- Session management and workspace handling
Execution Modes:
- Single-shot: Execute one task and exit
- Interactive: REPL-style conversation with the agent
- Autonomous: Coach-player feedback loop for complex projects
- Retro TUI: Full-screen terminal interface with real-time updates
Key Features:
- Multi-line Input: Support for complex, multi-line prompts with backslash continuation
- Context Progress: Real-time display of token usage and context window status
- Error Recovery: Automatic retry logic for timeout and recoverable errors
- History Management: Persistent command history across sessions
- Theme Support: Customizable color themes for retro mode
- Cancellation: Ctrl+C support for graceful operation cancellation
4. g3-execution: Code Execution Engine
Primary Responsibilities:
- Safe execution of shell commands and scripts
- Streaming output capture and display
- Multi-language code execution support
- Error handling and result formatting
Supported Languages:
- Bash/Shell: Direct command execution with streaming output
- Python: Script execution via temporary files
- JavaScript: Node.js-based execution
- Extensible: Framework for adding additional language support
Key Features:
- Streaming Output: Real-time command output display
- Error Capture: Comprehensive stderr and stdout handling
- Exit Code Tracking: Proper success/failure detection
- Async Execution: Non-blocking command execution
- Output Formatting: Clean, user-friendly result presentation
5. g3-config: Configuration Management
Primary Responsibilities:
- TOML-based configuration file management
- Environment variable overrides
- Provider-specific settings and credentials
- CLI argument integration
Configuration Hierarchy:
- Default configuration (embedded in code)
- Configuration files (
~/.config/g3/config.toml,./g3.toml) - Environment variables (
G3_*) - CLI arguments (highest priority)
Key Features:
- Auto-generation: Creates default configuration files if none exist
- Provider Overrides: Runtime provider and model selection
- Validation: Configuration validation with helpful error messages
- Flexible Paths: Support for shell expansion (
~, environment variables)
Advanced Features
Context Window Management
G3 implements sophisticated context window management:
- Automatic Monitoring: Tracks token usage with percentage-based thresholds
- Smart Summarization: Auto-triggers at 80% capacity to prevent context overflow
- Conversation Preservation: Maintains conversation continuity through intelligent summaries
- Provider-Specific Limits: Adapts to different model context windows (4k to 200k+ tokens)
- Cumulative Tracking: Monitors total token usage across entire sessions
Error Handling & Resilience
Comprehensive error handling system:
- Error Classification: Distinguishes between recoverable and non-recoverable errors
- Automatic Retry: Exponential backoff with jitter for rate limits, timeouts, and server errors
- Detailed Logging: Comprehensive error context including stack traces and session data
- Error Persistence: Saves detailed error logs to
logs/errors/for analysis - Graceful Degradation: Continues operation when possible, fails gracefully when not
Session Management
Automatic session tracking and logging:
- Session IDs: Generated based on initial prompts for easy identification
- Complete Logs: Full conversation history, token usage, and timing data
- JSON Format: Structured logs for easy parsing and analysis
- Automatic Cleanup: Organized in
logs/directory with timestamps - Status Tracking: Records session completion status (completed, cancelled, error)
Autonomous Mode
Advanced autonomous operation with coach-player feedback:
- Requirements-Driven: Reads
requirements.mdfor project specifications - Dual-Agent System: Separate player (implementation) and coach (review) agents
- Iterative Improvement: Multiple rounds of implementation and feedback
- Progress Tracking: Detailed reporting of turns, token usage, and final status
- Workspace Management: Automatic workspace setup and file organization
Provider Comparison
| Feature | Anthropic | Databricks | Embedded |
|---|---|---|---|
| Cost | Pay per token | Pay per token | Free after download |
| Privacy | Data sent to API | Data sent to API | Completely local |
| Performance | Very fast | Very fast | Depends on hardware |
| Model Quality | Excellent | Excellent | Good (varies by model) |
| Offline Support | No | No | Yes |
| Setup Complexity | API key only | OAuth or token | Model download required |
| Context Window | 200k tokens | Varies by model | 4k-32k tokens |
| Tool Calling | Native support | Native support | JSON fallback |
| Hardware Requirements | None | None | 4-16GB RAM, optional GPU |
Configuration Examples
Cloud-First Setup (Anthropic)
[providers]
default_provider = "anthropic"
[providers.anthropic]
api_key = "sk-ant-..."
model = "claude-3-5-sonnet-20241022"
max_tokens = 8192
temperature = 0.1
Enterprise Setup (Databricks)
[providers]
default_provider = "databricks"
[providers.databricks]
host = "https://your-workspace.cloud.databricks.com"
model = "databricks-claude-sonnet-4"
max_tokens = 32000
temperature = 0.1
use_oauth = true
Privacy-First Setup (Local Models)
[providers]
default_provider = "embedded"
[providers.embedded]
model_path = "~/.cache/g3/models/qwen2.5-7b-instruct-q3_k_m.gguf"
model_type = "qwen"
context_length = 32768
max_tokens = 2048
temperature = 0.1
gpu_layers = 32
threads = 8
Hybrid Setup
[providers]
default_provider = "embedded"
# Local model for most tasks
[providers.embedded]
model_path = "~/.cache/g3/models/codellama-7b-instruct.Q4_K_M.gguf"
model_type = "codellama"
context_length = 16384
gpu_layers = 32
# Cloud fallback for complex tasks
[providers.anthropic]
api_key = "sk-ant-..."
model = "claude-3-5-sonnet-20241022"
Usage Examples
Single-Shot Mode
g3 "implement a fibonacci function in Rust"
Interactive Mode
g3
g3> read the README and suggest improvements
g3> implement the suggestions you made
Autonomous Mode
g3 --autonomous --max-turns 10
# Reads requirements.md and implements iteratively
Retro TUI Mode
g3 --retro --theme dracula
# Full-screen terminal interface
Future Enhancements
Planned Features
- Plugin System: Custom tool and provider plugins
- Web Interface: Browser-based UI for remote access
- Model Quantization: Optimized local model deployment
- Multi-Model Ensemble: Combine multiple models for better results
- Advanced Sandboxing: Enhanced security for code execution
- Collaborative Mode: Multi-user sessions and shared workspaces
Technical Improvements
- Performance Optimization: Faster streaming and tool execution
- Memory Management: Better handling of large contexts and files
- Caching System: Intelligent caching of model responses and computations
- Monitoring: Built-in metrics and performance monitoring
- Testing: Comprehensive test suite and CI/CD integration
Development Guidelines
Code Organization
- Modular Design: Each crate has a single, well-defined responsibility
- Trait-Based: Use traits for abstraction and testability
- Error Handling: Comprehensive error types with context
- Documentation: Inline docs and examples for all public APIs
- Testing: Unit tests, integration tests, and property-based testing
Performance Considerations
- Async-First: All I/O operations are asynchronous
- Streaming: Real-time response processing where possible
- Memory Efficiency: Careful memory management for large contexts
- Caching: Strategic caching of expensive operations
- Profiling: Regular performance profiling and optimization
This design document reflects the current state of G3 as a mature, production-ready AI coding agent with sophisticated architecture and comprehensive feature set.