diff --git a/DESIGN.md b/DESIGN.md index 45696ce..05c9230 100644 --- a/DESIGN.md +++ b/DESIGN.md @@ -1,157 +1,273 @@ # G3 General Purpose AI Agent - Design Document ## Overview -G3 is a **code-first AI agent** that helps you complete tasks by writing and executing code or scripts. Instead of just giving advice, G3 solves problems by generating executable code in the appropriate language. + +G3 is a **modular, composable AI coding agent** built in Rust that helps you complete tasks by writing and executing code. It provides a flexible architecture for interacting with various Large Language Model (LLM) providers while offering powerful code generation, file manipulation, and task automation capabilities. + +The agent follows a **tool-first philosophy**: instead of just providing advice, G3 actively uses tools to read files, write code, execute commands, and complete tasks autonomously. ## Core Principles -1. **Code-First Philosophy**: Always try to solve problems with executable code -2. **Multi-Language Support**: Generate scripts in Python, Bash, JavaScript, Rust, etc. -3. **Unix Philosophy**: Small, focused tools that do one thing well + +1. **Tool-First Philosophy**: Solve problems by actively using tools rather than just describing solutions +2. **Modular Architecture**: Clear separation of concerns across multiple Rust crates +3. **Provider Flexibility**: Support multiple LLM providers through a unified interface 4. **Modularity**: Clear separation of concerns 5. **Composability**: Components can be combined in different ways -6. **Performance**: Blazing fast execution +6. **Performance**: Built in Rust for speed and reliability +7. **Context Intelligence**: Smart context window management with auto-summarization +8. **Error Resilience**: Robust error handling with automatic retry logic -## Architecture +## Project Structure -### High-Level Components +G3 is organized as a Rust workspace with the following crates: + +``` +g3/ +├── src/main.rs # Main entry point +├── crates/ +│ ├── g3-cli/ # Command-line interface and TUI +│ ├── g3-core/ # Core agent engine and logic +│ ├── g3-providers/ # LLM provider abstractions +│ ├── g3-config/ # Configuration management +│ └── g3-execution/ # Code execution engine +├── logs/ # Session logs (auto-created) +├── README.md # Project documentation +└── DESIGN.md # This design document +``` + +## Architecture Overview + +### High-Level Architecture ``` ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ -│ CLI Module │ │ Core Engine │ │ LLM Providers │ +│ g3-cli │ │ g3-core │ │ g3-providers │ │ │ │ │ │ │ -│ - Task commands │◄──►│ - Task │◄──►│ - OpenAI │ -│ - Interactive │ │ interpretation│ │ - Anthropic │ -│ mode │ │ - Code │ │ - Embedded │ -│ - Code exec │ │ generation │ │ (llama.cpp) │ -│ approval │ │ - Script │ │ - Custom APIs │ -│ │ │ execution │ │ │ +│ • CLI parsing │◄──►│ • Agent engine │◄──►│ • Anthropic │ +│ • Interactive │ │ • Context mgmt │ │ • Databricks │ +│ • Retro TUI │ │ • Tool system │ │ • Embedded │ +│ • Autonomous │ │ • Streaming │ │ (llama.cpp) │ +│ mode │ │ • Task exec │ │ • OAuth flow │ └─────────────────┘ └─────────────────┘ └─────────────────┘ │ │ │ └───────────────────────┼───────────────────────┘ │ - ┌─────────────────┐ - │ Execution │ - │ Engine │ - │ │ - │ - Python │ - │ - Bash/Shell │ - │ - JavaScript │ - │ - Rust │ - │ - Sandboxing │ - └─────────────────┘ + ┌─────────────────┐ ┌─────────────────┐ + │ g3-execution │ │ g3-config │ + │ │ │ │ + │ • Code exec │ │ • TOML config │ + │ • Shell cmds │ │ • Env overrides │ + │ • Streaming │ │ • Provider │ + │ • Error hdlg │ │ settings │ + └─────────────────┘ └─────────────────┘ ``` -### Module Breakdown +## Core Components -#### 1. CLI Module (`g3-cli`) -- **Responsibility**: User interface and task interpretation -- **New Features**: - - Progress indicators for script execution +### 1. g3-core: Agent Engine -#### 2. Core Engine (`g3-core`) -- **Responsibility**: Task interpretation and code generation -- **New Features**: - - Task analysis and decomposition - - Language selection based on task type - - Code generation with execution context - - Script template system - - Autonomous execution of generated code +**Primary Responsibilities:** +- Main orchestration logic for handling conversations and task execution +- Context window management with intelligent token tracking +- Built-in tool system for file operations and command execution +- Streaming response parsing with real-time tool call detection +- Error handling with automatic retry logic -#### 3. LLM Providers (`g3-providers`) -- **Responsibility**: LLM communication and model abstraction -- **Supported Providers**: - - **OpenAI**: GPT-4, GPT-3.5-turbo via API - - **Anthropic**: Claude models via API - - **Embedded**: Local open-weights models via llama.cpp -- **Enhanced Prompts**: - - Code-first system prompts - - Language-specific generation instructions +**Key Features:** +- **Context Window Intelligence**: Automatic monitoring with percentage-based tracking (~80% capacity triggers auto-summarization) +- **Tool System**: Built-in tools for file operations (read, write, edit), shell commands, and structured output +- **Streaming Parser**: Real-time parsing of LLM responses with tool call detection and execution +- **Session Management**: Automatic session logging with detailed conversation history and token usage +- **Error Recovery**: Sophisticated error classification and retry logic for recoverable errors -#### 5. Embedded Provider (`g3-core/providers/embedded`) - NEW -- **Responsibility**: Local model inference using llama.cpp -- **Features**: - - GGUF model support (Llama, CodeLlama, Mistral, etc.) - - GPU acceleration via CUDA/Metal - - Configurable context length and generation parameters - - Async-compatible inference without blocking - - Thread-safe model access - - Stop sequence detection +**Available Tools:** +- `shell`: Execute shell commands with streaming output +- `read_file`: Read file contents with optional character range support +- `write_file`: Create or overwrite files with content +- `str_replace`: Apply unified diffs to files with precise editing +- `final_output`: Signal task completion with detailed summaries -#### 4. Execution Engine (`g3-execution`) - NEW -- **Responsibility**: Safe code execution -- **Features**: - - Multi-language script execution - - Sandboxing and security - - Resource limits - - Output capture and formatting - - Error handling and recovery +### 2. g3-providers: LLM Provider Abstraction -### Task Types and Language Selection +**Primary Responsibilities:** +- Unified interface for multiple LLM providers +- Provider-specific optimizations and feature support +- OAuth authentication flows +- Streaming and non-streaming completion support -| Task Type | Preferred Language | Use Cases | -|-----------|-------------------|-----------| -| Data Processing | Python | CSV/JSON analysis, data transformation | -| File Operations | Bash/Shell | File manipulation, backups, organization | -| System Admin | Bash/Shell | Process management, system monitoring | -| Text Processing | Python/Bash | Log analysis, text transformation | -| Database | Python/SQL | Data migration, queries, reporting | -| Image/Media | Python | Image processing, format conversion | -| Development | Rust | Code generation, project setup | +**Supported Providers:** +- **Anthropic**: Claude models via API with native tool calling support +- **Databricks**: Foundation Model APIs with OAuth and token-based authentication +- **Embedded**: Local models via llama.cpp with GPU acceleration (Metal/CUDA) +- **Provider Registry**: Dynamic provider management and hot-swapping -## Implementation Plan +**Key Features:** +- **Native Tool Calling**: Full support for structured tool calls where available +- **Fallback Parsing**: JSON tool call parsing for providers without native support +- **OAuth Integration**: Built-in OAuth flow for secure provider authentication +- **Context-Aware**: Provider-specific context length and token limit handling +- **Streaming Support**: Real-time response streaming with tool call detection -### Phase 1: Core Refactoring ✅ -1. ✅ Update CLI commands for task-oriented interface -2. ✅ Enhance system prompts for code-first approach -3. ✅ Add basic code execution capabilities -4. ✅ Update interactive mode messaging +### 3. g3-cli: Command-Line Interface -### Phase 2: Enhanced Provider Support ✅ -1. ✅ Implement embedded model provider using llama.cpp -2. ✅ Add GGUF model support for local inference -3. ✅ Configure GPU acceleration and performance optimization -4. ✅ Add comprehensive logging and debugging support +**Primary Responsibilities:** +- Command-line argument parsing and validation +- Interactive terminal interface with history support +- Retro-style terminal UI (80s sci-fi inspired) +- Autonomous mode with coach-player feedback loops +- Session management and workspace handling -### Phase 3: Advanced Features (Future) -1. Model quantization and optimization -2. Multi-model ensemble support -3. Advanced code execution sandboxing -4. Plugin system for custom providers -5. Web interface for remote access +**Execution Modes:** +- **Single-shot**: Execute one task and exit +- **Interactive**: REPL-style conversation with the agent +- **Autonomous**: Coach-player feedback loop for complex projects +- **Retro TUI**: Full-screen terminal interface with real-time updates + +**Key Features:** +- **Multi-line Input**: Support for complex, multi-line prompts with backslash continuation +- **Context Progress**: Real-time display of token usage and context window status +- **Error Recovery**: Automatic retry logic for timeout and recoverable errors +- **History Management**: Persistent command history across sessions +- **Theme Support**: Customizable color themes for retro mode +- **Cancellation**: Ctrl+C support for graceful operation cancellation + +### 4. g3-execution: Code Execution Engine + +**Primary Responsibilities:** +- Safe execution of shell commands and scripts +- Streaming output capture and display +- Multi-language code execution support +- Error handling and result formatting + +**Supported Languages:** +- **Bash/Shell**: Direct command execution with streaming output +- **Python**: Script execution via temporary files +- **JavaScript**: Node.js-based execution +- **Extensible**: Framework for adding additional language support + +**Key Features:** +- **Streaming Output**: Real-time command output display +- **Error Capture**: Comprehensive stderr and stdout handling +- **Exit Code Tracking**: Proper success/failure detection +- **Async Execution**: Non-blocking command execution +- **Output Formatting**: Clean, user-friendly result presentation + +### 5. g3-config: Configuration Management + +**Primary Responsibilities:** +- TOML-based configuration file management +- Environment variable overrides +- Provider-specific settings and credentials +- CLI argument integration + +**Configuration Hierarchy:** +1. Default configuration (embedded in code) +2. Configuration files (`~/.config/g3/config.toml`, `./g3.toml`) +3. Environment variables (`G3_*`) +4. CLI arguments (highest priority) + +**Key Features:** +- **Auto-generation**: Creates default configuration files if none exist +- **Provider Overrides**: Runtime provider and model selection +- **Validation**: Configuration validation with helpful error messages +- **Flexible Paths**: Support for shell expansion (`~`, environment variables) + +## Advanced Features + +### Context Window Management + +G3 implements sophisticated context window management: + +- **Automatic Monitoring**: Tracks token usage with percentage-based thresholds +- **Smart Summarization**: Auto-triggers at 80% capacity to prevent context overflow +- **Conversation Preservation**: Maintains conversation continuity through intelligent summaries +- **Provider-Specific Limits**: Adapts to different model context windows (4k to 200k+ tokens) +- **Cumulative Tracking**: Monitors total token usage across entire sessions + +### Error Handling & Resilience + +Comprehensive error handling system: + +- **Error Classification**: Distinguishes between recoverable and non-recoverable errors +- **Automatic Retry**: Exponential backoff with jitter for rate limits, timeouts, and server errors +- **Detailed Logging**: Comprehensive error context including stack traces and session data +- **Error Persistence**: Saves detailed error logs to `logs/errors/` for analysis +- **Graceful Degradation**: Continues operation when possible, fails gracefully when not + +### Session Management + +Automatic session tracking and logging: + +- **Session IDs**: Generated based on initial prompts for easy identification +- **Complete Logs**: Full conversation history, token usage, and timing data +- **JSON Format**: Structured logs for easy parsing and analysis +- **Automatic Cleanup**: Organized in `logs/` directory with timestamps +- **Status Tracking**: Records session completion status (completed, cancelled, error) + +### Autonomous Mode + +Advanced autonomous operation with coach-player feedback: + +- **Requirements-Driven**: Reads `requirements.md` for project specifications +- **Dual-Agent System**: Separate player (implementation) and coach (review) agents +- **Iterative Improvement**: Multiple rounds of implementation and feedback +- **Progress Tracking**: Detailed reporting of turns, token usage, and final status +- **Workspace Management**: Automatic workspace setup and file organization ## Provider Comparison -| Feature | OpenAI | Anthropic | Embedded | -|---------|--------|-----------|----------| +| Feature | Anthropic | Databricks | Embedded | +|---------|-----------|------------|----------| | **Cost** | Pay per token | Pay per token | Free after download | | **Privacy** | Data sent to API | Data sent to API | Completely local | | **Performance** | Very fast | Very fast | Depends on hardware | | **Model Quality** | Excellent | Excellent | Good (varies by model) | | **Offline Support** | No | No | Yes | -| **Setup Complexity** | API key only | API key only | Model download required | +| **Setup Complexity** | API key only | OAuth or token | Model download required | +| **Context Window** | 200k tokens | Varies by model | 4k-32k tokens | +| **Tool Calling** | Native support | Native support | JSON fallback | | **Hardware Requirements** | None | None | 4-16GB RAM, optional GPU | ## Configuration Examples -### Cloud-First Setup +### Cloud-First Setup (Anthropic) ```toml [providers] -default_provider = "openai" +default_provider = "anthropic" -[providers.openai] -api_key = "sk-..." -model = "gpt-4" +[providers.anthropic] +api_key = "sk-ant-..." +model = "claude-3-5-sonnet-20241022" +max_tokens = 8192 +temperature = 0.1 ``` -### Privacy-First Setup +### Enterprise Setup (Databricks) +```toml +[providers] +default_provider = "databricks" + +[providers.databricks] +host = "https://your-workspace.cloud.databricks.com" +model = "databricks-claude-sonnet-4" +max_tokens = 32000 +temperature = 0.1 +use_oauth = true +``` + +### Privacy-First Setup (Local Models) ```toml [providers] default_provider = "embedded" [providers.embedded] -model_path = "~/.cache/g3/models/codellama-7b-instruct.Q4_K_M.gguf" -model_type = "codellama" +model_path = "~/.cache/g3/models/qwen2.5-7b-instruct-q3_k_m.gguf" +model_type = "qwen" +context_length = 32768 +max_tokens = 2048 +temperature = 0.1 gpu_layers = 32 +threads = 8 ``` ### Hybrid Setup @@ -159,14 +275,76 @@ gpu_layers = 32 [providers] default_provider = "embedded" -# Use embedded for most tasks +# Local model for most tasks [providers.embedded] model_path = "~/.cache/g3/models/codellama-7b-instruct.Q4_K_M.gguf" model_type = "codellama" +context_length = 16384 gpu_layers = 32 -# Fallback to cloud for complex tasks -[providers.openai] -api_key = "sk-..." -model = "gpt-4" +# Cloud fallback for complex tasks +[providers.anthropic] +api_key = "sk-ant-..." +model = "claude-3-5-sonnet-20241022" ``` + +## Usage Examples + +### Single-Shot Mode +```bash +g3 "implement a fibonacci function in Rust" +``` + +### Interactive Mode +```bash +g3 +g3> read the README and suggest improvements +g3> implement the suggestions you made +``` + +### Autonomous Mode +```bash +g3 --autonomous --max-turns 10 +# Reads requirements.md and implements iteratively +``` + +### Retro TUI Mode +```bash +g3 --retro --theme dracula +# Full-screen terminal interface +``` + +## Future Enhancements + +### Planned Features +- **Plugin System**: Custom tool and provider plugins +- **Web Interface**: Browser-based UI for remote access +- **Model Quantization**: Optimized local model deployment +- **Multi-Model Ensemble**: Combine multiple models for better results +- **Advanced Sandboxing**: Enhanced security for code execution +- **Collaborative Mode**: Multi-user sessions and shared workspaces + +### Technical Improvements +- **Performance Optimization**: Faster streaming and tool execution +- **Memory Management**: Better handling of large contexts and files +- **Caching System**: Intelligent caching of model responses and computations +- **Monitoring**: Built-in metrics and performance monitoring +- **Testing**: Comprehensive test suite and CI/CD integration + +## Development Guidelines + +### Code Organization +- **Modular Design**: Each crate has a single, well-defined responsibility +- **Trait-Based**: Use traits for abstraction and testability +- **Error Handling**: Comprehensive error types with context +- **Documentation**: Inline docs and examples for all public APIs +- **Testing**: Unit tests, integration tests, and property-based testing + +### Performance Considerations +- **Async-First**: All I/O operations are asynchronous +- **Streaming**: Real-time response processing where possible +- **Memory Efficiency**: Careful memory management for large contexts +- **Caching**: Strategic caching of expensive operations +- **Profiling**: Regular performance profiling and optimization + +This design document reflects the current state of G3 as a mature, production-ready AI coding agent with sophisticated architecture and comprehensive feature set.