design doc

2025-10-14 12:33:36 +11:00
parent bfd256db3b
commit 5110da0c61
1 changed files with 284 additions and 106 deletions
--- a/DESIGN.md
+++ b/DESIGN.md
@@ -1,157 +1,273 @@
 # G3 General Purpose AI Agent - Design Document
 ## Overview
-G3 is a **code-first AI agent** that helps you complete tasks by writing and executing code or scripts. Instead of just giving advice, G3 solves problems by generating executable code in the appropriate language.
+
 G3 is a **modular, composable AI coding agent** built in Rust that helps you complete tasks by writing and executing code. It provides a flexible architecture for interacting with various Large Language Model (LLM) providers while offering powerful code generation, file manipulation, and task automation capabilities.
 The agent follows a **tool-first philosophy**: instead of just providing advice, G3 actively uses tools to read files, write code, execute commands, and complete tasks autonomously.
 ## Core Principles
-1. **Code-First Philosophy**: Always try to solve problems with executable code
+
-2. **Multi-Language Support**: Generate scripts in Python, Bash, JavaScript, Rust, etc.
+1. **Tool-First Philosophy**: Solve problems by actively using tools rather than just describing solutions
-3. **Unix Philosophy**: Small, focused tools that do one thing well
+2. **Modular Architecture**: Clear separation of concerns across multiple Rust crates
 3. **Provider Flexibility**: Support multiple LLM providers through a unified interface
 4. **Modularity**: Clear separation of concerns
 5. **Composability**: Components can be combined in different ways
-6. **Performance**: Blazing fast execution
+6. **Performance**: Built in Rust for speed and reliability
 7. **Context Intelligence**: Smart context window management with auto-summarization
 8. **Error Resilience**: Robust error handling with automatic retry logic
-## Architecture
+## Project Structure
-### High-Level Components
+G3 is organized as a Rust workspace with the following crates:
 ```
 g3/
 ├── src/main.rs                    # Main entry point
 ├── crates/
 │   ├── g3-cli/                   # Command-line interface and TUI
 │   ├── g3-core/                  # Core agent engine and logic
 │   ├── g3-providers/             # LLM provider abstractions
 │   ├── g3-config/                # Configuration management
 │   └── g3-execution/             # Code execution engine
 ├── logs/                         # Session logs (auto-created)
 ├── README.md                     # Project documentation
 └── DESIGN.md                     # This design document
 ```
 ## Architecture Overview
 ### High-Level Architecture
 ```
 ┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
-│   CLI Module    │    │  Core Engine    │    │ LLM Providers   │
+│   g3-cli        │    │   g3-core       │    │ g3-providers    │
 │                 │    │                 │    │                 │
-│ - Task commands │◄──►│ - Task          │◄──►│ - OpenAI        │
+│ • CLI parsing   │◄──►│ • Agent engine  │◄──►│ • Anthropic     │
-│ - Interactive   │    │   interpretation│    │ - Anthropic     │
+│ • Interactive   │    │ • Context mgmt  │    │ • Databricks    │
-│   mode          │    │ - Code          │    │ - Embedded      │
+│ • Retro TUI     │    │ • Tool system   │    │ • Embedded      │
-│ - Code exec     │    │   generation    │    │   (llama.cpp)   │
+│ • Autonomous    │    │ • Streaming     │    │   (llama.cpp)   │
-│   approval      │    │ - Script        │    │ - Custom APIs   │
+│   mode          │    │ • Task exec     │    │ • OAuth flow    │
 │                 │    │   execution     │    │                 │
 └─────────────────┘    └─────────────────┘    └─────────────────┘
         │                       │                       │
         └───────────────────────┼───────────────────────┘
                                 │
-                    ┌─────────────────┐
+                    ┌─────────────────┐    ┌─────────────────┐
-                    │   Execution     │
+                    │ g3-execution    │    │   g3-config     │
-                    │   Engine        │
+                    │                 │    │                 │
-                    │                 │
+                    │ • Code exec     │    │ • TOML config   │
-                    │ - Python        │
+                    │ • Shell cmds    │    │ • Env overrides │
-                    │ - Bash/Shell    │
+                    │ • Streaming     │    │ • Provider      │
-                    │ - JavaScript    │
+                    │ • Error hdlg    │    │   settings      │
-                    │ - Rust          │
+                    └─────────────────┘    └─────────────────┘
                    │ - Sandboxing    │
                    └─────────────────┘
 ```
-### Module Breakdown
+## Core Components
-#### 1. CLI Module (`g3-cli`)
+### 1. g3-core: Agent Engine
 - **Responsibility**: User interface and task interpretation
 - **New Features**:
  - Progress indicators for script execution
-#### 2. Core Engine (`g3-core`)
+**Primary Responsibilities:**
- **Responsibility**: Task interpretation and code generation
+- Main orchestration logic for handling conversations and task execution
- **New Features**:
+- Context window management with intelligent token tracking
-  - Task analysis and decomposition
+- Built-in tool system for file operations and command execution
-  - Language selection based on task type
+- Streaming response parsing with real-time tool call detection
-  - Code generation with execution context
+- Error handling with automatic retry logic
  - Script template system
  - Autonomous execution of generated code
-#### 3. LLM Providers (`g3-providers`)
+**Key Features:**
- **Responsibility**: LLM communication and model abstraction
+- **Context Window Intelligence**: Automatic monitoring with percentage-based tracking (~80% capacity triggers auto-summarization)
- **Supported Providers**:
+- **Tool System**: Built-in tools for file operations (read, write, edit), shell commands, and structured output
-  - **OpenAI**: GPT-4, GPT-3.5-turbo via API
+- **Streaming Parser**: Real-time parsing of LLM responses with tool call detection and execution
-  - **Anthropic**: Claude models via API  
+- **Session Management**: Automatic session logging with detailed conversation history and token usage
-  - **Embedded**: Local open-weights models via llama.cpp
+- **Error Recovery**: Sophisticated error classification and retry logic for recoverable errors
 - **Enhanced Prompts**:
  - Code-first system prompts
  - Language-specific generation instructions
-#### 5. Embedded Provider (`g3-core/providers/embedded`) - NEW
+**Available Tools:**
- **Responsibility**: Local model inference using llama.cpp
+- `shell`: Execute shell commands with streaming output
- **Features**:
+- `read_file`: Read file contents with optional character range support
-  - GGUF model support (Llama, CodeLlama, Mistral, etc.)
+- `write_file`: Create or overwrite files with content
-  - GPU acceleration via CUDA/Metal
+- `str_replace`: Apply unified diffs to files with precise editing
-  - Configurable context length and generation parameters
+- `final_output`: Signal task completion with detailed summaries
  - Async-compatible inference without blocking
  - Thread-safe model access
  - Stop sequence detection
-#### 4. Execution Engine (`g3-execution`) - NEW
+### 2. g3-providers: LLM Provider Abstraction
 - **Responsibility**: Safe code execution
 - **Features**:
  - Multi-language script execution
  - Sandboxing and security
  - Resource limits
  - Output capture and formatting
  - Error handling and recovery
-### Task Types and Language Selection
+**Primary Responsibilities:**
 - Unified interface for multiple LLM providers
 - Provider-specific optimizations and feature support
 - OAuth authentication flows
 - Streaming and non-streaming completion support
-| Task Type | Preferred Language | Use Cases |
+**Supported Providers:**
-|-----------|-------------------|-----------|
+- **Anthropic**: Claude models via API with native tool calling support
-| Data Processing | Python | CSV/JSON analysis, data transformation |
+- **Databricks**: Foundation Model APIs with OAuth and token-based authentication
-| File Operations | Bash/Shell | File manipulation, backups, organization |
+- **Embedded**: Local models via llama.cpp with GPU acceleration (Metal/CUDA)
-| System Admin | Bash/Shell | Process management, system monitoring |
+- **Provider Registry**: Dynamic provider management and hot-swapping
 | Text Processing | Python/Bash | Log analysis, text transformation |
 | Database | Python/SQL | Data migration, queries, reporting |
 | Image/Media | Python | Image processing, format conversion |
 | Development | Rust | Code generation, project setup |
-## Implementation Plan
+**Key Features:**
 - **Native Tool Calling**: Full support for structured tool calls where available
 - **Fallback Parsing**: JSON tool call parsing for providers without native support
 - **OAuth Integration**: Built-in OAuth flow for secure provider authentication
 - **Context-Aware**: Provider-specific context length and token limit handling
 - **Streaming Support**: Real-time response streaming with tool call detection
-### Phase 1: Core Refactoring ✅
+### 3. g3-cli: Command-Line Interface
 1. ✅ Update CLI commands for task-oriented interface
 2. ✅ Enhance system prompts for code-first approach
 3. ✅ Add basic code execution capabilities
 4. ✅ Update interactive mode messaging
-### Phase 2: Enhanced Provider Support ✅
+**Primary Responsibilities:**
-1. ✅ Implement embedded model provider using llama.cpp
+- Command-line argument parsing and validation
-2. ✅ Add GGUF model support for local inference
+- Interactive terminal interface with history support
-3. ✅ Configure GPU acceleration and performance optimization
+- Retro-style terminal UI (80s sci-fi inspired)
-4. ✅ Add comprehensive logging and debugging support
+- Autonomous mode with coach-player feedback loops
 - Session management and workspace handling
-### Phase 3: Advanced Features (Future)
+**Execution Modes:**
-1. Model quantization and optimization
+- **Single-shot**: Execute one task and exit
-2. Multi-model ensemble support
+- **Interactive**: REPL-style conversation with the agent
-3. Advanced code execution sandboxing
+- **Autonomous**: Coach-player feedback loop for complex projects
-4. Plugin system for custom providers
+- **Retro TUI**: Full-screen terminal interface with real-time updates
-5. Web interface for remote access
+
 **Key Features:**
 - **Multi-line Input**: Support for complex, multi-line prompts with backslash continuation
 - **Context Progress**: Real-time display of token usage and context window status
 - **Error Recovery**: Automatic retry logic for timeout and recoverable errors
 - **History Management**: Persistent command history across sessions
 - **Theme Support**: Customizable color themes for retro mode
 - **Cancellation**: Ctrl+C support for graceful operation cancellation
 ### 4. g3-execution: Code Execution Engine
 **Primary Responsibilities:**
 - Safe execution of shell commands and scripts
 - Streaming output capture and display
 - Multi-language code execution support
 - Error handling and result formatting
 **Supported Languages:**
 - **Bash/Shell**: Direct command execution with streaming output
 - **Python**: Script execution via temporary files
 - **JavaScript**: Node.js-based execution
 - **Extensible**: Framework for adding additional language support
 **Key Features:**
 - **Streaming Output**: Real-time command output display
 - **Error Capture**: Comprehensive stderr and stdout handling
 - **Exit Code Tracking**: Proper success/failure detection
 - **Async Execution**: Non-blocking command execution
 - **Output Formatting**: Clean, user-friendly result presentation
 ### 5. g3-config: Configuration Management
 **Primary Responsibilities:**
 - TOML-based configuration file management
 - Environment variable overrides
 - Provider-specific settings and credentials
 - CLI argument integration
 **Configuration Hierarchy:**
 1. Default configuration (embedded in code)
 2. Configuration files (`~/.config/g3/config.toml`, `./g3.toml`)
 3. Environment variables (`G3_*`)
 4. CLI arguments (highest priority)
 **Key Features:**
 - **Auto-generation**: Creates default configuration files if none exist
 - **Provider Overrides**: Runtime provider and model selection
 - **Validation**: Configuration validation with helpful error messages
 - **Flexible Paths**: Support for shell expansion (`~`, environment variables)
 ## Advanced Features
 ### Context Window Management
 G3 implements sophisticated context window management:
 - **Automatic Monitoring**: Tracks token usage with percentage-based thresholds
 - **Smart Summarization**: Auto-triggers at 80% capacity to prevent context overflow
 - **Conversation Preservation**: Maintains conversation continuity through intelligent summaries
 - **Provider-Specific Limits**: Adapts to different model context windows (4k to 200k+ tokens)
 - **Cumulative Tracking**: Monitors total token usage across entire sessions
 ### Error Handling & Resilience
 Comprehensive error handling system:
 - **Error Classification**: Distinguishes between recoverable and non-recoverable errors
 - **Automatic Retry**: Exponential backoff with jitter for rate limits, timeouts, and server errors
 - **Detailed Logging**: Comprehensive error context including stack traces and session data
 - **Error Persistence**: Saves detailed error logs to `logs/errors/` for analysis
 - **Graceful Degradation**: Continues operation when possible, fails gracefully when not
 ### Session Management
 Automatic session tracking and logging:
 - **Session IDs**: Generated based on initial prompts for easy identification
 - **Complete Logs**: Full conversation history, token usage, and timing data
 - **JSON Format**: Structured logs for easy parsing and analysis
 - **Automatic Cleanup**: Organized in `logs/` directory with timestamps
 - **Status Tracking**: Records session completion status (completed, cancelled, error)
 ### Autonomous Mode
 Advanced autonomous operation with coach-player feedback:
 - **Requirements-Driven**: Reads `requirements.md` for project specifications
 - **Dual-Agent System**: Separate player (implementation) and coach (review) agents
 - **Iterative Improvement**: Multiple rounds of implementation and feedback
 - **Progress Tracking**: Detailed reporting of turns, token usage, and final status
 - **Workspace Management**: Automatic workspace setup and file organization
 ## Provider Comparison
-| Feature | OpenAI | Anthropic | Embedded |
+| Feature | Anthropic | Databricks | Embedded |
-|---------|--------|-----------|----------|
+|---------|-----------|------------|----------|
 | **Cost** | Pay per token | Pay per token | Free after download |
 | **Privacy** | Data sent to API | Data sent to API | Completely local |
 | **Performance** | Very fast | Very fast | Depends on hardware |
 | **Model Quality** | Excellent | Excellent | Good (varies by model) |
 | **Offline Support** | No | No | Yes |
-| **Setup Complexity** | API key only | API key only | Model download required |
+| **Setup Complexity** | API key only | OAuth or token | Model download required |
 | **Context Window** | 200k tokens | Varies by model | 4k-32k tokens |
 | **Tool Calling** | Native support | Native support | JSON fallback |
 | **Hardware Requirements** | None | None | 4-16GB RAM, optional GPU |
 ## Configuration Examples
-### Cloud-First Setup
+### Cloud-First Setup (Anthropic)
 ```toml
 [providers]
-default_provider = "openai"
+default_provider = "anthropic"
-[providers.openai]
+[providers.anthropic]
-api_key = "sk-..."
+api_key = "sk-ant-..."
-model = "gpt-4"
+model = "claude-3-5-sonnet-20241022"
 max_tokens = 8192
 temperature = 0.1
 ```
-### Privacy-First Setup  
+### Enterprise Setup (Databricks)
 ```toml
 [providers]
 default_provider = "databricks"
 [providers.databricks]
 host = "https://your-workspace.cloud.databricks.com"
 model = "databricks-claude-sonnet-4"
 max_tokens = 32000
 temperature = 0.1
 use_oauth = true
 ```
 ### Privacy-First Setup (Local Models)
 ```toml
 [providers]
 default_provider = "embedded"
 [providers.embedded]
-model_path = "~/.cache/g3/models/codellama-7b-instruct.Q4_K_M.gguf"
+model_path = "~/.cache/g3/models/qwen2.5-7b-instruct-q3_k_m.gguf"
-model_type = "codellama"
+model_type = "qwen"
 context_length = 32768
 max_tokens = 2048
 temperature = 0.1
 gpu_layers = 32
 threads = 8
 ```
 ### Hybrid Setup
@@ -159,14 +275,76 @@ gpu_layers = 32
 [providers]
 default_provider = "embedded"
-# Use embedded for most tasks
+# Local model for most tasks
 [providers.embedded]
 model_path = "~/.cache/g3/models/codellama-7b-instruct.Q4_K_M.gguf"
 model_type = "codellama"
 context_length = 16384
 gpu_layers = 32
-# Fallback to cloud for complex tasks
+# Cloud fallback for complex tasks
-[providers.openai]
+[providers.anthropic]
-api_key = "sk-..."
+api_key = "sk-ant-..."
-model = "gpt-4"
+model = "claude-3-5-sonnet-20241022"
 ```
 ## Usage Examples
 ### Single-Shot Mode
 ```bash
 g3 "implement a fibonacci function in Rust"
 ```
 ### Interactive Mode
 ```bash
 g3
 g3> read the README and suggest improvements
 g3> implement the suggestions you made
 ```
 ### Autonomous Mode
 ```bash
 g3 --autonomous --max-turns 10
 # Reads requirements.md and implements iteratively
 ```
 ### Retro TUI Mode
 ```bash
 g3 --retro --theme dracula
 # Full-screen terminal interface
 ```
 ## Future Enhancements
 ### Planned Features
 - **Plugin System**: Custom tool and provider plugins
 - **Web Interface**: Browser-based UI for remote access
 - **Model Quantization**: Optimized local model deployment
 - **Multi-Model Ensemble**: Combine multiple models for better results
 - **Advanced Sandboxing**: Enhanced security for code execution
 - **Collaborative Mode**: Multi-user sessions and shared workspaces
 ### Technical Improvements
 - **Performance Optimization**: Faster streaming and tool execution
 - **Memory Management**: Better handling of large contexts and files
 - **Caching System**: Intelligent caching of model responses and computations
 - **Monitoring**: Built-in metrics and performance monitoring
 - **Testing**: Comprehensive test suite and CI/CD integration
 ## Development Guidelines
 ### Code Organization
 - **Modular Design**: Each crate has a single, well-defined responsibility
 - **Trait-Based**: Use traits for abstraction and testability
 - **Error Handling**: Comprehensive error types with context
 - **Documentation**: Inline docs and examples for all public APIs
 - **Testing**: Unit tests, integration tests, and property-based testing
 ### Performance Considerations
 - **Async-First**: All I/O operations are asynchronous
 - **Streaming**: Real-time response processing where possible
 - **Memory Efficiency**: Careful memory management for large contexts
 - **Caching**: Strategic caching of expensive operations
 - **Profiling**: Regular performance profiling and optimization
 This design document reflects the current state of G3 as a mature, production-ready AI coding agent with sophisticated architecture and comprehensive feature set.