design doc

2025-10-14 12:33:36 +11:00
parent bfd256db3b
commit 5110da0c61
1 changed files with 284 additions and 106 deletions
--- a/DESIGN.md
+++ b/DESIGN.md
@@ -1,157 +1,273 @@
 # G3 General Purpose AI Agent - Design Document

 ## Overview
-G3 is a **code-first AI agent** that helps you complete tasks by writing and executing code or scripts. Instead of just giving advice, G3 solves problems by generating executable code in the appropriate language.
+
+G3 is a **modular, composable AI coding agent** built in Rust that helps you complete tasks by writing and executing code. It provides a flexible architecture for interacting with various Large Language Model (LLM) providers while offering powerful code generation, file manipulation, and task automation capabilities.
+
+The agent follows a **tool-first philosophy**: instead of just providing advice, G3 actively uses tools to read files, write code, execute commands, and complete tasks autonomously.

 ## Core Principles
-1. **Code-First Philosophy**: Always try to solve problems with executable code
-2. **Multi-Language Support**: Generate scripts in Python, Bash, JavaScript, Rust, etc.
-3. **Unix Philosophy**: Small, focused tools that do one thing well
+
+1. **Tool-First Philosophy**: Solve problems by actively using tools rather than just describing solutions
+2. **Modular Architecture**: Clear separation of concerns across multiple Rust crates
+3. **Provider Flexibility**: Support multiple LLM providers through a unified interface
 4. **Modularity**: Clear separation of concerns
 5. **Composability**: Components can be combined in different ways
-6. **Performance**: Blazing fast execution
+6. **Performance**: Built in Rust for speed and reliability
+7. **Context Intelligence**: Smart context window management with auto-summarization
+8. **Error Resilience**: Robust error handling with automatic retry logic

-## Architecture
+## Project Structure

-### High-Level Components
+G3 is organized as a Rust workspace with the following crates:
+
+```
+g3/
+├── src/main.rs                    # Main entry point
+├── crates/
+│   ├── g3-cli/                   # Command-line interface and TUI
+│   ├── g3-core/                  # Core agent engine and logic
+│   ├── g3-providers/             # LLM provider abstractions
+│   ├── g3-config/                # Configuration management
+│   └── g3-execution/             # Code execution engine
+├── logs/                         # Session logs (auto-created)
+├── README.md                     # Project documentation
+└── DESIGN.md                     # This design document
+```
+
+## Architecture Overview
+
+### High-Level Architecture

 ```
 ┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
-│   CLI Module    │    │  Core Engine    │    │ LLM Providers   │
+│   g3-cli        │    │   g3-core       │    │ g3-providers    │
 │                 │    │                 │    │                 │
-│ - Task commands │◄──►│ - Task          │◄──►│ - OpenAI        │
-│ - Interactive   │    │   interpretation│    │ - Anthropic     │
-│   mode          │    │ - Code          │    │ - Embedded      │
-│ - Code exec     │    │   generation    │    │   (llama.cpp)   │
-│   approval      │    │ - Script        │    │ - Custom APIs   │
-│                 │    │   execution     │    │                 │
+│ • CLI parsing   │◄──►│ • Agent engine  │◄──►│ • Anthropic     │
+│ • Interactive   │    │ • Context mgmt  │    │ • Databricks    │
+│ • Retro TUI     │    │ • Tool system   │    │ • Embedded      │
+│ • Autonomous    │    │ • Streaming     │    │   (llama.cpp)   │
+│   mode          │    │ • Task exec     │    │ • OAuth flow    │
 └─────────────────┘    └─────────────────┘    └─────────────────┘
         │                       │                       │
         └───────────────────────┼───────────────────────┘
                                 │
-                    ┌─────────────────┐
-                    │   Execution     │
-                    │   Engine        │
-                    │                 │
-                    │ - Python        │
-                    │ - Bash/Shell    │
-                    │ - JavaScript    │
-                    │ - Rust          │
-                    │ - Sandboxing    │
-                    └─────────────────┘
+                    ┌─────────────────┐    ┌─────────────────┐
+                    │ g3-execution    │    │   g3-config     │
+                    │                 │    │                 │
+                    │ • Code exec     │    │ • TOML config   │
+                    │ • Shell cmds    │    │ • Env overrides │
+                    │ • Streaming     │    │ • Provider      │
+                    │ • Error hdlg    │    │   settings      │
+                    └─────────────────┘    └─────────────────┘
 ```

-### Module Breakdown
+## Core Components

-#### 1. CLI Module (`g3-cli`)
- **Responsibility**: User interface and task interpretation
- **New Features**:
-  - Progress indicators for script execution
+### 1. g3-core: Agent Engine

-#### 2. Core Engine (`g3-core`)
- **Responsibility**: Task interpretation and code generation
- **New Features**:
-  - Task analysis and decomposition
-  - Language selection based on task type
-  - Code generation with execution context
-  - Script template system
-  - Autonomous execution of generated code
+**Primary Responsibilities:**
+- Main orchestration logic for handling conversations and task execution
+- Context window management with intelligent token tracking
+- Built-in tool system for file operations and command execution
+- Streaming response parsing with real-time tool call detection
+- Error handling with automatic retry logic

-#### 3. LLM Providers (`g3-providers`)
- **Responsibility**: LLM communication and model abstraction
- **Supported Providers**:
-  - **OpenAI**: GPT-4, GPT-3.5-turbo via API
-  - **Anthropic**: Claude models via API  
-  - **Embedded**: Local open-weights models via llama.cpp
- **Enhanced Prompts**:
-  - Code-first system prompts
-  - Language-specific generation instructions
+**Key Features:**
+- **Context Window Intelligence**: Automatic monitoring with percentage-based tracking (~80% capacity triggers auto-summarization)
+- **Tool System**: Built-in tools for file operations (read, write, edit), shell commands, and structured output
+- **Streaming Parser**: Real-time parsing of LLM responses with tool call detection and execution
+- **Session Management**: Automatic session logging with detailed conversation history and token usage
+- **Error Recovery**: Sophisticated error classification and retry logic for recoverable errors

-#### 5. Embedded Provider (`g3-core/providers/embedded`) - NEW
- **Responsibility**: Local model inference using llama.cpp
- **Features**:
-  - GGUF model support (Llama, CodeLlama, Mistral, etc.)
-  - GPU acceleration via CUDA/Metal
-  - Configurable context length and generation parameters
-  - Async-compatible inference without blocking
-  - Thread-safe model access
-  - Stop sequence detection
+**Available Tools:**
+- `shell`: Execute shell commands with streaming output
+- `read_file`: Read file contents with optional character range support
+- `write_file`: Create or overwrite files with content
+- `str_replace`: Apply unified diffs to files with precise editing
+- `final_output`: Signal task completion with detailed summaries

-#### 4. Execution Engine (`g3-execution`) - NEW
- **Responsibility**: Safe code execution
- **Features**:
-  - Multi-language script execution
-  - Sandboxing and security
-  - Resource limits
-  - Output capture and formatting
-  - Error handling and recovery
+### 2. g3-providers: LLM Provider Abstraction

-### Task Types and Language Selection
+**Primary Responsibilities:**
+- Unified interface for multiple LLM providers
+- Provider-specific optimizations and feature support
+- OAuth authentication flows
+- Streaming and non-streaming completion support

-| Task Type | Preferred Language | Use Cases |
-|-----------|-------------------|-----------|
-| Data Processing | Python | CSV/JSON analysis, data transformation |
-| File Operations | Bash/Shell | File manipulation, backups, organization |
-| System Admin | Bash/Shell | Process management, system monitoring |
-| Text Processing | Python/Bash | Log analysis, text transformation |
-| Database | Python/SQL | Data migration, queries, reporting |
-| Image/Media | Python | Image processing, format conversion |
-| Development | Rust | Code generation, project setup |
+**Supported Providers:**
+- **Anthropic**: Claude models via API with native tool calling support
+- **Databricks**: Foundation Model APIs with OAuth and token-based authentication
+- **Embedded**: Local models via llama.cpp with GPU acceleration (Metal/CUDA)
+- **Provider Registry**: Dynamic provider management and hot-swapping

-## Implementation Plan
+**Key Features:**
+- **Native Tool Calling**: Full support for structured tool calls where available
+- **Fallback Parsing**: JSON tool call parsing for providers without native support
+- **OAuth Integration**: Built-in OAuth flow for secure provider authentication
+- **Context-Aware**: Provider-specific context length and token limit handling
+- **Streaming Support**: Real-time response streaming with tool call detection

-### Phase 1: Core Refactoring ✅
-1. ✅ Update CLI commands for task-oriented interface
-2. ✅ Enhance system prompts for code-first approach
-3. ✅ Add basic code execution capabilities
-4. ✅ Update interactive mode messaging
+### 3. g3-cli: Command-Line Interface

-### Phase 2: Enhanced Provider Support ✅
-1. ✅ Implement embedded model provider using llama.cpp
-2. ✅ Add GGUF model support for local inference
-3. ✅ Configure GPU acceleration and performance optimization
-4. ✅ Add comprehensive logging and debugging support
+**Primary Responsibilities:**
+- Command-line argument parsing and validation
+- Interactive terminal interface with history support
+- Retro-style terminal UI (80s sci-fi inspired)
+- Autonomous mode with coach-player feedback loops
+- Session management and workspace handling

-### Phase 3: Advanced Features (Future)
-1. Model quantization and optimization
-2. Multi-model ensemble support
-3. Advanced code execution sandboxing
-4. Plugin system for custom providers
-5. Web interface for remote access
+**Execution Modes:**
+- **Single-shot**: Execute one task and exit
+- **Interactive**: REPL-style conversation with the agent
+- **Autonomous**: Coach-player feedback loop for complex projects
+- **Retro TUI**: Full-screen terminal interface with real-time updates
+
+**Key Features:**
+- **Multi-line Input**: Support for complex, multi-line prompts with backslash continuation
+- **Context Progress**: Real-time display of token usage and context window status
+- **Error Recovery**: Automatic retry logic for timeout and recoverable errors
+- **History Management**: Persistent command history across sessions
+- **Theme Support**: Customizable color themes for retro mode
+- **Cancellation**: Ctrl+C support for graceful operation cancellation
+
+### 4. g3-execution: Code Execution Engine
+
+**Primary Responsibilities:**
+- Safe execution of shell commands and scripts
+- Streaming output capture and display
+- Multi-language code execution support
+- Error handling and result formatting
+
+**Supported Languages:**
+- **Bash/Shell**: Direct command execution with streaming output
+- **Python**: Script execution via temporary files
+- **JavaScript**: Node.js-based execution
+- **Extensible**: Framework for adding additional language support
+
+**Key Features:**
+- **Streaming Output**: Real-time command output display
+- **Error Capture**: Comprehensive stderr and stdout handling
+- **Exit Code Tracking**: Proper success/failure detection
+- **Async Execution**: Non-blocking command execution
+- **Output Formatting**: Clean, user-friendly result presentation
+
+### 5. g3-config: Configuration Management
+
+**Primary Responsibilities:**
+- TOML-based configuration file management
+- Environment variable overrides
+- Provider-specific settings and credentials
+- CLI argument integration
+
+**Configuration Hierarchy:**
+1. Default configuration (embedded in code)
+2. Configuration files (`~/.config/g3/config.toml`, `./g3.toml`)
+3. Environment variables (`G3_*`)
+4. CLI arguments (highest priority)
+
+**Key Features:**
+- **Auto-generation**: Creates default configuration files if none exist
+- **Provider Overrides**: Runtime provider and model selection
+- **Validation**: Configuration validation with helpful error messages
+- **Flexible Paths**: Support for shell expansion (`~`, environment variables)
+
+## Advanced Features
+
+### Context Window Management
+
+G3 implements sophisticated context window management:
+
+- **Automatic Monitoring**: Tracks token usage with percentage-based thresholds
+- **Smart Summarization**: Auto-triggers at 80% capacity to prevent context overflow
+- **Conversation Preservation**: Maintains conversation continuity through intelligent summaries
+- **Provider-Specific Limits**: Adapts to different model context windows (4k to 200k+ tokens)
+- **Cumulative Tracking**: Monitors total token usage across entire sessions
+
+### Error Handling & Resilience
+
+Comprehensive error handling system:
+
+- **Error Classification**: Distinguishes between recoverable and non-recoverable errors
+- **Automatic Retry**: Exponential backoff with jitter for rate limits, timeouts, and server errors
+- **Detailed Logging**: Comprehensive error context including stack traces and session data
+- **Error Persistence**: Saves detailed error logs to `logs/errors/` for analysis
+- **Graceful Degradation**: Continues operation when possible, fails gracefully when not
+
+### Session Management
+
+Automatic session tracking and logging:
+
+- **Session IDs**: Generated based on initial prompts for easy identification
+- **Complete Logs**: Full conversation history, token usage, and timing data
+- **JSON Format**: Structured logs for easy parsing and analysis
+- **Automatic Cleanup**: Organized in `logs/` directory with timestamps
+- **Status Tracking**: Records session completion status (completed, cancelled, error)
+
+### Autonomous Mode
+
+Advanced autonomous operation with coach-player feedback:
+
+- **Requirements-Driven**: Reads `requirements.md` for project specifications
+- **Dual-Agent System**: Separate player (implementation) and coach (review) agents
+- **Iterative Improvement**: Multiple rounds of implementation and feedback
+- **Progress Tracking**: Detailed reporting of turns, token usage, and final status
+- **Workspace Management**: Automatic workspace setup and file organization

 ## Provider Comparison

-| Feature | OpenAI | Anthropic | Embedded |
-|---------|--------|-----------|----------|
+| Feature | Anthropic | Databricks | Embedded |
+|---------|-----------|------------|----------|
 | **Cost** | Pay per token | Pay per token | Free after download |
 | **Privacy** | Data sent to API | Data sent to API | Completely local |
 | **Performance** | Very fast | Very fast | Depends on hardware |
 | **Model Quality** | Excellent | Excellent | Good (varies by model) |
 | **Offline Support** | No | No | Yes |
-| **Setup Complexity** | API key only | API key only | Model download required |
+| **Setup Complexity** | API key only | OAuth or token | Model download required |
+| **Context Window** | 200k tokens | Varies by model | 4k-32k tokens |
+| **Tool Calling** | Native support | Native support | JSON fallback |
 | **Hardware Requirements** | None | None | 4-16GB RAM, optional GPU |

 ## Configuration Examples

-### Cloud-First Setup
+### Cloud-First Setup (Anthropic)
 ```toml
 [providers]
-default_provider = "openai"
+default_provider = "anthropic"

-[providers.openai]
-api_key = "sk-..."
-model = "gpt-4"
+[providers.anthropic]
+api_key = "sk-ant-..."
+model = "claude-3-5-sonnet-20241022"
+max_tokens = 8192
+temperature = 0.1
 ```

-### Privacy-First Setup  
+### Enterprise Setup (Databricks)
+```toml
+[providers]
+default_provider = "databricks"
+
+[providers.databricks]
+host = "https://your-workspace.cloud.databricks.com"
+model = "databricks-claude-sonnet-4"
+max_tokens = 32000
+temperature = 0.1
+use_oauth = true
+```
+
+### Privacy-First Setup (Local Models)
 ```toml
 [providers]
 default_provider = "embedded"

 [providers.embedded]
-model_path = "~/.cache/g3/models/codellama-7b-instruct.Q4_K_M.gguf"
-model_type = "codellama"
+model_path = "~/.cache/g3/models/qwen2.5-7b-instruct-q3_k_m.gguf"
+model_type = "qwen"
+context_length = 32768
+max_tokens = 2048
+temperature = 0.1
 gpu_layers = 32
+threads = 8
 ```

 ### Hybrid Setup
@@ -159,14 +275,76 @@ gpu_layers = 32
 [providers]
 default_provider = "embedded"

-# Use embedded for most tasks
+# Local model for most tasks
 [providers.embedded]
 model_path = "~/.cache/g3/models/codellama-7b-instruct.Q4_K_M.gguf"
 model_type = "codellama"
+context_length = 16384
 gpu_layers = 32

-# Fallback to cloud for complex tasks
-[providers.openai]
-api_key = "sk-..."
-model = "gpt-4"
+# Cloud fallback for complex tasks
+[providers.anthropic]
+api_key = "sk-ant-..."
+model = "claude-3-5-sonnet-20241022"
 ```
+
+## Usage Examples
+
+### Single-Shot Mode
+```bash
+g3 "implement a fibonacci function in Rust"
+```
+
+### Interactive Mode
+```bash
+g3
+g3> read the README and suggest improvements
+g3> implement the suggestions you made
+```
+
+### Autonomous Mode
+```bash
+g3 --autonomous --max-turns 10
+# Reads requirements.md and implements iteratively
+```
+
+### Retro TUI Mode
+```bash
+g3 --retro --theme dracula
+# Full-screen terminal interface
+```
+
+## Future Enhancements
+
+### Planned Features
+- **Plugin System**: Custom tool and provider plugins
+- **Web Interface**: Browser-based UI for remote access
+- **Model Quantization**: Optimized local model deployment
+- **Multi-Model Ensemble**: Combine multiple models for better results
+- **Advanced Sandboxing**: Enhanced security for code execution
+- **Collaborative Mode**: Multi-user sessions and shared workspaces
+
+### Technical Improvements
+- **Performance Optimization**: Faster streaming and tool execution
+- **Memory Management**: Better handling of large contexts and files
+- **Caching System**: Intelligent caching of model responses and computations
+- **Monitoring**: Built-in metrics and performance monitoring
+- **Testing**: Comprehensive test suite and CI/CD integration
+
+## Development Guidelines
+
+### Code Organization
+- **Modular Design**: Each crate has a single, well-defined responsibility
+- **Trait-Based**: Use traits for abstraction and testability
+- **Error Handling**: Comprehensive error types with context
+- **Documentation**: Inline docs and examples for all public APIs
+- **Testing**: Unit tests, integration tests, and property-based testing
+
+### Performance Considerations
+- **Async-First**: All I/O operations are asynchronous
+- **Streaming**: Real-time response processing where possible
+- **Memory Efficiency**: Careful memory management for large contexts
+- **Caching**: Strategic caching of expensive operations
+- **Profiling**: Regular performance profiling and optimization
+
+This design document reflects the current state of G3 as a mature, production-ready AI coding agent with sophisticated architecture and comprehensive feature set.