g3/DESIGN.md

# G3 General Purpose AI Agent - Design Document

## Overview
G3 is a **code-first AI agent** that helps you complete tasks by writing and executing code or scripts. Instead of just giving advice, G3 solves problems by generating executable code in the appropriate language.

## Core Principles
1. **Code-First Philosophy**: Always try to solve problems with executable code
2. **Multi-Language Support**: Generate scripts in Python, Bash, JavaScript, Rust, etc.
3. **Unix Philosophy**: Small, focused tools that do one thing well
4. **Modularity**: Clear separation of concerns
5. **Composability**: Components can be combined in different ways
6. **Performance**: Blazing fast execution

## Architecture

### High-Level Components

```
┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   CLI Module    │    │  Core Engine    │    │ LLM Providers   │
│                 │    │                 │    │                 │
│ - Task commands │◄──►│ - Task          │◄──►│ - OpenAI        │
│ - Interactive   │    │   interpretation│    │ - Anthropic     │
│   mode          │    │ - Code          │    │ - Embedded      │
│ - Code exec     │    │   generation    │    │   (llama.cpp)   │
│   approval      │    │ - Script        │    │ - Custom APIs   │
│                 │    │   execution     │    │                 │
└─────────────────┘    └─────────────────┘    └─────────────────┘
         │                       │                       │
         └───────────────────────┼───────────────────────┘
                                 │
                    ┌─────────────────┐
                    │   Execution     │
                    │   Engine        │
                    │                 │
                    │ - Python        │
                    │ - Bash/Shell    │
                    │ - JavaScript    │
                    │ - Rust          │
                    │ - Sandboxing    │
                    └─────────────────┘
```

### Module Breakdown

#### 1. CLI Module (`g3-cli`)
- **Responsibility**: User interface and task interpretation
- **New Features**:
  - Progress indicators for script execution

#### 2. Core Engine (`g3-core`)
- **Responsibility**: Task interpretation and code generation
- **New Features**:
  - Task analysis and decomposition
  - Language selection based on task type
  - Code generation with execution context
  - Script template system
  - Autonomous execution of generated code

#### 3. LLM Providers (`g3-providers`)
- **Responsibility**: LLM communication and model abstraction
- **Supported Providers**:
  - **OpenAI**: GPT-4, GPT-3.5-turbo via API
  - **Anthropic**: Claude models via API
  - **Embedded**: Local open-weights models via llama.cpp
- **Enhanced Prompts**:
  - Code-first system prompts
  - Language-specific generation instructions

#### 5. Embedded Provider (`g3-core/providers/embedded`) - NEW
- **Responsibility**: Local model inference using llama.cpp
- **Features**:
  - GGUF model support (Llama, CodeLlama, Mistral, etc.)
  - GPU acceleration via CUDA/Metal
  - Configurable context length and generation parameters
  - Async-compatible inference without blocking
  - Thread-safe model access
  - Stop sequence detection

#### 4. Execution Engine (`g3-execution`) - NEW
- **Responsibility**: Safe code execution
- **Features**:
  - Multi-language script execution
  - Sandboxing and security
  - Resource limits
  - Output capture and formatting
  - Error handling and recovery

### Task Types and Language Selection

| Task Type | Preferred Language | Use Cases |
|-----------|-------------------|-----------|
| Data Processing | Python | CSV/JSON analysis, data transformation |
| File Operations | Bash/Shell | File manipulation, backups, organization |
| System Admin | Bash/Shell | Process management, system monitoring |
| Text Processing | Python/Bash | Log analysis, text transformation |
| Database | Python/SQL | Data migration, queries, reporting |
| Image/Media | Python | Image processing, format conversion |
| Development | Rust | Code generation, project setup |

## Implementation Plan

### Phase 1: Core Refactoring ✅
1. ✅ Update CLI commands for task-oriented interface
2. ✅ Enhance system prompts for code-first approach
3. ✅ Add basic code execution capabilities
4. ✅ Update interactive mode messaging

### Phase 2: Enhanced Provider Support ✅
1. ✅ Implement embedded model provider using llama.cpp
2. ✅ Add GGUF model support for local inference
3. ✅ Configure GPU acceleration and performance optimization
4. ✅ Add comprehensive logging and debugging support

### Phase 3: Advanced Features (Future)
1. Model quantization and optimization
2. Multi-model ensemble support
3. Advanced code execution sandboxing
4. Plugin system for custom providers
5. Web interface for remote access

## Provider Comparison

| Feature | OpenAI | Anthropic | Embedded |
|---------|--------|-----------|----------|
| **Cost** | Pay per token | Pay per token | Free after download |
| **Privacy** | Data sent to API | Data sent to API | Completely local |
| **Performance** | Very fast | Very fast | Depends on hardware |
| **Model Quality** | Excellent | Excellent | Good (varies by model) |
| **Offline Support** | No | No | Yes |
| **Setup Complexity** | API key only | API key only | Model download required |
| **Hardware Requirements** | None | None | 4-16GB RAM, optional GPU |

## Configuration Examples

### Cloud-First Setup
```toml
[providers]
default_provider = "openai"

[providers.openai]
api_key = "sk-..."
model = "gpt-4"
```

### Privacy-First Setup
```toml
[providers]
default_provider = "embedded"

[providers.embedded]
model_path = "~/.cache/g3/models/codellama-7b-instruct.Q4_K_M.gguf"
model_type = "codellama"
gpu_layers = 32
```

### Hybrid Setup
```toml
[providers]
default_provider = "embedded"

# Use embedded for most tasks
[providers.embedded]
model_path = "~/.cache/g3/models/codellama-7b-instruct.Q4_K_M.gguf"
model_type = "codellama"
gpu_layers = 32

# Fallback to cloud for complex tasks
[providers.openai]
api_key = "sk-..."
model = "gpt-4"
```