alex/g3

Files

Dhanji Prasanna 1834b8946c embedded model support

2025-09-15 09:07:12 +10:00

6.5 KiB

Raw Blame History

G3 General Purpose AI Agent - Design Document

Overview

G3 is a code-first AI agent that helps you complete tasks by writing and executing code or scripts. Instead of just giving advice, G3 solves problems by generating executable code in the appropriate language.

Core Principles

Code-First Philosophy: Always try to solve problems with executable code
Multi-Language Support: Generate scripts in Python, Bash, JavaScript, Rust, etc.
Unix Philosophy: Small, focused tools that do one thing well
Modularity: Clear separation of concerns
Composability: Components can be combined in different ways
Performance: Blazing fast execution

Architecture

High-Level Components

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   CLI Module    │    │  Core Engine    │    │ LLM Providers   │
│                 │    │                 │    │                 │
│ - Task commands │◄──►│ - Task          │◄──►│ - OpenAI        │
│ - Interactive   │    │   interpretation│    │ - Anthropic     │
│   mode          │    │ - Code          │    │ - Embedded      │
│ - Code exec     │    │   generation    │    │   (llama.cpp)   │
│   approval      │    │ - Script        │    │ - Custom APIs   │
│                 │    │   execution     │    │                 │
└─────────────────┘    └─────────────────┘    └─────────────────┘
         │                       │                       │
         └───────────────────────┼───────────────────────┘
                                 │
                    ┌─────────────────┐
                    │   Execution     │
                    │   Engine        │
                    │                 │
                    │ - Python        │
                    │ - Bash/Shell    │
                    │ - JavaScript    │
                    │ - Rust          │
                    │ - Sandboxing    │
                    └─────────────────┘

Module Breakdown

1. CLI Module (`g3-cli`)

Responsibility: User interface and task interpretation
New Features:
- Progress indicators for script execution

2. Core Engine (`g3-core`)

Responsibility: Task interpretation and code generation
New Features:
- Task analysis and decomposition
- Language selection based on task type
- Code generation with execution context
- Script template system
- Autonomous execution of generated code

3. LLM Providers (`g3-providers`)

Responsibility: LLM communication and model abstraction
Supported Providers:
- OpenAI: GPT-4, GPT-3.5-turbo via API
- Anthropic: Claude models via API
- Embedded: Local open-weights models via llama.cpp
Enhanced Prompts:
- Code-first system prompts
- Language-specific generation instructions

5. Embedded Provider (`g3-core/providers/embedded`) - NEW

Responsibility: Local model inference using llama.cpp
Features:
- GGUF model support (Llama, CodeLlama, Mistral, etc.)
- GPU acceleration via CUDA/Metal
- Configurable context length and generation parameters
- Async-compatible inference without blocking
- Thread-safe model access
- Stop sequence detection

4. Execution Engine (`g3-execution`) - NEW

Responsibility: Safe code execution
Features:
- Multi-language script execution
- Sandboxing and security
- Resource limits
- Output capture and formatting
- Error handling and recovery

Task Types and Language Selection

Task Type	Preferred Language	Use Cases
Data Processing	Python	CSV/JSON analysis, data transformation
File Operations	Bash/Shell	File manipulation, backups, organization
System Admin	Bash/Shell	Process management, system monitoring
Text Processing	Python/Bash	Log analysis, text transformation
Database	Python/SQL	Data migration, queries, reporting
Image/Media	Python	Image processing, format conversion
Development	Rust	Code generation, project setup

Implementation Plan

Phase 1: Core Refactoring ✅

✅ Update CLI commands for task-oriented interface
✅ Enhance system prompts for code-first approach
✅ Add basic code execution capabilities
✅ Update interactive mode messaging

Phase 2: Enhanced Provider Support ✅

✅ Implement embedded model provider using llama.cpp
✅ Add GGUF model support for local inference
✅ Configure GPU acceleration and performance optimization
✅ Add comprehensive logging and debugging support

Phase 3: Advanced Features (Future)

Model quantization and optimization
Multi-model ensemble support
Advanced code execution sandboxing
Plugin system for custom providers
Web interface for remote access

Provider Comparison

Feature	OpenAI	Anthropic	Embedded
Cost	Pay per token	Pay per token	Free after download
Privacy	Data sent to API	Data sent to API	Completely local
Performance	Very fast	Very fast	Depends on hardware
Model Quality	Excellent	Excellent	Good (varies by model)
Offline Support	No	No	Yes
Setup Complexity	API key only	API key only	Model download required
Hardware Requirements	None	None	4-16GB RAM, optional GPU

Configuration Examples

Cloud-First Setup

[providers]
default_provider = "openai"

[providers.openai]
api_key = "sk-..."
model = "gpt-4"

Privacy-First Setup

[providers]
default_provider = "embedded"

[providers.embedded]
model_path = "~/.cache/g3/models/codellama-7b-instruct.Q4_K_M.gguf"
model_type = "codellama"
gpu_layers = 32

Hybrid Setup

[providers]
default_provider = "embedded"

# Use embedded for most tasks
[providers.embedded]
model_path = "~/.cache/g3/models/codellama-7b-instruct.Q4_K_M.gguf"
model_type = "codellama"
gpu_layers = 32

# Fallback to cloud for complex tasks
[providers.openai]
api_key = "sk-..."
model = "gpt-4"

6.5 KiB Raw Blame History