alex/g3

Files

Dhanji Prasanna 5110da0c61 design doc

2025-10-14 12:33:36 +11:00

14 KiB

Raw Blame History

G3 General Purpose AI Agent - Design Document

Overview

G3 is a modular, composable AI coding agent built in Rust that helps you complete tasks by writing and executing code. It provides a flexible architecture for interacting with various Large Language Model (LLM) providers while offering powerful code generation, file manipulation, and task automation capabilities.

The agent follows a tool-first philosophy: instead of just providing advice, G3 actively uses tools to read files, write code, execute commands, and complete tasks autonomously.

Core Principles

Tool-First Philosophy: Solve problems by actively using tools rather than just describing solutions
Modular Architecture: Clear separation of concerns across multiple Rust crates
Provider Flexibility: Support multiple LLM providers through a unified interface
Modularity: Clear separation of concerns
Composability: Components can be combined in different ways
Performance: Built in Rust for speed and reliability
Context Intelligence: Smart context window management with auto-summarization
Error Resilience: Robust error handling with automatic retry logic

Project Structure

G3 is organized as a Rust workspace with the following crates:

g3/
├── src/main.rs                    # Main entry point
├── crates/
│   ├── g3-cli/                   # Command-line interface and TUI
│   ├── g3-core/                  # Core agent engine and logic
│   ├── g3-providers/             # LLM provider abstractions
│   ├── g3-config/                # Configuration management
│   └── g3-execution/             # Code execution engine
├── logs/                         # Session logs (auto-created)
├── README.md                     # Project documentation
└── DESIGN.md                     # This design document

Architecture Overview

High-Level Architecture

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   g3-cli        │    │   g3-core       │    │ g3-providers    │
│                 │    │                 │    │                 │
│ • CLI parsing   │◄──►│ • Agent engine  │◄──►│ • Anthropic     │
│ • Interactive   │    │ • Context mgmt  │    │ • Databricks    │
│ • Retro TUI     │    │ • Tool system   │    │ • Embedded      │
│ • Autonomous    │    │ • Streaming     │    │   (llama.cpp)   │
│   mode          │    │ • Task exec     │    │ • OAuth flow    │
└─────────────────┘    └─────────────────┘    └─────────────────┘
         │                       │                       │
         └───────────────────────┼───────────────────────┘
                                 │
                    ┌─────────────────┐    ┌─────────────────┐
                    │ g3-execution    │    │   g3-config     │
                    │                 │    │                 │
                    │ • Code exec     │    │ • TOML config   │
                    │ • Shell cmds    │    │ • Env overrides │
                    │ • Streaming     │    │ • Provider      │
                    │ • Error hdlg    │    │   settings      │
                    └─────────────────┘    └─────────────────┘

Core Components

1. g3-core: Agent Engine

Primary Responsibilities:

Main orchestration logic for handling conversations and task execution
Context window management with intelligent token tracking
Built-in tool system for file operations and command execution
Streaming response parsing with real-time tool call detection
Error handling with automatic retry logic

Key Features:

Context Window Intelligence: Automatic monitoring with percentage-based tracking (~80% capacity triggers auto-summarization)
Tool System: Built-in tools for file operations (read, write, edit), shell commands, and structured output
Streaming Parser: Real-time parsing of LLM responses with tool call detection and execution
Session Management: Automatic session logging with detailed conversation history and token usage
Error Recovery: Sophisticated error classification and retry logic for recoverable errors

Available Tools:

shell: Execute shell commands with streaming output
read_file: Read file contents with optional character range support
write_file: Create or overwrite files with content
str_replace: Apply unified diffs to files with precise editing
final_output: Signal task completion with detailed summaries

2. g3-providers: LLM Provider Abstraction

Primary Responsibilities:

Unified interface for multiple LLM providers
Provider-specific optimizations and feature support
OAuth authentication flows
Streaming and non-streaming completion support

Supported Providers:

Anthropic: Claude models via API with native tool calling support
Databricks: Foundation Model APIs with OAuth and token-based authentication
Embedded: Local models via llama.cpp with GPU acceleration (Metal/CUDA)
Provider Registry: Dynamic provider management and hot-swapping

Key Features:

Native Tool Calling: Full support for structured tool calls where available
Fallback Parsing: JSON tool call parsing for providers without native support
OAuth Integration: Built-in OAuth flow for secure provider authentication
Context-Aware: Provider-specific context length and token limit handling
Streaming Support: Real-time response streaming with tool call detection

3. g3-cli: Command-Line Interface

Primary Responsibilities:

Command-line argument parsing and validation
Interactive terminal interface with history support
Retro-style terminal UI (80s sci-fi inspired)
Autonomous mode with coach-player feedback loops
Session management and workspace handling

Execution Modes:

Single-shot: Execute one task and exit
Interactive: REPL-style conversation with the agent
Autonomous: Coach-player feedback loop for complex projects
Retro TUI: Full-screen terminal interface with real-time updates

Key Features:

Multi-line Input: Support for complex, multi-line prompts with backslash continuation
Context Progress: Real-time display of token usage and context window status
Error Recovery: Automatic retry logic for timeout and recoverable errors
History Management: Persistent command history across sessions
Theme Support: Customizable color themes for retro mode
Cancellation: Ctrl+C support for graceful operation cancellation

4. g3-execution: Code Execution Engine

Primary Responsibilities:

Safe execution of shell commands and scripts
Streaming output capture and display
Multi-language code execution support
Error handling and result formatting

Supported Languages:

Bash/Shell: Direct command execution with streaming output
Python: Script execution via temporary files
JavaScript: Node.js-based execution
Extensible: Framework for adding additional language support

Key Features:

Streaming Output: Real-time command output display
Error Capture: Comprehensive stderr and stdout handling
Exit Code Tracking: Proper success/failure detection
Async Execution: Non-blocking command execution
Output Formatting: Clean, user-friendly result presentation

5. g3-config: Configuration Management

Primary Responsibilities:

TOML-based configuration file management
Environment variable overrides
Provider-specific settings and credentials
CLI argument integration

Configuration Hierarchy:

Default configuration (embedded in code)
Configuration files (~/.config/g3/config.toml, ./g3.toml)
Environment variables (G3_*)
CLI arguments (highest priority)

Key Features:

Auto-generation: Creates default configuration files if none exist
Provider Overrides: Runtime provider and model selection
Validation: Configuration validation with helpful error messages
Flexible Paths: Support for shell expansion (~, environment variables)

Advanced Features

Context Window Management

G3 implements sophisticated context window management:

Automatic Monitoring: Tracks token usage with percentage-based thresholds
Smart Summarization: Auto-triggers at 80% capacity to prevent context overflow
Conversation Preservation: Maintains conversation continuity through intelligent summaries
Provider-Specific Limits: Adapts to different model context windows (4k to 200k+ tokens)
Cumulative Tracking: Monitors total token usage across entire sessions

Error Handling & Resilience

Comprehensive error handling system:

Error Classification: Distinguishes between recoverable and non-recoverable errors
Automatic Retry: Exponential backoff with jitter for rate limits, timeouts, and server errors
Detailed Logging: Comprehensive error context including stack traces and session data
Error Persistence: Saves detailed error logs to logs/errors/ for analysis
Graceful Degradation: Continues operation when possible, fails gracefully when not

Session Management

Automatic session tracking and logging:

Session IDs: Generated based on initial prompts for easy identification
Complete Logs: Full conversation history, token usage, and timing data
JSON Format: Structured logs for easy parsing and analysis
Automatic Cleanup: Organized in logs/ directory with timestamps
Status Tracking: Records session completion status (completed, cancelled, error)

Autonomous Mode

Advanced autonomous operation with coach-player feedback:

Requirements-Driven: Reads requirements.md for project specifications
Dual-Agent System: Separate player (implementation) and coach (review) agents
Iterative Improvement: Multiple rounds of implementation and feedback
Progress Tracking: Detailed reporting of turns, token usage, and final status
Workspace Management: Automatic workspace setup and file organization

Provider Comparison

Feature	Anthropic	Databricks	Embedded
Cost	Pay per token	Pay per token	Free after download
Privacy	Data sent to API	Data sent to API	Completely local
Performance	Very fast	Very fast	Depends on hardware
Model Quality	Excellent	Excellent	Good (varies by model)
Offline Support	No	No	Yes
Setup Complexity	API key only	OAuth or token	Model download required
Context Window	200k tokens	Varies by model	4k-32k tokens
Tool Calling	Native support	Native support	JSON fallback
Hardware Requirements	None	None	4-16GB RAM, optional GPU

Configuration Examples

Cloud-First Setup (Anthropic)

[providers]
default_provider = "anthropic"

[providers.anthropic]
api_key = "sk-ant-..."
model = "claude-3-5-sonnet-20241022"
max_tokens = 8192
temperature = 0.1

Enterprise Setup (Databricks)

[providers]
default_provider = "databricks"

[providers.databricks]
host = "https://your-workspace.cloud.databricks.com"
model = "databricks-claude-sonnet-4"
max_tokens = 32000
temperature = 0.1
use_oauth = true

Privacy-First Setup (Local Models)

[providers]
default_provider = "embedded"

[providers.embedded]
model_path = "~/.cache/g3/models/qwen2.5-7b-instruct-q3_k_m.gguf"
model_type = "qwen"
context_length = 32768
max_tokens = 2048
temperature = 0.1
gpu_layers = 32
threads = 8

Hybrid Setup

[providers]
default_provider = "embedded"

# Local model for most tasks
[providers.embedded]
model_path = "~/.cache/g3/models/codellama-7b-instruct.Q4_K_M.gguf"
model_type = "codellama"
context_length = 16384
gpu_layers = 32

# Cloud fallback for complex tasks
[providers.anthropic]
api_key = "sk-ant-..."
model = "claude-3-5-sonnet-20241022"

Usage Examples

Single-Shot Mode

g3 "implement a fibonacci function in Rust"

Interactive Mode

g3
g3> read the README and suggest improvements
g3> implement the suggestions you made

Autonomous Mode

g3 --autonomous --max-turns 10
# Reads requirements.md and implements iteratively

Retro TUI Mode

g3 --retro --theme dracula
# Full-screen terminal interface

Future Enhancements

Planned Features

Plugin System: Custom tool and provider plugins
Web Interface: Browser-based UI for remote access
Model Quantization: Optimized local model deployment
Multi-Model Ensemble: Combine multiple models for better results
Advanced Sandboxing: Enhanced security for code execution
Collaborative Mode: Multi-user sessions and shared workspaces

Technical Improvements

Performance Optimization: Faster streaming and tool execution
Memory Management: Better handling of large contexts and files
Caching System: Intelligent caching of model responses and computations
Monitoring: Built-in metrics and performance monitoring
Testing: Comprehensive test suite and CI/CD integration

Development Guidelines

Code Organization

Modular Design: Each crate has a single, well-defined responsibility
Trait-Based: Use traits for abstraction and testability
Error Handling: Comprehensive error types with context
Documentation: Inline docs and examples for all public APIs
Testing: Unit tests, integration tests, and property-based testing

Performance Considerations

Async-First: All I/O operations are asynchronous
Streaming: Real-time response processing where possible
Memory Efficiency: Careful memory management for large contexts
Caching: Strategic caching of expensive operations
Profiling: Regular performance profiling and optimization

This design document reflects the current state of G3 as a mature, production-ready AI coding agent with sophisticated architecture and comprehensive feature set.

14 KiB Raw Blame History

G3 General Purpose AI Agent - Design Document

Overview

Core Principles

Project Structure

Architecture Overview

High-Level Architecture

Core Components

1. g3-core: Agent Engine

2. g3-providers: LLM Provider Abstraction

3. g3-cli: Command-Line Interface

4. g3-execution: Code Execution Engine

5. g3-config: Configuration Management

Advanced Features

Context Window Management

Error Handling & Resilience

Session Management

Autonomous Mode

Provider Comparison

Configuration Examples

Cloud-First Setup (Anthropic)

Enterprise Setup (Databricks)

Privacy-First Setup (Local Models)

Hybrid Setup

Usage Examples

Single-Shot Mode

Interactive Mode

Autonomous Mode

Retro TUI Mode

Future Enhancements

Planned Features

Technical Improvements

Development Guidelines

Code Organization

Performance Considerations

14 KiB

Raw Blame History