Jochen 010a43d203 coach/player provider split + add OpenAI
Allows coach and player LLM providers to be separately specified.
Also adds OpenAI provider
2025-10-21 16:59:13 +11:00
2025-09-15 09:07:09 +10:00
2025-09-30 22:29:49 +10:00
2025-10-18 14:16:50 +11:00
2025-10-06 14:23:17 +11:00

G3 - AI Coding Agent

G3 is a coding AI agent designed to help you complete tasks by writing code and executing commands. Built in Rust, it provides a flexible architecture for interacting with various Large Language Model (LLM) providers while offering powerful code generation and task automation capabilities.

Architecture Overview

G3 follows a modular architecture organized as a Rust workspace with multiple crates, each responsible for specific functionality:

Core Components

g3-core

The heart of the agent system, containing:

  • Agent Engine: Main orchestration logic for handling conversations, tool execution, and task management
  • Context Window Management: Intelligent tracking of token usage with context thinning (50-80%) and auto-summarization at 80% capacity
  • Tool System: Built-in tools for file operations, shell commands, computer control, TODO management, and structured output
  • Streaming Response Parser: Real-time parsing of LLM responses with tool call detection and execution
  • Task Execution: Support for single and iterative task execution with automatic retry logic

g3-providers

Abstraction layer for LLM providers:

  • Provider Interface: Common trait-based API for different LLM backends
  • Multiple Provider Support:
    • Anthropic (Claude models)
    • Databricks (DBRX and other models)
    • Local/embedded models via llama.cpp with Metal acceleration on macOS
  • OAuth Authentication: Built-in OAuth flow support for secure provider authentication
  • Provider Registry: Dynamic provider management and selection

g3-config

Configuration management system:

  • Environment-based configuration
  • Provider credentials and settings
  • Model selection and parameters
  • Runtime configuration options

g3-execution

Task execution framework:

  • Task planning and decomposition
  • Execution strategies (sequential, parallel)
  • Error handling and retry mechanisms
  • Progress tracking and reporting

g3-computer-control

Computer control capabilities:

  • Mouse and keyboard automation
  • UI element inspection and interaction
  • Screenshot capture and window management
  • OCR text extraction via Tesseract

g3-cli

Command-line interface:

  • Interactive terminal interface
  • Task submission and monitoring
  • Configuration management commands
  • Session management

Error Handling & Resilience

G3 includes robust error handling with automatic retry logic:

  • Recoverable Error Detection: Automatically identifies recoverable errors (rate limits, network issues, server errors, timeouts)
  • Exponential Backoff with Jitter: Implements intelligent retry delays to avoid overwhelming services
  • Detailed Error Logging: Captures comprehensive error context including stack traces, request/response data, and session information
  • Error Persistence: Saves detailed error logs to logs/errors/ for post-mortem analysis
  • Graceful Degradation: Non-recoverable errors are logged with full context before terminating

Key Features

Intelligent Context Management

  • Automatic context window monitoring with percentage-based tracking
  • Smart auto-summarization when approaching token limits
  • Context thinning at 50%, 60%, 70%, 80% thresholds - automatically replaces large tool results with file references
  • Conversation history preservation through summaries
  • Dynamic token allocation for different providers (4k to 200k+ tokens)

Tool Ecosystem

  • File Operations: Read, write, and edit files with line-range precision
  • Shell Integration: Execute system commands with output capture
  • Code Generation: Structured code generation with syntax awareness
  • TODO Management: Read and write TODO lists with markdown checkbox format
  • Computer Control (Experimental): Automate desktop applications
    • Mouse and keyboard control
    • UI element inspection
    • Screenshot capture and window management
    • OCR text extraction from images and screen regions
    • Window listing and identification
  • Final Output: Formatted result presentation

Provider Flexibility

  • Support for multiple LLM providers through a unified interface
  • Hot-swappable providers without code changes
  • Provider-specific optimizations and feature support
  • Local model support for offline operation

Task Automation

  • Single-shot task execution for quick operations
  • Iterative task mode for complex, multi-step workflows
  • Automatic error recovery and retry logic
  • Progress tracking and intermediate result handling

Language & Technology Stack

  • Language: Rust (2021 edition)
  • Async Runtime: Tokio for concurrent operations
  • HTTP Client: Reqwest for API communications
  • Serialization: Serde for JSON handling
  • CLI Framework: Clap for command-line parsing
  • Logging: Tracing for structured logging
  • Local Models: llama.cpp with Metal acceleration support

Use Cases

G3 is designed for:

  • Automated code generation and refactoring
  • File manipulation and project scaffolding
  • System administration tasks
  • Data processing and transformation
  • API integration and testing
  • Documentation generation
  • Complex multi-step workflows
  • Desktop application automation and testing

Getting Started

# Build the project
cargo build --release

# Run G3
cargo run

# Execute a task
g3 "implement a function to calculate fibonacci numbers"

WebDriver Browser Automation

G3 includes WebDriver support for browser automation tasks using Safari.

One-Time Setup (macOS only):

Safari Remote Automation must be enabled before using WebDriver tools. Run this once:

# Option 1: Use the provided script
./scripts/enable-safari-automation.sh

# Option 2: Enable manually
safaridriver --enable  # Requires password

# Option 3: Enable via Safari UI
# Safari → Preferences → Advanced → Show Develop menu
# Then: Develop → Allow Remote Automation

For detailed setup instructions and troubleshooting, see WebDriver Setup Guide.

Usage: Run G3 with the --webdriver flag to enable browser automation tools.

Computer Control (Experimental)

G3 can interact with your computer's GUI for automation tasks:

Available Tools: mouse_click, type_text, find_element, take_screenshot, extract_text, find_text_on_screen, list_windows

Setup: Enable in config with computer_control.enabled = true and grant OS accessibility permissions:

  • macOS: System Preferences → Security & Privacy → Accessibility
  • Linux: Ensure X11 or Wayland access
  • Windows: Run as administrator (first time only)

Session Logs

G3 automatically saves session logs for each interaction in the logs/ directory. These logs contain:

  • Complete conversation history
  • Token usage statistics
  • Timestamps and session status

The logs/ directory is created automatically on first use and is excluded from version control.

License

MIT License - see LICENSE file for details

Contributing

G3 is an open-source project. Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

Description
experiments in goose
Readme 139 MiB
Languages
Rust 99.8%
Shell 0.2%