docs: update README and DESIGN to reflect current project state
- Add g3-computer-control crate to architecture documentation - Document all 13 tools including computer control and TODO management - Add context thinning feature documentation (50-80% thresholds) - Update tool ecosystem section with complete tool list - Remove broken link to non-existent COMPUTER_CONTROL.md - Update workspace count from 5 to 6 crates - Add platform-specific implementation details for computer control - Document OCR support via Tesseract - Clarify setup instructions for computer control features
This commit is contained in:
62
DESIGN.md
62
DESIGN.md
@@ -29,7 +29,8 @@ g3/
|
|||||||
│ ├── g3-core/ # Core agent engine, tools, and streaming logic
|
│ ├── g3-core/ # Core agent engine, tools, and streaming logic
|
||||||
│ ├── g3-providers/ # LLM provider abstractions and implementations
|
│ ├── g3-providers/ # LLM provider abstractions and implementations
|
||||||
│ ├── g3-config/ # Configuration management
|
│ ├── g3-config/ # Configuration management
|
||||||
│ └── g3-execution/ # Code execution engine
|
│ ├── g3-execution/ # Code execution engine
|
||||||
|
│ └── g3-computer-control/ # Computer control and automation
|
||||||
├── logs/ # Session logs (auto-created)
|
├── logs/ # Session logs (auto-created)
|
||||||
├── README.md # Project documentation
|
├── README.md # Project documentation
|
||||||
└── DESIGN.md # This design document
|
└── DESIGN.md # This design document
|
||||||
@@ -48,6 +49,7 @@ g3/
|
|||||||
│ • Retro TUI │ │ • Tool system │ │ • Embedded │
|
│ • Retro TUI │ │ • Tool system │ │ • Embedded │
|
||||||
│ • Autonomous │ │ • Streaming │ │ (llama.cpp) │
|
│ • Autonomous │ │ • Streaming │ │ (llama.cpp) │
|
||||||
│ mode │ │ • Task exec │ │ • OAuth flow │
|
│ mode │ │ • Task exec │ │ • OAuth flow │
|
||||||
|
│ │ │ • TODO mgmt │ │ │
|
||||||
└─────────────────┘ └─────────────────┘ └─────────────────┘
|
└─────────────────┘ └─────────────────┘ └─────────────────┘
|
||||||
│ │ │
|
│ │ │
|
||||||
└───────────────────────┼───────────────────────┘
|
└───────────────────────┼───────────────────────┘
|
||||||
@@ -59,7 +61,18 @@ g3/
|
|||||||
│ • Shell cmds │ │ • Env overrides │
|
│ • Shell cmds │ │ • Env overrides │
|
||||||
│ • Streaming │ │ • Provider │
|
│ • Streaming │ │ • Provider │
|
||||||
│ • Error hdlg │ │ settings │
|
│ • Error hdlg │ │ settings │
|
||||||
└─────────────────┘ └─────────────────┘
|
└─────────────────┘ │ • Computer │
|
||||||
|
│ │ control cfg │
|
||||||
|
│ └─────────────────┘
|
||||||
|
│ │
|
||||||
|
┌─────────────────┐ │
|
||||||
|
│ g3-computer- │◄────────────┘
|
||||||
|
│ control │
|
||||||
|
│ • Mouse/kbd │
|
||||||
|
│ • Screenshots │
|
||||||
|
│ • OCR/Tesseract │
|
||||||
|
│ • Windows/UI │
|
||||||
|
└─────────────────┘
|
||||||
```
|
```
|
||||||
|
|
||||||
## Core Components
|
## Core Components
|
||||||
@@ -79,6 +92,7 @@ g3/
|
|||||||
- **Streaming Parser**: Real-time parsing of LLM responses with tool call detection and execution
|
- **Streaming Parser**: Real-time parsing of LLM responses with tool call detection and execution
|
||||||
- **Session Management**: Automatic session logging with detailed conversation history and token usage
|
- **Session Management**: Automatic session logging with detailed conversation history and token usage
|
||||||
- **Error Recovery**: Sophisticated error classification and retry logic for recoverable errors
|
- **Error Recovery**: Sophisticated error classification and retry logic for recoverable errors
|
||||||
|
- **TODO Management**: In-memory TODO list with read/write tools for task tracking
|
||||||
|
|
||||||
**Available Tools:**
|
**Available Tools:**
|
||||||
- `shell`: Execute shell commands with streaming output
|
- `shell`: Execute shell commands with streaming output
|
||||||
@@ -86,7 +100,15 @@ g3/
|
|||||||
- `write_file`: Create or overwrite files with content
|
- `write_file`: Create or overwrite files with content
|
||||||
- `str_replace`: Apply unified diffs to files with precise editing
|
- `str_replace`: Apply unified diffs to files with precise editing
|
||||||
- `final_output`: Signal task completion with detailed summaries
|
- `final_output`: Signal task completion with detailed summaries
|
||||||
- **Project Management**: Workspace handling, requirements.md processing for autonomous mode
|
- `todo_read`: Read the entire TODO list content
|
||||||
|
- `todo_write`: Write or overwrite the entire TODO list
|
||||||
|
- `mouse_click`: Click the mouse at specific coordinates
|
||||||
|
- `type_text`: Type text at the current cursor position
|
||||||
|
- `find_element`: Find UI elements by text, role, or attributes
|
||||||
|
- `take_screenshot`: Capture screenshots of screen, region, or window
|
||||||
|
- `extract_text`: Extract text from images or screen regions using OCR
|
||||||
|
- `find_text_on_screen`: Find text visually on screen and return coordinates
|
||||||
|
- `list_windows`: List all open windows with IDs and titles
|
||||||
|
|
||||||
### 2. g3-providers: LLM Provider Abstraction
|
### 2. g3-providers: LLM Provider Abstraction
|
||||||
|
|
||||||
@@ -172,6 +194,26 @@ g3/
|
|||||||
- **Validation**: Configuration validation with helpful error messages
|
- **Validation**: Configuration validation with helpful error messages
|
||||||
- **Flexible Paths**: Support for shell expansion (`~`, environment variables)
|
- **Flexible Paths**: Support for shell expansion (`~`, environment variables)
|
||||||
|
|
||||||
|
### 6. g3-computer-control: Computer Control & Automation
|
||||||
|
|
||||||
|
**Primary Responsibilities:**
|
||||||
|
- Cross-platform computer control and automation
|
||||||
|
- Mouse and keyboard input simulation
|
||||||
|
- Window management and screenshot capture
|
||||||
|
- OCR text extraction from images and screen regions
|
||||||
|
|
||||||
|
**Platform Support:**
|
||||||
|
- **macOS**: Core Graphics, Cocoa, screencapture integration
|
||||||
|
- **Linux**: X11/Xtest for input, X11 for window management
|
||||||
|
- **Windows**: Win32 APIs for input and window control
|
||||||
|
|
||||||
|
**Key Features:**
|
||||||
|
- **OCR Integration**: Tesseract-based text extraction from images
|
||||||
|
- **Window Management**: List, identify, and capture specific application windows
|
||||||
|
- **UI Automation**: Find elements, simulate clicks, type text
|
||||||
|
- **Screenshot Capture**: Full screen, regions, or specific windows
|
||||||
|
- **Accessibility**: Requires OS-level permissions for automation
|
||||||
|
|
||||||
## Advanced Features
|
## Advanced Features
|
||||||
|
|
||||||
### Context Window Management
|
### Context Window Management
|
||||||
@@ -180,6 +222,7 @@ G3 implements sophisticated context window management:
|
|||||||
|
|
||||||
- **Automatic Monitoring**: Tracks token usage with percentage-based thresholds
|
- **Automatic Monitoring**: Tracks token usage with percentage-based thresholds
|
||||||
- **Smart Summarization**: Auto-triggers at 80% capacity to prevent context overflow
|
- **Smart Summarization**: Auto-triggers at 80% capacity to prevent context overflow
|
||||||
|
- **Context Thinning**: Progressive thinning at 50%, 60%, 70%, 80% thresholds - replaces large tool results with file references
|
||||||
- **Conversation Preservation**: Maintains conversation continuity through intelligent summaries
|
- **Conversation Preservation**: Maintains conversation continuity through intelligent summaries
|
||||||
- **Provider-Specific Limits**: Adapts to different model context windows (4k to 200k+ tokens)
|
- **Provider-Specific Limits**: Adapts to different model context windows (4k to 200k+ tokens)
|
||||||
- **Cumulative Tracking**: Monitors total token usage across entire sessions
|
- **Cumulative Tracking**: Monitors total token usage across entire sessions
|
||||||
@@ -354,20 +397,23 @@ This design document reflects the current state of G3 as a mature, production-re
|
|||||||
### Fully Implemented
|
### Fully Implemented
|
||||||
- ✅ **Core Agent Engine**: Complete with streaming, tool execution, and context management
|
- ✅ **Core Agent Engine**: Complete with streaming, tool execution, and context management
|
||||||
- ✅ **Provider System**: Anthropic, Databricks, and Embedded providers with OAuth support
|
- ✅ **Provider System**: Anthropic, Databricks, and Embedded providers with OAuth support
|
||||||
- ✅ **Tool System**: All 5 core tools (shell, read_file, write_file, str_replace, final_output)
|
- ✅ **Tool System**: 13 tools including file ops, shell, TODO management, and computer control
|
||||||
- ✅ **CLI Interface**: Interactive mode, single-shot mode, retro TUI
|
- ✅ **CLI Interface**: Interactive mode, single-shot mode, retro TUI
|
||||||
- ✅ **Autonomous Mode**: Coach-player feedback loop with requirements.md processing
|
- ✅ **Autonomous Mode**: Coach-player feedback loop with requirements.md processing
|
||||||
- ✅ **Configuration**: TOML-based config with environment overrides
|
- ✅ **Configuration**: TOML-based config with environment overrides
|
||||||
- ✅ **Error Handling**: Comprehensive retry logic and error classification
|
- ✅ **Error Handling**: Comprehensive retry logic and error classification
|
||||||
- ✅ **Session Logging**: Automatic session tracking and JSON logs
|
- ✅ **Session Logging**: Automatic session tracking and JSON logs
|
||||||
- ✅ **Context Management**: Auto-summarization at 80% capacity
|
- ✅ **Context Management**: Context thinning (50-80%) and auto-summarization at 80% capacity
|
||||||
|
- ✅ **Computer Control**: Cross-platform automation with OCR support
|
||||||
|
- ✅ **TODO Management**: In-memory TODO list with read/write tools
|
||||||
|
|
||||||
### Architecture Highlights
|
### Architecture Highlights
|
||||||
- **Workspace**: 5 crates with clear separation of concerns
|
- **Workspace**: 6 crates with clear separation of concerns
|
||||||
- **Dependencies**: Modern Rust ecosystem (Tokio, Clap, Serde, etc.)
|
- **Dependencies**: Modern Rust ecosystem (Tokio, Clap, Serde, etc.)
|
||||||
- **Streaming**: Real-time response processing with tool call detection
|
- **Streaming**: Real-time response processing with tool call detection
|
||||||
- **Cross-Platform**: Works on macOS, Linux, and Windows
|
- **Cross-Platform**: Works on macOS, Linux, and Windows
|
||||||
- **GPU Support**: Metal acceleration for local models on macOS
|
- **GPU Support**: Metal acceleration for local models on macOS, CUDA on Linux
|
||||||
|
- **OCR Support**: Tesseract integration for text extraction from images
|
||||||
|
|
||||||
### Key Files
|
### Key Files
|
||||||
- `src/main.rs`: main entry point delegating to g3-cli
|
- `src/main.rs`: main entry point delegating to g3-cli
|
||||||
@@ -376,3 +422,5 @@ This design document reflects the current state of G3 as a mature, production-re
|
|||||||
- `crates/g3-providers/src/lib.rs`: provider trait and registry
|
- `crates/g3-providers/src/lib.rs`: provider trait and registry
|
||||||
- `crates/g3-config/src/lib.rs`: configuration management
|
- `crates/g3-config/src/lib.rs`: configuration management
|
||||||
- `crates/g3-execution/src/lib.rs`: code execution engine
|
- `crates/g3-execution/src/lib.rs`: code execution engine
|
||||||
|
- `crates/g3-computer-control/src/lib.rs`: computer control and automation
|
||||||
|
- `crates/g3-computer-control/src/platform/`: platform-specific implementations
|
||||||
|
|||||||
42
README.md
42
README.md
@@ -11,8 +11,8 @@ G3 follows a modular architecture organized as a Rust workspace with multiple cr
|
|||||||
#### **g3-core**
|
#### **g3-core**
|
||||||
The heart of the agent system, containing:
|
The heart of the agent system, containing:
|
||||||
- **Agent Engine**: Main orchestration logic for handling conversations, tool execution, and task management
|
- **Agent Engine**: Main orchestration logic for handling conversations, tool execution, and task management
|
||||||
- **Context Window Management**: Intelligent tracking of token usage with auto-summarization capabilities when approaching context limits (~80% capacity)
|
- **Context Window Management**: Intelligent tracking of token usage with context thinning (50-80%) and auto-summarization at 80% capacity
|
||||||
- **Tool System**: Built-in tools for file operations (read, write, edit), shell command execution, and structured output generation
|
- **Tool System**: Built-in tools for file operations, shell commands, computer control, TODO management, and structured output
|
||||||
- **Streaming Response Parser**: Real-time parsing of LLM responses with tool call detection and execution
|
- **Streaming Response Parser**: Real-time parsing of LLM responses with tool call detection and execution
|
||||||
- **Task Execution**: Support for single and iterative task execution with automatic retry logic
|
- **Task Execution**: Support for single and iterative task execution with automatic retry logic
|
||||||
|
|
||||||
@@ -44,8 +44,8 @@ Task execution framework:
|
|||||||
Computer control capabilities:
|
Computer control capabilities:
|
||||||
- Mouse and keyboard automation
|
- Mouse and keyboard automation
|
||||||
- UI element inspection and interaction
|
- UI element inspection and interaction
|
||||||
- Screenshot capture
|
- Screenshot capture and window management
|
||||||
- OCR text extraction
|
- OCR text extraction via Tesseract
|
||||||
|
|
||||||
#### **g3-cli**
|
#### **g3-cli**
|
||||||
Command-line interface:
|
Command-line interface:
|
||||||
@@ -68,19 +68,21 @@ G3 includes robust error handling with automatic retry logic:
|
|||||||
### Intelligent Context Management
|
### Intelligent Context Management
|
||||||
- Automatic context window monitoring with percentage-based tracking
|
- Automatic context window monitoring with percentage-based tracking
|
||||||
- Smart auto-summarization when approaching token limits
|
- Smart auto-summarization when approaching token limits
|
||||||
|
- **Context thinning** at 50%, 60%, 70%, 80% thresholds - automatically replaces large tool results with file references
|
||||||
- Conversation history preservation through summaries
|
- Conversation history preservation through summaries
|
||||||
- Dynamic token allocation for different providers
|
- Dynamic token allocation for different providers (4k to 200k+ tokens)
|
||||||
|
|
||||||
### Tool Ecosystem
|
### Tool Ecosystem
|
||||||
- **File Operations**: Read, write, and edit files with line-range precision
|
- **File Operations**: Read, write, and edit files with line-range precision
|
||||||
- **Shell Integration**: Execute system commands with output capture
|
- **Shell Integration**: Execute system commands with output capture
|
||||||
- **Code Generation**: Structured code generation with syntax awareness
|
- **Code Generation**: Structured code generation with syntax awareness
|
||||||
|
- **TODO Management**: Read and write TODO lists with markdown checkbox format
|
||||||
- **Computer Control** (Experimental): Automate desktop applications
|
- **Computer Control** (Experimental): Automate desktop applications
|
||||||
- **OCR Support**: Extract and find text from images and screen regions using Tesseract
|
|
||||||
- Mouse and keyboard control
|
- Mouse and keyboard control
|
||||||
- UI element inspection
|
- UI element inspection
|
||||||
- Screenshot capture
|
- Screenshot capture and window management
|
||||||
- See [Computer Control Guide](docs/COMPUTER_CONTROL.md) for details
|
- OCR text extraction from images and screen regions
|
||||||
|
- Window listing and identification
|
||||||
- **Final Output**: Formatted result presentation
|
- **Final Output**: Formatted result presentation
|
||||||
|
|
||||||
### Provider Flexibility
|
### Provider Flexibility
|
||||||
@@ -111,7 +113,7 @@ G3 is designed for:
|
|||||||
- Automated code generation and refactoring
|
- Automated code generation and refactoring
|
||||||
- File manipulation and project scaffolding
|
- File manipulation and project scaffolding
|
||||||
- System administration tasks
|
- System administration tasks
|
||||||
- Data processing and transformation
|
- Data processing and transformation
|
||||||
- API integration and testing
|
- API integration and testing
|
||||||
- Documentation generation
|
- Documentation generation
|
||||||
- Complex multi-step workflows
|
- Complex multi-step workflows
|
||||||
@@ -134,24 +136,12 @@ g3 "implement a function to calculate fibonacci numbers"
|
|||||||
|
|
||||||
G3 can interact with your computer's GUI for automation tasks:
|
G3 can interact with your computer's GUI for automation tasks:
|
||||||
|
|
||||||
### Setup
|
**Available Tools**: `mouse_click`, `type_text`, `find_element`, `take_screenshot`, `extract_text`, `find_text_on_screen`, `list_windows`
|
||||||
|
|
||||||
1. Enable in config:
|
**Setup**: Enable in config with `computer_control.enabled = true` and grant OS accessibility permissions:
|
||||||
```toml
|
- **macOS**: System Preferences → Security & Privacy → Accessibility
|
||||||
[computer_control]
|
- **Linux**: Ensure X11 or Wayland access
|
||||||
enabled = true
|
- **Windows**: Run as administrator (first time only)
|
||||||
```
|
|
||||||
|
|
||||||
2. Grant OS permissions:
|
|
||||||
- **macOS**: System Preferences → Security & Privacy → Accessibility
|
|
||||||
- **Linux**: Ensure X11 or Wayland access
|
|
||||||
- **Windows**: Run as administrator (first time only)
|
|
||||||
|
|
||||||
3. Use computer control:
|
|
||||||
```bash
|
|
||||||
```
|
|
||||||
|
|
||||||
See [Computer Control Guide](docs/COMPUTER_CONTROL.md) for detailed documentation.
|
|
||||||
|
|
||||||
## Session Logs
|
## Session Logs
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user