Compare commits
21 Commits
micn/agent
...
micn/auton
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
f2ed303550 | ||
|
|
93121c18e0 | ||
|
|
ed84a940f9 | ||
|
|
3128b5d8b9 | ||
|
|
758e255af8 | ||
|
|
393826ae02 | ||
|
|
3afad3d61f | ||
|
|
2488cc54d5 | ||
|
|
2ad0c9a3fd | ||
|
|
2008a81193 | ||
|
|
776f5034b8 | ||
|
|
92bece957b | ||
|
|
767299ff4e | ||
|
|
9d35449be8 | ||
|
|
da652bf287 | ||
|
|
a566171203 | ||
|
|
347c9e1e00 | ||
|
|
aa7eda0331 | ||
|
|
e42c76f3b9 | ||
|
|
dd211fab1c | ||
|
|
bcece38473 |
33
CHANGELOG.md
Normal file
33
CHANGELOG.md
Normal file
@@ -0,0 +1,33 @@
|
|||||||
|
# Changelog
|
||||||
|
|
||||||
|
## [Unreleased]
|
||||||
|
|
||||||
|
### Added
|
||||||
|
|
||||||
|
**Interactive Requirements Mode**
|
||||||
|
- **AI-Enhanced Interactive Requirements**: New `--interactive-requirements` flag for autonomous mode
|
||||||
|
- User enters brief description of what they want to build
|
||||||
|
- AI automatically enhances input into structured requirements.md document
|
||||||
|
- Generates professional markdown with:
|
||||||
|
- Project title and overview
|
||||||
|
- Organized requirements (functional, technical, quality)
|
||||||
|
- Acceptance criteria
|
||||||
|
- User can review, accept, edit manually, or cancel before proceeding
|
||||||
|
- Seamlessly transitions to autonomous mode
|
||||||
|
|
||||||
|
**Autonomous Mode Configuration**
|
||||||
|
- **Autonomous Mode Configuration**: Added ability to specify different models for coach and player agents in autonomous mode
|
||||||
|
- New `[autonomous]` configuration section in `g3.toml`
|
||||||
|
- `coach_provider` and `coach_model` options for coach agent
|
||||||
|
- `player_provider` and `player_model` options for player agent
|
||||||
|
- `Config::for_coach()` and `Config::for_player()` methods to generate role-specific configurations
|
||||||
|
- Comprehensive test suite for autonomous configuration
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Autonomous mode now uses `config.for_player()` for the player agent
|
||||||
|
- Coach agent creation now uses `config.for_coach()` for the coach agent
|
||||||
|
|
||||||
|
### Benefits
|
||||||
|
- **Cost Optimization**: Use cheaper models for execution, expensive models for review
|
||||||
|
- **Speed Optimization**: Use faster models for iteration, thorough models for validation
|
||||||
|
- **Specialization**: Leverage different providers' strengths for different roles
|
||||||
1202
Cargo.lock
generated
1202
Cargo.lock
generated
File diff suppressed because it is too large
Load Diff
@@ -4,7 +4,8 @@ members = [
|
|||||||
"crates/g3-core",
|
"crates/g3-core",
|
||||||
"crates/g3-providers",
|
"crates/g3-providers",
|
||||||
"crates/g3-config",
|
"crates/g3-config",
|
||||||
"crates/g3-execution"
|
"crates/g3-execution",
|
||||||
|
"crates/g3-computer-control"
|
||||||
]
|
]
|
||||||
resolver = "2"
|
resolver = "2"
|
||||||
|
|
||||||
|
|||||||
62
DESIGN.md
62
DESIGN.md
@@ -29,7 +29,8 @@ g3/
|
|||||||
│ ├── g3-core/ # Core agent engine, tools, and streaming logic
|
│ ├── g3-core/ # Core agent engine, tools, and streaming logic
|
||||||
│ ├── g3-providers/ # LLM provider abstractions and implementations
|
│ ├── g3-providers/ # LLM provider abstractions and implementations
|
||||||
│ ├── g3-config/ # Configuration management
|
│ ├── g3-config/ # Configuration management
|
||||||
│ └── g3-execution/ # Code execution engine
|
│ ├── g3-execution/ # Code execution engine
|
||||||
|
│ └── g3-computer-control/ # Computer control and automation
|
||||||
├── logs/ # Session logs (auto-created)
|
├── logs/ # Session logs (auto-created)
|
||||||
├── README.md # Project documentation
|
├── README.md # Project documentation
|
||||||
└── DESIGN.md # This design document
|
└── DESIGN.md # This design document
|
||||||
@@ -48,6 +49,7 @@ g3/
|
|||||||
│ • Retro TUI │ │ • Tool system │ │ • Embedded │
|
│ • Retro TUI │ │ • Tool system │ │ • Embedded │
|
||||||
│ • Autonomous │ │ • Streaming │ │ (llama.cpp) │
|
│ • Autonomous │ │ • Streaming │ │ (llama.cpp) │
|
||||||
│ mode │ │ • Task exec │ │ • OAuth flow │
|
│ mode │ │ • Task exec │ │ • OAuth flow │
|
||||||
|
│ │ │ • TODO mgmt │ │ │
|
||||||
└─────────────────┘ └─────────────────┘ └─────────────────┘
|
└─────────────────┘ └─────────────────┘ └─────────────────┘
|
||||||
│ │ │
|
│ │ │
|
||||||
└───────────────────────┼───────────────────────┘
|
└───────────────────────┼───────────────────────┘
|
||||||
@@ -59,7 +61,18 @@ g3/
|
|||||||
│ • Shell cmds │ │ • Env overrides │
|
│ • Shell cmds │ │ • Env overrides │
|
||||||
│ • Streaming │ │ • Provider │
|
│ • Streaming │ │ • Provider │
|
||||||
│ • Error hdlg │ │ settings │
|
│ • Error hdlg │ │ settings │
|
||||||
└─────────────────┘ └─────────────────┘
|
└─────────────────┘ │ • Computer │
|
||||||
|
│ │ control cfg │
|
||||||
|
│ └─────────────────┘
|
||||||
|
│ │
|
||||||
|
┌─────────────────┐ │
|
||||||
|
│ g3-computer- │◄────────────┘
|
||||||
|
│ control │
|
||||||
|
│ • Mouse/kbd │
|
||||||
|
│ • Screenshots │
|
||||||
|
│ • OCR/Tesseract │
|
||||||
|
│ • Windows/UI │
|
||||||
|
└─────────────────┘
|
||||||
```
|
```
|
||||||
|
|
||||||
## Core Components
|
## Core Components
|
||||||
@@ -79,6 +92,7 @@ g3/
|
|||||||
- **Streaming Parser**: Real-time parsing of LLM responses with tool call detection and execution
|
- **Streaming Parser**: Real-time parsing of LLM responses with tool call detection and execution
|
||||||
- **Session Management**: Automatic session logging with detailed conversation history and token usage
|
- **Session Management**: Automatic session logging with detailed conversation history and token usage
|
||||||
- **Error Recovery**: Sophisticated error classification and retry logic for recoverable errors
|
- **Error Recovery**: Sophisticated error classification and retry logic for recoverable errors
|
||||||
|
- **TODO Management**: In-memory TODO list with read/write tools for task tracking
|
||||||
|
|
||||||
**Available Tools:**
|
**Available Tools:**
|
||||||
- `shell`: Execute shell commands with streaming output
|
- `shell`: Execute shell commands with streaming output
|
||||||
@@ -86,7 +100,15 @@ g3/
|
|||||||
- `write_file`: Create or overwrite files with content
|
- `write_file`: Create or overwrite files with content
|
||||||
- `str_replace`: Apply unified diffs to files with precise editing
|
- `str_replace`: Apply unified diffs to files with precise editing
|
||||||
- `final_output`: Signal task completion with detailed summaries
|
- `final_output`: Signal task completion with detailed summaries
|
||||||
- **Project Management**: Workspace handling, requirements.md processing for autonomous mode
|
- `todo_read`: Read the entire TODO list content
|
||||||
|
- `todo_write`: Write or overwrite the entire TODO list
|
||||||
|
- `mouse_click`: Click the mouse at specific coordinates
|
||||||
|
- `type_text`: Type text at the current cursor position
|
||||||
|
- `find_element`: Find UI elements by text, role, or attributes
|
||||||
|
- `take_screenshot`: Capture screenshots of screen, region, or window
|
||||||
|
- `extract_text`: Extract text from images or screen regions using OCR
|
||||||
|
- `find_text_on_screen`: Find text visually on screen and return coordinates
|
||||||
|
- `list_windows`: List all open windows with IDs and titles
|
||||||
|
|
||||||
### 2. g3-providers: LLM Provider Abstraction
|
### 2. g3-providers: LLM Provider Abstraction
|
||||||
|
|
||||||
@@ -172,6 +194,26 @@ g3/
|
|||||||
- **Validation**: Configuration validation with helpful error messages
|
- **Validation**: Configuration validation with helpful error messages
|
||||||
- **Flexible Paths**: Support for shell expansion (`~`, environment variables)
|
- **Flexible Paths**: Support for shell expansion (`~`, environment variables)
|
||||||
|
|
||||||
|
### 6. g3-computer-control: Computer Control & Automation
|
||||||
|
|
||||||
|
**Primary Responsibilities:**
|
||||||
|
- Cross-platform computer control and automation
|
||||||
|
- Mouse and keyboard input simulation
|
||||||
|
- Window management and screenshot capture
|
||||||
|
- OCR text extraction from images and screen regions
|
||||||
|
|
||||||
|
**Platform Support:**
|
||||||
|
- **macOS**: Core Graphics, Cocoa, screencapture integration
|
||||||
|
- **Linux**: X11/Xtest for input, X11 for window management
|
||||||
|
- **Windows**: Win32 APIs for input and window control
|
||||||
|
|
||||||
|
**Key Features:**
|
||||||
|
- **OCR Integration**: Tesseract-based text extraction from images
|
||||||
|
- **Window Management**: List, identify, and capture specific application windows
|
||||||
|
- **UI Automation**: Find elements, simulate clicks, type text
|
||||||
|
- **Screenshot Capture**: Full screen, regions, or specific windows
|
||||||
|
- **Accessibility**: Requires OS-level permissions for automation
|
||||||
|
|
||||||
## Advanced Features
|
## Advanced Features
|
||||||
|
|
||||||
### Context Window Management
|
### Context Window Management
|
||||||
@@ -180,6 +222,7 @@ G3 implements sophisticated context window management:
|
|||||||
|
|
||||||
- **Automatic Monitoring**: Tracks token usage with percentage-based thresholds
|
- **Automatic Monitoring**: Tracks token usage with percentage-based thresholds
|
||||||
- **Smart Summarization**: Auto-triggers at 80% capacity to prevent context overflow
|
- **Smart Summarization**: Auto-triggers at 80% capacity to prevent context overflow
|
||||||
|
- **Context Thinning**: Progressive thinning at 50%, 60%, 70%, 80% thresholds - replaces large tool results with file references
|
||||||
- **Conversation Preservation**: Maintains conversation continuity through intelligent summaries
|
- **Conversation Preservation**: Maintains conversation continuity through intelligent summaries
|
||||||
- **Provider-Specific Limits**: Adapts to different model context windows (4k to 200k+ tokens)
|
- **Provider-Specific Limits**: Adapts to different model context windows (4k to 200k+ tokens)
|
||||||
- **Cumulative Tracking**: Monitors total token usage across entire sessions
|
- **Cumulative Tracking**: Monitors total token usage across entire sessions
|
||||||
@@ -354,20 +397,23 @@ This design document reflects the current state of G3 as a mature, production-re
|
|||||||
### Fully Implemented
|
### Fully Implemented
|
||||||
- ✅ **Core Agent Engine**: Complete with streaming, tool execution, and context management
|
- ✅ **Core Agent Engine**: Complete with streaming, tool execution, and context management
|
||||||
- ✅ **Provider System**: Anthropic, Databricks, and Embedded providers with OAuth support
|
- ✅ **Provider System**: Anthropic, Databricks, and Embedded providers with OAuth support
|
||||||
- ✅ **Tool System**: All 5 core tools (shell, read_file, write_file, str_replace, final_output)
|
- ✅ **Tool System**: 13 tools including file ops, shell, TODO management, and computer control
|
||||||
- ✅ **CLI Interface**: Interactive mode, single-shot mode, retro TUI
|
- ✅ **CLI Interface**: Interactive mode, single-shot mode, retro TUI
|
||||||
- ✅ **Autonomous Mode**: Coach-player feedback loop with requirements.md processing
|
- ✅ **Autonomous Mode**: Coach-player feedback loop with requirements.md processing
|
||||||
- ✅ **Configuration**: TOML-based config with environment overrides
|
- ✅ **Configuration**: TOML-based config with environment overrides
|
||||||
- ✅ **Error Handling**: Comprehensive retry logic and error classification
|
- ✅ **Error Handling**: Comprehensive retry logic and error classification
|
||||||
- ✅ **Session Logging**: Automatic session tracking and JSON logs
|
- ✅ **Session Logging**: Automatic session tracking and JSON logs
|
||||||
- ✅ **Context Management**: Auto-summarization at 80% capacity
|
- ✅ **Context Management**: Context thinning (50-80%) and auto-summarization at 80% capacity
|
||||||
|
- ✅ **Computer Control**: Cross-platform automation with OCR support
|
||||||
|
- ✅ **TODO Management**: In-memory TODO list with read/write tools
|
||||||
|
|
||||||
### Architecture Highlights
|
### Architecture Highlights
|
||||||
- **Workspace**: 5 crates with clear separation of concerns
|
- **Workspace**: 6 crates with clear separation of concerns
|
||||||
- **Dependencies**: Modern Rust ecosystem (Tokio, Clap, Serde, etc.)
|
- **Dependencies**: Modern Rust ecosystem (Tokio, Clap, Serde, etc.)
|
||||||
- **Streaming**: Real-time response processing with tool call detection
|
- **Streaming**: Real-time response processing with tool call detection
|
||||||
- **Cross-Platform**: Works on macOS, Linux, and Windows
|
- **Cross-Platform**: Works on macOS, Linux, and Windows
|
||||||
- **GPU Support**: Metal acceleration for local models on macOS
|
- **GPU Support**: Metal acceleration for local models on macOS, CUDA on Linux
|
||||||
|
- **OCR Support**: Tesseract integration for text extraction from images
|
||||||
|
|
||||||
### Key Files
|
### Key Files
|
||||||
- `src/main.rs`: main entry point delegating to g3-cli
|
- `src/main.rs`: main entry point delegating to g3-cli
|
||||||
@@ -376,3 +422,5 @@ This design document reflects the current state of G3 as a mature, production-re
|
|||||||
- `crates/g3-providers/src/lib.rs`: provider trait and registry
|
- `crates/g3-providers/src/lib.rs`: provider trait and registry
|
||||||
- `crates/g3-config/src/lib.rs`: configuration management
|
- `crates/g3-config/src/lib.rs`: configuration management
|
||||||
- `crates/g3-execution/src/lib.rs`: code execution engine
|
- `crates/g3-execution/src/lib.rs`: code execution engine
|
||||||
|
- `crates/g3-computer-control/src/lib.rs`: computer control and automation
|
||||||
|
- `crates/g3-computer-control/src/platform/`: platform-specific implementations
|
||||||
|
|||||||
331
README.md
331
README.md
@@ -2,106 +2,14 @@
|
|||||||
|
|
||||||
G3 is a coding AI agent designed to help you complete tasks by writing code and executing commands. Built in Rust, it provides a flexible architecture for interacting with various Large Language Model (LLM) providers while offering powerful code generation and task automation capabilities.
|
G3 is a coding AI agent designed to help you complete tasks by writing code and executing commands. Built in Rust, it provides a flexible architecture for interacting with various Large Language Model (LLM) providers while offering powerful code generation and task automation capabilities.
|
||||||
|
|
||||||
## Architecture Overview
|
|
||||||
|
|
||||||
G3 follows a modular architecture organized as a Rust workspace with multiple crates, each responsible for specific functionality:
|
|
||||||
|
|
||||||
### Core Components
|
|
||||||
|
|
||||||
#### **g3-core**
|
|
||||||
The heart of the agent system, containing:
|
|
||||||
- **Agent Engine**: Main orchestration logic for handling conversations, tool execution, and task management
|
|
||||||
- **Context Window Management**: Intelligent tracking of token usage with auto-summarization capabilities when approaching context limits (~80% capacity)
|
|
||||||
- **Tool System**: Built-in tools for file operations (read, write, edit), shell command execution, and structured output generation
|
|
||||||
- **Streaming Response Parser**: Real-time parsing of LLM responses with tool call detection and execution
|
|
||||||
- **Task Execution**: Support for single and iterative task execution with automatic retry logic
|
|
||||||
|
|
||||||
#### **g3-providers**
|
|
||||||
Abstraction layer for LLM providers:
|
|
||||||
- **Provider Interface**: Common trait-based API for different LLM backends
|
|
||||||
- **Multiple Provider Support**:
|
|
||||||
- Anthropic (Claude models)
|
|
||||||
- Databricks (DBRX and other models)
|
|
||||||
- Local/embedded models via llama.cpp with Metal acceleration on macOS
|
|
||||||
- **OAuth Authentication**: Built-in OAuth flow support for secure provider authentication
|
|
||||||
- **Provider Registry**: Dynamic provider management and selection
|
|
||||||
|
|
||||||
#### **g3-config**
|
|
||||||
Configuration management system:
|
|
||||||
- Environment-based configuration
|
|
||||||
- Provider credentials and settings
|
|
||||||
- Model selection and parameters
|
|
||||||
- Runtime configuration options
|
|
||||||
|
|
||||||
#### **g3-execution**
|
|
||||||
Task execution framework:
|
|
||||||
- Task planning and decomposition
|
|
||||||
- Execution strategies (sequential, parallel)
|
|
||||||
- Error handling and retry mechanisms
|
|
||||||
- Progress tracking and reporting
|
|
||||||
|
|
||||||
#### **g3-cli**
|
|
||||||
Command-line interface:
|
|
||||||
- Interactive terminal interface
|
|
||||||
- Task submission and monitoring
|
|
||||||
- Configuration management commands
|
|
||||||
- Session management
|
|
||||||
|
|
||||||
### Error Handling & Resilience
|
|
||||||
|
|
||||||
G3 includes robust error handling with automatic retry logic:
|
|
||||||
- **Recoverable Error Detection**: Automatically identifies recoverable errors (rate limits, network issues, server errors, timeouts)
|
|
||||||
- **Exponential Backoff with Jitter**: Implements intelligent retry delays to avoid overwhelming services
|
|
||||||
- **Detailed Error Logging**: Captures comprehensive error context including stack traces, request/response data, and session information
|
|
||||||
- **Error Persistence**: Saves detailed error logs to `logs/errors/` for post-mortem analysis
|
|
||||||
- **Graceful Degradation**: Non-recoverable errors are logged with full context before terminating
|
|
||||||
|
|
||||||
## Key Features
|
## Key Features
|
||||||
|
|
||||||
### Intelligent Context Management
|
- **Multiple LLM Providers**: Anthropic (Claude), Databricks, OpenAI, and local models via llama.cpp
|
||||||
- Automatic context window monitoring with percentage-based tracking
|
- **Autonomous Mode**: Coach-player feedback loop for complex tasks
|
||||||
- Smart auto-summarization when approaching token limits
|
- **Intelligent Context Management**: Auto-summarization and context thinning at 50-80% thresholds
|
||||||
- Conversation history preservation through summaries
|
- **Rich Tool Ecosystem**: File operations, shell commands, computer control, browser automation
|
||||||
- Dynamic token allocation for different providers
|
- **Streaming Responses**: Real-time output with tool call detection
|
||||||
|
- **Error Recovery**: Automatic retry logic with exponential backoff
|
||||||
### Tool Ecosystem
|
|
||||||
- **File Operations**: Read, write, and edit files with line-range precision
|
|
||||||
- **Shell Integration**: Execute system commands with output capture
|
|
||||||
- **Code Generation**: Structured code generation with syntax awareness
|
|
||||||
- **Final Output**: Formatted result presentation
|
|
||||||
|
|
||||||
### Provider Flexibility
|
|
||||||
- Support for multiple LLM providers through a unified interface
|
|
||||||
- Hot-swappable providers without code changes
|
|
||||||
- Provider-specific optimizations and feature support
|
|
||||||
- Local model support for offline operation
|
|
||||||
|
|
||||||
### Task Automation
|
|
||||||
- Single-shot task execution for quick operations
|
|
||||||
- Iterative task mode for complex, multi-step workflows
|
|
||||||
- Automatic error recovery and retry logic
|
|
||||||
- Progress tracking and intermediate result handling
|
|
||||||
|
|
||||||
## Language & Technology Stack
|
|
||||||
|
|
||||||
- **Language**: Rust (2021 edition)
|
|
||||||
- **Async Runtime**: Tokio for concurrent operations
|
|
||||||
- **HTTP Client**: Reqwest for API communications
|
|
||||||
- **Serialization**: Serde for JSON handling
|
|
||||||
- **CLI Framework**: Clap for command-line parsing
|
|
||||||
- **Logging**: Tracing for structured logging
|
|
||||||
- **Local Models**: llama.cpp with Metal acceleration support
|
|
||||||
|
|
||||||
## Use Cases
|
|
||||||
|
|
||||||
G3 is designed for:
|
|
||||||
- Automated code generation and refactoring
|
|
||||||
- File manipulation and project scaffolding
|
|
||||||
- System administration tasks
|
|
||||||
- Data processing and transformation
|
|
||||||
- API integration and testing
|
|
||||||
- Documentation generation
|
|
||||||
- Complex multi-step workflows
|
|
||||||
|
|
||||||
## Getting Started
|
## Getting Started
|
||||||
|
|
||||||
@@ -109,21 +17,234 @@ G3 is designed for:
|
|||||||
# Build the project
|
# Build the project
|
||||||
cargo build --release
|
cargo build --release
|
||||||
|
|
||||||
# Run G3
|
# Execute a single task
|
||||||
cargo run
|
|
||||||
|
|
||||||
# Execute a task
|
|
||||||
g3 "implement a function to calculate fibonacci numbers"
|
g3 "implement a function to calculate fibonacci numbers"
|
||||||
|
|
||||||
|
# Start autonomous mode with interactive requirements
|
||||||
|
g3 --autonomous --interactive-requirements
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
Create `~/.config/g3/config.toml`:
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[providers]
|
||||||
|
default_provider = "databricks"
|
||||||
|
|
||||||
|
[providers.anthropic]
|
||||||
|
api_key = "sk-ant-..."
|
||||||
|
model = "claude-3-5-sonnet-20241022"
|
||||||
|
max_tokens = 4096
|
||||||
|
|
||||||
|
[providers.databricks]
|
||||||
|
host = "https://your-workspace.cloud.databricks.com"
|
||||||
|
model = "databricks-meta-llama-3-1-70b-instruct"
|
||||||
|
max_tokens = 4096
|
||||||
|
use_oauth = true
|
||||||
|
|
||||||
|
[agent]
|
||||||
|
max_context_length = 8192
|
||||||
|
enable_streaming = true
|
||||||
|
|
||||||
|
# Optional: Use different models for coach and player in autonomous mode
|
||||||
|
[autonomous]
|
||||||
|
coach_provider = "anthropic"
|
||||||
|
coach_model = "claude-3-5-sonnet-20241022" # Thorough review
|
||||||
|
player_provider = "databricks"
|
||||||
|
player_model = "databricks-meta-llama-3-1-70b-instruct" # Fast execution
|
||||||
|
```
|
||||||
|
|
||||||
|
## Autonomous Mode (Coach-Player Loop)
|
||||||
|
|
||||||
|
G3 features an autonomous mode where two agents collaborate:
|
||||||
|
- **Player Agent**: Executes tasks and implements solutions
|
||||||
|
- **Coach Agent**: Reviews work and provides feedback
|
||||||
|
|
||||||
|
### Option 1: Interactive Requirements with AI Enhancement (Recommended)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
g3 --autonomous --interactive-requirements
|
||||||
|
```
|
||||||
|
|
||||||
|
**How it works:**
|
||||||
|
1. Describe what you want to build (can be brief)
|
||||||
|
2. Press **Ctrl+D** (Unix/Mac) or **Ctrl+Z** (Windows)
|
||||||
|
3. AI enhances your input into a structured requirements document
|
||||||
|
4. Review the enhanced requirements
|
||||||
|
5. Choose to proceed, edit manually, or cancel
|
||||||
|
6. If accepted, autonomous mode starts automatically
|
||||||
|
|
||||||
|
**Example:**
|
||||||
|
```
|
||||||
|
You type: "build a todo app with cli in python"
|
||||||
|
|
||||||
|
AI generates:
|
||||||
|
# Todo List CLI Application
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
A command-line todo list application built in Python...
|
||||||
|
|
||||||
|
## Functional Requirements
|
||||||
|
1. Add tasks with descriptions
|
||||||
|
2. Mark tasks as complete
|
||||||
|
3. Delete tasks
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
### Option 2: Direct Requirements
|
||||||
|
|
||||||
|
```bash
|
||||||
|
g3 --autonomous --requirements "Build a REST API with CRUD operations for user management"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Option 3: Requirements File
|
||||||
|
|
||||||
|
Create `requirements.md` in your workspace:
|
||||||
|
|
||||||
|
```markdown
|
||||||
|
# Project Requirements
|
||||||
|
|
||||||
|
1. Create a REST API with user endpoints
|
||||||
|
2. Use SQLite for storage
|
||||||
|
3. Include input validation
|
||||||
|
4. Write unit tests
|
||||||
|
```
|
||||||
|
|
||||||
|
Then run:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
g3 --autonomous
|
||||||
|
```
|
||||||
|
|
||||||
|
### Why Different Models for Coach and Player?
|
||||||
|
|
||||||
|
Configure different models in the `[autonomous]` section to:
|
||||||
|
- **Optimize Cost**: Use cheaper model for execution, expensive for review
|
||||||
|
- **Optimize Speed**: Use fast model for iteration, thorough for validation
|
||||||
|
- **Specialize**: Leverage provider strengths (e.g., Claude for analysis, Llama for code)
|
||||||
|
|
||||||
|
If not configured, both agents use the `default_provider` and its model.
|
||||||
|
|
||||||
|
## Command-Line Options
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Autonomous mode
|
||||||
|
g3 --autonomous --interactive-requirements
|
||||||
|
g3 --autonomous --requirements "Your requirements"
|
||||||
|
g3 --autonomous --max-turns 10
|
||||||
|
|
||||||
|
# Single-shot mode
|
||||||
|
g3 "your task here"
|
||||||
|
|
||||||
|
# Options
|
||||||
|
--workspace <DIR> # Set workspace directory
|
||||||
|
--provider <NAME> # Override provider (anthropic, databricks, openai)
|
||||||
|
--model <NAME> # Override model
|
||||||
|
--quiet # Disable log files
|
||||||
|
--webdriver # Enable browser automation
|
||||||
|
--show-prompt # Show system prompt
|
||||||
|
--show-code # Show generated code
|
||||||
|
```
|
||||||
|
|
||||||
|
## Architecture Overview
|
||||||
|
|
||||||
|
G3 is organized as a Rust workspace with multiple crates:
|
||||||
|
|
||||||
|
- **g3-core**: Agent engine, context management, tool system, streaming parser
|
||||||
|
- **g3-providers**: LLM provider abstraction (Anthropic, Databricks, OpenAI, local models)
|
||||||
|
- **g3-config**: Configuration management
|
||||||
|
- **g3-execution**: Task execution framework
|
||||||
|
- **g3-computer-control**: Mouse/keyboard automation, OCR, screenshots
|
||||||
|
- **g3-cli**: Command-line interface
|
||||||
|
|
||||||
|
### Key Capabilities
|
||||||
|
|
||||||
|
**Intelligent Context Management**
|
||||||
|
- Automatic context window monitoring with percentage-based tracking
|
||||||
|
- Smart auto-summarization when approaching token limits
|
||||||
|
- Context thinning at 50%, 60%, 70%, 80% thresholds
|
||||||
|
- Dynamic token allocation (4k to 200k+ tokens)
|
||||||
|
|
||||||
|
**Tool Ecosystem**
|
||||||
|
- File operations (read, write, edit with line-range precision)
|
||||||
|
- Shell command execution
|
||||||
|
- TODO management
|
||||||
|
- Computer control (experimental): mouse, keyboard, OCR, screenshots
|
||||||
|
- Browser automation via WebDriver (Safari)
|
||||||
|
|
||||||
|
**Error Handling**
|
||||||
|
- Automatic retry logic with exponential backoff
|
||||||
|
- Recoverable error detection (rate limits, network issues, timeouts)
|
||||||
|
- Detailed error logging to `logs/errors/`
|
||||||
|
|
||||||
|
## WebDriver Browser Automation
|
||||||
|
|
||||||
|
**One-Time Setup** (macOS):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Enable Safari Remote Automation
|
||||||
|
safaridriver --enable # Requires password
|
||||||
|
|
||||||
|
# Or via Safari UI:
|
||||||
|
# Safari → Preferences → Advanced → Show Develop menu
|
||||||
|
# Then: Develop → Allow Remote Automation
|
||||||
|
```
|
||||||
|
|
||||||
|
**Usage**:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
g3 --webdriver "scrape the top stories from Hacker News"
|
||||||
|
```
|
||||||
|
|
||||||
|
See [docs/webdriver-setup.md](docs/webdriver-setup.md) for detailed setup.
|
||||||
|
|
||||||
|
## Computer Control (Experimental)
|
||||||
|
|
||||||
|
Enable in config:
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[computer_control]
|
||||||
|
enabled = true
|
||||||
|
require_confirmation = true
|
||||||
|
```
|
||||||
|
|
||||||
|
Grant accessibility permissions:
|
||||||
|
- **macOS**: System Preferences → Security & Privacy → Accessibility
|
||||||
|
- **Linux**: Ensure X11 or Wayland access
|
||||||
|
- **Windows**: Run as administrator (first time)
|
||||||
|
|
||||||
|
**Available Tools**: `mouse_click`, `type_text`, `find_element`, `take_screenshot`, `extract_text`, `find_text_on_screen`, `list_windows`
|
||||||
|
|
||||||
|
## Use Cases
|
||||||
|
|
||||||
|
- Automated code generation and refactoring
|
||||||
|
- File manipulation and project scaffolding
|
||||||
|
- System administration tasks
|
||||||
|
- Data processing and transformation
|
||||||
|
- API integration and testing
|
||||||
|
- Documentation generation
|
||||||
|
- Complex multi-step workflows
|
||||||
|
- Desktop application automation
|
||||||
|
|
||||||
## Session Logs
|
## Session Logs
|
||||||
|
|
||||||
G3 automatically saves session logs for each interaction in the `logs/` directory. These logs contain:
|
G3 automatically saves session logs to `logs/` directory:
|
||||||
- Complete conversation history
|
- Complete conversation history
|
||||||
- Token usage statistics
|
- Token usage statistics
|
||||||
- Timestamps and session status
|
- Timestamps and session status
|
||||||
|
|
||||||
The `logs/` directory is created automatically on first use and is excluded from version control.
|
Disable with `--quiet` flag.
|
||||||
|
|
||||||
|
## Technology Stack
|
||||||
|
|
||||||
|
- **Language**: Rust (2021 edition)
|
||||||
|
- **Async Runtime**: Tokio
|
||||||
|
- **HTTP Client**: Reqwest
|
||||||
|
- **Serialization**: Serde
|
||||||
|
- **CLI Framework**: Clap
|
||||||
|
- **Logging**: Tracing
|
||||||
|
- **Local Models**: llama.cpp with Metal acceleration
|
||||||
|
|
||||||
## License
|
## License
|
||||||
|
|
||||||
@@ -131,4 +252,4 @@ MIT License - see LICENSE file for details
|
|||||||
|
|
||||||
## Contributing
|
## Contributing
|
||||||
|
|
||||||
G3 is an open-source project. Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
|
Contributions welcome! Please see CONTRIBUTING.md for guidelines.
|
||||||
|
|||||||
@@ -13,3 +13,8 @@ use_oauth = true
|
|||||||
max_context_length = 8192
|
max_context_length = 8192
|
||||||
enable_streaming = true
|
enable_streaming = true
|
||||||
timeout_seconds = 60
|
timeout_seconds = 60
|
||||||
|
|
||||||
|
[computer_control]
|
||||||
|
enabled = false # Set to true to enable computer control (requires OS permissions)
|
||||||
|
require_confirmation = true
|
||||||
|
max_actions_per_second = 5
|
||||||
|
|||||||
@@ -1,7 +1,5 @@
|
|||||||
use anyhow::Result;
|
use anyhow::Result;
|
||||||
use std::time::{Duration, Instant};
|
use std::time::{Duration, Instant};
|
||||||
/// Extract coach feedback by reading from the coach agent's specific log file
|
|
||||||
/// Uses the coach agent's session ID to find the exact log file
|
|
||||||
|
|
||||||
#[derive(Debug, Clone)]
|
#[derive(Debug, Clone)]
|
||||||
struct TurnMetrics {
|
struct TurnMetrics {
|
||||||
@@ -21,7 +19,7 @@ fn generate_turn_histogram(turn_metrics: &[TurnMetrics]) -> String {
|
|||||||
// Find max values for scaling
|
// Find max values for scaling
|
||||||
let max_tokens = turn_metrics.iter().map(|t| t.tokens_used).max().unwrap_or(1);
|
let max_tokens = turn_metrics.iter().map(|t| t.tokens_used).max().unwrap_or(1);
|
||||||
let max_time_ms = turn_metrics.iter()
|
let max_time_ms = turn_metrics.iter()
|
||||||
.map(|t| t.wall_clock_time.as_millis() as u32)
|
.map(|t| t.wall_clock_time.as_millis().min(u32::MAX as u128) as u32)
|
||||||
.max()
|
.max()
|
||||||
.unwrap_or(1);
|
.unwrap_or(1);
|
||||||
|
|
||||||
@@ -35,7 +33,7 @@ fn generate_turn_histogram(turn_metrics: &[TurnMetrics]) -> String {
|
|||||||
histogram.push_str(&format!(" {} = Wall Clock Time (max: {:.1}s)\n\n", TIME_CHAR, max_time_ms as f64 / 1000.0));
|
histogram.push_str(&format!(" {} = Wall Clock Time (max: {:.1}s)\n\n", TIME_CHAR, max_time_ms as f64 / 1000.0));
|
||||||
|
|
||||||
for metrics in turn_metrics {
|
for metrics in turn_metrics {
|
||||||
let turn_time_ms = metrics.wall_clock_time.as_millis() as u32;
|
let turn_time_ms = metrics.wall_clock_time.as_millis().min(u32::MAX as u128) as u32;
|
||||||
|
|
||||||
// Calculate bar lengths (proportional to max values)
|
// Calculate bar lengths (proportional to max values)
|
||||||
let token_bar_len = if max_tokens > 0 {
|
let token_bar_len = if max_tokens > 0 {
|
||||||
@@ -99,13 +97,17 @@ fn generate_turn_histogram(turn_metrics: &[TurnMetrics]) -> String {
|
|||||||
histogram
|
histogram
|
||||||
}
|
}
|
||||||
|
|
||||||
fn extract_coach_feedback_from_logs(_coach_result: &g3_core::TaskResult, coach_agent: &g3_core::Agent<ConsoleUiWriter>, output: &SimpleOutput) -> Result<String> {
|
/// Extract coach feedback by reading from the coach agent's specific log file
|
||||||
// CORRECT APPROACH: Get the session ID from the current coach agent
|
/// Uses the coach agent's session ID to find the exact log file
|
||||||
// and read its specific log file directly
|
fn extract_coach_feedback_from_logs(
|
||||||
|
coach_result: &g3_core::TaskResult,
|
||||||
|
coach_agent: &g3_core::Agent<ConsoleUiWriter>,
|
||||||
|
output: &SimpleOutput,
|
||||||
|
) -> String {
|
||||||
// Get the coach agent's session ID
|
// Get the coach agent's session ID
|
||||||
let session_id = coach_agent.get_session_id()
|
let session_id = coach_agent
|
||||||
.ok_or_else(|| anyhow::anyhow!("Coach agent has no session ID"))?;
|
.get_session_id()
|
||||||
|
.expect("Coach agent has no session ID");
|
||||||
|
|
||||||
// Construct the log file path for this specific coach session
|
// Construct the log file path for this specific coach session
|
||||||
let logs_dir = std::path::Path::new("logs");
|
let logs_dir = std::path::Path::new("logs");
|
||||||
@@ -118,12 +120,75 @@ fn extract_coach_feedback_from_logs(_coach_result: &g3_core::TaskResult, coach_a
|
|||||||
if let Some(context_window) = log_json.get("context_window") {
|
if let Some(context_window) = log_json.get("context_window") {
|
||||||
if let Some(conversation_history) = context_window.get("conversation_history") {
|
if let Some(conversation_history) = context_window.get("conversation_history") {
|
||||||
if let Some(messages) = conversation_history.as_array() {
|
if let Some(messages) = conversation_history.as_array() {
|
||||||
// Simply get the last message content - this is the coach's final feedback
|
// Look for the last assistant message (regardless of tool used)
|
||||||
if let Some(last_message) = messages.last() {
|
for message in messages.iter().rev() {
|
||||||
if let Some(content) = last_message.get("content") {
|
if let Some(role) = message.get("role") {
|
||||||
|
if role.as_str() == Some("assistant") {
|
||||||
|
if let Some(content) = message.get("content") {
|
||||||
if let Some(content_str) = content.as_str() {
|
if let Some(content_str) = content.as_str() {
|
||||||
output.print(&format!("✅ Extracted coach feedback from session: {}", session_id));
|
// First, check if this is plain text feedback (no tool call)
|
||||||
return Ok(content_str.to_string());
|
// This happens when the coach returns final feedback directly
|
||||||
|
if !content_str.contains("{\"tool\"") {
|
||||||
|
let trimmed = content_str.trim();
|
||||||
|
if !trimmed.is_empty() {
|
||||||
|
output.print(&format!(
|
||||||
|
"✅ Extracted coach feedback from session: {} ({} chars) [plain text]",
|
||||||
|
session_id,
|
||||||
|
trimmed.len()
|
||||||
|
));
|
||||||
|
return trimmed.to_string();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Look for ANY tool call in the message
|
||||||
|
// Pattern: {"tool": "...", "args": {...}}
|
||||||
|
if let Some(tool_start) = content_str.find("{\"tool\"") {
|
||||||
|
let json_part = &content_str[tool_start..];
|
||||||
|
|
||||||
|
// Find the end of the JSON object
|
||||||
|
if let Some(json_end) = find_json_end(json_part) {
|
||||||
|
let json_str = &json_part[..json_end];
|
||||||
|
|
||||||
|
if let Ok(tool_call) = serde_json::from_str::<serde_json::Value>(json_str) {
|
||||||
|
if let Some(args) = tool_call.get("args") {
|
||||||
|
// Try to extract feedback from different possible fields
|
||||||
|
let feedback = if let Some(summary) = args.get("summary") {
|
||||||
|
// final_output tool uses "summary"
|
||||||
|
summary.as_str().map(|s| s.to_string())
|
||||||
|
} else if let Some(content) = args.get("content") {
|
||||||
|
// todo_write and other tools might use "content"
|
||||||
|
content.as_str().map(|s| s.to_string())
|
||||||
|
} else {
|
||||||
|
// Fallback: use the entire args as JSON string
|
||||||
|
Some(serde_json::to_string_pretty(args).unwrap_or_default())
|
||||||
|
};
|
||||||
|
|
||||||
|
if let Some(feedback_str) = feedback {
|
||||||
|
if !feedback_str.trim().is_empty() {
|
||||||
|
output.print(&format!(
|
||||||
|
"✅ Extracted coach feedback from session: {} ({} chars)",
|
||||||
|
session_id,
|
||||||
|
feedback_str.len()
|
||||||
|
));
|
||||||
|
|
||||||
|
// Validate feedback length
|
||||||
|
if feedback_str.len() < 80 && !feedback_str.contains("IMPLEMENTATION_APPROVED") {
|
||||||
|
panic!(
|
||||||
|
"Coach feedback is too short ({} chars): '{}'",
|
||||||
|
feedback_str.len(),
|
||||||
|
feedback_str
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
return feedback_str;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -134,7 +199,47 @@ fn extract_coach_feedback_from_logs(_coach_result: &g3_core::TaskResult, coach_a
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
Err(anyhow::anyhow!("Could not extract feedback from coach session: {}", session_id))
|
// If we couldn't extract from logs, panic with detailed error
|
||||||
|
panic!(
|
||||||
|
"CRITICAL: Could not extract coach feedback from session: {}\n\
|
||||||
|
Log file path: {:?}\n\
|
||||||
|
Log file exists: {}\n\
|
||||||
|
This indicates the coach did not call any tool or the log is corrupted.\n\
|
||||||
|
Coach result response length: {} chars",
|
||||||
|
session_id,
|
||||||
|
log_file_path,
|
||||||
|
log_file_path.exists(),
|
||||||
|
coach_result.response.len()
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Helper function to find the end of a JSON object using brace counting
|
||||||
|
fn find_json_end(json_str: &str) -> Option<usize> {
|
||||||
|
let mut depth = 0;
|
||||||
|
let mut in_string = false;
|
||||||
|
let mut escape_next = false;
|
||||||
|
|
||||||
|
for (i, ch) in json_str.char_indices() {
|
||||||
|
if escape_next {
|
||||||
|
escape_next = false;
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
match ch {
|
||||||
|
'\\' if in_string => escape_next = true,
|
||||||
|
'"' => in_string = !in_string,
|
||||||
|
'{' if !in_string => depth += 1,
|
||||||
|
'}' if !in_string => {
|
||||||
|
depth -= 1;
|
||||||
|
if depth == 0 {
|
||||||
|
return Some(i + 1);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
_ => {}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
None
|
||||||
}
|
}
|
||||||
|
|
||||||
use clap::Parser;
|
use clap::Parser;
|
||||||
@@ -197,6 +302,10 @@ pub struct Cli {
|
|||||||
#[arg(long, value_name = "TEXT")]
|
#[arg(long, value_name = "TEXT")]
|
||||||
pub requirements: Option<String>,
|
pub requirements: Option<String>,
|
||||||
|
|
||||||
|
/// Interactive mode: prompt for requirements and save to requirements.md before starting autonomous mode
|
||||||
|
#[arg(long)]
|
||||||
|
pub interactive_requirements: bool,
|
||||||
|
|
||||||
/// Use retro terminal UI (inspired by 80s sci-fi)
|
/// Use retro terminal UI (inspired by 80s sci-fi)
|
||||||
#[arg(long)]
|
#[arg(long)]
|
||||||
pub retro: bool,
|
pub retro: bool,
|
||||||
@@ -216,6 +325,10 @@ pub struct Cli {
|
|||||||
/// Disable log file creation (no logs/ directory or session logs)
|
/// Disable log file creation (no logs/ directory or session logs)
|
||||||
#[arg(long)]
|
#[arg(long)]
|
||||||
pub quiet: bool,
|
pub quiet: bool,
|
||||||
|
|
||||||
|
/// Enable WebDriver tools for browser automation (Safari)
|
||||||
|
#[arg(long)]
|
||||||
|
pub webdriver: bool,
|
||||||
}
|
}
|
||||||
|
|
||||||
pub async fn run() -> Result<()> {
|
pub async fn run() -> Result<()> {
|
||||||
@@ -284,6 +397,113 @@ pub async fn run() -> Result<()> {
|
|||||||
|
|
||||||
// Create project model
|
// Create project model
|
||||||
let project = if cli.autonomous {
|
let project = if cli.autonomous {
|
||||||
|
// Handle interactive requirements mode with AI enhancement
|
||||||
|
if cli.interactive_requirements {
|
||||||
|
println!("\n📝 Interactive Requirements Mode");
|
||||||
|
println!("================================\n");
|
||||||
|
println!("Describe what you want to build (can be brief):");
|
||||||
|
println!("Press Ctrl+D (Unix) or Ctrl+Z (Windows) when done.\n");
|
||||||
|
|
||||||
|
use std::io::{self, Read, Write};
|
||||||
|
let mut requirements_input = String::new();
|
||||||
|
io::stdin().read_to_string(&mut requirements_input)?;
|
||||||
|
|
||||||
|
if requirements_input.trim().is_empty() {
|
||||||
|
anyhow::bail!("No requirements provided. Exiting.");
|
||||||
|
}
|
||||||
|
|
||||||
|
println!("\n🤖 Enhancing your requirements with AI...\n");
|
||||||
|
|
||||||
|
// Create a temporary agent to enhance the requirements
|
||||||
|
let temp_config = Config::load_with_overrides(
|
||||||
|
cli.config.as_deref(),
|
||||||
|
cli.provider.clone(),
|
||||||
|
cli.model.clone(),
|
||||||
|
)?;
|
||||||
|
|
||||||
|
// Create a simple output writer for the enhancement task
|
||||||
|
let ui_writer = ConsoleUiWriter::new();
|
||||||
|
let mut temp_agent = Agent::new_with_readme_and_quiet(
|
||||||
|
temp_config,
|
||||||
|
ui_writer,
|
||||||
|
None,
|
||||||
|
true, // quiet mode for enhancement
|
||||||
|
).await?;
|
||||||
|
|
||||||
|
// Create enhancement prompt
|
||||||
|
let enhancement_prompt = format!(
|
||||||
|
r#"Convert the following user input into a well-structured requirements.md document.
|
||||||
|
|
||||||
|
User Input:
|
||||||
|
{}
|
||||||
|
|
||||||
|
Create a professional requirements document with:
|
||||||
|
1. A clear project title (# heading)
|
||||||
|
2. An overview section explaining what will be built
|
||||||
|
3. Organized requirements (functional, technical, quality)
|
||||||
|
4. Acceptance criteria
|
||||||
|
5. Any technical constraints or preferences mentioned
|
||||||
|
|
||||||
|
Format as proper markdown. Be specific and actionable. If the user's input is vague, make reasonable assumptions but keep it focused on what they described.
|
||||||
|
|
||||||
|
Output ONLY the markdown content, no explanations or meta-commentary."#,
|
||||||
|
requirements_input.trim()
|
||||||
|
);
|
||||||
|
|
||||||
|
// Execute enhancement task
|
||||||
|
let result = temp_agent
|
||||||
|
.execute_task_with_timing(&enhancement_prompt, None, false, false, false, false)
|
||||||
|
.await?;
|
||||||
|
|
||||||
|
let enhanced_requirements = result.response.trim().to_string();
|
||||||
|
|
||||||
|
// Show the enhanced requirements
|
||||||
|
println!("\n📋 Enhanced Requirements Document:");
|
||||||
|
println!("{}\n", "=".repeat(60));
|
||||||
|
println!("{}", enhanced_requirements);
|
||||||
|
println!("{}\n", "=".repeat(60));
|
||||||
|
|
||||||
|
// Ask for confirmation
|
||||||
|
println!("\n❓ Is this requirements document acceptable?");
|
||||||
|
println!(" [y] Yes, proceed with autonomous mode");
|
||||||
|
println!(" [e] Edit and save manually");
|
||||||
|
println!(" [n] No, cancel\n");
|
||||||
|
|
||||||
|
print!("Your choice (y/e/n): ");
|
||||||
|
io::stdout().flush()?;
|
||||||
|
|
||||||
|
let mut choice = String::new();
|
||||||
|
io::stdin().read_line(&mut choice)?;
|
||||||
|
let choice = choice.trim().to_lowercase();
|
||||||
|
|
||||||
|
let requirements_path = workspace_dir.join("requirements.md");
|
||||||
|
|
||||||
|
match choice.as_str() {
|
||||||
|
"y" | "yes" => {
|
||||||
|
// Save enhanced requirements
|
||||||
|
std::fs::write(&requirements_path, &enhanced_requirements)?;
|
||||||
|
println!("\n✅ Requirements saved to: {}", requirements_path.display());
|
||||||
|
println!("🚀 Starting autonomous mode...\n");
|
||||||
|
}
|
||||||
|
"e" | "edit" => {
|
||||||
|
// Save enhanced requirements for manual editing
|
||||||
|
std::fs::write(&requirements_path, &enhanced_requirements)?;
|
||||||
|
println!("\n✅ Requirements saved to: {}", requirements_path.display());
|
||||||
|
println!("📝 Please edit the file and run: g3 --autonomous");
|
||||||
|
println!(" Exiting for now.\n");
|
||||||
|
return Ok(());
|
||||||
|
}
|
||||||
|
"n" | "no" => {
|
||||||
|
println!("\n❌ Cancelled. No files were saved.\n");
|
||||||
|
return Ok(());
|
||||||
|
}
|
||||||
|
_ => {
|
||||||
|
println!("\n❌ Invalid choice. Cancelled.\n");
|
||||||
|
return Ok(());
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
if let Some(requirements_text) = cli.requirements {
|
if let Some(requirements_text) = cli.requirements {
|
||||||
// Use requirements text override
|
// Use requirements text override
|
||||||
Project::new_autonomous_with_requirements(workspace_dir.clone(), requirements_text)?
|
Project::new_autonomous_with_requirements(workspace_dir.clone(), requirements_text)?
|
||||||
@@ -304,19 +524,25 @@ pub async fn run() -> Result<()> {
|
|||||||
}
|
}
|
||||||
|
|
||||||
// Load configuration with CLI overrides
|
// Load configuration with CLI overrides
|
||||||
let config = Config::load_with_overrides(
|
let mut config = Config::load_with_overrides(
|
||||||
cli.config.as_deref(),
|
cli.config.as_deref(),
|
||||||
cli.provider.clone(),
|
cli.provider.clone(),
|
||||||
cli.model.clone(),
|
cli.model.clone(),
|
||||||
)?;
|
)?;
|
||||||
|
|
||||||
|
// Override webdriver setting from CLI flag
|
||||||
|
if cli.webdriver {
|
||||||
|
config.webdriver.enabled = true;
|
||||||
|
}
|
||||||
|
|
||||||
// Validate provider if specified
|
// Validate provider if specified
|
||||||
if let Some(ref provider) = cli.provider {
|
if let Some(ref provider) = cli.provider {
|
||||||
let valid_providers = ["anthropic", "databricks", "embedded", "openai"];
|
let valid_providers = ["anthropic", "databricks", "embedded", "openai"];
|
||||||
if !valid_providers.contains(&provider.as_str()) {
|
if !valid_providers.contains(&provider.as_str()) {
|
||||||
return Err(anyhow::anyhow!(
|
return Err(anyhow::anyhow!(
|
||||||
"Invalid provider '{}'. Valid options: {:?}",
|
"Invalid provider '{}'. Valid options: {:?}",
|
||||||
provider, valid_providers
|
provider,
|
||||||
|
valid_providers
|
||||||
));
|
));
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -335,9 +561,22 @@ pub async fn run() -> Result<()> {
|
|||||||
};
|
};
|
||||||
|
|
||||||
let mut agent = if cli.autonomous {
|
let mut agent = if cli.autonomous {
|
||||||
Agent::new_autonomous_with_readme_and_quiet(config.clone(), ui_writer, combined_content.clone(), cli.quiet).await?
|
Agent::new_autonomous_with_readme_and_quiet(
|
||||||
|
// Use player-specific config in autonomous mode
|
||||||
|
config.for_player()?,
|
||||||
|
ui_writer,
|
||||||
|
combined_content.clone(),
|
||||||
|
cli.quiet,
|
||||||
|
)
|
||||||
|
.await?
|
||||||
} else {
|
} else {
|
||||||
Agent::new_with_readme_and_quiet(config.clone(), ui_writer, combined_content.clone(), cli.quiet).await?
|
Agent::new_with_readme_and_quiet(
|
||||||
|
config.clone(),
|
||||||
|
ui_writer,
|
||||||
|
combined_content.clone(),
|
||||||
|
cli.quiet,
|
||||||
|
)
|
||||||
|
.await?
|
||||||
};
|
};
|
||||||
|
|
||||||
// Execute task, autonomous mode, or start interactive mode
|
// Execute task, autonomous mode, or start interactive mode
|
||||||
@@ -1119,7 +1358,10 @@ async fn run_autonomous(
|
|||||||
output.print("❌ Error: requirements.md not found in workspace directory");
|
output.print("❌ Error: requirements.md not found in workspace directory");
|
||||||
output.print(" Please either:");
|
output.print(" Please either:");
|
||||||
output.print(" 1. Create a requirements.md file with your project requirements at:");
|
output.print(" 1. Create a requirements.md file with your project requirements at:");
|
||||||
output.print(&format!(" {}/requirements.md", project.workspace().display()));
|
output.print(&format!(
|
||||||
|
" {}/requirements.md",
|
||||||
|
project.workspace().display()
|
||||||
|
));
|
||||||
output.print(" 2. Or use the --requirements flag to provide requirements text directly:");
|
output.print(" 2. Or use the --requirements flag to provide requirements text directly:");
|
||||||
output.print(" g3 --autonomous --requirements \"Your requirements here\"");
|
output.print(" g3 --autonomous --requirements \"Your requirements here\"");
|
||||||
output.print("");
|
output.print("");
|
||||||
@@ -1228,6 +1470,10 @@ async fn run_autonomous(
|
|||||||
loop {
|
loop {
|
||||||
let turn_start_time = Instant::now();
|
let turn_start_time = Instant::now();
|
||||||
let turn_start_tokens = agent.get_context_window().used_tokens;
|
let turn_start_tokens = agent.get_context_window().used_tokens;
|
||||||
|
|
||||||
|
// Reset filter suppression state at the start of each turn
|
||||||
|
g3_core::fixed_filter_json::reset_fixed_json_tool_state();
|
||||||
|
|
||||||
// Skip player turn if it's the first turn and implementation files exist
|
// Skip player turn if it's the first turn and implementation files exist
|
||||||
if !(turn == 1 && skip_first_player) {
|
if !(turn == 1 && skip_first_player) {
|
||||||
output.print(&format!(
|
output.print(&format!(
|
||||||
@@ -1254,11 +1500,17 @@ async fn run_autonomous(
|
|||||||
// If there's no coach feedback on subsequent turns, this is an error
|
// If there's no coach feedback on subsequent turns, this is an error
|
||||||
if coach_feedback.is_empty() {
|
if coach_feedback.is_empty() {
|
||||||
if turn > 1 {
|
if turn > 1 {
|
||||||
return Err(anyhow::anyhow!("Player mode error: No coach feedback received on turn {}", turn));
|
return Err(anyhow::anyhow!(
|
||||||
|
"Player mode error: No coach feedback received on turn {}",
|
||||||
|
turn
|
||||||
|
));
|
||||||
}
|
}
|
||||||
output.print("📋 Player starting initial implementation (no prior coach feedback)");
|
output.print("📋 Player starting initial implementation (no prior coach feedback)");
|
||||||
} else {
|
} else {
|
||||||
output.print(&format!("📋 Player received coach feedback ({} chars):", coach_feedback.len()));
|
output.print(&format!(
|
||||||
|
"📋 Player received coach feedback ({} chars):",
|
||||||
|
coach_feedback.len()
|
||||||
|
));
|
||||||
output.print(&format!("{}", coach_feedback));
|
output.print(&format!("{}", coach_feedback));
|
||||||
}
|
}
|
||||||
output.print(""); // Empty line for readability
|
output.print(""); // Empty line for readability
|
||||||
@@ -1356,7 +1608,7 @@ async fn run_autonomous(
|
|||||||
));
|
));
|
||||||
// Record turn metrics before incrementing
|
// Record turn metrics before incrementing
|
||||||
let turn_duration = turn_start_time.elapsed();
|
let turn_duration = turn_start_time.elapsed();
|
||||||
let turn_tokens = agent.get_context_window().used_tokens - turn_start_tokens;
|
let turn_tokens = agent.get_context_window().used_tokens.saturating_sub(turn_start_tokens);
|
||||||
turn_metrics.push(TurnMetrics {
|
turn_metrics.push(TurnMetrics {
|
||||||
turn_number: turn,
|
turn_number: turn,
|
||||||
tokens_used: turn_tokens,
|
tokens_used: turn_tokens,
|
||||||
@@ -1382,9 +1634,15 @@ async fn run_autonomous(
|
|||||||
|
|
||||||
// Create a new agent instance for coach mode to ensure fresh context
|
// Create a new agent instance for coach mode to ensure fresh context
|
||||||
// Use the same config with overrides that was passed to the player agent
|
// Use the same config with overrides that was passed to the player agent
|
||||||
let config = agent.get_config().clone();
|
let base_config = agent.get_config().clone();
|
||||||
|
let coach_config = base_config.for_coach()?;
|
||||||
|
|
||||||
|
// Reset filter suppression state before creating coach agent
|
||||||
|
g3_core::fixed_filter_json::reset_fixed_json_tool_state();
|
||||||
|
|
||||||
let ui_writer = ConsoleUiWriter::new();
|
let ui_writer = ConsoleUiWriter::new();
|
||||||
let mut coach_agent = Agent::new_autonomous_with_readme_and_quiet(config, ui_writer, None, quiet).await?;
|
let mut coach_agent =
|
||||||
|
Agent::new_autonomous_with_readme_and_quiet(coach_config, ui_writer, None, quiet).await?;
|
||||||
|
|
||||||
// Ensure coach agent is also in the workspace directory
|
// Ensure coach agent is also in the workspace directory
|
||||||
project.enter_workspace()?;
|
project.enter_workspace()?;
|
||||||
@@ -1414,13 +1672,13 @@ CRITICAL INSTRUCTIONS:
|
|||||||
3. Focus ONLY on what needs to be fixed or improved
|
3. Focus ONLY on what needs to be fixed or improved
|
||||||
4. Do NOT include your analysis process, file contents, or compilation output in the summary
|
4. Do NOT include your analysis process, file contents, or compilation output in the summary
|
||||||
|
|
||||||
If the implementation correctly meets all requirements and compiles without errors:
|
If the implementation generally meets all requirements and compiles without errors:
|
||||||
- Call final_output with summary: 'IMPLEMENTATION_APPROVED'
|
- Call final_output with summary: 'IMPLEMENTATION_APPROVED'
|
||||||
|
|
||||||
If improvements are needed:
|
If improvements are needed:
|
||||||
- Call final_output with a brief summary listing ONLY the specific issues to fix
|
- Call final_output with a brief summary listing ONLY the specific issues to fix
|
||||||
|
|
||||||
Remember: Be thorough in your review but concise in your feedback. APPROVE if the implementation works and generally fits the requirements.",
|
Remember: Be clear in your review and concise in your feedback. APPROVE if the implementation works and generally fits the requirements. Don't be picky.",
|
||||||
requirements
|
requirements
|
||||||
);
|
);
|
||||||
|
|
||||||
@@ -1511,7 +1769,7 @@ Remember: Be thorough in your review but concise in your feedback. APPROVE if th
|
|||||||
coach_feedback = "The implementation needs review. Please ensure all requirements are met and the code compiles without errors.".to_string();
|
coach_feedback = "The implementation needs review. Please ensure all requirements are met and the code compiles without errors.".to_string();
|
||||||
// Record turn metrics before incrementing
|
// Record turn metrics before incrementing
|
||||||
let turn_duration = turn_start_time.elapsed();
|
let turn_duration = turn_start_time.elapsed();
|
||||||
let turn_tokens = agent.get_context_window().used_tokens - turn_start_tokens;
|
let turn_tokens = agent.get_context_window().used_tokens.saturating_sub(turn_start_tokens);
|
||||||
turn_metrics.push(TurnMetrics {
|
turn_metrics.push(TurnMetrics {
|
||||||
turn_number: turn,
|
turn_number: turn,
|
||||||
tokens_used: turn_tokens,
|
tokens_used: turn_tokens,
|
||||||
@@ -1531,7 +1789,8 @@ Remember: Be thorough in your review but concise in your feedback. APPROVE if th
|
|||||||
let coach_result = coach_result_opt.unwrap();
|
let coach_result = coach_result_opt.unwrap();
|
||||||
|
|
||||||
// Extract the complete coach feedback from final_output
|
// Extract the complete coach feedback from final_output
|
||||||
let coach_feedback_text = extract_coach_feedback_from_logs(&coach_result, &coach_agent, &output)?;
|
let coach_feedback_text =
|
||||||
|
extract_coach_feedback_from_logs(&coach_result, &coach_agent, &output);
|
||||||
|
|
||||||
// Log the size of the feedback for debugging
|
// Log the size of the feedback for debugging
|
||||||
info!(
|
info!(
|
||||||
@@ -1546,7 +1805,7 @@ Remember: Be thorough in your review but concise in your feedback. APPROVE if th
|
|||||||
coach_feedback = "The implementation needs review. Please ensure all requirements are met and the code compiles without errors.".to_string();
|
coach_feedback = "The implementation needs review. Please ensure all requirements are met and the code compiles without errors.".to_string();
|
||||||
// Record turn metrics before incrementing
|
// Record turn metrics before incrementing
|
||||||
let turn_duration = turn_start_time.elapsed();
|
let turn_duration = turn_start_time.elapsed();
|
||||||
let turn_tokens = agent.get_context_window().used_tokens - turn_start_tokens;
|
let turn_tokens = agent.get_context_window().used_tokens.saturating_sub(turn_start_tokens);
|
||||||
turn_metrics.push(TurnMetrics {
|
turn_metrics.push(TurnMetrics {
|
||||||
turn_number: turn,
|
turn_number: turn,
|
||||||
tokens_used: turn_tokens,
|
tokens_used: turn_tokens,
|
||||||
@@ -1558,6 +1817,15 @@ Remember: Be thorough in your review but concise in your feedback. APPROVE if th
|
|||||||
|
|
||||||
output.print_smart(&format!("Coach feedback:\n{}", coach_feedback_text));
|
output.print_smart(&format!("Coach feedback:\n{}", coach_feedback_text));
|
||||||
|
|
||||||
|
// Record turn metrics before checking for approval or max turns
|
||||||
|
let turn_duration = turn_start_time.elapsed();
|
||||||
|
let turn_tokens = agent.get_context_window().used_tokens.saturating_sub(turn_start_tokens);
|
||||||
|
turn_metrics.push(TurnMetrics {
|
||||||
|
turn_number: turn,
|
||||||
|
tokens_used: turn_tokens,
|
||||||
|
wall_clock_time: turn_duration,
|
||||||
|
});
|
||||||
|
|
||||||
// Check if coach approved the implementation
|
// Check if coach approved the implementation
|
||||||
if coach_result.is_approved() || coach_feedback_text.contains("IMPLEMENTATION_APPROVED") {
|
if coach_result.is_approved() || coach_feedback_text.contains("IMPLEMENTATION_APPROVED") {
|
||||||
output.print("\n=== SESSION COMPLETED - IMPLEMENTATION APPROVED ===");
|
output.print("\n=== SESSION COMPLETED - IMPLEMENTATION APPROVED ===");
|
||||||
@@ -1566,6 +1834,7 @@ Remember: Be thorough in your review but concise in your feedback. APPROVE if th
|
|||||||
break;
|
break;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Increment turn counter after recording metrics but before checking max turns
|
||||||
// Check if we've reached max turns
|
// Check if we've reached max turns
|
||||||
if turn >= max_turns {
|
if turn >= max_turns {
|
||||||
output.print("\n=== SESSION COMPLETED - MAX TURNS REACHED ===");
|
output.print("\n=== SESSION COMPLETED - MAX TURNS REACHED ===");
|
||||||
@@ -1575,14 +1844,7 @@ Remember: Be thorough in your review but concise in your feedback. APPROVE if th
|
|||||||
|
|
||||||
// Store coach feedback for next iteration
|
// Store coach feedback for next iteration
|
||||||
coach_feedback = coach_feedback_text;
|
coach_feedback = coach_feedback_text;
|
||||||
// Record turn metrics before incrementing
|
|
||||||
let turn_duration = turn_start_time.elapsed();
|
|
||||||
let turn_tokens = agent.get_context_window().used_tokens - turn_start_tokens;
|
|
||||||
turn_metrics.push(TurnMetrics {
|
|
||||||
turn_number: turn,
|
|
||||||
tokens_used: turn_tokens,
|
|
||||||
wall_clock_time: turn_duration,
|
|
||||||
});
|
|
||||||
turn += 1;
|
turn += 1;
|
||||||
|
|
||||||
output.print("🔄 Coach provided feedback for next iteration");
|
output.print("🔄 Coach provided feedback for next iteration");
|
||||||
|
|||||||
@@ -10,6 +10,7 @@ pub struct ConsoleUiWriter {
|
|||||||
current_tool_args: Mutex<Vec<(String, String)>>,
|
current_tool_args: Mutex<Vec<(String, String)>>,
|
||||||
current_output_line: Mutex<Option<String>>,
|
current_output_line: Mutex<Option<String>>,
|
||||||
output_line_printed: Mutex<bool>,
|
output_line_printed: Mutex<bool>,
|
||||||
|
in_todo_tool: Mutex<bool>,
|
||||||
}
|
}
|
||||||
|
|
||||||
impl ConsoleUiWriter {
|
impl ConsoleUiWriter {
|
||||||
@@ -19,6 +20,60 @@ impl ConsoleUiWriter {
|
|||||||
current_tool_args: Mutex::new(Vec::new()),
|
current_tool_args: Mutex::new(Vec::new()),
|
||||||
current_output_line: Mutex::new(None),
|
current_output_line: Mutex::new(None),
|
||||||
output_line_printed: Mutex::new(false),
|
output_line_printed: Mutex::new(false),
|
||||||
|
in_todo_tool: Mutex::new(false),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
fn print_todo_line(&self, line: &str) {
|
||||||
|
// Transform and print todo list lines elegantly
|
||||||
|
let trimmed = line.trim();
|
||||||
|
|
||||||
|
// Skip the "📝 TODO list:" prefix line
|
||||||
|
if trimmed.starts_with("📝 TODO list:") || trimmed == "📝 TODO list is empty" {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Handle empty lines
|
||||||
|
if trimmed.is_empty() {
|
||||||
|
println!();
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Detect indentation level
|
||||||
|
let indent_count = line.chars().take_while(|c| c.is_whitespace()).count();
|
||||||
|
let indent = " ".repeat(indent_count / 2); // Convert spaces to visual indent
|
||||||
|
|
||||||
|
// Format based on line type
|
||||||
|
if trimmed.starts_with("- [ ]") {
|
||||||
|
// Incomplete task
|
||||||
|
let task = trimmed.strip_prefix("- [ ]").unwrap_or(trimmed).trim();
|
||||||
|
println!("{}☐ {}", indent, task);
|
||||||
|
} else if trimmed.starts_with("- [x]") || trimmed.starts_with("- [X]") {
|
||||||
|
// Completed task
|
||||||
|
let task = trimmed.strip_prefix("- [x]")
|
||||||
|
.or_else(|| trimmed.strip_prefix("- [X]"))
|
||||||
|
.unwrap_or(trimmed)
|
||||||
|
.trim();
|
||||||
|
println!("{}\x1b[2m☑ {}\x1b[0m", indent, task);
|
||||||
|
} else if trimmed.starts_with("- ") {
|
||||||
|
// Regular bullet point
|
||||||
|
let item = trimmed.strip_prefix("- ").unwrap_or(trimmed).trim();
|
||||||
|
println!("{}• {}", indent, item);
|
||||||
|
} else if trimmed.starts_with("# ") {
|
||||||
|
// Heading
|
||||||
|
let heading = trimmed.strip_prefix("# ").unwrap_or(trimmed).trim();
|
||||||
|
println!("\n\x1b[1m{}\x1b[0m", heading);
|
||||||
|
} else if trimmed.starts_with("## ") {
|
||||||
|
// Subheading
|
||||||
|
let subheading = trimmed.strip_prefix("## ").unwrap_or(trimmed).trim();
|
||||||
|
println!("\n\x1b[1m{}\x1b[0m", subheading);
|
||||||
|
} else if trimmed.starts_with("**") && trimmed.ends_with("**") {
|
||||||
|
// Bold text (section marker)
|
||||||
|
let text = trimmed.trim_start_matches("**").trim_end_matches("**");
|
||||||
|
println!("{}\x1b[1m{}\x1b[0m", indent, text);
|
||||||
|
} else {
|
||||||
|
// Regular text or note
|
||||||
|
println!("{}{}", indent, trimmed);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -53,6 +108,15 @@ impl UiWriter for ConsoleUiWriter {
|
|||||||
// Store the tool name and clear args for collection
|
// Store the tool name and clear args for collection
|
||||||
*self.current_tool_name.lock().unwrap() = Some(tool_name.to_string());
|
*self.current_tool_name.lock().unwrap() = Some(tool_name.to_string());
|
||||||
self.current_tool_args.lock().unwrap().clear();
|
self.current_tool_args.lock().unwrap().clear();
|
||||||
|
|
||||||
|
// Check if this is a todo tool call
|
||||||
|
let is_todo = tool_name == "todo_read" || tool_name == "todo_write";
|
||||||
|
*self.in_todo_tool.lock().unwrap() = is_todo;
|
||||||
|
|
||||||
|
// For todo tools, we'll skip the normal header and print a custom one later
|
||||||
|
if is_todo {
|
||||||
|
return;
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
fn print_tool_arg(&self, key: &str, value: &str) {
|
fn print_tool_arg(&self, key: &str, value: &str) {
|
||||||
@@ -75,6 +139,12 @@ impl UiWriter for ConsoleUiWriter {
|
|||||||
}
|
}
|
||||||
|
|
||||||
fn print_tool_output_header(&self) {
|
fn print_tool_output_header(&self) {
|
||||||
|
// Skip normal header for todo tools
|
||||||
|
if *self.in_todo_tool.lock().unwrap() {
|
||||||
|
println!(); // Just add a newline
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
println!();
|
println!();
|
||||||
// Now print the tool header with the most important arg in bold green
|
// Now print the tool header with the most important arg in bold green
|
||||||
if let Some(tool_name) = self.current_tool_name.lock().unwrap().as_ref() {
|
if let Some(tool_name) = self.current_tool_name.lock().unwrap().as_ref() {
|
||||||
@@ -115,8 +185,8 @@ impl UiWriter for ConsoleUiWriter {
|
|||||||
String::new()
|
String::new()
|
||||||
};
|
};
|
||||||
|
|
||||||
// Print with bold green formatting using ANSI escape codes
|
// Print with bold green tool name, purple (non-bold) for pipe and args
|
||||||
println!("┌─\x1b[1;32m {} | {}{}\x1b[0m", tool_name, display_value, header_suffix);
|
println!("┌─\x1b[1;32m {}\x1b[0m\x1b[35m | {}{}\x1b[0m", tool_name, display_value, header_suffix);
|
||||||
} else {
|
} else {
|
||||||
// Print with bold green formatting using ANSI escape codes
|
// Print with bold green formatting using ANSI escape codes
|
||||||
println!("┌─\x1b[1;32m {}\x1b[0m", tool_name);
|
println!("┌─\x1b[1;32m {}\x1b[0m", tool_name);
|
||||||
@@ -144,10 +214,21 @@ impl UiWriter for ConsoleUiWriter {
|
|||||||
}
|
}
|
||||||
|
|
||||||
fn print_tool_output_line(&self, line: &str) {
|
fn print_tool_output_line(&self, line: &str) {
|
||||||
|
// Special handling for todo tools
|
||||||
|
if *self.in_todo_tool.lock().unwrap() {
|
||||||
|
self.print_todo_line(line);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
println!("│ \x1b[2m{}\x1b[0m", line);
|
println!("│ \x1b[2m{}\x1b[0m", line);
|
||||||
}
|
}
|
||||||
|
|
||||||
fn print_tool_output_summary(&self, count: usize) {
|
fn print_tool_output_summary(&self, count: usize) {
|
||||||
|
// Skip for todo tools
|
||||||
|
if *self.in_todo_tool.lock().unwrap() {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
println!(
|
println!(
|
||||||
"│ \x1b[2m({} line{})\x1b[0m",
|
"│ \x1b[2m({} line{})\x1b[0m",
|
||||||
count,
|
count,
|
||||||
@@ -156,7 +237,55 @@ impl UiWriter for ConsoleUiWriter {
|
|||||||
}
|
}
|
||||||
|
|
||||||
fn print_tool_timing(&self, duration_str: &str) {
|
fn print_tool_timing(&self, duration_str: &str) {
|
||||||
println!("└─ ⚡️ {}", duration_str);
|
// For todo tools, just print a simple completion message
|
||||||
|
if *self.in_todo_tool.lock().unwrap() {
|
||||||
|
println!();
|
||||||
|
*self.in_todo_tool.lock().unwrap() = false;
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Parse the duration string to determine color
|
||||||
|
// Format is like "1.5s", "500ms", "2m 30.0s"
|
||||||
|
let color_code = if duration_str.ends_with("ms") {
|
||||||
|
// Milliseconds - use default color (< 1s)
|
||||||
|
""
|
||||||
|
} else if duration_str.contains('m') {
|
||||||
|
// Contains minutes
|
||||||
|
// Extract minutes value
|
||||||
|
if let Some(m_pos) = duration_str.find('m') {
|
||||||
|
if let Ok(minutes) = duration_str[..m_pos].trim().parse::<u32>() {
|
||||||
|
if minutes >= 5 {
|
||||||
|
"\x1b[31m" // Red for >= 5 minutes
|
||||||
|
} else {
|
||||||
|
"\x1b[38;5;208m" // Orange for >= 1 minute but < 5 minutes
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
"" // Default color if parsing fails
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
"" // Default color if 'm' not found (shouldn't happen)
|
||||||
|
}
|
||||||
|
} else if duration_str.ends_with('s') {
|
||||||
|
// Seconds only
|
||||||
|
if let Some(s_value) = duration_str.strip_suffix('s') {
|
||||||
|
if let Ok(seconds) = s_value.trim().parse::<f64>() {
|
||||||
|
if seconds >= 1.0 {
|
||||||
|
"\x1b[33m" // Yellow for >= 1 second
|
||||||
|
} else {
|
||||||
|
"" // Default color for < 1 second
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
"" // Default color if parsing fails
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
"" // Default color
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
// Milliseconds or other format - use default color
|
||||||
|
""
|
||||||
|
};
|
||||||
|
|
||||||
|
println!("└─ ⚡️ {}{}\x1b[0m", color_code, duration_str);
|
||||||
println!();
|
println!();
|
||||||
// Clear the stored tool info
|
// Clear the stored tool info
|
||||||
*self.current_tool_name.lock().unwrap() = None;
|
*self.current_tool_name.lock().unwrap() = None;
|
||||||
|
|||||||
46
crates/g3-computer-control/Cargo.toml
Normal file
46
crates/g3-computer-control/Cargo.toml
Normal file
@@ -0,0 +1,46 @@
|
|||||||
|
[package]
|
||||||
|
name = "g3-computer-control"
|
||||||
|
version = "0.1.0"
|
||||||
|
edition = "2021"
|
||||||
|
|
||||||
|
[dependencies]
|
||||||
|
# Workspace dependencies
|
||||||
|
tokio = { workspace = true }
|
||||||
|
anyhow = { workspace = true }
|
||||||
|
thiserror = { workspace = true }
|
||||||
|
serde = { workspace = true }
|
||||||
|
serde_json = { workspace = true }
|
||||||
|
tracing = { workspace = true }
|
||||||
|
uuid = { workspace = true }
|
||||||
|
|
||||||
|
shellexpand = "3.1"
|
||||||
|
# Async trait support
|
||||||
|
async-trait = "0.1"
|
||||||
|
|
||||||
|
# WebDriver support
|
||||||
|
fantoccini = "0.21"
|
||||||
|
|
||||||
|
# OCR dependencies
|
||||||
|
tesseract = "0.14"
|
||||||
|
|
||||||
|
# macOS dependencies
|
||||||
|
[target.'cfg(target_os = "macos")'.dependencies]
|
||||||
|
core-graphics = "0.23"
|
||||||
|
core-foundation = "0.9"
|
||||||
|
cocoa = "0.25"
|
||||||
|
objc = "0.2"
|
||||||
|
image = "0.24"
|
||||||
|
|
||||||
|
# Linux dependencies
|
||||||
|
[target.'cfg(target_os = "linux")'.dependencies]
|
||||||
|
x11 = { version = "2.21", features = ["xlib", "xtest"] }
|
||||||
|
image = "0.24"
|
||||||
|
|
||||||
|
# Windows dependencies
|
||||||
|
[target.'cfg(target_os = "windows")'.dependencies]
|
||||||
|
windows = { version = "0.52", features = [
|
||||||
|
"Win32_Foundation",
|
||||||
|
"Win32_UI_WindowsAndMessaging",
|
||||||
|
"Win32_UI_Input_KeyboardAndMouse",
|
||||||
|
"Win32_Graphics_Gdi",
|
||||||
|
] }
|
||||||
46
crates/g3-computer-control/examples/debug_screenshot.rs
Normal file
46
crates/g3-computer-control/examples/debug_screenshot.rs
Normal file
@@ -0,0 +1,46 @@
|
|||||||
|
use core_graphics::display::CGDisplay;
|
||||||
|
|
||||||
|
fn main() {
|
||||||
|
let display = CGDisplay::main();
|
||||||
|
let image = display.image().expect("Failed to capture screen");
|
||||||
|
|
||||||
|
println!("CGImage properties:");
|
||||||
|
println!(" Width: {}", image.width());
|
||||||
|
println!(" Height: {}", image.height());
|
||||||
|
println!(" Bits per component: {}", image.bits_per_component());
|
||||||
|
println!(" Bits per pixel: {}", image.bits_per_pixel());
|
||||||
|
println!(" Bytes per row: {}", image.bytes_per_row());
|
||||||
|
|
||||||
|
let data = image.data();
|
||||||
|
let expected_size = image.width() * image.height() * 4;
|
||||||
|
println!(" Data length: {}", data.len());
|
||||||
|
println!(" Expected (w*h*4): {}", expected_size);
|
||||||
|
|
||||||
|
// Check if there's padding in rows
|
||||||
|
let bytes_per_row = image.bytes_per_row();
|
||||||
|
let width = image.width();
|
||||||
|
let expected_bytes_per_row = width * 4;
|
||||||
|
println!("\nRow alignment:");
|
||||||
|
println!(" Actual bytes per row: {}", bytes_per_row);
|
||||||
|
println!(" Expected (width * 4): {}", expected_bytes_per_row);
|
||||||
|
println!(" Padding per row: {}", bytes_per_row - expected_bytes_per_row);
|
||||||
|
|
||||||
|
// Sample some pixels from different locations
|
||||||
|
println!("\nFirst 3 pixels (raw bytes):");
|
||||||
|
for i in 0..3 {
|
||||||
|
let offset = i * 4;
|
||||||
|
println!(" Pixel {}: [{:3}, {:3}, {:3}, {:3}]",
|
||||||
|
i, data[offset], data[offset+1], data[offset+2], data[offset+3]);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check a pixel from the middle
|
||||||
|
let mid_row = image.height() / 2;
|
||||||
|
let mid_col = image.width() / 2;
|
||||||
|
let mid_offset = (mid_row * bytes_per_row + mid_col * 4) as usize;
|
||||||
|
println!("\nMiddle pixel (row {}, col {}):", mid_row, mid_col);
|
||||||
|
println!(" Offset: {}", mid_offset);
|
||||||
|
if mid_offset + 3 < data.len() as usize {
|
||||||
|
println!(" Bytes: [{:3}, {:3}, {:3}, {:3}]",
|
||||||
|
data[mid_offset], data[mid_offset+1], data[mid_offset+2], data[mid_offset+3]);
|
||||||
|
}
|
||||||
|
}
|
||||||
56
crates/g3-computer-control/examples/list_windows.rs
Normal file
56
crates/g3-computer-control/examples/list_windows.rs
Normal file
@@ -0,0 +1,56 @@
|
|||||||
|
use core_graphics::window::{kCGWindowListOptionOnScreenOnly, kCGNullWindowID, CGWindowListCopyWindowInfo};
|
||||||
|
use core_foundation::dictionary::CFDictionary;
|
||||||
|
use core_foundation::string::CFString;
|
||||||
|
use core_foundation::base::TCFType;
|
||||||
|
|
||||||
|
fn main() {
|
||||||
|
println!("Listing all on-screen windows...");
|
||||||
|
println!("{:<10} {:<25} {}", "Window ID", "Owner", "Title");
|
||||||
|
println!("{}", "-".repeat(80));
|
||||||
|
|
||||||
|
unsafe {
|
||||||
|
let window_list = CGWindowListCopyWindowInfo(
|
||||||
|
kCGWindowListOptionOnScreenOnly,
|
||||||
|
kCGNullWindowID
|
||||||
|
);
|
||||||
|
|
||||||
|
let count = core_foundation::array::CFArray::<CFDictionary>::wrap_under_create_rule(window_list).len();
|
||||||
|
let array = core_foundation::array::CFArray::<CFDictionary>::wrap_under_create_rule(window_list);
|
||||||
|
|
||||||
|
for i in 0..count {
|
||||||
|
let dict = array.get(i).unwrap();
|
||||||
|
|
||||||
|
// Get window ID
|
||||||
|
let window_id_key = CFString::from_static_string("kCGWindowNumber");
|
||||||
|
let window_id: i64 = if let Some(value) = dict.find(window_id_key.as_concrete_TypeRef()) {
|
||||||
|
let num: core_foundation::number::CFNumber = TCFType::wrap_under_get_rule(*value as *const _);
|
||||||
|
num.to_i64().unwrap_or(0)
|
||||||
|
} else {
|
||||||
|
0
|
||||||
|
};
|
||||||
|
|
||||||
|
// Get owner name
|
||||||
|
let owner_key = CFString::from_static_string("kCGWindowOwnerName");
|
||||||
|
let owner: String = if let Some(value) = dict.find(owner_key.as_concrete_TypeRef()) {
|
||||||
|
let s: CFString = TCFType::wrap_under_get_rule(*value as *const _);
|
||||||
|
s.to_string()
|
||||||
|
} else {
|
||||||
|
"Unknown".to_string()
|
||||||
|
};
|
||||||
|
|
||||||
|
// Get window name/title
|
||||||
|
let name_key = CFString::from_static_string("kCGWindowName");
|
||||||
|
let title: String = if let Some(value) = dict.find(name_key.as_concrete_TypeRef()) {
|
||||||
|
let s: CFString = TCFType::wrap_under_get_rule(*value as *const _);
|
||||||
|
s.to_string()
|
||||||
|
} else {
|
||||||
|
"".to_string()
|
||||||
|
};
|
||||||
|
|
||||||
|
// Filter for iTerm or show all
|
||||||
|
if owner.contains("iTerm") || owner.contains("Terminal") {
|
||||||
|
println!("{:<10} {:<25} {}", window_id, owner, title);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
64
crates/g3-computer-control/examples/safari_demo.rs
Normal file
64
crates/g3-computer-control/examples/safari_demo.rs
Normal file
@@ -0,0 +1,64 @@
|
|||||||
|
use g3_computer_control::SafariDriver;
|
||||||
|
use g3_computer_control::webdriver::WebDriverController;
|
||||||
|
use anyhow::Result;
|
||||||
|
|
||||||
|
#[tokio::main]
|
||||||
|
async fn main() -> Result<()> {
|
||||||
|
println!("Safari WebDriver Demo");
|
||||||
|
println!("=====================\n");
|
||||||
|
|
||||||
|
println!("Make sure to:");
|
||||||
|
println!("1. Enable 'Allow Remote Automation' in Safari's Develop menu");
|
||||||
|
println!("2. Run: /usr/bin/safaridriver --enable");
|
||||||
|
println!("3. Start safaridriver in another terminal: safaridriver --port 4444\n");
|
||||||
|
|
||||||
|
println!("Connecting to SafariDriver...");
|
||||||
|
let mut driver = SafariDriver::new().await?;
|
||||||
|
println!("✅ Connected!\n");
|
||||||
|
|
||||||
|
// Navigate to a website
|
||||||
|
println!("Navigating to example.com...");
|
||||||
|
driver.navigate("https://example.com").await?;
|
||||||
|
println!("✅ Navigated\n");
|
||||||
|
|
||||||
|
// Get page title
|
||||||
|
let title = driver.title().await?;
|
||||||
|
println!("Page title: {}\n", title);
|
||||||
|
|
||||||
|
// Get current URL
|
||||||
|
let url = driver.current_url().await?;
|
||||||
|
println!("Current URL: {}\n", url);
|
||||||
|
|
||||||
|
// Find an element
|
||||||
|
println!("Finding h1 element...");
|
||||||
|
let mut h1 = driver.find_element("h1").await?;
|
||||||
|
let h1_text = h1.text().await?;
|
||||||
|
println!("H1 text: {}\n", h1_text);
|
||||||
|
|
||||||
|
// Find all paragraphs
|
||||||
|
println!("Finding all paragraphs...");
|
||||||
|
let paragraphs = driver.find_elements("p").await?;
|
||||||
|
println!("Found {} paragraphs\n", paragraphs.len());
|
||||||
|
|
||||||
|
// Get page source
|
||||||
|
println!("Getting page source...");
|
||||||
|
let source = driver.page_source().await?;
|
||||||
|
println!("Page source length: {} bytes\n", source.len());
|
||||||
|
|
||||||
|
// Execute JavaScript
|
||||||
|
println!("Executing JavaScript...");
|
||||||
|
let result = driver.execute_script("return document.title", vec![]).await?;
|
||||||
|
println!("JS result: {:?}\n", result);
|
||||||
|
|
||||||
|
// Take a screenshot
|
||||||
|
println!("Taking screenshot...");
|
||||||
|
driver.screenshot("/tmp/safari_demo.png").await?;
|
||||||
|
println!("✅ Screenshot saved to /tmp/safari_demo.png\n");
|
||||||
|
|
||||||
|
// Close the browser
|
||||||
|
println!("Closing browser...");
|
||||||
|
driver.quit().await?;
|
||||||
|
println!("✅ Done!");
|
||||||
|
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
@@ -0,0 +1,21 @@
|
|||||||
|
use g3_computer_control::{create_controller, ComputerController};
|
||||||
|
|
||||||
|
#[tokio::main]
|
||||||
|
async fn main() {
|
||||||
|
println!("Testing screenshot with permission prompt...");
|
||||||
|
|
||||||
|
let controller = create_controller().expect("Failed to create controller");
|
||||||
|
|
||||||
|
match controller.take_screenshot("/tmp/test_with_prompt.png", None, None).await {
|
||||||
|
Ok(_) => {
|
||||||
|
println!("\n✅ Screenshot saved to /tmp/test_with_prompt.png");
|
||||||
|
println!("Opening screenshot...");
|
||||||
|
let _ = std::process::Command::new("open")
|
||||||
|
.arg("/tmp/test_with_prompt.png")
|
||||||
|
.spawn();
|
||||||
|
}
|
||||||
|
Err(e) => {
|
||||||
|
println!("❌ Screenshot failed: {}", e);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,39 @@
|
|||||||
|
use std::process::Command;
|
||||||
|
|
||||||
|
fn main() {
|
||||||
|
let path = "/tmp/rust_screencapture_test.png";
|
||||||
|
|
||||||
|
println!("Testing screencapture command from Rust...");
|
||||||
|
|
||||||
|
let mut cmd = Command::new("screencapture");
|
||||||
|
cmd.arg("-x"); // No sound
|
||||||
|
cmd.arg(path);
|
||||||
|
|
||||||
|
println!("Command: {:?}", cmd);
|
||||||
|
|
||||||
|
match cmd.output() {
|
||||||
|
Ok(output) => {
|
||||||
|
println!("Exit status: {}", output.status);
|
||||||
|
println!("Stdout: {}", String::from_utf8_lossy(&output.stdout));
|
||||||
|
println!("Stderr: {}", String::from_utf8_lossy(&output.stderr));
|
||||||
|
|
||||||
|
if output.status.success() {
|
||||||
|
println!("\n✅ Screenshot saved to: {}", path);
|
||||||
|
|
||||||
|
// Check file exists and size
|
||||||
|
if let Ok(metadata) = std::fs::metadata(path) {
|
||||||
|
println!("File size: {} bytes ({:.1} MB)", metadata.len(), metadata.len() as f64 / 1_000_000.0);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Open it
|
||||||
|
let _ = Command::new("open").arg(path).spawn();
|
||||||
|
println!("\nOpened screenshot - please verify it looks correct!");
|
||||||
|
} else {
|
||||||
|
println!("\n❌ Screenshot failed!");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
Err(e) => {
|
||||||
|
println!("❌ Failed to execute screencapture: {}", e);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
69
crates/g3-computer-control/examples/test_screenshot_fix.rs
Normal file
69
crates/g3-computer-control/examples/test_screenshot_fix.rs
Normal file
@@ -0,0 +1,69 @@
|
|||||||
|
use core_graphics::display::CGDisplay;
|
||||||
|
use image::{ImageBuffer, RgbaImage};
|
||||||
|
use std::path::Path;
|
||||||
|
|
||||||
|
fn main() {
|
||||||
|
let display = CGDisplay::main();
|
||||||
|
let image = display.image().expect("Failed to capture screen");
|
||||||
|
|
||||||
|
let width = image.width() as u32;
|
||||||
|
let height = image.height() as u32;
|
||||||
|
let bytes_per_row = image.bytes_per_row() as usize;
|
||||||
|
let data = image.data();
|
||||||
|
|
||||||
|
println!("Testing screenshot fix...");
|
||||||
|
println!("Image: {}x{}, bytes_per_row: {}", width, height, bytes_per_row);
|
||||||
|
println!("Expected bytes per row: {}", width * 4);
|
||||||
|
println!("Padding per row: {} bytes", bytes_per_row - (width as usize * 4));
|
||||||
|
|
||||||
|
// OLD METHOD (broken) - treating data as continuous
|
||||||
|
println!("\n=== OLD METHOD (BROKEN) ===");
|
||||||
|
let mut old_rgba = Vec::with_capacity(data.len() as usize);
|
||||||
|
for chunk in data.chunks_exact(4) {
|
||||||
|
old_rgba.push(chunk[2]); // R
|
||||||
|
old_rgba.push(chunk[1]); // G
|
||||||
|
old_rgba.push(chunk[0]); // B
|
||||||
|
old_rgba.push(chunk[3]); // A
|
||||||
|
}
|
||||||
|
println!("Converted {} pixels", old_rgba.len() / 4);
|
||||||
|
println!("Expected {} pixels", width * height);
|
||||||
|
|
||||||
|
// NEW METHOD (fixed) - handling row padding
|
||||||
|
println!("\n=== NEW METHOD (FIXED) ===");
|
||||||
|
let mut new_rgba = Vec::with_capacity((width * height * 4) as usize);
|
||||||
|
for row in 0..height as usize {
|
||||||
|
let row_start = row * bytes_per_row;
|
||||||
|
let row_end = row_start + (width as usize * 4);
|
||||||
|
|
||||||
|
for chunk in data[row_start..row_end].chunks_exact(4) {
|
||||||
|
new_rgba.push(chunk[2]); // R
|
||||||
|
new_rgba.push(chunk[1]); // G
|
||||||
|
new_rgba.push(chunk[0]); // B
|
||||||
|
new_rgba.push(chunk[3]); // A
|
||||||
|
}
|
||||||
|
}
|
||||||
|
println!("Converted {} pixels", new_rgba.len() / 4);
|
||||||
|
println!("Expected {} pixels", width * height);
|
||||||
|
|
||||||
|
// Save a small crop from both methods
|
||||||
|
let crop_size = 200;
|
||||||
|
|
||||||
|
// Old method crop
|
||||||
|
let old_crop: Vec<u8> = old_rgba.iter().take((crop_size * crop_size * 4) as usize).copied().collect();
|
||||||
|
if let Some(old_img) = ImageBuffer::from_raw(crop_size, crop_size, old_crop) {
|
||||||
|
let old_img: RgbaImage = old_img;
|
||||||
|
old_img.save("/tmp/screenshot_old_method.png").unwrap();
|
||||||
|
println!("\nSaved OLD method crop to: /tmp/screenshot_old_method.png");
|
||||||
|
}
|
||||||
|
|
||||||
|
// New method crop
|
||||||
|
let new_crop: Vec<u8> = new_rgba.iter().take((crop_size * crop_size * 4) as usize).copied().collect();
|
||||||
|
if let Some(new_img) = ImageBuffer::from_raw(crop_size, crop_size, new_crop) {
|
||||||
|
let new_img: RgbaImage = new_img;
|
||||||
|
new_img.save("/tmp/screenshot_new_method.png").unwrap();
|
||||||
|
println!("Saved NEW method crop to: /tmp/screenshot_new_method.png");
|
||||||
|
}
|
||||||
|
|
||||||
|
println!("\nOpen both images to compare:");
|
||||||
|
println!(" open /tmp/screenshot_old_method.png /tmp/screenshot_new_method.png");
|
||||||
|
}
|
||||||
45
crates/g3-computer-control/examples/test_window_capture.rs
Normal file
45
crates/g3-computer-control/examples/test_window_capture.rs
Normal file
@@ -0,0 +1,45 @@
|
|||||||
|
use g3_computer_control::create_controller;
|
||||||
|
|
||||||
|
#[tokio::main]
|
||||||
|
async fn main() {
|
||||||
|
println!("Testing window-specific screenshot capture...");
|
||||||
|
|
||||||
|
let controller = create_controller().expect("Failed to create controller");
|
||||||
|
|
||||||
|
// Test 1: Capture iTerm2 window
|
||||||
|
println!("\n1. Capturing iTerm2 window...");
|
||||||
|
match controller.take_screenshot("/tmp/iterm_window.png", None, Some("iTerm2")).await {
|
||||||
|
Ok(_) => {
|
||||||
|
println!(" ✅ iTerm2 window captured to /tmp/iterm_window.png");
|
||||||
|
let _ = std::process::Command::new("open").arg("/tmp/iterm_window.png").spawn();
|
||||||
|
}
|
||||||
|
Err(e) => println!(" ❌ Failed: {}", e),
|
||||||
|
}
|
||||||
|
|
||||||
|
// Wait a moment for the image to open
|
||||||
|
tokio::time::sleep(tokio::time::Duration::from_secs(2)).await;
|
||||||
|
|
||||||
|
// Test 2: Full screen capture for comparison
|
||||||
|
println!("\n2. Capturing full screen for comparison...");
|
||||||
|
match controller.take_screenshot("/tmp/fullscreen.png", None, None).await {
|
||||||
|
Ok(_) => {
|
||||||
|
println!(" ✅ Full screen captured to /tmp/fullscreen.png");
|
||||||
|
let _ = std::process::Command::new("open").arg("/tmp/fullscreen.png").spawn();
|
||||||
|
}
|
||||||
|
Err(e) => println!(" ❌ Failed: {}", e),
|
||||||
|
}
|
||||||
|
|
||||||
|
println!("\n=== Comparison ===");
|
||||||
|
println!("iTerm window: /tmp/iterm_window.png (should show ONLY iTerm window)");
|
||||||
|
println!("Full screen: /tmp/fullscreen.png (should show entire desktop)");
|
||||||
|
|
||||||
|
// Show file sizes
|
||||||
|
if let Ok(meta1) = std::fs::metadata("/tmp/iterm_window.png") {
|
||||||
|
if let Ok(meta2) = std::fs::metadata("/tmp/fullscreen.png") {
|
||||||
|
println!("\nFile sizes:");
|
||||||
|
println!(" iTerm window: {:.1} MB", meta1.len() as f64 / 1_000_000.0);
|
||||||
|
println!(" Full screen: {:.1} MB", meta2.len() as f64 / 1_000_000.0);
|
||||||
|
println!("\nWindow capture should be smaller than full screen.");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
35
crates/g3-computer-control/src/lib.rs
Normal file
35
crates/g3-computer-control/src/lib.rs
Normal file
@@ -0,0 +1,35 @@
|
|||||||
|
pub mod types;
|
||||||
|
pub mod platform;
|
||||||
|
pub mod webdriver;
|
||||||
|
|
||||||
|
// Re-export webdriver types for convenience
|
||||||
|
pub use webdriver::{WebDriverController, WebElement, safari::SafariDriver};
|
||||||
|
|
||||||
|
use anyhow::Result;
|
||||||
|
use async_trait::async_trait;
|
||||||
|
use types::*;
|
||||||
|
|
||||||
|
#[async_trait]
|
||||||
|
pub trait ComputerController: Send + Sync {
|
||||||
|
// Screen capture
|
||||||
|
async fn take_screenshot(&self, path: &str, region: Option<Rect>, window_id: Option<&str>) -> Result<()>;
|
||||||
|
|
||||||
|
// OCR operations
|
||||||
|
async fn extract_text_from_screen(&self, region: Rect) -> Result<String>;
|
||||||
|
async fn extract_text_from_image(&self, path: &str) -> Result<String>;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Platform-specific constructor
|
||||||
|
pub fn create_controller() -> Result<Box<dyn ComputerController>> {
|
||||||
|
#[cfg(target_os = "macos")]
|
||||||
|
return Ok(Box::new(platform::macos::MacOSController::new()?));
|
||||||
|
|
||||||
|
#[cfg(target_os = "linux")]
|
||||||
|
return Ok(Box::new(platform::linux::LinuxController::new()?));
|
||||||
|
|
||||||
|
#[cfg(target_os = "windows")]
|
||||||
|
return Ok(Box::new(platform::windows::WindowsController::new()?));
|
||||||
|
|
||||||
|
#[cfg(not(any(target_os = "macos", target_os = "linux", target_os = "windows")))]
|
||||||
|
anyhow::bail!("Unsupported platform")
|
||||||
|
}
|
||||||
161
crates/g3-computer-control/src/platform/linux.rs
Normal file
161
crates/g3-computer-control/src/platform/linux.rs
Normal file
@@ -0,0 +1,161 @@
|
|||||||
|
use crate::{ComputerController, types::*};
|
||||||
|
use anyhow::Result;
|
||||||
|
use async_trait::async_trait;
|
||||||
|
use tesseract::Tesseract;
|
||||||
|
use uuid::Uuid;
|
||||||
|
|
||||||
|
pub struct LinuxController {
|
||||||
|
// Placeholder for X11 connection or other state
|
||||||
|
}
|
||||||
|
|
||||||
|
impl LinuxController {
|
||||||
|
pub fn new() -> Result<Self> {
|
||||||
|
// Initialize X11 connection
|
||||||
|
tracing::warn!("Linux computer control not fully implemented");
|
||||||
|
Ok(Self {})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[async_trait]
|
||||||
|
impl ComputerController for LinuxController {
|
||||||
|
async fn move_mouse(&self, _x: i32, _y: i32) -> Result<()> {
|
||||||
|
anyhow::bail!("Linux implementation not yet available")
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn click(&self, _button: MouseButton) -> Result<()> {
|
||||||
|
anyhow::bail!("Linux implementation not yet available")
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn double_click(&self, _button: MouseButton) -> Result<()> {
|
||||||
|
anyhow::bail!("Linux implementation not yet available")
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn type_text(&self, _text: &str) -> Result<()> {
|
||||||
|
anyhow::bail!("Linux implementation not yet available")
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn press_key(&self, _key: &str) -> Result<()> {
|
||||||
|
anyhow::bail!("Linux implementation not yet available")
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn list_windows(&self) -> Result<Vec<Window>> {
|
||||||
|
anyhow::bail!("Linux implementation not yet available")
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn focus_window(&self, _window_id: &str) -> Result<()> {
|
||||||
|
anyhow::bail!("Linux implementation not yet available")
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn get_window_bounds(&self, _window_id: &str) -> Result<Rect> {
|
||||||
|
anyhow::bail!("Linux implementation not yet available")
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn find_element(&self, _selector: &ElementSelector) -> Result<Option<UIElement>> {
|
||||||
|
anyhow::bail!("Linux implementation not yet available")
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn get_element_text(&self, _element_id: &str) -> Result<String> {
|
||||||
|
anyhow::bail!("Linux implementation not yet available")
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn get_element_bounds(&self, _element_id: &str) -> Result<Rect> {
|
||||||
|
anyhow::bail!("Linux implementation not yet available")
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn take_screenshot(&self, _path: &str, _region: Option<Rect>, _window_id: Option<&str>) -> Result<()> {
|
||||||
|
anyhow::bail!("Linux implementation not yet available")
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn extract_text_from_screen(&self, _region: Rect) -> Result<OCRResult> {
|
||||||
|
anyhow::bail!("Linux implementation not yet available")
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn extract_text_from_image(&self, _path: &str) -> Result<OCRResult> {
|
||||||
|
// Check if tesseract is available on the system
|
||||||
|
let tesseract_check = std::process::Command::new("which")
|
||||||
|
.arg("tesseract")
|
||||||
|
.output();
|
||||||
|
|
||||||
|
if tesseract_check.is_err() || !tesseract_check.as_ref().unwrap().status.success() {
|
||||||
|
anyhow::bail!("Tesseract OCR is not installed on your system.\n\n\
|
||||||
|
To install tesseract:\n \
|
||||||
|
Ubuntu/Debian: sudo apt-get install tesseract-ocr\n \
|
||||||
|
RHEL/CentOS: sudo yum install tesseract\n \
|
||||||
|
Arch Linux: sudo pacman -S tesseract\n\n\
|
||||||
|
After installation, restart your terminal and try again.");
|
||||||
|
}
|
||||||
|
|
||||||
|
// Initialize Tesseract
|
||||||
|
let tess = Tesseract::new(None, Some("eng"))
|
||||||
|
.map_err(|e| {
|
||||||
|
anyhow::anyhow!("Failed to initialize Tesseract: {}\n\n\
|
||||||
|
This usually means:\n1. Tesseract is not properly installed\n\
|
||||||
|
2. Language data files are missing\n\nTo fix:\n \
|
||||||
|
Ubuntu/Debian: sudo apt-get install tesseract-ocr-eng\n \
|
||||||
|
RHEL/CentOS: sudo yum install tesseract-langpack-eng\n \
|
||||||
|
Arch Linux: sudo pacman -S tesseract-data-eng", e)
|
||||||
|
})?;
|
||||||
|
|
||||||
|
let text = tess.set_image(_path)
|
||||||
|
.map_err(|e| anyhow::anyhow!("Failed to load image '{}': {}", _path, e))?
|
||||||
|
.get_text()
|
||||||
|
.map_err(|e| anyhow::anyhow!("Failed to extract text from image: {}", e))?;
|
||||||
|
|
||||||
|
// Get confidence (simplified - would need more complex API calls for per-word confidence)
|
||||||
|
let confidence = 0.85; // Placeholder
|
||||||
|
|
||||||
|
Ok(OCRResult {
|
||||||
|
text,
|
||||||
|
confidence,
|
||||||
|
bounds: Rect { x: 0, y: 0, width: 0, height: 0 }, // Would need image dimensions
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn find_text_on_screen(&self, _text: &str) -> Result<Option<Point>> {
|
||||||
|
// Check if tesseract is available on the system
|
||||||
|
let tesseract_check = std::process::Command::new("which")
|
||||||
|
.arg("tesseract")
|
||||||
|
.output();
|
||||||
|
|
||||||
|
if tesseract_check.is_err() || !tesseract_check.as_ref().unwrap().status.success() {
|
||||||
|
anyhow::bail!("Tesseract OCR is not installed on your system.\n\n\
|
||||||
|
To install tesseract:\n \
|
||||||
|
Ubuntu/Debian: sudo apt-get install tesseract-ocr\n \
|
||||||
|
RHEL/CentOS: sudo yum install tesseract\n \
|
||||||
|
Arch Linux: sudo pacman -S tesseract\n\n\
|
||||||
|
After installation, restart your terminal and try again.");
|
||||||
|
}
|
||||||
|
|
||||||
|
// Take full screen screenshot
|
||||||
|
let temp_path = format!("/tmp/g3_ocr_search_{}.png", uuid::Uuid::new_v4());
|
||||||
|
self.take_screenshot(&temp_path, None, None).await?;
|
||||||
|
|
||||||
|
// Use Tesseract to find text with bounding boxes
|
||||||
|
let tess = Tesseract::new(None, Some("eng"))
|
||||||
|
.map_err(|e| {
|
||||||
|
anyhow::anyhow!("Failed to initialize Tesseract: {}\n\n\
|
||||||
|
This usually means:\n1. Tesseract is not properly installed\n\
|
||||||
|
2. Language data files are missing\n\nTo fix:\n \
|
||||||
|
Ubuntu/Debian: sudo apt-get install tesseract-ocr-eng\n \
|
||||||
|
RHEL/CentOS: sudo yum install tesseract-langpack-eng\n \
|
||||||
|
Arch Linux: sudo pacman -S tesseract-data-eng", e)
|
||||||
|
})?;
|
||||||
|
|
||||||
|
let full_text = tess.set_image(temp_path.as_str())
|
||||||
|
.map_err(|e| anyhow::anyhow!("Failed to load screenshot: {}", e))?
|
||||||
|
.get_text()
|
||||||
|
.map_err(|e| anyhow::anyhow!("Failed to extract text from screen: {}", e))?;
|
||||||
|
|
||||||
|
// Clean up temp file
|
||||||
|
let _ = std::fs::remove_file(&temp_path);
|
||||||
|
|
||||||
|
// Simple text search - full implementation would use get_component_images
|
||||||
|
// to get bounding boxes for each word
|
||||||
|
if full_text.contains(_text) {
|
||||||
|
tracing::warn!("Text found but precise coordinates not available in simplified implementation");
|
||||||
|
Ok(Some(Point { x: 0, y: 0 }))
|
||||||
|
} else {
|
||||||
|
Ok(None)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
125
crates/g3-computer-control/src/platform/macos.rs
Normal file
125
crates/g3-computer-control/src/platform/macos.rs
Normal file
@@ -0,0 +1,125 @@
|
|||||||
|
use crate::{ComputerController, types::Rect};
|
||||||
|
use anyhow::Result;
|
||||||
|
use async_trait::async_trait;
|
||||||
|
use std::path::Path;
|
||||||
|
use tesseract::Tesseract;
|
||||||
|
|
||||||
|
pub struct MacOSController {
|
||||||
|
// Empty struct for now
|
||||||
|
}
|
||||||
|
|
||||||
|
impl MacOSController {
|
||||||
|
pub fn new() -> Result<Self> {
|
||||||
|
Ok(Self {})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[async_trait]
|
||||||
|
impl ComputerController for MacOSController {
|
||||||
|
async fn take_screenshot(&self, path: &str, region: Option<Rect>, window_id: Option<&str>) -> Result<()> {
|
||||||
|
// Determine the temporary directory for screenshots
|
||||||
|
let temp_dir = std::env::var("TMPDIR")
|
||||||
|
.or_else(|_| std::env::var("HOME").map(|h| format!("{}/tmp", h)))
|
||||||
|
.unwrap_or_else(|_| "/tmp".to_string());
|
||||||
|
|
||||||
|
// Ensure temp directory exists
|
||||||
|
std::fs::create_dir_all(&temp_dir)?;
|
||||||
|
|
||||||
|
// If path is relative or doesn't specify a directory, use temp_dir
|
||||||
|
let final_path = if path.starts_with('/') {
|
||||||
|
path.to_string()
|
||||||
|
} else {
|
||||||
|
format!("{}/{}", temp_dir.trim_end_matches('/'), path)
|
||||||
|
};
|
||||||
|
|
||||||
|
let path_obj = Path::new(&final_path);
|
||||||
|
if let Some(parent) = path_obj.parent() {
|
||||||
|
std::fs::create_dir_all(parent)?;
|
||||||
|
}
|
||||||
|
|
||||||
|
let mut cmd = std::process::Command::new("screencapture");
|
||||||
|
|
||||||
|
// Add flags
|
||||||
|
cmd.arg("-x"); // No sound
|
||||||
|
|
||||||
|
if let Some(region) = region {
|
||||||
|
// Capture specific region: -R x,y,width,height
|
||||||
|
cmd.arg("-R");
|
||||||
|
cmd.arg(format!("{},{},{},{}", region.x, region.y, region.width, region.height));
|
||||||
|
}
|
||||||
|
|
||||||
|
if let Some(app_name) = window_id {
|
||||||
|
// Capture specific window by app name
|
||||||
|
// Use AppleScript to get window ID
|
||||||
|
let script = format!(r#"tell application "{}" to id of window 1"#, app_name);
|
||||||
|
let output = std::process::Command::new("osascript")
|
||||||
|
.arg("-e")
|
||||||
|
.arg(&script)
|
||||||
|
.output()?;
|
||||||
|
|
||||||
|
if output.status.success() {
|
||||||
|
let window_id_str = String::from_utf8_lossy(&output.stdout).trim().to_string();
|
||||||
|
cmd.arg(format!("-l{}", window_id_str));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
cmd.arg(&final_path);
|
||||||
|
|
||||||
|
let screenshot_result = cmd.output()?;
|
||||||
|
|
||||||
|
if !screenshot_result.status.success() {
|
||||||
|
let stderr = String::from_utf8_lossy(&screenshot_result.stderr);
|
||||||
|
return Err(anyhow::anyhow!("screencapture failed: {}", stderr));
|
||||||
|
}
|
||||||
|
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn extract_text_from_screen(&self, region: Rect) -> Result<String> {
|
||||||
|
// Take screenshot of region first
|
||||||
|
let temp_path = format!("/tmp/g3_ocr_{}.png", uuid::Uuid::new_v4());
|
||||||
|
self.take_screenshot(&temp_path, Some(region), None).await?;
|
||||||
|
|
||||||
|
// Extract text from the screenshot
|
||||||
|
let result = self.extract_text_from_image(&temp_path).await?;
|
||||||
|
|
||||||
|
// Clean up temp file
|
||||||
|
let _ = std::fs::remove_file(&temp_path);
|
||||||
|
|
||||||
|
Ok(result)
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn extract_text_from_image(&self, path: &str) -> Result<String> {
|
||||||
|
// Check if tesseract is available on the system
|
||||||
|
let tesseract_check = std::process::Command::new("which")
|
||||||
|
.arg("tesseract")
|
||||||
|
.output();
|
||||||
|
|
||||||
|
if tesseract_check.is_err() || !tesseract_check.as_ref().unwrap().status.success() {
|
||||||
|
anyhow::bail!("Tesseract OCR is not installed on your system.\n\n\
|
||||||
|
To install tesseract:\n macOS: brew install tesseract\n \
|
||||||
|
Linux: sudo apt-get install tesseract-ocr (Ubuntu/Debian)\n \
|
||||||
|
sudo yum install tesseract (RHEL/CentOS)\n \
|
||||||
|
Windows: Download from https://github.com/UB-Mannheim/tesseract/wiki\n\n\
|
||||||
|
After installation, restart your terminal and try again.");
|
||||||
|
}
|
||||||
|
|
||||||
|
// Initialize Tesseract
|
||||||
|
let tess = Tesseract::new(None, Some("eng"))
|
||||||
|
.map_err(|e| {
|
||||||
|
anyhow::anyhow!("Failed to initialize Tesseract: {}\n\n\
|
||||||
|
This usually means:\n1. Tesseract is not properly installed\n\
|
||||||
|
2. Language data files are missing\n\nTo fix:\n \
|
||||||
|
macOS: brew reinstall tesseract\n \
|
||||||
|
Linux: sudo apt-get install tesseract-ocr-eng\n \
|
||||||
|
Windows: Reinstall tesseract and ensure language files are included", e)
|
||||||
|
})?;
|
||||||
|
|
||||||
|
let text = tess.set_image(path)
|
||||||
|
.map_err(|e| anyhow::anyhow!("Failed to load image '{}': {}", path, e))?
|
||||||
|
.get_text()
|
||||||
|
.map_err(|e| anyhow::anyhow!("Failed to extract text from image: {}", e))?;
|
||||||
|
|
||||||
|
Ok(text)
|
||||||
|
}
|
||||||
|
}
|
||||||
425
crates/g3-computer-control/src/platform/macos.rs.bak
Normal file
425
crates/g3-computer-control/src/platform/macos.rs.bak
Normal file
@@ -0,0 +1,425 @@
|
|||||||
|
use crate::{ComputerController, types::*};
|
||||||
|
use anyhow::Result;
|
||||||
|
use async_trait::async_trait;
|
||||||
|
use core_graphics::display::CGPoint;
|
||||||
|
use core_graphics::event::{CGEvent, CGEventType, CGMouseButton, CGEventTapLocation};
|
||||||
|
use core_graphics::event_source::{CGEventSource, CGEventSourceStateID};
|
||||||
|
use std::path::Path;
|
||||||
|
use tesseract::Tesseract;
|
||||||
|
|
||||||
|
// MacOSController doesn't store CGEventSource to avoid Send/Sync issues
|
||||||
|
// We create it fresh for each operation
|
||||||
|
pub struct MacOSController {
|
||||||
|
// Empty struct - event source created per operation
|
||||||
|
}
|
||||||
|
|
||||||
|
impl MacOSController {
|
||||||
|
pub fn new() -> Result<Self> {
|
||||||
|
// Test that we can create an event source
|
||||||
|
let _event_source = CGEventSource::new(CGEventSourceStateID::CombinedSessionState)
|
||||||
|
.map_err(|_| anyhow::anyhow!("Failed to create event source. Make sure Accessibility permissions are granted."))?;
|
||||||
|
Ok(Self {})
|
||||||
|
}
|
||||||
|
|
||||||
|
fn key_to_keycode(&self, key: &str) -> Result<u16> {
|
||||||
|
// Map key names to macOS keycodes
|
||||||
|
let keycode = match key.to_lowercase().as_str() {
|
||||||
|
"return" | "enter" => 36,
|
||||||
|
"tab" => 48,
|
||||||
|
"space" => 49,
|
||||||
|
"delete" | "backspace" => 51,
|
||||||
|
"escape" | "esc" => 53,
|
||||||
|
"command" | "cmd" => 55,
|
||||||
|
"shift" => 56,
|
||||||
|
"capslock" => 57,
|
||||||
|
"option" | "alt" => 58,
|
||||||
|
"control" | "ctrl" => 59,
|
||||||
|
"left" => 123,
|
||||||
|
"right" => 124,
|
||||||
|
"down" => 125,
|
||||||
|
"up" => 126,
|
||||||
|
_ => anyhow::bail!("Unknown key: {}", key),
|
||||||
|
};
|
||||||
|
Ok(keycode)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[async_trait]
|
||||||
|
impl ComputerController for MacOSController {
|
||||||
|
async fn move_mouse(&self, x: i32, y: i32) -> Result<()> {
|
||||||
|
let event_source = CGEventSource::new(CGEventSourceStateID::CombinedSessionState)
|
||||||
|
.map_err(|_| anyhow::anyhow!("Failed to create event source"))?;
|
||||||
|
let point = CGPoint::new(x as f64, y as f64);
|
||||||
|
let event = CGEvent::new_mouse_event(
|
||||||
|
event_source,
|
||||||
|
CGEventType::MouseMoved,
|
||||||
|
point,
|
||||||
|
CGMouseButton::Left,
|
||||||
|
).map_err(|_| anyhow::anyhow!("Failed to create mouse move event"))?;
|
||||||
|
|
||||||
|
event.post(CGEventTapLocation::HID);
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn click(&self, button: MouseButton) -> Result<()> {
|
||||||
|
let (cg_button, down_type, up_type) = match button {
|
||||||
|
MouseButton::Left => (CGMouseButton::Left, CGEventType::LeftMouseDown, CGEventType::LeftMouseUp),
|
||||||
|
MouseButton::Right => (CGMouseButton::Right, CGEventType::RightMouseDown, CGEventType::RightMouseUp),
|
||||||
|
MouseButton::Middle => (CGMouseButton::Center, CGEventType::OtherMouseDown, CGEventType::OtherMouseUp),
|
||||||
|
};
|
||||||
|
|
||||||
|
let point = {
|
||||||
|
// Get current mouse position
|
||||||
|
let temp_source = CGEventSource::new(CGEventSourceStateID::CombinedSessionState)
|
||||||
|
.map_err(|_| anyhow::anyhow!("Failed to create event source"))?;
|
||||||
|
let event = CGEvent::new(temp_source)
|
||||||
|
.map_err(|_| anyhow::anyhow!("Failed to get mouse position"))?;
|
||||||
|
let p = event.location();
|
||||||
|
p
|
||||||
|
};
|
||||||
|
|
||||||
|
{
|
||||||
|
let event_source = CGEventSource::new(CGEventSourceStateID::CombinedSessionState)
|
||||||
|
.map_err(|_| anyhow::anyhow!("Failed to create event source"))?;
|
||||||
|
|
||||||
|
// Mouse down
|
||||||
|
let down_event = CGEvent::new_mouse_event(
|
||||||
|
event_source,
|
||||||
|
down_type,
|
||||||
|
point,
|
||||||
|
cg_button,
|
||||||
|
).map_err(|_| anyhow::anyhow!("Failed to create mouse down event"))?;
|
||||||
|
down_event.post(CGEventTapLocation::HID);
|
||||||
|
} // event_source and down_event dropped here
|
||||||
|
|
||||||
|
// Small delay
|
||||||
|
tokio::time::sleep(tokio::time::Duration::from_millis(50)).await;
|
||||||
|
|
||||||
|
{
|
||||||
|
let event_source = CGEventSource::new(CGEventSourceStateID::CombinedSessionState)
|
||||||
|
.map_err(|_| anyhow::anyhow!("Failed to create event source"))?;
|
||||||
|
|
||||||
|
let up_event = CGEvent::new_mouse_event(
|
||||||
|
event_source,
|
||||||
|
up_type,
|
||||||
|
point,
|
||||||
|
cg_button,
|
||||||
|
).map_err(|_| anyhow::anyhow!("Failed to create mouse up event"))?;
|
||||||
|
up_event.post(CGEventTapLocation::HID);
|
||||||
|
} // event_source and up_event dropped here
|
||||||
|
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn double_click(&self, button: MouseButton) -> Result<()> {
|
||||||
|
self.click(button).await?;
|
||||||
|
tokio::time::sleep(tokio::time::Duration::from_millis(100)).await;
|
||||||
|
self.click(button).await?;
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn type_text(&self, text: &str) -> Result<()> {
|
||||||
|
for ch in text.chars() {
|
||||||
|
{
|
||||||
|
let event_source = CGEventSource::new(CGEventSourceStateID::CombinedSessionState)
|
||||||
|
.map_err(|_| anyhow::anyhow!("Failed to create event source"))?;
|
||||||
|
|
||||||
|
// Create keyboard event for character
|
||||||
|
let event = CGEvent::new_keyboard_event(
|
||||||
|
event_source,
|
||||||
|
0, // keycode (0 for unicode)
|
||||||
|
true,
|
||||||
|
).map_err(|_| anyhow::anyhow!("Failed to create keyboard event"))?;
|
||||||
|
|
||||||
|
// Set unicode string
|
||||||
|
let mut utf16_buf = [0u16; 2];
|
||||||
|
let utf16_slice = ch.encode_utf16(&mut utf16_buf);
|
||||||
|
let utf16_chars: Vec<u16> = utf16_slice.iter().copied().collect();
|
||||||
|
|
||||||
|
event.set_string_from_utf16_unchecked(utf16_chars.as_slice());
|
||||||
|
event.post(CGEventTapLocation::HID);
|
||||||
|
} // event_source and event dropped here
|
||||||
|
|
||||||
|
tokio::time::sleep(tokio::time::Duration::from_millis(10)).await;
|
||||||
|
}
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn press_key(&self, key: &str) -> Result<()> {
|
||||||
|
let keycode = self.key_to_keycode(key)?;
|
||||||
|
|
||||||
|
{
|
||||||
|
let event_source = CGEventSource::new(CGEventSourceStateID::CombinedSessionState)
|
||||||
|
.map_err(|_| anyhow::anyhow!("Failed to create event source"))?;
|
||||||
|
|
||||||
|
// Key down
|
||||||
|
let down_event = CGEvent::new_keyboard_event(
|
||||||
|
event_source,
|
||||||
|
keycode,
|
||||||
|
true,
|
||||||
|
).map_err(|_| anyhow::anyhow!("Failed to create key down event"))?;
|
||||||
|
down_event.post(CGEventTapLocation::HID);
|
||||||
|
} // event_source and down_event dropped here
|
||||||
|
|
||||||
|
tokio::time::sleep(tokio::time::Duration::from_millis(50)).await;
|
||||||
|
|
||||||
|
{
|
||||||
|
let event_source = CGEventSource::new(CGEventSourceStateID::CombinedSessionState)
|
||||||
|
.map_err(|_| anyhow::anyhow!("Failed to create event source"))?;
|
||||||
|
|
||||||
|
// Key up
|
||||||
|
let up_event = CGEvent::new_keyboard_event(
|
||||||
|
event_source,
|
||||||
|
keycode,
|
||||||
|
false,
|
||||||
|
).map_err(|_| anyhow::anyhow!("Failed to create key up event"))?;
|
||||||
|
up_event.post(CGEventTapLocation::HID);
|
||||||
|
} // event_source and up_event dropped here
|
||||||
|
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn list_windows(&self) -> Result<Vec<Window>> {
|
||||||
|
// Note: Full implementation would use CGWindowListCopyWindowInfo
|
||||||
|
// For now, return empty list as this requires more complex FFI
|
||||||
|
tracing::warn!("list_windows not fully implemented on macOS");
|
||||||
|
Ok(vec![])
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn focus_window(&self, _window_id: &str) -> Result<()> {
|
||||||
|
// Note: Full implementation would use NSWorkspace to activate application
|
||||||
|
tracing::warn!("focus_window not fully implemented on macOS");
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn get_window_bounds(&self, _window_id: &str) -> Result<Rect> {
|
||||||
|
// Note: Full implementation would use Accessibility API
|
||||||
|
tracing::warn!("get_window_bounds not fully implemented on macOS");
|
||||||
|
Ok(Rect { x: 0, y: 0, width: 800, height: 600 })
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn find_element(&self, _selector: &ElementSelector) -> Result<Option<UIElement>> {
|
||||||
|
// Note: Full implementation would use macOS Accessibility API
|
||||||
|
tracing::warn!("find_element not fully implemented on macOS");
|
||||||
|
Ok(None)
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn get_element_text(&self, _element_id: &str) -> Result<String> {
|
||||||
|
// Note: Full implementation would use Accessibility API
|
||||||
|
tracing::warn!("get_element_text not fully implemented on macOS");
|
||||||
|
Ok(String::new())
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn get_element_bounds(&self, _element_id: &str) -> Result<Rect> {
|
||||||
|
// Note: Full implementation would use Accessibility API
|
||||||
|
tracing::warn!("get_element_bounds not fully implemented on macOS");
|
||||||
|
Ok(Rect { x: 0, y: 0, width: 100, height: 30 })
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn take_screenshot(&self, path: &str, _region: Option<Rect>, window_id: Option<&str>) -> Result<()> {
|
||||||
|
// Use native macOS screencapture command which handles all the format complexities
|
||||||
|
|
||||||
|
// Check if we have Screen Recording permission by attempting a test capture
|
||||||
|
// If we only get wallpaper/menubar but no windows, we need permission
|
||||||
|
let needs_permission_check = std::env::var("G3_SKIP_PERMISSION_CHECK").is_err();
|
||||||
|
|
||||||
|
if needs_permission_check {
|
||||||
|
// Try to open Screen Recording settings if this is the first screenshot
|
||||||
|
static PERMISSION_PROMPTED: std::sync::atomic::AtomicBool = std::sync::atomic::AtomicBool::new(false);
|
||||||
|
|
||||||
|
if !PERMISSION_PROMPTED.swap(true, std::sync::atomic::Ordering::Relaxed) {
|
||||||
|
tracing::warn!("\n=== Screen Recording Permission Required ===\n\
|
||||||
|
macOS requires explicit permission to capture window content.\n\
|
||||||
|
If screenshots only show wallpaper/menubar (no windows):\n\n\
|
||||||
|
1. Open System Settings > Privacy & Security > Screen Recording\n\
|
||||||
|
2. Enable permission for your terminal (iTerm/Terminal) or g3\n\
|
||||||
|
3. Restart your terminal if needed\n\n\
|
||||||
|
Opening Screen Recording settings now...\n");
|
||||||
|
|
||||||
|
// Try to open the settings (non-blocking)
|
||||||
|
let _ = std::process::Command::new("open")
|
||||||
|
.arg("x-apple.systempreferences:com.apple.preference.security?Privacy_ScreenCapture")
|
||||||
|
.spawn();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
let path_obj = Path::new(path);
|
||||||
|
if let Some(parent) = path_obj.parent() {
|
||||||
|
std::fs::create_dir_all(parent)?;
|
||||||
|
}
|
||||||
|
|
||||||
|
let mut cmd = std::process::Command::new("screencapture");
|
||||||
|
|
||||||
|
// Add flags
|
||||||
|
cmd.arg("-x"); // No sound
|
||||||
|
|
||||||
|
if let Some(window_id) = window_id {
|
||||||
|
// Capture specific window by getting its bounds and using region capture
|
||||||
|
// window_id format: "AppName" or "AppName:WindowTitle"
|
||||||
|
let app_name = window_id.split(':').next().unwrap_or(window_id);
|
||||||
|
|
||||||
|
// Use AppleScript to get window bounds
|
||||||
|
let script = format!(
|
||||||
|
r#"tell application "{}"
|
||||||
|
tell current window
|
||||||
|
get bounds
|
||||||
|
end tell
|
||||||
|
end tell"#,
|
||||||
|
app_name
|
||||||
|
);
|
||||||
|
|
||||||
|
let output = std::process::Command::new("osascript")
|
||||||
|
.arg("-e")
|
||||||
|
.arg(&script)
|
||||||
|
.output()
|
||||||
|
.map_err(|e| anyhow::anyhow!("Failed to get window bounds: {}", e))?;
|
||||||
|
|
||||||
|
if output.status.success() {
|
||||||
|
let bounds_str = String::from_utf8_lossy(&output.stdout);
|
||||||
|
let bounds: Vec<i32> = bounds_str
|
||||||
|
.trim()
|
||||||
|
.split(',')
|
||||||
|
.filter_map(|s| s.trim().parse().ok())
|
||||||
|
.collect();
|
||||||
|
|
||||||
|
if bounds.len() == 4 {
|
||||||
|
let (left, top, right, bottom) = (bounds[0], bounds[1], bounds[2], bounds[3]);
|
||||||
|
let width = right - left;
|
||||||
|
let height = bottom - top;
|
||||||
|
|
||||||
|
cmd.arg("-R");
|
||||||
|
cmd.arg(format!("{},{},{},{}", left, top, width, height));
|
||||||
|
|
||||||
|
tracing::debug!("Capturing window '{}' at region: {},{} {}x{}", app_name, left, top, width, height);
|
||||||
|
} else {
|
||||||
|
tracing::warn!("Failed to parse window bounds, capturing full screen");
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
tracing::warn!("Failed to get window bounds for '{}', capturing full screen", app_name);
|
||||||
|
}
|
||||||
|
} else if let Some(region) = _region {
|
||||||
|
// Capture specific region: -R x,y,width,height
|
||||||
|
cmd.arg("-R");
|
||||||
|
cmd.arg(format!("{},{},{},{}", region.x, region.y, region.width, region.height));
|
||||||
|
}
|
||||||
|
|
||||||
|
cmd.arg(path);
|
||||||
|
|
||||||
|
let output = cmd.output()
|
||||||
|
.map_err(|e| anyhow::anyhow!("Failed to execute screencapture: {}", e))?;
|
||||||
|
|
||||||
|
if !output.status.success() {
|
||||||
|
let stderr = String::from_utf8_lossy(&output.stderr);
|
||||||
|
anyhow::bail!("screencapture failed: {}", stderr);
|
||||||
|
}
|
||||||
|
|
||||||
|
tracing::debug!("Screenshot saved using screencapture: {}", path);
|
||||||
|
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn extract_text_from_screen(&self, region: Rect) -> Result<OCRResult> {
|
||||||
|
// Take screenshot of region first
|
||||||
|
let temp_path = format!("/tmp/g3_ocr_{}.png", uuid::Uuid::new_v4());
|
||||||
|
self.take_screenshot(&temp_path, Some(region), None).await?;
|
||||||
|
|
||||||
|
// Extract text from the screenshot
|
||||||
|
let result = self.extract_text_from_image(&temp_path).await?;
|
||||||
|
|
||||||
|
// Clean up temp file
|
||||||
|
let _ = std::fs::remove_file(&temp_path);
|
||||||
|
|
||||||
|
Ok(result)
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn extract_text_from_image(&self, _path: &str) -> Result<OCRResult> {
|
||||||
|
// Check if tesseract is available on the system
|
||||||
|
let tesseract_check = std::process::Command::new("which")
|
||||||
|
.arg("tesseract")
|
||||||
|
.output();
|
||||||
|
|
||||||
|
if tesseract_check.is_err() || !tesseract_check.as_ref().unwrap().status.success() {
|
||||||
|
anyhow::bail!("Tesseract OCR is not installed on your system.\n\n\
|
||||||
|
To install tesseract:\n macOS: brew install tesseract\n \
|
||||||
|
Linux: sudo apt-get install tesseract-ocr (Ubuntu/Debian)\n \
|
||||||
|
sudo yum install tesseract (RHEL/CentOS)\n \
|
||||||
|
Windows: Download from https://github.com/UB-Mannheim/tesseract/wiki\n\n\
|
||||||
|
After installation, restart your terminal and try again.");
|
||||||
|
}
|
||||||
|
|
||||||
|
// Initialize Tesseract
|
||||||
|
let tess = Tesseract::new(None, Some("eng"))
|
||||||
|
.map_err(|e| {
|
||||||
|
anyhow::anyhow!("Failed to initialize Tesseract: {}\n\n\
|
||||||
|
This usually means:\n1. Tesseract is not properly installed\n\
|
||||||
|
2. Language data files are missing\n\nTo fix:\n \
|
||||||
|
macOS: brew reinstall tesseract\n \
|
||||||
|
Linux: sudo apt-get install tesseract-ocr-eng\n \
|
||||||
|
Windows: Reinstall tesseract and ensure language files are included", e)
|
||||||
|
})?;
|
||||||
|
|
||||||
|
let text = tess.set_image(_path)
|
||||||
|
.map_err(|e| anyhow::anyhow!("Failed to load image '{}': {}", _path, e))?
|
||||||
|
.get_text()
|
||||||
|
.map_err(|e| anyhow::anyhow!("Failed to extract text from image: {}", e))?;
|
||||||
|
|
||||||
|
// Get confidence (simplified - would need more complex API calls for per-word confidence)
|
||||||
|
let confidence = 0.85; // Placeholder
|
||||||
|
|
||||||
|
Ok(OCRResult {
|
||||||
|
text,
|
||||||
|
confidence,
|
||||||
|
bounds: Rect { x: 0, y: 0, width: 0, height: 0 }, // Would need image dimensions
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn find_text_on_screen(&self, _text: &str) -> Result<Option<Point>> {
|
||||||
|
// Check if tesseract is available on the system
|
||||||
|
let tesseract_check = std::process::Command::new("which")
|
||||||
|
.arg("tesseract")
|
||||||
|
.output();
|
||||||
|
|
||||||
|
if tesseract_check.is_err() || !tesseract_check.as_ref().unwrap().status.success() {
|
||||||
|
anyhow::bail!("Tesseract OCR is not installed on your system.\n\n\
|
||||||
|
To install tesseract:\n macOS: brew install tesseract\n \
|
||||||
|
Linux: sudo apt-get install tesseract-ocr (Ubuntu/Debian)\n \
|
||||||
|
sudo yum install tesseract (RHEL/CentOS)\n \
|
||||||
|
Windows: Download from https://github.com/UB-Mannheim/tesseract/wiki\n\n\
|
||||||
|
After installation, restart your terminal and try again.");
|
||||||
|
}
|
||||||
|
|
||||||
|
// Take full screen screenshot
|
||||||
|
let temp_path = format!("/tmp/g3_ocr_search_{}.png", uuid::Uuid::new_v4());
|
||||||
|
self.take_screenshot(&temp_path, None, None).await?;
|
||||||
|
|
||||||
|
// Use Tesseract to find text with bounding boxes
|
||||||
|
let tess = Tesseract::new(None, Some("eng"))
|
||||||
|
.map_err(|e| {
|
||||||
|
anyhow::anyhow!("Failed to initialize Tesseract: {}\n\n\
|
||||||
|
This usually means:\n1. Tesseract is not properly installed\n\
|
||||||
|
2. Language data files are missing\n\nTo fix:\n \
|
||||||
|
macOS: brew reinstall tesseract\n \
|
||||||
|
Linux: sudo apt-get install tesseract-ocr-eng\n \
|
||||||
|
Windows: Reinstall tesseract and ensure language files are included", e)
|
||||||
|
})?;
|
||||||
|
|
||||||
|
let full_text = tess.set_image(temp_path.as_str())
|
||||||
|
.map_err(|e| anyhow::anyhow!("Failed to load screenshot: {}", e))?
|
||||||
|
.get_text()
|
||||||
|
.map_err(|e| anyhow::anyhow!("Failed to extract text from screen: {}", e))?;
|
||||||
|
|
||||||
|
// Clean up temp file
|
||||||
|
let _ = std::fs::remove_file(&temp_path);
|
||||||
|
|
||||||
|
// Simple text search - full implementation would use get_component_images
|
||||||
|
// to get bounding boxes for each word
|
||||||
|
if full_text.contains(_text) {
|
||||||
|
tracing::warn!("Text found but precise coordinates not available in simplified implementation");
|
||||||
|
Ok(Some(Point { x: 0, y: 0 }))
|
||||||
|
} else {
|
||||||
|
Ok(None)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
8
crates/g3-computer-control/src/platform/mod.rs
Normal file
8
crates/g3-computer-control/src/platform/mod.rs
Normal file
@@ -0,0 +1,8 @@
|
|||||||
|
#[cfg(target_os = "macos")]
|
||||||
|
pub mod macos;
|
||||||
|
|
||||||
|
#[cfg(target_os = "linux")]
|
||||||
|
pub mod linux;
|
||||||
|
|
||||||
|
#[cfg(target_os = "windows")]
|
||||||
|
pub mod windows;
|
||||||
162
crates/g3-computer-control/src/platform/windows.rs
Normal file
162
crates/g3-computer-control/src/platform/windows.rs
Normal file
@@ -0,0 +1,162 @@
|
|||||||
|
use crate::{ComputerController, types::*};
|
||||||
|
use anyhow::Result;
|
||||||
|
use async_trait::async_trait;
|
||||||
|
use tesseract::Tesseract;
|
||||||
|
use uuid::Uuid;
|
||||||
|
|
||||||
|
pub struct WindowsController {
|
||||||
|
// Placeholder for Windows-specific state
|
||||||
|
}
|
||||||
|
|
||||||
|
impl WindowsController {
|
||||||
|
pub fn new() -> Result<Self> {
|
||||||
|
tracing::warn!("Windows computer control not fully implemented");
|
||||||
|
Ok(Self {})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[async_trait]
|
||||||
|
impl ComputerController for WindowsController {
|
||||||
|
async fn move_mouse(&self, _x: i32, _y: i32) -> Result<()> {
|
||||||
|
anyhow::bail!("Windows implementation not yet available")
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn click(&self, _button: MouseButton) -> Result<()> {
|
||||||
|
anyhow::bail!("Windows implementation not yet available")
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn double_click(&self, _button: MouseButton) -> Result<()> {
|
||||||
|
anyhow::bail!("Windows implementation not yet available")
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn type_text(&self, _text: &str) -> Result<()> {
|
||||||
|
anyhow::bail!("Windows implementation not yet available")
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn press_key(&self, _key: &str) -> Result<()> {
|
||||||
|
anyhow::bail!("Windows implementation not yet available")
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn list_windows(&self) -> Result<Vec<Window>> {
|
||||||
|
anyhow::bail!("Windows implementation not yet available")
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn focus_window(&self, _window_id: &str) -> Result<()> {
|
||||||
|
anyhow::bail!("Windows implementation not yet available")
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn get_window_bounds(&self, _window_id: &str) -> Result<Rect> {
|
||||||
|
anyhow::bail!("Windows implementation not yet available")
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn find_element(&self, _selector: &ElementSelector) -> Result<Option<UIElement>> {
|
||||||
|
anyhow::bail!("Windows implementation not yet available")
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn get_element_text(&self, _element_id: &str) -> Result<String> {
|
||||||
|
anyhow::bail!("Windows implementation not yet available")
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn get_element_bounds(&self, _element_id: &str) -> Result<Rect> {
|
||||||
|
anyhow::bail!("Windows implementation not yet available")
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn take_screenshot(&self, _path: &str, _region: Option<Rect>, _window_id: Option<&str>) -> Result<()> {
|
||||||
|
anyhow::bail!("Windows implementation not yet available")
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn extract_text_from_screen(&self, _region: Rect) -> Result<OCRResult> {
|
||||||
|
anyhow::bail!("Windows implementation not yet available")
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn extract_text_from_image(&self, _path: &str) -> Result<OCRResult> {
|
||||||
|
// Check if tesseract is available on the system
|
||||||
|
let tesseract_check = std::process::Command::new("where")
|
||||||
|
.arg("tesseract")
|
||||||
|
.output();
|
||||||
|
|
||||||
|
if tesseract_check.is_err() || !tesseract_check.as_ref().unwrap().status.success() {
|
||||||
|
anyhow::bail!("Tesseract OCR is not installed on your system.\n\n\
|
||||||
|
To install tesseract on Windows:\n \
|
||||||
|
1. Download the installer from: https://github.com/UB-Mannheim/tesseract/wiki\n \
|
||||||
|
2. Run the installer and follow the instructions\n \
|
||||||
|
3. Add tesseract to your PATH environment variable\n \
|
||||||
|
4. Restart your terminal/command prompt\n\n\
|
||||||
|
After installation, restart your terminal and try again.");
|
||||||
|
}
|
||||||
|
|
||||||
|
// Initialize Tesseract
|
||||||
|
let tess = Tesseract::new(None, Some("eng"))
|
||||||
|
.map_err(|e| {
|
||||||
|
anyhow::anyhow!("Failed to initialize Tesseract: {}\n\n\
|
||||||
|
This usually means:\n1. Tesseract is not properly installed\n\
|
||||||
|
2. Language data files are missing\n\nTo fix:\n \
|
||||||
|
1. Reinstall tesseract from https://github.com/UB-Mannheim/tesseract/wiki\n \
|
||||||
|
2. Make sure to select 'Additional language data' during installation\n \
|
||||||
|
3. Ensure tesseract is in your PATH", e)
|
||||||
|
})?;
|
||||||
|
|
||||||
|
let text = tess.set_image(_path)
|
||||||
|
.map_err(|e| anyhow::anyhow!("Failed to load image '{}': {}", _path, e))?
|
||||||
|
.get_text()
|
||||||
|
.map_err(|e| anyhow::anyhow!("Failed to extract text from image: {}", e))?;
|
||||||
|
|
||||||
|
// Get confidence (simplified - would need more complex API calls for per-word confidence)
|
||||||
|
let confidence = 0.85; // Placeholder
|
||||||
|
|
||||||
|
Ok(OCRResult {
|
||||||
|
text,
|
||||||
|
confidence,
|
||||||
|
bounds: Rect { x: 0, y: 0, width: 0, height: 0 }, // Would need image dimensions
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn find_text_on_screen(&self, _text: &str) -> Result<Option<Point>> {
|
||||||
|
// Check if tesseract is available on the system
|
||||||
|
let tesseract_check = std::process::Command::new("where")
|
||||||
|
.arg("tesseract")
|
||||||
|
.output();
|
||||||
|
|
||||||
|
if tesseract_check.is_err() || !tesseract_check.as_ref().unwrap().status.success() {
|
||||||
|
anyhow::bail!("Tesseract OCR is not installed on your system.\n\n\
|
||||||
|
To install tesseract on Windows:\n \
|
||||||
|
1. Download the installer from: https://github.com/UB-Mannheim/tesseract/wiki\n \
|
||||||
|
2. Run the installer and follow the instructions\n \
|
||||||
|
3. Add tesseract to your PATH environment variable\n \
|
||||||
|
4. Restart your terminal/command prompt\n\n\
|
||||||
|
After installation, restart your terminal and try again.");
|
||||||
|
}
|
||||||
|
|
||||||
|
// Take full screen screenshot
|
||||||
|
let temp_path = format!("C:\\\\Temp\\\\g3_ocr_search_{}.png", uuid::Uuid::new_v4());
|
||||||
|
self.take_screenshot(&temp_path, None, None).await?;
|
||||||
|
|
||||||
|
// Use Tesseract to find text with bounding boxes
|
||||||
|
let tess = Tesseract::new(None, Some("eng"))
|
||||||
|
.map_err(|e| {
|
||||||
|
anyhow::anyhow!("Failed to initialize Tesseract: {}\n\n\
|
||||||
|
This usually means:\n1. Tesseract is not properly installed\n\
|
||||||
|
2. Language data files are missing\n\nTo fix:\n \
|
||||||
|
1. Reinstall tesseract from https://github.com/UB-Mannheim/tesseract/wiki\n \
|
||||||
|
2. Make sure to select 'Additional language data' during installation\n \
|
||||||
|
3. Ensure tesseract is in your PATH", e)
|
||||||
|
})?;
|
||||||
|
|
||||||
|
let full_text = tess.set_image(temp_path.as_str())
|
||||||
|
.map_err(|e| anyhow::anyhow!("Failed to load screenshot: {}", e))?
|
||||||
|
.get_text()
|
||||||
|
.map_err(|e| anyhow::anyhow!("Failed to extract text from screen: {}", e))?;
|
||||||
|
|
||||||
|
// Clean up temp file
|
||||||
|
let _ = std::fs::remove_file(&temp_path);
|
||||||
|
|
||||||
|
// Simple text search - full implementation would use get_component_images
|
||||||
|
// to get bounding boxes for each word
|
||||||
|
if full_text.contains(_text) {
|
||||||
|
tracing::warn!("Text found but precise coordinates not available in simplified implementation");
|
||||||
|
Ok(Some(Point { x: 0, y: 0 }))
|
||||||
|
} else {
|
||||||
|
Ok(None)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
9
crates/g3-computer-control/src/types.rs
Normal file
9
crates/g3-computer-control/src/types.rs
Normal file
@@ -0,0 +1,9 @@
|
|||||||
|
use serde::{Deserialize, Serialize};
|
||||||
|
|
||||||
|
#[derive(Debug, Clone, Copy, Serialize, Deserialize)]
|
||||||
|
pub struct Rect {
|
||||||
|
pub x: i32,
|
||||||
|
pub y: i32,
|
||||||
|
pub width: i32,
|
||||||
|
pub height: i32,
|
||||||
|
}
|
||||||
111
crates/g3-computer-control/src/webdriver/mod.rs
Normal file
111
crates/g3-computer-control/src/webdriver/mod.rs
Normal file
@@ -0,0 +1,111 @@
|
|||||||
|
pub mod safari;
|
||||||
|
|
||||||
|
use anyhow::Result;
|
||||||
|
use async_trait::async_trait;
|
||||||
|
use serde_json::Value;
|
||||||
|
|
||||||
|
/// WebDriver controller for browser automation
|
||||||
|
#[async_trait]
|
||||||
|
pub trait WebDriverController: Send + Sync {
|
||||||
|
/// Navigate to a URL
|
||||||
|
async fn navigate(&mut self, url: &str) -> Result<()>;
|
||||||
|
|
||||||
|
/// Get the current URL
|
||||||
|
async fn current_url(&self) -> Result<String>;
|
||||||
|
|
||||||
|
/// Get the page title
|
||||||
|
async fn title(&self) -> Result<String>;
|
||||||
|
|
||||||
|
/// Find an element by CSS selector
|
||||||
|
async fn find_element(&mut self, selector: &str) -> Result<WebElement>;
|
||||||
|
|
||||||
|
/// Find multiple elements by CSS selector
|
||||||
|
async fn find_elements(&mut self, selector: &str) -> Result<Vec<WebElement>>;
|
||||||
|
|
||||||
|
/// Execute JavaScript in the browser
|
||||||
|
async fn execute_script(&mut self, script: &str, args: Vec<Value>) -> Result<Value>;
|
||||||
|
|
||||||
|
/// Get the page source (HTML)
|
||||||
|
async fn page_source(&self) -> Result<String>;
|
||||||
|
|
||||||
|
/// Take a screenshot and save to path
|
||||||
|
async fn screenshot(&mut self, path: &str) -> Result<()>;
|
||||||
|
|
||||||
|
/// Close the current window/tab
|
||||||
|
async fn close(&mut self) -> Result<()>;
|
||||||
|
|
||||||
|
/// Quit the browser session
|
||||||
|
async fn quit(self) -> Result<()>;
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Represents a web element in the DOM
|
||||||
|
pub struct WebElement {
|
||||||
|
pub(crate) inner: fantoccini::elements::Element,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl WebElement {
|
||||||
|
/// Click the element
|
||||||
|
pub async fn click(&mut self) -> Result<()> {
|
||||||
|
self.inner.click().await?;
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Send keys/text to the element
|
||||||
|
pub async fn send_keys(&mut self, text: &str) -> Result<()> {
|
||||||
|
self.inner.send_keys(text).await?;
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Clear the element's content (for input fields)
|
||||||
|
pub async fn clear(&mut self) -> Result<()> {
|
||||||
|
self.inner.clear().await?;
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Get the element's text content
|
||||||
|
pub async fn text(&self) -> Result<String> {
|
||||||
|
Ok(self.inner.text().await?)
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Get an attribute value
|
||||||
|
pub async fn attr(&self, name: &str) -> Result<Option<String>> {
|
||||||
|
Ok(self.inner.attr(name).await?)
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Get a property value
|
||||||
|
pub async fn prop(&self, name: &str) -> Result<Option<String>> {
|
||||||
|
Ok(self.inner.prop(name).await?)
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Get the element's HTML
|
||||||
|
pub async fn html(&self, inner: bool) -> Result<String> {
|
||||||
|
Ok(self.inner.html(inner).await?)
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Check if element is displayed
|
||||||
|
pub async fn is_displayed(&self) -> Result<bool> {
|
||||||
|
Ok(self.inner.is_displayed().await?)
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Check if element is enabled
|
||||||
|
pub async fn is_enabled(&self) -> Result<bool> {
|
||||||
|
Ok(self.inner.is_enabled().await?)
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Check if element is selected (for checkboxes/radio buttons)
|
||||||
|
pub async fn is_selected(&self) -> Result<bool> {
|
||||||
|
Ok(self.inner.is_selected().await?)
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Find a child element by CSS selector
|
||||||
|
pub async fn find_element(&mut self, selector: &str) -> Result<WebElement> {
|
||||||
|
let elem = self.inner.find(fantoccini::Locator::Css(selector)).await?;
|
||||||
|
Ok(WebElement { inner: elem })
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Find multiple child elements by CSS selector
|
||||||
|
pub async fn find_elements(&mut self, selector: &str) -> Result<Vec<WebElement>> {
|
||||||
|
let elems = self.inner.find_all(fantoccini::Locator::Css(selector)).await?;
|
||||||
|
Ok(elems.into_iter().map(|inner| WebElement { inner }).collect())
|
||||||
|
}
|
||||||
|
}
|
||||||
212
crates/g3-computer-control/src/webdriver/safari.rs
Normal file
212
crates/g3-computer-control/src/webdriver/safari.rs
Normal file
@@ -0,0 +1,212 @@
|
|||||||
|
use super::{WebDriverController, WebElement};
|
||||||
|
use anyhow::{Context, Result};
|
||||||
|
use async_trait::async_trait;
|
||||||
|
use fantoccini::{Client, ClientBuilder};
|
||||||
|
use serde_json::Value;
|
||||||
|
use std::time::Duration;
|
||||||
|
|
||||||
|
/// SafariDriver WebDriver controller
|
||||||
|
pub struct SafariDriver {
|
||||||
|
client: Client,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl SafariDriver {
|
||||||
|
/// Create a new SafariDriver instance
|
||||||
|
///
|
||||||
|
/// This will connect to SafariDriver running on the default port (4444).
|
||||||
|
/// Make sure to enable "Allow Remote Automation" in Safari's Develop menu first.
|
||||||
|
///
|
||||||
|
/// You can start SafariDriver manually with:
|
||||||
|
/// ```bash
|
||||||
|
/// /usr/bin/safaridriver --enable
|
||||||
|
/// ```
|
||||||
|
pub async fn new() -> Result<Self> {
|
||||||
|
Self::with_port(4444).await
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Create a new SafariDriver instance with a custom port
|
||||||
|
pub async fn with_port(port: u16) -> Result<Self> {
|
||||||
|
let url = format!("http://localhost:{}", port);
|
||||||
|
|
||||||
|
let mut caps = serde_json::Map::new();
|
||||||
|
caps.insert("browserName".to_string(), Value::String("safari".to_string()));
|
||||||
|
|
||||||
|
let client = ClientBuilder::native()
|
||||||
|
.capabilities(caps)
|
||||||
|
.connect(&url)
|
||||||
|
.await
|
||||||
|
.context("Failed to connect to SafariDriver. Make sure SafariDriver is running and 'Allow Remote Automation' is enabled in Safari's Develop menu.")?;
|
||||||
|
|
||||||
|
Ok(Self { client })
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Go back in browser history
|
||||||
|
pub async fn back(&mut self) -> Result<()> {
|
||||||
|
self.client.back().await?;
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Go forward in browser history
|
||||||
|
pub async fn forward(&mut self) -> Result<()> {
|
||||||
|
self.client.forward().await?;
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Refresh the current page
|
||||||
|
pub async fn refresh(&mut self) -> Result<()> {
|
||||||
|
self.client.refresh().await?;
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Get all window handles
|
||||||
|
pub async fn window_handles(&mut self) -> Result<Vec<String>> {
|
||||||
|
let handles = self.client.windows().await?;
|
||||||
|
Ok(handles.into_iter()
|
||||||
|
.map(|h| h.into())
|
||||||
|
.collect())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Switch to a window by handle
|
||||||
|
pub async fn switch_to_window(&mut self, handle: &str) -> Result<()> {
|
||||||
|
let window_handle: fantoccini::wd::WindowHandle = handle.to_string().try_into()?;
|
||||||
|
self.client.switch_to_window(window_handle).await?;
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Get the current window handle
|
||||||
|
pub async fn current_window_handle(&mut self) -> Result<String> {
|
||||||
|
Ok(self.client.window().await?.into())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Close the current window
|
||||||
|
pub async fn close_window(&mut self) -> Result<()> {
|
||||||
|
self.client.close_window().await?;
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Create a new window/tab
|
||||||
|
pub async fn new_window(&mut self, is_tab: bool) -> Result<String> {
|
||||||
|
let window_type = if is_tab { "tab" } else { "window" };
|
||||||
|
let response = self.client.new_window(window_type == "tab").await?;
|
||||||
|
Ok(response.handle.into())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Get cookies
|
||||||
|
pub async fn get_cookies(&mut self) -> Result<Vec<fantoccini::cookies::Cookie<'static>>> {
|
||||||
|
Ok(self.client.get_all_cookies().await?)
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Add a cookie
|
||||||
|
pub async fn add_cookie(&mut self, cookie: fantoccini::cookies::Cookie<'static>) -> Result<()> {
|
||||||
|
self.client.add_cookie(cookie).await?;
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Delete all cookies
|
||||||
|
pub async fn delete_all_cookies(&mut self) -> Result<()> {
|
||||||
|
self.client.delete_all_cookies().await?;
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Wait for an element to appear (with timeout)
|
||||||
|
pub async fn wait_for_element(&mut self, selector: &str, timeout: Duration) -> Result<WebElement> {
|
||||||
|
let start = std::time::Instant::now();
|
||||||
|
let poll_interval = Duration::from_millis(100);
|
||||||
|
|
||||||
|
loop {
|
||||||
|
if let Ok(elem) = self.find_element(selector).await {
|
||||||
|
return Ok(elem);
|
||||||
|
}
|
||||||
|
|
||||||
|
if start.elapsed() >= timeout {
|
||||||
|
anyhow::bail!("Timeout waiting for element: {}", selector);
|
||||||
|
}
|
||||||
|
|
||||||
|
tokio::time::sleep(poll_interval).await;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Wait for an element to be visible (with timeout)
|
||||||
|
pub async fn wait_for_visible(&mut self, selector: &str, timeout: Duration) -> Result<WebElement> {
|
||||||
|
let start = std::time::Instant::now();
|
||||||
|
let poll_interval = Duration::from_millis(100);
|
||||||
|
|
||||||
|
loop {
|
||||||
|
if let Ok(elem) = self.find_element(selector).await {
|
||||||
|
if elem.is_displayed().await.unwrap_or(false) {
|
||||||
|
return Ok(elem);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if start.elapsed() >= timeout {
|
||||||
|
anyhow::bail!("Timeout waiting for element to be visible: {}", selector);
|
||||||
|
}
|
||||||
|
|
||||||
|
tokio::time::sleep(poll_interval).await;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[async_trait]
|
||||||
|
impl WebDriverController for SafariDriver {
|
||||||
|
async fn navigate(&mut self, url: &str) -> Result<()> {
|
||||||
|
self.client.goto(url).await?;
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn current_url(&self) -> Result<String> {
|
||||||
|
Ok(self.client.current_url().await?.to_string())
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn title(&self) -> Result<String> {
|
||||||
|
Ok(self.client.title().await?)
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn find_element(&mut self, selector: &str) -> Result<WebElement> {
|
||||||
|
let elem = self.client.find(fantoccini::Locator::Css(selector)).await
|
||||||
|
.context(format!("Failed to find element with selector: {}", selector))?;
|
||||||
|
Ok(WebElement { inner: elem })
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn find_elements(&mut self, selector: &str) -> Result<Vec<WebElement>> {
|
||||||
|
let elems = self.client.find_all(fantoccini::Locator::Css(selector)).await?;
|
||||||
|
Ok(elems.into_iter().map(|inner| WebElement { inner }).collect())
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn execute_script(&mut self, script: &str, args: Vec<Value>) -> Result<Value> {
|
||||||
|
Ok(self.client.execute(script, args).await?)
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn page_source(&self) -> Result<String> {
|
||||||
|
Ok(self.client.source().await?)
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn screenshot(&mut self, path: &str) -> Result<()> {
|
||||||
|
let screenshot_data = self.client.screenshot().await?;
|
||||||
|
|
||||||
|
// Expand tilde in path
|
||||||
|
let expanded_path = shellexpand::tilde(path);
|
||||||
|
let path_str = expanded_path.as_ref();
|
||||||
|
|
||||||
|
// Create parent directories if needed
|
||||||
|
if let Some(parent) = std::path::Path::new(path_str).parent() {
|
||||||
|
std::fs::create_dir_all(parent)
|
||||||
|
.context("Failed to create parent directories for screenshot")?;
|
||||||
|
}
|
||||||
|
|
||||||
|
std::fs::write(path_str, screenshot_data)
|
||||||
|
.context("Failed to write screenshot to file")?;
|
||||||
|
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn close(&mut self) -> Result<()> {
|
||||||
|
self.client.close_window().await?;
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn quit(mut self) -> Result<()> {
|
||||||
|
self.client.close().await?;
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
}
|
||||||
62
crates/g3-computer-control/tests/integration_test.rs
Normal file
62
crates/g3-computer-control/tests/integration_test.rs
Normal file
@@ -0,0 +1,62 @@
|
|||||||
|
use g3_computer_control::*;
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_mouse_movement() {
|
||||||
|
let controller = create_controller().expect("Failed to create controller");
|
||||||
|
|
||||||
|
// Move mouse to center of screen (assuming 1920x1080)
|
||||||
|
let result = controller.move_mouse(960, 540).await;
|
||||||
|
assert!(result.is_ok(), "Failed to move mouse: {:?}", result.err());
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_typing() {
|
||||||
|
let controller = create_controller().expect("Failed to create controller");
|
||||||
|
|
||||||
|
// Type some text
|
||||||
|
let result = controller.type_text("Hello, World!").await;
|
||||||
|
assert!(result.is_ok(), "Failed to type text: {:?}", result.err());
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_screenshot() {
|
||||||
|
let controller = create_controller().expect("Failed to create controller");
|
||||||
|
|
||||||
|
// Take screenshot
|
||||||
|
let path = "/tmp/test_screenshot.png";
|
||||||
|
let result = controller.take_screenshot(path, None, None).await;
|
||||||
|
assert!(result.is_ok(), "Failed to take screenshot: {:?}", result.err());
|
||||||
|
|
||||||
|
// Verify file exists
|
||||||
|
assert!(std::path::Path::new(path).exists(), "Screenshot file was not created");
|
||||||
|
|
||||||
|
// Clean up
|
||||||
|
let _ = std::fs::remove_file(path);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_click() {
|
||||||
|
let controller = create_controller().expect("Failed to create controller");
|
||||||
|
|
||||||
|
// Click at a safe location
|
||||||
|
let result = controller.click(types::MouseButton::Left).await;
|
||||||
|
assert!(result.is_ok(), "Failed to click: {:?}", result.err());
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_double_click() {
|
||||||
|
let controller = create_controller().expect("Failed to create controller");
|
||||||
|
|
||||||
|
// Double click
|
||||||
|
let result = controller.double_click(types::MouseButton::Left).await;
|
||||||
|
assert!(result.is_ok(), "Failed to double click: {:?}", result.err());
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_press_key() {
|
||||||
|
let controller = create_controller().expect("Failed to create controller");
|
||||||
|
|
||||||
|
// Press escape key
|
||||||
|
let result = controller.press_key("escape").await;
|
||||||
|
assert!(result.is_ok(), "Failed to press key: {:?}", result.err());
|
||||||
|
}
|
||||||
131
crates/g3-config/src/autonomous_config_tests.rs
Normal file
131
crates/g3-config/src/autonomous_config_tests.rs
Normal file
@@ -0,0 +1,131 @@
|
|||||||
|
#[cfg(test)]
|
||||||
|
mod autonomous_config_tests {
|
||||||
|
use crate::{Config, AnthropicConfig, DatabricksConfig};
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_default_autonomous_config() {
|
||||||
|
let config = Config::default();
|
||||||
|
assert!(config.autonomous.coach_provider.is_none());
|
||||||
|
assert!(config.autonomous.coach_model.is_none());
|
||||||
|
assert!(config.autonomous.player_provider.is_none());
|
||||||
|
assert!(config.autonomous.player_model.is_none());
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_for_coach_with_overrides() {
|
||||||
|
let mut config = Config::default();
|
||||||
|
|
||||||
|
// Set up base config with anthropic
|
||||||
|
config.providers.anthropic = Some(AnthropicConfig {
|
||||||
|
api_key: "test-key".to_string(),
|
||||||
|
model: "claude-3-5-sonnet-20241022".to_string(),
|
||||||
|
max_tokens: Some(4096),
|
||||||
|
temperature: Some(0.1),
|
||||||
|
});
|
||||||
|
|
||||||
|
// Set coach overrides
|
||||||
|
config.autonomous.coach_provider = Some("anthropic".to_string());
|
||||||
|
config.autonomous.coach_model = Some("claude-3-opus-20240229".to_string());
|
||||||
|
|
||||||
|
let coach_config = config.for_coach().unwrap();
|
||||||
|
|
||||||
|
// Verify coach uses overridden provider and model
|
||||||
|
assert_eq!(coach_config.providers.default_provider, "anthropic");
|
||||||
|
assert_eq!(
|
||||||
|
coach_config.providers.anthropic.as_ref().unwrap().model,
|
||||||
|
"claude-3-opus-20240229"
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_for_player_with_overrides() {
|
||||||
|
let mut config = Config::default();
|
||||||
|
|
||||||
|
// Set up base config with databricks
|
||||||
|
config.providers.databricks = Some(DatabricksConfig {
|
||||||
|
host: "https://test.databricks.com".to_string(),
|
||||||
|
token: Some("test-token".to_string()),
|
||||||
|
model: "databricks-meta-llama-3-1-70b-instruct".to_string(),
|
||||||
|
max_tokens: Some(4096),
|
||||||
|
temperature: Some(0.1),
|
||||||
|
use_oauth: Some(false),
|
||||||
|
});
|
||||||
|
|
||||||
|
// Set player overrides
|
||||||
|
config.autonomous.player_provider = Some("databricks".to_string());
|
||||||
|
config.autonomous.player_model = Some("databricks-dbrx-instruct".to_string());
|
||||||
|
|
||||||
|
let player_config = config.for_player().unwrap();
|
||||||
|
|
||||||
|
// Verify player uses overridden provider and model
|
||||||
|
assert_eq!(player_config.providers.default_provider, "databricks");
|
||||||
|
assert_eq!(
|
||||||
|
player_config.providers.databricks.as_ref().unwrap().model,
|
||||||
|
"databricks-dbrx-instruct"
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_no_overrides_uses_defaults() {
|
||||||
|
let mut config = Config::default();
|
||||||
|
config.providers.default_provider = "databricks".to_string();
|
||||||
|
|
||||||
|
let coach_config = config.for_coach().unwrap();
|
||||||
|
let player_config = config.for_player().unwrap();
|
||||||
|
|
||||||
|
// Both should use the default provider when no overrides
|
||||||
|
assert_eq!(coach_config.providers.default_provider, "databricks");
|
||||||
|
assert_eq!(player_config.providers.default_provider, "databricks");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_provider_override_only() {
|
||||||
|
let mut config = Config::default();
|
||||||
|
|
||||||
|
config.providers.anthropic = Some(AnthropicConfig {
|
||||||
|
api_key: "test-key".to_string(),
|
||||||
|
model: "claude-3-5-sonnet-20241022".to_string(),
|
||||||
|
max_tokens: Some(4096),
|
||||||
|
temperature: Some(0.1),
|
||||||
|
});
|
||||||
|
|
||||||
|
// Only override provider, not model
|
||||||
|
config.autonomous.coach_provider = Some("anthropic".to_string());
|
||||||
|
|
||||||
|
let coach_config = config.for_coach().unwrap();
|
||||||
|
|
||||||
|
// Should use overridden provider with its default model
|
||||||
|
assert_eq!(coach_config.providers.default_provider, "anthropic");
|
||||||
|
assert_eq!(
|
||||||
|
coach_config.providers.anthropic.as_ref().unwrap().model,
|
||||||
|
"claude-3-5-sonnet-20241022"
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_model_override_only() {
|
||||||
|
let mut config = Config::default();
|
||||||
|
config.providers.default_provider = "databricks".to_string();
|
||||||
|
|
||||||
|
config.providers.databricks = Some(DatabricksConfig {
|
||||||
|
host: "https://test.databricks.com".to_string(),
|
||||||
|
token: Some("test-token".to_string()),
|
||||||
|
model: "databricks-meta-llama-3-1-70b-instruct".to_string(),
|
||||||
|
max_tokens: Some(4096),
|
||||||
|
temperature: Some(0.1),
|
||||||
|
use_oauth: Some(false),
|
||||||
|
});
|
||||||
|
|
||||||
|
// Only override model, not provider
|
||||||
|
config.autonomous.player_model = Some("databricks-dbrx-instruct".to_string());
|
||||||
|
|
||||||
|
let player_config = config.for_player().unwrap();
|
||||||
|
|
||||||
|
// Should use default provider with overridden model
|
||||||
|
assert_eq!(player_config.providers.default_provider, "databricks");
|
||||||
|
assert_eq!(
|
||||||
|
player_config.providers.databricks.as_ref().unwrap().model,
|
||||||
|
"databricks-dbrx-instruct"
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -2,10 +2,16 @@ use serde::{Deserialize, Serialize};
|
|||||||
use anyhow::Result;
|
use anyhow::Result;
|
||||||
use std::path::Path;
|
use std::path::Path;
|
||||||
|
|
||||||
|
#[cfg(test)]
|
||||||
|
mod autonomous_config_tests;
|
||||||
|
|
||||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||||
pub struct Config {
|
pub struct Config {
|
||||||
pub providers: ProvidersConfig,
|
pub providers: ProvidersConfig,
|
||||||
pub agent: AgentConfig,
|
pub agent: AgentConfig,
|
||||||
|
pub computer_control: ComputerControlConfig,
|
||||||
|
pub webdriver: WebDriverConfig,
|
||||||
|
pub autonomous: AutonomousConfig,
|
||||||
}
|
}
|
||||||
|
|
||||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||||
@@ -62,6 +68,52 @@ pub struct AgentConfig {
|
|||||||
pub timeout_seconds: u64,
|
pub timeout_seconds: u64,
|
||||||
}
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||||
|
pub struct ComputerControlConfig {
|
||||||
|
pub enabled: bool,
|
||||||
|
pub require_confirmation: bool,
|
||||||
|
pub max_actions_per_second: u32,
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||||
|
pub struct WebDriverConfig {
|
||||||
|
pub enabled: bool,
|
||||||
|
pub safari_port: u16,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl Default for WebDriverConfig {
|
||||||
|
fn default() -> Self {
|
||||||
|
Self {
|
||||||
|
enabled: false,
|
||||||
|
safari_port: 4444,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||||
|
pub struct AutonomousConfig {
|
||||||
|
pub coach_provider: Option<String>,
|
||||||
|
pub coach_model: Option<String>,
|
||||||
|
pub player_provider: Option<String>,
|
||||||
|
pub player_model: Option<String>,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl Default for AutonomousConfig {
|
||||||
|
fn default() -> Self {
|
||||||
|
Self { coach_provider: None, coach_model: None, player_provider: None, player_model: None }
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
impl Default for ComputerControlConfig {
|
||||||
|
fn default() -> Self {
|
||||||
|
Self {
|
||||||
|
enabled: false, // Disabled by default for safety
|
||||||
|
require_confirmation: true,
|
||||||
|
max_actions_per_second: 5,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
impl Default for Config {
|
impl Default for Config {
|
||||||
fn default() -> Self {
|
fn default() -> Self {
|
||||||
Self {
|
Self {
|
||||||
@@ -84,6 +136,9 @@ impl Default for Config {
|
|||||||
enable_streaming: true,
|
enable_streaming: true,
|
||||||
timeout_seconds: 60,
|
timeout_seconds: 60,
|
||||||
},
|
},
|
||||||
|
computer_control: ComputerControlConfig::default(),
|
||||||
|
webdriver: WebDriverConfig::default(),
|
||||||
|
autonomous: AutonomousConfig::default(),
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -194,6 +249,9 @@ impl Config {
|
|||||||
enable_streaming: true,
|
enable_streaming: true,
|
||||||
timeout_seconds: 60,
|
timeout_seconds: 60,
|
||||||
},
|
},
|
||||||
|
computer_control: ComputerControlConfig::default(),
|
||||||
|
webdriver: WebDriverConfig::default(),
|
||||||
|
autonomous: AutonomousConfig::default(),
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -262,4 +320,78 @@ impl Config {
|
|||||||
|
|
||||||
Ok(config)
|
Ok(config)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Create a config for the coach agent in autonomous mode
|
||||||
|
pub fn for_coach(&self) -> Result<Self> {
|
||||||
|
let mut config = self.clone();
|
||||||
|
|
||||||
|
// Apply coach-specific overrides if configured
|
||||||
|
if let Some(ref coach_provider) = self.autonomous.coach_provider {
|
||||||
|
config.providers.default_provider = coach_provider.clone();
|
||||||
|
}
|
||||||
|
|
||||||
|
if let Some(ref coach_model) = self.autonomous.coach_model {
|
||||||
|
// Apply model override to the coach's provider
|
||||||
|
match config.providers.default_provider.as_str() {
|
||||||
|
"anthropic" => {
|
||||||
|
if let Some(ref mut anthropic) = config.providers.anthropic {
|
||||||
|
anthropic.model = coach_model.clone();
|
||||||
|
} else {
|
||||||
|
return Err(anyhow::anyhow!(
|
||||||
|
"Coach provider 'anthropic' is not configured. Please add anthropic configuration to your config file."
|
||||||
|
));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
"databricks" => {
|
||||||
|
if let Some(ref mut databricks) = config.providers.databricks {
|
||||||
|
databricks.model = coach_model.clone();
|
||||||
|
} else {
|
||||||
|
return Err(anyhow::anyhow!(
|
||||||
|
"Coach provider 'databricks' is not configured. Please add databricks configuration to your config file."
|
||||||
|
));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
_ => {}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
Ok(config)
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Create a config for the player agent in autonomous mode
|
||||||
|
pub fn for_player(&self) -> Result<Self> {
|
||||||
|
let mut config = self.clone();
|
||||||
|
|
||||||
|
// Apply player-specific overrides if configured
|
||||||
|
if let Some(ref player_provider) = self.autonomous.player_provider {
|
||||||
|
config.providers.default_provider = player_provider.clone();
|
||||||
|
}
|
||||||
|
|
||||||
|
if let Some(ref player_model) = self.autonomous.player_model {
|
||||||
|
// Apply model override to the player's provider
|
||||||
|
match config.providers.default_provider.as_str() {
|
||||||
|
"anthropic" => {
|
||||||
|
if let Some(ref mut anthropic) = config.providers.anthropic {
|
||||||
|
anthropic.model = player_model.clone();
|
||||||
|
} else {
|
||||||
|
return Err(anyhow::anyhow!(
|
||||||
|
"Player provider 'anthropic' is not configured. Please add anthropic configuration to your config file."
|
||||||
|
));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
"databricks" => {
|
||||||
|
if let Some(ref mut databricks) = config.providers.databricks {
|
||||||
|
databricks.model = player_model.clone();
|
||||||
|
} else {
|
||||||
|
return Err(anyhow::anyhow!(
|
||||||
|
"Player provider 'databricks' is not configured. Please add databricks configuration to your config file."
|
||||||
|
));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
_ => {}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
Ok(config)
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -8,6 +8,7 @@ description = "Core engine for G3 AI coding agent"
|
|||||||
g3-providers = { path = "../g3-providers" }
|
g3-providers = { path = "../g3-providers" }
|
||||||
g3-config = { path = "../g3-config" }
|
g3-config = { path = "../g3-config" }
|
||||||
g3-execution = { path = "../g3-execution" }
|
g3-execution = { path = "../g3-execution" }
|
||||||
|
g3-computer-control = { path = "../g3-computer-control" }
|
||||||
tokio = { workspace = true }
|
tokio = { workspace = true }
|
||||||
reqwest = { workspace = true }
|
reqwest = { workspace = true }
|
||||||
anyhow = { workspace = true }
|
anyhow = { workspace = true }
|
||||||
@@ -23,3 +24,4 @@ futures-util = "0.3"
|
|||||||
chrono = { version = "0.4", features = ["serde"] }
|
chrono = { version = "0.4", features = ["serde"] }
|
||||||
rand = "0.8"
|
rand = "0.8"
|
||||||
regex = "1.0"
|
regex = "1.0"
|
||||||
|
shellexpand = "3.1"
|
||||||
|
|||||||
File diff suppressed because it is too large
Load Diff
36
crates/g3-core/src/tilde_expansion_tests.rs
Normal file
36
crates/g3-core/src/tilde_expansion_tests.rs
Normal file
@@ -0,0 +1,36 @@
|
|||||||
|
#[cfg(test)]
|
||||||
|
mod tilde_expansion_tests {
|
||||||
|
use std::env;
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_tilde_expansion() {
|
||||||
|
// Test that shellexpand works
|
||||||
|
let path_with_tilde = "~/test.txt";
|
||||||
|
let expanded = shellexpand::tilde(path_with_tilde);
|
||||||
|
|
||||||
|
// Get the actual home directory
|
||||||
|
let home = env::var("HOME").expect("HOME environment variable not set");
|
||||||
|
|
||||||
|
// Verify expansion happened
|
||||||
|
assert_eq!(expanded.as_ref(), format!("{}/test.txt", home));
|
||||||
|
assert!(!expanded.contains("~"));
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_tilde_expansion_with_subdirs() {
|
||||||
|
let path_with_tilde = "~/Documents/test.txt";
|
||||||
|
let expanded = shellexpand::tilde(path_with_tilde);
|
||||||
|
|
||||||
|
let home = env::var("HOME").expect("HOME environment variable not set");
|
||||||
|
|
||||||
|
assert_eq!(expanded.as_ref(), format!("{}/Documents/test.txt", home));
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_no_tilde_unchanged() {
|
||||||
|
let path_without_tilde = "/absolute/path/test.txt";
|
||||||
|
let expanded = shellexpand::tilde(path_without_tilde);
|
||||||
|
|
||||||
|
assert_eq!(expanded.as_ref(), path_without_tilde);
|
||||||
|
}
|
||||||
|
}
|
||||||
157
crates/g3-core/tests/test_context_thinning.rs
Normal file
157
crates/g3-core/tests/test_context_thinning.rs
Normal file
@@ -0,0 +1,157 @@
|
|||||||
|
use g3_core::ContextWindow;
|
||||||
|
use g3_providers::{Message, MessageRole};
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_thinning_thresholds() {
|
||||||
|
let mut context = ContextWindow::new(10000);
|
||||||
|
|
||||||
|
// At 0%, should not thin
|
||||||
|
assert!(!context.should_thin());
|
||||||
|
|
||||||
|
// Simulate reaching 50% usage
|
||||||
|
context.used_tokens = 5000;
|
||||||
|
assert!(context.should_thin());
|
||||||
|
|
||||||
|
// After thinning at 50%, should not thin again until next threshold
|
||||||
|
context.last_thinning_percentage = 50;
|
||||||
|
assert!(!context.should_thin());
|
||||||
|
|
||||||
|
// At 60%, should thin again
|
||||||
|
context.used_tokens = 6000;
|
||||||
|
assert!(context.should_thin());
|
||||||
|
|
||||||
|
// After thinning at 60%, should not thin
|
||||||
|
context.last_thinning_percentage = 60;
|
||||||
|
assert!(!context.should_thin());
|
||||||
|
|
||||||
|
// At 70%, should thin
|
||||||
|
context.used_tokens = 7000;
|
||||||
|
assert!(context.should_thin());
|
||||||
|
|
||||||
|
// At 80%, should thin
|
||||||
|
context.last_thinning_percentage = 70;
|
||||||
|
context.used_tokens = 8000;
|
||||||
|
assert!(context.should_thin());
|
||||||
|
|
||||||
|
// After 80%, should not thin (compaction takes over)
|
||||||
|
context.last_thinning_percentage = 80;
|
||||||
|
context.used_tokens = 8500;
|
||||||
|
assert!(!context.should_thin());
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_thin_context_basic() {
|
||||||
|
let mut context = ContextWindow::new(10000);
|
||||||
|
|
||||||
|
// Add some messages to the first third
|
||||||
|
for i in 0..9 {
|
||||||
|
if i % 2 == 0 {
|
||||||
|
context.add_message(Message {
|
||||||
|
role: MessageRole::Assistant,
|
||||||
|
content: format!("Assistant message {}", i),
|
||||||
|
});
|
||||||
|
} else {
|
||||||
|
// Add tool results with varying sizes
|
||||||
|
let content = if i == 1 {
|
||||||
|
// Large tool result (> 1000 chars)
|
||||||
|
format!("Tool result: {}", "x".repeat(1500))
|
||||||
|
} else if i == 3 {
|
||||||
|
// Another large tool result
|
||||||
|
format!("Tool result: {}", "y".repeat(2000))
|
||||||
|
} else {
|
||||||
|
// Small tool result (< 1000 chars)
|
||||||
|
format!("Tool result: small result {}", i)
|
||||||
|
};
|
||||||
|
|
||||||
|
context.add_message(Message {
|
||||||
|
role: MessageRole::User,
|
||||||
|
content,
|
||||||
|
});
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Trigger thinning at 50%
|
||||||
|
context.used_tokens = 5000;
|
||||||
|
let summary = context.thin_context();
|
||||||
|
|
||||||
|
println!("Thinning summary: {}", summary);
|
||||||
|
|
||||||
|
// Should have thinned at least 1 large tool result in the first third
|
||||||
|
assert!(summary.contains("1 tool result"), "Summary was: {}", summary);
|
||||||
|
assert!(summary.contains("50%"));
|
||||||
|
|
||||||
|
// Check that the large tool results were replaced
|
||||||
|
let first_third_end = context.conversation_history.len() / 3;
|
||||||
|
for i in 0..first_third_end {
|
||||||
|
if let Some(msg) = context.conversation_history.get(i) {
|
||||||
|
if matches!(msg.role, MessageRole::User) && msg.content.starts_with("Tool result:") {
|
||||||
|
if msg.content.len() > 1000 {
|
||||||
|
panic!("Found un-thinned large tool result at index {}", i);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_thin_context_no_large_results() {
|
||||||
|
let mut context = ContextWindow::new(10000);
|
||||||
|
|
||||||
|
// Add only small messages
|
||||||
|
for i in 0..9 {
|
||||||
|
context.add_message(Message {
|
||||||
|
role: MessageRole::User,
|
||||||
|
content: format!("Tool result: small {}", i),
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
context.used_tokens = 5000;
|
||||||
|
let summary = context.thin_context();
|
||||||
|
|
||||||
|
// Should report no large results found
|
||||||
|
assert!(summary.contains("no large tool results found"));
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_thin_context_only_affects_first_third() {
|
||||||
|
let mut context = ContextWindow::new(10000);
|
||||||
|
|
||||||
|
// Add 12 messages (first third = 4 messages)
|
||||||
|
for i in 0..12 {
|
||||||
|
let content = if i % 2 == 1 {
|
||||||
|
// All odd indices are large tool results
|
||||||
|
format!("Tool result: {}", "x".repeat(1500))
|
||||||
|
} else {
|
||||||
|
format!("Assistant message {}", i)
|
||||||
|
};
|
||||||
|
|
||||||
|
let role = if i % 2 == 1 {
|
||||||
|
MessageRole::User
|
||||||
|
} else {
|
||||||
|
MessageRole::Assistant
|
||||||
|
};
|
||||||
|
|
||||||
|
context.add_message(Message { role, content });
|
||||||
|
}
|
||||||
|
|
||||||
|
context.used_tokens = 5000;
|
||||||
|
let summary = context.thin_context();
|
||||||
|
|
||||||
|
// First third is 4 messages (indices 0-3), so only indices 1 and 3 should be thinned
|
||||||
|
// That's 2 tool results
|
||||||
|
assert!(summary.contains("2 tool results"));
|
||||||
|
|
||||||
|
// Check that messages after the first third are NOT thinned
|
||||||
|
let first_third_end = context.conversation_history.len() / 3;
|
||||||
|
for i in first_third_end..context.conversation_history.len() {
|
||||||
|
if let Some(msg) = context.conversation_history.get(i) {
|
||||||
|
if matches!(msg.role, MessageRole::User) && msg.content.starts_with("Tool result:") {
|
||||||
|
// These should still be large (not thinned)
|
||||||
|
if i % 2 == 1 {
|
||||||
|
assert!(msg.content.len() > 1000,
|
||||||
|
"Message at index {} should not have been thinned", i);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
39
test-ai-requirements.sh
Executable file
39
test-ai-requirements.sh
Executable file
@@ -0,0 +1,39 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
# Test script for AI-enhanced interactive requirements mode
|
||||||
|
|
||||||
|
echo "Testing AI-enhanced interactive requirements mode..."
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# Create a test workspace
|
||||||
|
TEST_WORKSPACE="/tmp/g3-test-interactive-$(date +%s)"
|
||||||
|
mkdir -p "$TEST_WORKSPACE"
|
||||||
|
|
||||||
|
echo "Test workspace: $TEST_WORKSPACE"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# Create sample brief input
|
||||||
|
BRIEF_INPUT="build a calculator cli in rust with basic operations"
|
||||||
|
|
||||||
|
echo "Brief input:"
|
||||||
|
echo "---"
|
||||||
|
echo "$BRIEF_INPUT"
|
||||||
|
echo "---"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
echo "This will:"
|
||||||
|
echo "1. Send brief input to AI"
|
||||||
|
echo "2. AI generates structured requirements.md"
|
||||||
|
echo "3. Show enhanced requirements"
|
||||||
|
echo "4. Prompt for confirmation (y/e/n)"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
echo "To test manually, run:"
|
||||||
|
echo "cargo run -- --autonomous --interactive-requirements --workspace $TEST_WORKSPACE"
|
||||||
|
echo ""
|
||||||
|
echo "Then type: $BRIEF_INPUT"
|
||||||
|
echo "Press Ctrl+D"
|
||||||
|
echo "Review the AI-generated requirements"
|
||||||
|
echo "Choose 'y' to proceed, 'e' to edit, or 'n' to cancel"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
echo "Test workspace will be at: $TEST_WORKSPACE"
|
||||||
Reference in New Issue
Block a user