Compare commits
21 Commits
micn/agent
...
micn/inter
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
af6d37a8e2 | ||
|
|
c1c6680e03 | ||
|
|
f2d8e744bb | ||
|
|
010a43d203 | ||
|
|
758e255af8 | ||
|
|
393826ae02 | ||
|
|
3afad3d61f | ||
|
|
2488cc54d5 | ||
|
|
2ad0c9a3fd | ||
|
|
2008a81193 | ||
|
|
776f5034b8 | ||
|
|
92bece957b | ||
|
|
767299ff4e | ||
|
|
9d35449be8 | ||
|
|
da652bf287 | ||
|
|
a566171203 | ||
|
|
347c9e1e00 | ||
|
|
aa7eda0331 | ||
|
|
e42c76f3b9 | ||
|
|
dd211fab1c | ||
|
|
bcece38473 |
1203
Cargo.lock
generated
1203
Cargo.lock
generated
File diff suppressed because it is too large
Load Diff
@@ -4,7 +4,8 @@ members = [
|
|||||||
"crates/g3-core",
|
"crates/g3-core",
|
||||||
"crates/g3-providers",
|
"crates/g3-providers",
|
||||||
"crates/g3-config",
|
"crates/g3-config",
|
||||||
"crates/g3-execution"
|
"crates/g3-execution",
|
||||||
|
"crates/g3-computer-control"
|
||||||
]
|
]
|
||||||
resolver = "2"
|
resolver = "2"
|
||||||
|
|
||||||
|
|||||||
62
DESIGN.md
62
DESIGN.md
@@ -29,7 +29,8 @@ g3/
|
|||||||
│ ├── g3-core/ # Core agent engine, tools, and streaming logic
|
│ ├── g3-core/ # Core agent engine, tools, and streaming logic
|
||||||
│ ├── g3-providers/ # LLM provider abstractions and implementations
|
│ ├── g3-providers/ # LLM provider abstractions and implementations
|
||||||
│ ├── g3-config/ # Configuration management
|
│ ├── g3-config/ # Configuration management
|
||||||
│ └── g3-execution/ # Code execution engine
|
│ ├── g3-execution/ # Code execution engine
|
||||||
|
│ └── g3-computer-control/ # Computer control and automation
|
||||||
├── logs/ # Session logs (auto-created)
|
├── logs/ # Session logs (auto-created)
|
||||||
├── README.md # Project documentation
|
├── README.md # Project documentation
|
||||||
└── DESIGN.md # This design document
|
└── DESIGN.md # This design document
|
||||||
@@ -48,6 +49,7 @@ g3/
|
|||||||
│ • Retro TUI │ │ • Tool system │ │ • Embedded │
|
│ • Retro TUI │ │ • Tool system │ │ • Embedded │
|
||||||
│ • Autonomous │ │ • Streaming │ │ (llama.cpp) │
|
│ • Autonomous │ │ • Streaming │ │ (llama.cpp) │
|
||||||
│ mode │ │ • Task exec │ │ • OAuth flow │
|
│ mode │ │ • Task exec │ │ • OAuth flow │
|
||||||
|
│ │ │ • TODO mgmt │ │ │
|
||||||
└─────────────────┘ └─────────────────┘ └─────────────────┘
|
└─────────────────┘ └─────────────────┘ └─────────────────┘
|
||||||
│ │ │
|
│ │ │
|
||||||
└───────────────────────┼───────────────────────┘
|
└───────────────────────┼───────────────────────┘
|
||||||
@@ -59,7 +61,18 @@ g3/
|
|||||||
│ • Shell cmds │ │ • Env overrides │
|
│ • Shell cmds │ │ • Env overrides │
|
||||||
│ • Streaming │ │ • Provider │
|
│ • Streaming │ │ • Provider │
|
||||||
│ • Error hdlg │ │ settings │
|
│ • Error hdlg │ │ settings │
|
||||||
└─────────────────┘ └─────────────────┘
|
└─────────────────┘ │ • Computer │
|
||||||
|
│ │ control cfg │
|
||||||
|
│ └─────────────────┘
|
||||||
|
│ │
|
||||||
|
┌─────────────────┐ │
|
||||||
|
│ g3-computer- │◄────────────┘
|
||||||
|
│ control │
|
||||||
|
│ • Mouse/kbd │
|
||||||
|
│ • Screenshots │
|
||||||
|
│ • OCR/Tesseract │
|
||||||
|
│ • Windows/UI │
|
||||||
|
└─────────────────┘
|
||||||
```
|
```
|
||||||
|
|
||||||
## Core Components
|
## Core Components
|
||||||
@@ -79,6 +92,7 @@ g3/
|
|||||||
- **Streaming Parser**: Real-time parsing of LLM responses with tool call detection and execution
|
- **Streaming Parser**: Real-time parsing of LLM responses with tool call detection and execution
|
||||||
- **Session Management**: Automatic session logging with detailed conversation history and token usage
|
- **Session Management**: Automatic session logging with detailed conversation history and token usage
|
||||||
- **Error Recovery**: Sophisticated error classification and retry logic for recoverable errors
|
- **Error Recovery**: Sophisticated error classification and retry logic for recoverable errors
|
||||||
|
- **TODO Management**: In-memory TODO list with read/write tools for task tracking
|
||||||
|
|
||||||
**Available Tools:**
|
**Available Tools:**
|
||||||
- `shell`: Execute shell commands with streaming output
|
- `shell`: Execute shell commands with streaming output
|
||||||
@@ -86,7 +100,15 @@ g3/
|
|||||||
- `write_file`: Create or overwrite files with content
|
- `write_file`: Create or overwrite files with content
|
||||||
- `str_replace`: Apply unified diffs to files with precise editing
|
- `str_replace`: Apply unified diffs to files with precise editing
|
||||||
- `final_output`: Signal task completion with detailed summaries
|
- `final_output`: Signal task completion with detailed summaries
|
||||||
- **Project Management**: Workspace handling, requirements.md processing for autonomous mode
|
- `todo_read`: Read the entire TODO list content
|
||||||
|
- `todo_write`: Write or overwrite the entire TODO list
|
||||||
|
- `mouse_click`: Click the mouse at specific coordinates
|
||||||
|
- `type_text`: Type text at the current cursor position
|
||||||
|
- `find_element`: Find UI elements by text, role, or attributes
|
||||||
|
- `take_screenshot`: Capture screenshots of screen, region, or window
|
||||||
|
- `extract_text`: Extract text from images or screen regions using OCR
|
||||||
|
- `find_text_on_screen`: Find text visually on screen and return coordinates
|
||||||
|
- `list_windows`: List all open windows with IDs and titles
|
||||||
|
|
||||||
### 2. g3-providers: LLM Provider Abstraction
|
### 2. g3-providers: LLM Provider Abstraction
|
||||||
|
|
||||||
@@ -172,6 +194,26 @@ g3/
|
|||||||
- **Validation**: Configuration validation with helpful error messages
|
- **Validation**: Configuration validation with helpful error messages
|
||||||
- **Flexible Paths**: Support for shell expansion (`~`, environment variables)
|
- **Flexible Paths**: Support for shell expansion (`~`, environment variables)
|
||||||
|
|
||||||
|
### 6. g3-computer-control: Computer Control & Automation
|
||||||
|
|
||||||
|
**Primary Responsibilities:**
|
||||||
|
- Cross-platform computer control and automation
|
||||||
|
- Mouse and keyboard input simulation
|
||||||
|
- Window management and screenshot capture
|
||||||
|
- OCR text extraction from images and screen regions
|
||||||
|
|
||||||
|
**Platform Support:**
|
||||||
|
- **macOS**: Core Graphics, Cocoa, screencapture integration
|
||||||
|
- **Linux**: X11/Xtest for input, X11 for window management
|
||||||
|
- **Windows**: Win32 APIs for input and window control
|
||||||
|
|
||||||
|
**Key Features:**
|
||||||
|
- **OCR Integration**: Tesseract-based text extraction from images
|
||||||
|
- **Window Management**: List, identify, and capture specific application windows
|
||||||
|
- **UI Automation**: Find elements, simulate clicks, type text
|
||||||
|
- **Screenshot Capture**: Full screen, regions, or specific windows
|
||||||
|
- **Accessibility**: Requires OS-level permissions for automation
|
||||||
|
|
||||||
## Advanced Features
|
## Advanced Features
|
||||||
|
|
||||||
### Context Window Management
|
### Context Window Management
|
||||||
@@ -180,6 +222,7 @@ G3 implements sophisticated context window management:
|
|||||||
|
|
||||||
- **Automatic Monitoring**: Tracks token usage with percentage-based thresholds
|
- **Automatic Monitoring**: Tracks token usage with percentage-based thresholds
|
||||||
- **Smart Summarization**: Auto-triggers at 80% capacity to prevent context overflow
|
- **Smart Summarization**: Auto-triggers at 80% capacity to prevent context overflow
|
||||||
|
- **Context Thinning**: Progressive thinning at 50%, 60%, 70%, 80% thresholds - replaces large tool results with file references
|
||||||
- **Conversation Preservation**: Maintains conversation continuity through intelligent summaries
|
- **Conversation Preservation**: Maintains conversation continuity through intelligent summaries
|
||||||
- **Provider-Specific Limits**: Adapts to different model context windows (4k to 200k+ tokens)
|
- **Provider-Specific Limits**: Adapts to different model context windows (4k to 200k+ tokens)
|
||||||
- **Cumulative Tracking**: Monitors total token usage across entire sessions
|
- **Cumulative Tracking**: Monitors total token usage across entire sessions
|
||||||
@@ -354,20 +397,23 @@ This design document reflects the current state of G3 as a mature, production-re
|
|||||||
### Fully Implemented
|
### Fully Implemented
|
||||||
- ✅ **Core Agent Engine**: Complete with streaming, tool execution, and context management
|
- ✅ **Core Agent Engine**: Complete with streaming, tool execution, and context management
|
||||||
- ✅ **Provider System**: Anthropic, Databricks, and Embedded providers with OAuth support
|
- ✅ **Provider System**: Anthropic, Databricks, and Embedded providers with OAuth support
|
||||||
- ✅ **Tool System**: All 5 core tools (shell, read_file, write_file, str_replace, final_output)
|
- ✅ **Tool System**: 13 tools including file ops, shell, TODO management, and computer control
|
||||||
- ✅ **CLI Interface**: Interactive mode, single-shot mode, retro TUI
|
- ✅ **CLI Interface**: Interactive mode, single-shot mode, retro TUI
|
||||||
- ✅ **Autonomous Mode**: Coach-player feedback loop with requirements.md processing
|
- ✅ **Autonomous Mode**: Coach-player feedback loop with requirements.md processing
|
||||||
- ✅ **Configuration**: TOML-based config with environment overrides
|
- ✅ **Configuration**: TOML-based config with environment overrides
|
||||||
- ✅ **Error Handling**: Comprehensive retry logic and error classification
|
- ✅ **Error Handling**: Comprehensive retry logic and error classification
|
||||||
- ✅ **Session Logging**: Automatic session tracking and JSON logs
|
- ✅ **Session Logging**: Automatic session tracking and JSON logs
|
||||||
- ✅ **Context Management**: Auto-summarization at 80% capacity
|
- ✅ **Context Management**: Context thinning (50-80%) and auto-summarization at 80% capacity
|
||||||
|
- ✅ **Computer Control**: Cross-platform automation with OCR support
|
||||||
|
- ✅ **TODO Management**: In-memory TODO list with read/write tools
|
||||||
|
|
||||||
### Architecture Highlights
|
### Architecture Highlights
|
||||||
- **Workspace**: 5 crates with clear separation of concerns
|
- **Workspace**: 6 crates with clear separation of concerns
|
||||||
- **Dependencies**: Modern Rust ecosystem (Tokio, Clap, Serde, etc.)
|
- **Dependencies**: Modern Rust ecosystem (Tokio, Clap, Serde, etc.)
|
||||||
- **Streaming**: Real-time response processing with tool call detection
|
- **Streaming**: Real-time response processing with tool call detection
|
||||||
- **Cross-Platform**: Works on macOS, Linux, and Windows
|
- **Cross-Platform**: Works on macOS, Linux, and Windows
|
||||||
- **GPU Support**: Metal acceleration for local models on macOS
|
- **GPU Support**: Metal acceleration for local models on macOS, CUDA on Linux
|
||||||
|
- **OCR Support**: Tesseract integration for text extraction from images
|
||||||
|
|
||||||
### Key Files
|
### Key Files
|
||||||
- `src/main.rs`: main entry point delegating to g3-cli
|
- `src/main.rs`: main entry point delegating to g3-cli
|
||||||
@@ -376,3 +422,5 @@ This design document reflects the current state of G3 as a mature, production-re
|
|||||||
- `crates/g3-providers/src/lib.rs`: provider trait and registry
|
- `crates/g3-providers/src/lib.rs`: provider trait and registry
|
||||||
- `crates/g3-config/src/lib.rs`: configuration management
|
- `crates/g3-config/src/lib.rs`: configuration management
|
||||||
- `crates/g3-execution/src/lib.rs`: code execution engine
|
- `crates/g3-execution/src/lib.rs`: code execution engine
|
||||||
|
- `crates/g3-computer-control/src/lib.rs`: computer control and automation
|
||||||
|
- `crates/g3-computer-control/src/platform/`: platform-specific implementations
|
||||||
|
|||||||
59
README.md
59
README.md
@@ -11,8 +11,8 @@ G3 follows a modular architecture organized as a Rust workspace with multiple cr
|
|||||||
#### **g3-core**
|
#### **g3-core**
|
||||||
The heart of the agent system, containing:
|
The heart of the agent system, containing:
|
||||||
- **Agent Engine**: Main orchestration logic for handling conversations, tool execution, and task management
|
- **Agent Engine**: Main orchestration logic for handling conversations, tool execution, and task management
|
||||||
- **Context Window Management**: Intelligent tracking of token usage with auto-summarization capabilities when approaching context limits (~80% capacity)
|
- **Context Window Management**: Intelligent tracking of token usage with context thinning (50-80%) and auto-summarization at 80% capacity
|
||||||
- **Tool System**: Built-in tools for file operations (read, write, edit), shell command execution, and structured output generation
|
- **Tool System**: Built-in tools for file operations, shell commands, computer control, TODO management, and structured output
|
||||||
- **Streaming Response Parser**: Real-time parsing of LLM responses with tool call detection and execution
|
- **Streaming Response Parser**: Real-time parsing of LLM responses with tool call detection and execution
|
||||||
- **Task Execution**: Support for single and iterative task execution with automatic retry logic
|
- **Task Execution**: Support for single and iterative task execution with automatic retry logic
|
||||||
|
|
||||||
@@ -40,6 +40,13 @@ Task execution framework:
|
|||||||
- Error handling and retry mechanisms
|
- Error handling and retry mechanisms
|
||||||
- Progress tracking and reporting
|
- Progress tracking and reporting
|
||||||
|
|
||||||
|
#### **g3-computer-control**
|
||||||
|
Computer control capabilities:
|
||||||
|
- Mouse and keyboard automation
|
||||||
|
- UI element inspection and interaction
|
||||||
|
- Screenshot capture and window management
|
||||||
|
- OCR text extraction via Tesseract
|
||||||
|
|
||||||
#### **g3-cli**
|
#### **g3-cli**
|
||||||
Command-line interface:
|
Command-line interface:
|
||||||
- Interactive terminal interface
|
- Interactive terminal interface
|
||||||
@@ -61,13 +68,21 @@ G3 includes robust error handling with automatic retry logic:
|
|||||||
### Intelligent Context Management
|
### Intelligent Context Management
|
||||||
- Automatic context window monitoring with percentage-based tracking
|
- Automatic context window monitoring with percentage-based tracking
|
||||||
- Smart auto-summarization when approaching token limits
|
- Smart auto-summarization when approaching token limits
|
||||||
|
- **Context thinning** at 50%, 60%, 70%, 80% thresholds - automatically replaces large tool results with file references
|
||||||
- Conversation history preservation through summaries
|
- Conversation history preservation through summaries
|
||||||
- Dynamic token allocation for different providers
|
- Dynamic token allocation for different providers (4k to 200k+ tokens)
|
||||||
|
|
||||||
### Tool Ecosystem
|
### Tool Ecosystem
|
||||||
- **File Operations**: Read, write, and edit files with line-range precision
|
- **File Operations**: Read, write, and edit files with line-range precision
|
||||||
- **Shell Integration**: Execute system commands with output capture
|
- **Shell Integration**: Execute system commands with output capture
|
||||||
- **Code Generation**: Structured code generation with syntax awareness
|
- **Code Generation**: Structured code generation with syntax awareness
|
||||||
|
- **TODO Management**: Read and write TODO lists with markdown checkbox format
|
||||||
|
- **Computer Control** (Experimental): Automate desktop applications
|
||||||
|
- Mouse and keyboard control
|
||||||
|
- UI element inspection
|
||||||
|
- Screenshot capture and window management
|
||||||
|
- OCR text extraction from images and screen regions
|
||||||
|
- Window listing and identification
|
||||||
- **Final Output**: Formatted result presentation
|
- **Final Output**: Formatted result presentation
|
||||||
|
|
||||||
### Provider Flexibility
|
### Provider Flexibility
|
||||||
@@ -98,10 +113,11 @@ G3 is designed for:
|
|||||||
- Automated code generation and refactoring
|
- Automated code generation and refactoring
|
||||||
- File manipulation and project scaffolding
|
- File manipulation and project scaffolding
|
||||||
- System administration tasks
|
- System administration tasks
|
||||||
- Data processing and transformation
|
- Data processing and transformation
|
||||||
- API integration and testing
|
- API integration and testing
|
||||||
- Documentation generation
|
- Documentation generation
|
||||||
- Complex multi-step workflows
|
- Complex multi-step workflows
|
||||||
|
- Desktop application automation and testing
|
||||||
|
|
||||||
## Getting Started
|
## Getting Started
|
||||||
|
|
||||||
@@ -116,6 +132,41 @@ cargo run
|
|||||||
g3 "implement a function to calculate fibonacci numbers"
|
g3 "implement a function to calculate fibonacci numbers"
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## WebDriver Browser Automation
|
||||||
|
|
||||||
|
G3 includes WebDriver support for browser automation tasks using Safari.
|
||||||
|
|
||||||
|
**One-Time Setup** (macOS only):
|
||||||
|
|
||||||
|
Safari Remote Automation must be enabled before using WebDriver tools. Run this once:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Option 1: Use the provided script
|
||||||
|
./scripts/enable-safari-automation.sh
|
||||||
|
|
||||||
|
# Option 2: Enable manually
|
||||||
|
safaridriver --enable # Requires password
|
||||||
|
|
||||||
|
# Option 3: Enable via Safari UI
|
||||||
|
# Safari → Preferences → Advanced → Show Develop menu
|
||||||
|
# Then: Develop → Allow Remote Automation
|
||||||
|
```
|
||||||
|
|
||||||
|
**For detailed setup instructions and troubleshooting**, see [WebDriver Setup Guide](docs/webdriver-setup.md).
|
||||||
|
|
||||||
|
**Usage**: Run G3 with the `--webdriver` flag to enable browser automation tools.
|
||||||
|
|
||||||
|
## Computer Control (Experimental)
|
||||||
|
|
||||||
|
G3 can interact with your computer's GUI for automation tasks:
|
||||||
|
|
||||||
|
**Available Tools**: `mouse_click`, `type_text`, `find_element`, `take_screenshot`, `extract_text`, `find_text_on_screen`, `list_windows`
|
||||||
|
|
||||||
|
**Setup**: Enable in config with `computer_control.enabled = true` and grant OS accessibility permissions:
|
||||||
|
- **macOS**: System Preferences → Security & Privacy → Accessibility
|
||||||
|
- **Linux**: Ensure X11 or Wayland access
|
||||||
|
- **Windows**: Run as administrator (first time only)
|
||||||
|
|
||||||
## Session Logs
|
## Session Logs
|
||||||
|
|
||||||
G3 automatically saves session logs for each interaction in the `logs/` directory. These logs contain:
|
G3 automatically saves session logs for each interaction in the `logs/` directory. These logs contain:
|
||||||
|
|||||||
24
config.coach-player.example.toml
Normal file
24
config.coach-player.example.toml
Normal file
@@ -0,0 +1,24 @@
|
|||||||
|
[providers]
|
||||||
|
default_provider = "databricks"
|
||||||
|
# Specify different providers for coach and player in autonomous mode
|
||||||
|
coach = "databricks" # Provider for coach (code reviewer) - can be more powerful/expensive
|
||||||
|
player = "anthropic" # Provider for player (code implementer) - can be faster/cheaper
|
||||||
|
|
||||||
|
[providers.databricks]
|
||||||
|
host = "https://your-workspace.cloud.databricks.com"
|
||||||
|
# token = "your-databricks-token" # Optional - will use OAuth if not provided
|
||||||
|
model = "databricks-claude-sonnet-4"
|
||||||
|
max_tokens = 4096
|
||||||
|
temperature = 0.1
|
||||||
|
use_oauth = true
|
||||||
|
|
||||||
|
[providers.anthropic]
|
||||||
|
api_key = "your-anthropic-api-key"
|
||||||
|
model = "claude-3-haiku-20240307" # Using a faster model for player
|
||||||
|
max_tokens = 4096
|
||||||
|
temperature = 0.3 # Slightly higher temperature for more creative implementations
|
||||||
|
|
||||||
|
[agent]
|
||||||
|
max_context_length = 8192
|
||||||
|
enable_streaming = true
|
||||||
|
timeout_seconds = 60
|
||||||
@@ -1,5 +1,10 @@
|
|||||||
[providers]
|
[providers]
|
||||||
default_provider = "databricks"
|
default_provider = "databricks"
|
||||||
|
# Optional: Specify different providers for coach and player in autonomous mode
|
||||||
|
# If not specified, will use default_provider for both
|
||||||
|
# coach = "databricks" # Provider for coach (code reviewer)
|
||||||
|
# player = "anthropic" # Provider for player (code implementer)
|
||||||
|
# Note: Make sure the specified providers are configured below
|
||||||
|
|
||||||
[providers.databricks]
|
[providers.databricks]
|
||||||
host = "https://your-workspace.cloud.databricks.com"
|
host = "https://your-workspace.cloud.databricks.com"
|
||||||
@@ -13,3 +18,8 @@ use_oauth = true
|
|||||||
max_context_length = 8192
|
max_context_length = 8192
|
||||||
enable_streaming = true
|
enable_streaming = true
|
||||||
timeout_seconds = 60
|
timeout_seconds = 60
|
||||||
|
|
||||||
|
[computer_control]
|
||||||
|
enabled = false # Set to true to enable computer control (requires OS permissions)
|
||||||
|
require_confirmation = true
|
||||||
|
max_actions_per_second = 5
|
||||||
|
|||||||
@@ -1,7 +1,5 @@
|
|||||||
use anyhow::Result;
|
use anyhow::Result;
|
||||||
use std::time::{Duration, Instant};
|
use std::time::{Duration, Instant};
|
||||||
/// Extract coach feedback by reading from the coach agent's specific log file
|
|
||||||
/// Uses the coach agent's session ID to find the exact log file
|
|
||||||
|
|
||||||
#[derive(Debug, Clone)]
|
#[derive(Debug, Clone)]
|
||||||
struct TurnMetrics {
|
struct TurnMetrics {
|
||||||
@@ -21,7 +19,7 @@ fn generate_turn_histogram(turn_metrics: &[TurnMetrics]) -> String {
|
|||||||
// Find max values for scaling
|
// Find max values for scaling
|
||||||
let max_tokens = turn_metrics.iter().map(|t| t.tokens_used).max().unwrap_or(1);
|
let max_tokens = turn_metrics.iter().map(|t| t.tokens_used).max().unwrap_or(1);
|
||||||
let max_time_ms = turn_metrics.iter()
|
let max_time_ms = turn_metrics.iter()
|
||||||
.map(|t| t.wall_clock_time.as_millis() as u32)
|
.map(|t| t.wall_clock_time.as_millis().min(u32::MAX as u128) as u32)
|
||||||
.max()
|
.max()
|
||||||
.unwrap_or(1);
|
.unwrap_or(1);
|
||||||
|
|
||||||
@@ -35,7 +33,7 @@ fn generate_turn_histogram(turn_metrics: &[TurnMetrics]) -> String {
|
|||||||
histogram.push_str(&format!(" {} = Wall Clock Time (max: {:.1}s)\n\n", TIME_CHAR, max_time_ms as f64 / 1000.0));
|
histogram.push_str(&format!(" {} = Wall Clock Time (max: {:.1}s)\n\n", TIME_CHAR, max_time_ms as f64 / 1000.0));
|
||||||
|
|
||||||
for metrics in turn_metrics {
|
for metrics in turn_metrics {
|
||||||
let turn_time_ms = metrics.wall_clock_time.as_millis() as u32;
|
let turn_time_ms = metrics.wall_clock_time.as_millis().min(u32::MAX as u128) as u32;
|
||||||
|
|
||||||
// Calculate bar lengths (proportional to max values)
|
// Calculate bar lengths (proportional to max values)
|
||||||
let token_bar_len = if max_tokens > 0 {
|
let token_bar_len = if max_tokens > 0 {
|
||||||
@@ -99,18 +97,25 @@ fn generate_turn_histogram(turn_metrics: &[TurnMetrics]) -> String {
|
|||||||
histogram
|
histogram
|
||||||
}
|
}
|
||||||
|
|
||||||
fn extract_coach_feedback_from_logs(_coach_result: &g3_core::TaskResult, coach_agent: &g3_core::Agent<ConsoleUiWriter>, output: &SimpleOutput) -> Result<String> {
|
/// Extract coach feedback by reading from the coach agent's specific log file
|
||||||
|
/// Uses the coach agent's session ID to find the exact log file
|
||||||
|
fn extract_coach_feedback_from_logs(
|
||||||
|
coach_result: &g3_core::TaskResult,
|
||||||
|
coach_agent: &g3_core::Agent<ConsoleUiWriter>,
|
||||||
|
output: &SimpleOutput,
|
||||||
|
) -> Result<String> {
|
||||||
// CORRECT APPROACH: Get the session ID from the current coach agent
|
// CORRECT APPROACH: Get the session ID from the current coach agent
|
||||||
// and read its specific log file directly
|
// and read its specific log file directly
|
||||||
|
|
||||||
// Get the coach agent's session ID
|
// Get the coach agent's session ID
|
||||||
let session_id = coach_agent.get_session_id()
|
let session_id = coach_agent
|
||||||
|
.get_session_id()
|
||||||
.ok_or_else(|| anyhow::anyhow!("Coach agent has no session ID"))?;
|
.ok_or_else(|| anyhow::anyhow!("Coach agent has no session ID"))?;
|
||||||
|
|
||||||
// Construct the log file path for this specific coach session
|
// Construct the log file path for this specific coach session
|
||||||
let logs_dir = std::path::Path::new("logs");
|
let logs_dir = std::path::Path::new("logs");
|
||||||
let log_file_path = logs_dir.join(format!("g3_session_{}.json", session_id));
|
let log_file_path = logs_dir.join(format!("g3_session_{}.json", session_id));
|
||||||
|
|
||||||
// Read the coach agent's specific log file
|
// Read the coach agent's specific log file
|
||||||
if log_file_path.exists() {
|
if log_file_path.exists() {
|
||||||
if let Ok(log_content) = std::fs::read_to_string(&log_file_path) {
|
if let Ok(log_content) = std::fs::read_to_string(&log_file_path) {
|
||||||
@@ -122,7 +127,10 @@ fn extract_coach_feedback_from_logs(_coach_result: &g3_core::TaskResult, coach_a
|
|||||||
if let Some(last_message) = messages.last() {
|
if let Some(last_message) = messages.last() {
|
||||||
if let Some(content) = last_message.get("content") {
|
if let Some(content) = last_message.get("content") {
|
||||||
if let Some(content_str) = content.as_str() {
|
if let Some(content_str) = content.as_str() {
|
||||||
output.print(&format!("✅ Extracted coach feedback from session: {}", session_id));
|
output.print(&format!(
|
||||||
|
"✅ Extracted coach feedback from session: {}",
|
||||||
|
session_id
|
||||||
|
));
|
||||||
return Ok(content_str.to_string());
|
return Ok(content_str.to_string());
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -133,8 +141,19 @@ fn extract_coach_feedback_from_logs(_coach_result: &g3_core::TaskResult, coach_a
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
Err(anyhow::anyhow!("Could not extract feedback from coach session: {}", session_id))
|
// If we couldn't extract from logs, panic with detailed error
|
||||||
|
panic!(
|
||||||
|
"CRITICAL: Could not extract coach feedback from session: {}\n\
|
||||||
|
Log file path: {:?}\n\
|
||||||
|
Log file exists: {}\n\
|
||||||
|
This indicates the coach did not call any tool or the log is corrupted.\n\
|
||||||
|
Coach result response length: {} chars",
|
||||||
|
session_id,
|
||||||
|
log_file_path,
|
||||||
|
log_file_path.exists(),
|
||||||
|
coach_result.response.len()
|
||||||
|
);
|
||||||
}
|
}
|
||||||
|
|
||||||
use clap::Parser;
|
use clap::Parser;
|
||||||
@@ -197,6 +216,10 @@ pub struct Cli {
|
|||||||
#[arg(long, value_name = "TEXT")]
|
#[arg(long, value_name = "TEXT")]
|
||||||
pub requirements: Option<String>,
|
pub requirements: Option<String>,
|
||||||
|
|
||||||
|
/// Interactive mode: prompt for requirements and save to requirements.md before starting autonomous mode
|
||||||
|
#[arg(long)]
|
||||||
|
pub interactive_requirements: bool,
|
||||||
|
|
||||||
/// Use retro terminal UI (inspired by 80s sci-fi)
|
/// Use retro terminal UI (inspired by 80s sci-fi)
|
||||||
#[arg(long)]
|
#[arg(long)]
|
||||||
pub retro: bool,
|
pub retro: bool,
|
||||||
@@ -284,6 +307,112 @@ pub async fn run() -> Result<()> {
|
|||||||
|
|
||||||
// Create project model
|
// Create project model
|
||||||
let project = if cli.autonomous {
|
let project = if cli.autonomous {
|
||||||
|
// Handle interactive requirements mode with AI enhancement
|
||||||
|
if cli.interactive_requirements {
|
||||||
|
println!("\n📝 Interactive Requirements Mode");
|
||||||
|
println!("================================\n");
|
||||||
|
println!("Describe what you want to build (can be brief):");
|
||||||
|
println!("Press Ctrl+D (Unix) or Ctrl+Z (Windows) when done.\n");
|
||||||
|
|
||||||
|
use std::io::{self, Read, Write};
|
||||||
|
let mut requirements_input = String::new();
|
||||||
|
io::stdin().read_to_string(&mut requirements_input)?;
|
||||||
|
|
||||||
|
if requirements_input.trim().is_empty() {
|
||||||
|
anyhow::bail!("No requirements provided. Exiting.");
|
||||||
|
}
|
||||||
|
|
||||||
|
println!("\n🤖 Enhancing your requirements with AI...\n");
|
||||||
|
|
||||||
|
// Create a temporary agent to enhance the requirements
|
||||||
|
let temp_config = Config::load_with_overrides(
|
||||||
|
cli.config.as_deref(),
|
||||||
|
cli.provider.clone(),
|
||||||
|
cli.model.clone(),
|
||||||
|
)?;
|
||||||
|
|
||||||
|
let ui_writer = ConsoleUiWriter::new();
|
||||||
|
let mut temp_agent = Agent::new_with_readme_and_quiet(
|
||||||
|
temp_config,
|
||||||
|
ui_writer,
|
||||||
|
None,
|
||||||
|
true, // quiet mode
|
||||||
|
).await?;
|
||||||
|
|
||||||
|
// Craft the enhancement prompt
|
||||||
|
let enhancement_prompt = format!(
|
||||||
|
r#"You are a requirements analyst. Take this brief user input and expand it into a structured requirements document.
|
||||||
|
|
||||||
|
USER INPUT:
|
||||||
|
{}
|
||||||
|
|
||||||
|
Create a professional requirements document with:
|
||||||
|
1. A clear project title (# heading)
|
||||||
|
2. An overview section explaining what will be built
|
||||||
|
3. Organized requirements (functional, technical, quality)
|
||||||
|
4. Acceptance criteria
|
||||||
|
5. Any technical constraints or preferences mentioned
|
||||||
|
|
||||||
|
Format as proper markdown. Be specific and actionable. If the user's input is vague, make reasonable assumptions but keep it focused on what they described.
|
||||||
|
|
||||||
|
Output ONLY the markdown content, no explanations or meta-commentary."#,
|
||||||
|
requirements_input.trim()
|
||||||
|
);
|
||||||
|
|
||||||
|
// Execute enhancement task
|
||||||
|
let result = temp_agent
|
||||||
|
.execute_task_with_timing(&enhancement_prompt, None, false, false, false, false)
|
||||||
|
.await?;
|
||||||
|
|
||||||
|
let enhanced_requirements = result.response.trim().to_string();
|
||||||
|
|
||||||
|
// Show the enhanced requirements
|
||||||
|
println!("\n📋 Enhanced Requirements Document:");
|
||||||
|
println!("{}\n", "=".repeat(60));
|
||||||
|
println!("{}", enhanced_requirements);
|
||||||
|
println!("{}\n", "=".repeat(60));
|
||||||
|
|
||||||
|
// Ask for confirmation
|
||||||
|
println!("\n❓ Is this requirements document acceptable?");
|
||||||
|
println!(" [y] Yes, proceed with autonomous mode");
|
||||||
|
println!(" [e] Edit and save manually");
|
||||||
|
println!(" [n] No, cancel\n");
|
||||||
|
|
||||||
|
print!("Your choice (y/e/n): ");
|
||||||
|
io::stdout().flush()?;
|
||||||
|
|
||||||
|
let mut choice = String::new();
|
||||||
|
io::stdin().read_line(&mut choice)?;
|
||||||
|
let choice = choice.trim().to_lowercase();
|
||||||
|
|
||||||
|
let requirements_path = workspace_dir.join("requirements.md");
|
||||||
|
|
||||||
|
match choice.as_str() {
|
||||||
|
"y" | "yes" => {
|
||||||
|
// Save enhanced requirements
|
||||||
|
std::fs::write(&requirements_path, &enhanced_requirements)?;
|
||||||
|
println!("\n✅ Requirements saved to: {}", requirements_path.display());
|
||||||
|
println!("🚀 Starting autonomous mode...\n");
|
||||||
|
}
|
||||||
|
"e" | "edit" => {
|
||||||
|
// Save enhanced requirements for manual editing
|
||||||
|
std::fs::write(&requirements_path, &enhanced_requirements)?;
|
||||||
|
println!("\n✅ Requirements saved to: {}", requirements_path.display());
|
||||||
|
println!("📝 Please edit the file and run: g3 --autonomous");
|
||||||
|
println!(" Exiting for now.\n");
|
||||||
|
return Ok(());
|
||||||
|
}
|
||||||
|
"n" | "no" => {
|
||||||
|
println!("\n❌ Cancelled. No files were saved.\n");
|
||||||
|
return Ok(());
|
||||||
|
}
|
||||||
|
_ => {
|
||||||
|
println!("\n❌ Invalid choice. Cancelled.\n");
|
||||||
|
return Ok(());
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
if let Some(requirements_text) = cli.requirements {
|
if let Some(requirements_text) = cli.requirements {
|
||||||
// Use requirements text override
|
// Use requirements text override
|
||||||
Project::new_autonomous_with_requirements(workspace_dir.clone(), requirements_text)?
|
Project::new_autonomous_with_requirements(workspace_dir.clone(), requirements_text)?
|
||||||
@@ -309,14 +438,15 @@ pub async fn run() -> Result<()> {
|
|||||||
cli.provider.clone(),
|
cli.provider.clone(),
|
||||||
cli.model.clone(),
|
cli.model.clone(),
|
||||||
)?;
|
)?;
|
||||||
|
|
||||||
// Validate provider if specified
|
// Validate provider if specified
|
||||||
if let Some(ref provider) = cli.provider {
|
if let Some(ref provider) = cli.provider {
|
||||||
let valid_providers = ["anthropic", "databricks", "embedded", "openai"];
|
let valid_providers = ["anthropic", "databricks", "embedded", "openai"];
|
||||||
if !valid_providers.contains(&provider.as_str()) {
|
if !valid_providers.contains(&provider.as_str()) {
|
||||||
return Err(anyhow::anyhow!(
|
return Err(anyhow::anyhow!(
|
||||||
"Invalid provider '{}'. Valid options: {:?}",
|
"Invalid provider '{}'. Valid options: {:?}",
|
||||||
provider, valid_providers
|
provider,
|
||||||
|
valid_providers
|
||||||
));
|
));
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -335,9 +465,21 @@ pub async fn run() -> Result<()> {
|
|||||||
};
|
};
|
||||||
|
|
||||||
let mut agent = if cli.autonomous {
|
let mut agent = if cli.autonomous {
|
||||||
Agent::new_autonomous_with_readme_and_quiet(config.clone(), ui_writer, combined_content.clone(), cli.quiet).await?
|
Agent::new_autonomous_with_readme_and_quiet(
|
||||||
|
config.clone(),
|
||||||
|
ui_writer,
|
||||||
|
combined_content.clone(),
|
||||||
|
cli.quiet,
|
||||||
|
)
|
||||||
|
.await?
|
||||||
} else {
|
} else {
|
||||||
Agent::new_with_readme_and_quiet(config.clone(), ui_writer, combined_content.clone(), cli.quiet).await?
|
Agent::new_with_readme_and_quiet(
|
||||||
|
config.clone(),
|
||||||
|
ui_writer,
|
||||||
|
combined_content.clone(),
|
||||||
|
cli.quiet,
|
||||||
|
)
|
||||||
|
.await?
|
||||||
};
|
};
|
||||||
|
|
||||||
// Execute task, autonomous mode, or start interactive mode
|
// Execute task, autonomous mode, or start interactive mode
|
||||||
@@ -374,7 +516,7 @@ pub async fn run() -> Result<()> {
|
|||||||
if cli.retro {
|
if cli.retro {
|
||||||
// Use retro terminal UI
|
// Use retro terminal UI
|
||||||
run_interactive_retro(
|
run_interactive_retro(
|
||||||
config, // Already has overrides applied
|
config, // Already has overrides applied
|
||||||
cli.show_prompt,
|
cli.show_prompt,
|
||||||
cli.show_code,
|
cli.show_code,
|
||||||
cli.theme,
|
cli.theme,
|
||||||
@@ -1119,7 +1261,10 @@ async fn run_autonomous(
|
|||||||
output.print("❌ Error: requirements.md not found in workspace directory");
|
output.print("❌ Error: requirements.md not found in workspace directory");
|
||||||
output.print(" Please either:");
|
output.print(" Please either:");
|
||||||
output.print(" 1. Create a requirements.md file with your project requirements at:");
|
output.print(" 1. Create a requirements.md file with your project requirements at:");
|
||||||
output.print(&format!(" {}/requirements.md", project.workspace().display()));
|
output.print(&format!(
|
||||||
|
" {}/requirements.md",
|
||||||
|
project.workspace().display()
|
||||||
|
));
|
||||||
output.print(" 2. Or use the --requirements flag to provide requirements text directly:");
|
output.print(" 2. Or use the --requirements flag to provide requirements text directly:");
|
||||||
output.print(" g3 --autonomous --requirements \"Your requirements here\"");
|
output.print(" g3 --autonomous --requirements \"Your requirements here\"");
|
||||||
output.print("");
|
output.print("");
|
||||||
@@ -1254,11 +1399,17 @@ async fn run_autonomous(
|
|||||||
// If there's no coach feedback on subsequent turns, this is an error
|
// If there's no coach feedback on subsequent turns, this is an error
|
||||||
if coach_feedback.is_empty() {
|
if coach_feedback.is_empty() {
|
||||||
if turn > 1 {
|
if turn > 1 {
|
||||||
return Err(anyhow::anyhow!("Player mode error: No coach feedback received on turn {}", turn));
|
return Err(anyhow::anyhow!(
|
||||||
|
"Player mode error: No coach feedback received on turn {}",
|
||||||
|
turn
|
||||||
|
));
|
||||||
}
|
}
|
||||||
output.print("📋 Player starting initial implementation (no prior coach feedback)");
|
output.print("📋 Player starting initial implementation (no prior coach feedback)");
|
||||||
} else {
|
} else {
|
||||||
output.print(&format!("📋 Player received coach feedback ({} chars):", coach_feedback.len()));
|
output.print(&format!(
|
||||||
|
"📋 Player received coach feedback ({} chars):",
|
||||||
|
coach_feedback.len()
|
||||||
|
));
|
||||||
output.print(&format!("{}", coach_feedback));
|
output.print(&format!("{}", coach_feedback));
|
||||||
}
|
}
|
||||||
output.print(""); // Empty line for readability
|
output.print(""); // Empty line for readability
|
||||||
@@ -1356,7 +1507,7 @@ async fn run_autonomous(
|
|||||||
));
|
));
|
||||||
// Record turn metrics before incrementing
|
// Record turn metrics before incrementing
|
||||||
let turn_duration = turn_start_time.elapsed();
|
let turn_duration = turn_start_time.elapsed();
|
||||||
let turn_tokens = agent.get_context_window().used_tokens - turn_start_tokens;
|
let turn_tokens = agent.get_context_window().used_tokens.saturating_sub(turn_start_tokens);
|
||||||
turn_metrics.push(TurnMetrics {
|
turn_metrics.push(TurnMetrics {
|
||||||
turn_number: turn,
|
turn_number: turn,
|
||||||
tokens_used: turn_tokens,
|
tokens_used: turn_tokens,
|
||||||
@@ -1382,9 +1533,15 @@ async fn run_autonomous(
|
|||||||
|
|
||||||
// Create a new agent instance for coach mode to ensure fresh context
|
// Create a new agent instance for coach mode to ensure fresh context
|
||||||
// Use the same config with overrides that was passed to the player agent
|
// Use the same config with overrides that was passed to the player agent
|
||||||
let config = agent.get_config().clone();
|
let base_config = agent.get_config().clone();
|
||||||
|
let coach_config = base_config.for_coach()?;
|
||||||
|
|
||||||
|
// Reset filter suppression state before creating coach agent
|
||||||
|
g3_core::fixed_filter_json::reset_fixed_json_tool_state();
|
||||||
|
|
||||||
let ui_writer = ConsoleUiWriter::new();
|
let ui_writer = ConsoleUiWriter::new();
|
||||||
let mut coach_agent = Agent::new_autonomous_with_readme_and_quiet(config, ui_writer, None, quiet).await?;
|
let mut coach_agent =
|
||||||
|
Agent::new_autonomous_with_readme_and_quiet(coach_config, ui_writer, None, quiet).await?;
|
||||||
|
|
||||||
// Ensure coach agent is also in the workspace directory
|
// Ensure coach agent is also in the workspace directory
|
||||||
project.enter_workspace()?;
|
project.enter_workspace()?;
|
||||||
@@ -1414,13 +1571,13 @@ CRITICAL INSTRUCTIONS:
|
|||||||
3. Focus ONLY on what needs to be fixed or improved
|
3. Focus ONLY on what needs to be fixed or improved
|
||||||
4. Do NOT include your analysis process, file contents, or compilation output in the summary
|
4. Do NOT include your analysis process, file contents, or compilation output in the summary
|
||||||
|
|
||||||
If the implementation correctly meets all requirements and compiles without errors:
|
If the implementation generally meets all requirements and compiles without errors:
|
||||||
- Call final_output with summary: 'IMPLEMENTATION_APPROVED'
|
- Call final_output with summary: 'IMPLEMENTATION_APPROVED'
|
||||||
|
|
||||||
If improvements are needed:
|
If improvements are needed:
|
||||||
- Call final_output with a brief summary listing ONLY the specific issues to fix
|
- Call final_output with a brief summary listing ONLY the specific issues to fix
|
||||||
|
|
||||||
Remember: Be thorough in your review but concise in your feedback. APPROVE if the implementation works and generally fits the requirements.",
|
Remember: Be clear in your review and concise in your feedback. APPROVE if the implementation works and generally fits the requirements. Don't be picky.",
|
||||||
requirements
|
requirements
|
||||||
);
|
);
|
||||||
|
|
||||||
@@ -1511,7 +1668,7 @@ Remember: Be thorough in your review but concise in your feedback. APPROVE if th
|
|||||||
coach_feedback = "The implementation needs review. Please ensure all requirements are met and the code compiles without errors.".to_string();
|
coach_feedback = "The implementation needs review. Please ensure all requirements are met and the code compiles without errors.".to_string();
|
||||||
// Record turn metrics before incrementing
|
// Record turn metrics before incrementing
|
||||||
let turn_duration = turn_start_time.elapsed();
|
let turn_duration = turn_start_time.elapsed();
|
||||||
let turn_tokens = agent.get_context_window().used_tokens - turn_start_tokens;
|
let turn_tokens = agent.get_context_window().used_tokens.saturating_sub(turn_start_tokens);
|
||||||
turn_metrics.push(TurnMetrics {
|
turn_metrics.push(TurnMetrics {
|
||||||
turn_number: turn,
|
turn_number: turn,
|
||||||
tokens_used: turn_tokens,
|
tokens_used: turn_tokens,
|
||||||
@@ -1531,7 +1688,8 @@ Remember: Be thorough in your review but concise in your feedback. APPROVE if th
|
|||||||
let coach_result = coach_result_opt.unwrap();
|
let coach_result = coach_result_opt.unwrap();
|
||||||
|
|
||||||
// Extract the complete coach feedback from final_output
|
// Extract the complete coach feedback from final_output
|
||||||
let coach_feedback_text = extract_coach_feedback_from_logs(&coach_result, &coach_agent, &output)?;
|
let coach_feedback_text =
|
||||||
|
extract_coach_feedback_from_logs(&coach_result, &coach_agent, &output)?;
|
||||||
|
|
||||||
// Log the size of the feedback for debugging
|
// Log the size of the feedback for debugging
|
||||||
info!(
|
info!(
|
||||||
@@ -1546,7 +1704,7 @@ Remember: Be thorough in your review but concise in your feedback. APPROVE if th
|
|||||||
coach_feedback = "The implementation needs review. Please ensure all requirements are met and the code compiles without errors.".to_string();
|
coach_feedback = "The implementation needs review. Please ensure all requirements are met and the code compiles without errors.".to_string();
|
||||||
// Record turn metrics before incrementing
|
// Record turn metrics before incrementing
|
||||||
let turn_duration = turn_start_time.elapsed();
|
let turn_duration = turn_start_time.elapsed();
|
||||||
let turn_tokens = agent.get_context_window().used_tokens - turn_start_tokens;
|
let turn_tokens = agent.get_context_window().used_tokens.saturating_sub(turn_start_tokens);
|
||||||
turn_metrics.push(TurnMetrics {
|
turn_metrics.push(TurnMetrics {
|
||||||
turn_number: turn,
|
turn_number: turn,
|
||||||
tokens_used: turn_tokens,
|
tokens_used: turn_tokens,
|
||||||
@@ -1577,7 +1735,7 @@ Remember: Be thorough in your review but concise in your feedback. APPROVE if th
|
|||||||
coach_feedback = coach_feedback_text;
|
coach_feedback = coach_feedback_text;
|
||||||
// Record turn metrics before incrementing
|
// Record turn metrics before incrementing
|
||||||
let turn_duration = turn_start_time.elapsed();
|
let turn_duration = turn_start_time.elapsed();
|
||||||
let turn_tokens = agent.get_context_window().used_tokens - turn_start_tokens;
|
let turn_tokens = agent.get_context_window().used_tokens.saturating_sub(turn_start_tokens);
|
||||||
turn_metrics.push(TurnMetrics {
|
turn_metrics.push(TurnMetrics {
|
||||||
turn_number: turn,
|
turn_number: turn,
|
||||||
tokens_used: turn_tokens,
|
tokens_used: turn_tokens,
|
||||||
|
|||||||
@@ -10,6 +10,7 @@ pub struct ConsoleUiWriter {
|
|||||||
current_tool_args: Mutex<Vec<(String, String)>>,
|
current_tool_args: Mutex<Vec<(String, String)>>,
|
||||||
current_output_line: Mutex<Option<String>>,
|
current_output_line: Mutex<Option<String>>,
|
||||||
output_line_printed: Mutex<bool>,
|
output_line_printed: Mutex<bool>,
|
||||||
|
in_todo_tool: Mutex<bool>,
|
||||||
}
|
}
|
||||||
|
|
||||||
impl ConsoleUiWriter {
|
impl ConsoleUiWriter {
|
||||||
@@ -19,6 +20,60 @@ impl ConsoleUiWriter {
|
|||||||
current_tool_args: Mutex::new(Vec::new()),
|
current_tool_args: Mutex::new(Vec::new()),
|
||||||
current_output_line: Mutex::new(None),
|
current_output_line: Mutex::new(None),
|
||||||
output_line_printed: Mutex::new(false),
|
output_line_printed: Mutex::new(false),
|
||||||
|
in_todo_tool: Mutex::new(false),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
fn print_todo_line(&self, line: &str) {
|
||||||
|
// Transform and print todo list lines elegantly
|
||||||
|
let trimmed = line.trim();
|
||||||
|
|
||||||
|
// Skip the "📝 TODO list:" prefix line
|
||||||
|
if trimmed.starts_with("📝 TODO list:") || trimmed == "📝 TODO list is empty" {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Handle empty lines
|
||||||
|
if trimmed.is_empty() {
|
||||||
|
println!();
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Detect indentation level
|
||||||
|
let indent_count = line.chars().take_while(|c| c.is_whitespace()).count();
|
||||||
|
let indent = " ".repeat(indent_count / 2); // Convert spaces to visual indent
|
||||||
|
|
||||||
|
// Format based on line type
|
||||||
|
if trimmed.starts_with("- [ ]") {
|
||||||
|
// Incomplete task
|
||||||
|
let task = trimmed.strip_prefix("- [ ]").unwrap_or(trimmed).trim();
|
||||||
|
println!("{}☐ {}", indent, task);
|
||||||
|
} else if trimmed.starts_with("- [x]") || trimmed.starts_with("- [X]") {
|
||||||
|
// Completed task
|
||||||
|
let task = trimmed.strip_prefix("- [x]")
|
||||||
|
.or_else(|| trimmed.strip_prefix("- [X]"))
|
||||||
|
.unwrap_or(trimmed)
|
||||||
|
.trim();
|
||||||
|
println!("{}\x1b[2m☑ {}\x1b[0m", indent, task);
|
||||||
|
} else if trimmed.starts_with("- ") {
|
||||||
|
// Regular bullet point
|
||||||
|
let item = trimmed.strip_prefix("- ").unwrap_or(trimmed).trim();
|
||||||
|
println!("{}• {}", indent, item);
|
||||||
|
} else if trimmed.starts_with("# ") {
|
||||||
|
// Heading
|
||||||
|
let heading = trimmed.strip_prefix("# ").unwrap_or(trimmed).trim();
|
||||||
|
println!("\n\x1b[1m{}\x1b[0m", heading);
|
||||||
|
} else if trimmed.starts_with("## ") {
|
||||||
|
// Subheading
|
||||||
|
let subheading = trimmed.strip_prefix("## ").unwrap_or(trimmed).trim();
|
||||||
|
println!("\n\x1b[1m{}\x1b[0m", subheading);
|
||||||
|
} else if trimmed.starts_with("**") && trimmed.ends_with("**") {
|
||||||
|
// Bold text (section marker)
|
||||||
|
let text = trimmed.trim_start_matches("**").trim_end_matches("**");
|
||||||
|
println!("{}\x1b[1m{}\x1b[0m", indent, text);
|
||||||
|
} else {
|
||||||
|
// Regular text or note
|
||||||
|
println!("{}{}", indent, trimmed);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -53,6 +108,15 @@ impl UiWriter for ConsoleUiWriter {
|
|||||||
// Store the tool name and clear args for collection
|
// Store the tool name and clear args for collection
|
||||||
*self.current_tool_name.lock().unwrap() = Some(tool_name.to_string());
|
*self.current_tool_name.lock().unwrap() = Some(tool_name.to_string());
|
||||||
self.current_tool_args.lock().unwrap().clear();
|
self.current_tool_args.lock().unwrap().clear();
|
||||||
|
|
||||||
|
// Check if this is a todo tool call
|
||||||
|
let is_todo = tool_name == "todo_read" || tool_name == "todo_write";
|
||||||
|
*self.in_todo_tool.lock().unwrap() = is_todo;
|
||||||
|
|
||||||
|
// For todo tools, we'll skip the normal header and print a custom one later
|
||||||
|
if is_todo {
|
||||||
|
return;
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
fn print_tool_arg(&self, key: &str, value: &str) {
|
fn print_tool_arg(&self, key: &str, value: &str) {
|
||||||
@@ -75,6 +139,12 @@ impl UiWriter for ConsoleUiWriter {
|
|||||||
}
|
}
|
||||||
|
|
||||||
fn print_tool_output_header(&self) {
|
fn print_tool_output_header(&self) {
|
||||||
|
// Skip normal header for todo tools
|
||||||
|
if *self.in_todo_tool.lock().unwrap() {
|
||||||
|
println!(); // Just add a newline
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
println!();
|
println!();
|
||||||
// Now print the tool header with the most important arg in bold green
|
// Now print the tool header with the most important arg in bold green
|
||||||
if let Some(tool_name) = self.current_tool_name.lock().unwrap().as_ref() {
|
if let Some(tool_name) = self.current_tool_name.lock().unwrap().as_ref() {
|
||||||
@@ -115,8 +185,8 @@ impl UiWriter for ConsoleUiWriter {
|
|||||||
String::new()
|
String::new()
|
||||||
};
|
};
|
||||||
|
|
||||||
// Print with bold green formatting using ANSI escape codes
|
// Print with bold green tool name, purple (non-bold) for pipe and args
|
||||||
println!("┌─\x1b[1;32m {} | {}{}\x1b[0m", tool_name, display_value, header_suffix);
|
println!("┌─\x1b[1;32m {}\x1b[0m\x1b[35m | {}{}\x1b[0m", tool_name, display_value, header_suffix);
|
||||||
} else {
|
} else {
|
||||||
// Print with bold green formatting using ANSI escape codes
|
// Print with bold green formatting using ANSI escape codes
|
||||||
println!("┌─\x1b[1;32m {}\x1b[0m", tool_name);
|
println!("┌─\x1b[1;32m {}\x1b[0m", tool_name);
|
||||||
@@ -144,10 +214,21 @@ impl UiWriter for ConsoleUiWriter {
|
|||||||
}
|
}
|
||||||
|
|
||||||
fn print_tool_output_line(&self, line: &str) {
|
fn print_tool_output_line(&self, line: &str) {
|
||||||
|
// Special handling for todo tools
|
||||||
|
if *self.in_todo_tool.lock().unwrap() {
|
||||||
|
self.print_todo_line(line);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
println!("│ \x1b[2m{}\x1b[0m", line);
|
println!("│ \x1b[2m{}\x1b[0m", line);
|
||||||
}
|
}
|
||||||
|
|
||||||
fn print_tool_output_summary(&self, count: usize) {
|
fn print_tool_output_summary(&self, count: usize) {
|
||||||
|
// Skip for todo tools
|
||||||
|
if *self.in_todo_tool.lock().unwrap() {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
println!(
|
println!(
|
||||||
"│ \x1b[2m({} line{})\x1b[0m",
|
"│ \x1b[2m({} line{})\x1b[0m",
|
||||||
count,
|
count,
|
||||||
@@ -156,7 +237,55 @@ impl UiWriter for ConsoleUiWriter {
|
|||||||
}
|
}
|
||||||
|
|
||||||
fn print_tool_timing(&self, duration_str: &str) {
|
fn print_tool_timing(&self, duration_str: &str) {
|
||||||
println!("└─ ⚡️ {}", duration_str);
|
// For todo tools, just print a simple completion message
|
||||||
|
if *self.in_todo_tool.lock().unwrap() {
|
||||||
|
println!();
|
||||||
|
*self.in_todo_tool.lock().unwrap() = false;
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Parse the duration string to determine color
|
||||||
|
// Format is like "1.5s", "500ms", "2m 30.0s"
|
||||||
|
let color_code = if duration_str.ends_with("ms") {
|
||||||
|
// Milliseconds - use default color (< 1s)
|
||||||
|
""
|
||||||
|
} else if duration_str.contains('m') {
|
||||||
|
// Contains minutes
|
||||||
|
// Extract minutes value
|
||||||
|
if let Some(m_pos) = duration_str.find('m') {
|
||||||
|
if let Ok(minutes) = duration_str[..m_pos].trim().parse::<u32>() {
|
||||||
|
if minutes >= 5 {
|
||||||
|
"\x1b[31m" // Red for >= 5 minutes
|
||||||
|
} else {
|
||||||
|
"\x1b[38;5;208m" // Orange for >= 1 minute but < 5 minutes
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
"" // Default color if parsing fails
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
"" // Default color if 'm' not found (shouldn't happen)
|
||||||
|
}
|
||||||
|
} else if duration_str.ends_with('s') {
|
||||||
|
// Seconds only
|
||||||
|
if let Some(s_value) = duration_str.strip_suffix('s') {
|
||||||
|
if let Ok(seconds) = s_value.trim().parse::<f64>() {
|
||||||
|
if seconds >= 1.0 {
|
||||||
|
"\x1b[33m" // Yellow for >= 1 second
|
||||||
|
} else {
|
||||||
|
"" // Default color for < 1 second
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
"" // Default color if parsing fails
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
"" // Default color
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
// Milliseconds or other format - use default color
|
||||||
|
""
|
||||||
|
};
|
||||||
|
|
||||||
|
println!("└─ ⚡️ {}{}\x1b[0m", color_code, duration_str);
|
||||||
println!();
|
println!();
|
||||||
// Clear the stored tool info
|
// Clear the stored tool info
|
||||||
*self.current_tool_name.lock().unwrap() = None;
|
*self.current_tool_name.lock().unwrap() = None;
|
||||||
|
|||||||
46
crates/g3-computer-control/Cargo.toml
Normal file
46
crates/g3-computer-control/Cargo.toml
Normal file
@@ -0,0 +1,46 @@
|
|||||||
|
[package]
|
||||||
|
name = "g3-computer-control"
|
||||||
|
version = "0.1.0"
|
||||||
|
edition = "2021"
|
||||||
|
|
||||||
|
[dependencies]
|
||||||
|
# Workspace dependencies
|
||||||
|
tokio = { workspace = true }
|
||||||
|
anyhow = { workspace = true }
|
||||||
|
thiserror = { workspace = true }
|
||||||
|
serde = { workspace = true }
|
||||||
|
serde_json = { workspace = true }
|
||||||
|
tracing = { workspace = true }
|
||||||
|
uuid = { workspace = true }
|
||||||
|
|
||||||
|
shellexpand = "3.1"
|
||||||
|
# Async trait support
|
||||||
|
async-trait = "0.1"
|
||||||
|
|
||||||
|
# WebDriver support
|
||||||
|
fantoccini = "0.21"
|
||||||
|
|
||||||
|
# OCR dependencies
|
||||||
|
tesseract = "0.14"
|
||||||
|
|
||||||
|
# macOS dependencies
|
||||||
|
[target.'cfg(target_os = "macos")'.dependencies]
|
||||||
|
core-graphics = "0.23"
|
||||||
|
core-foundation = "0.9"
|
||||||
|
cocoa = "0.25"
|
||||||
|
objc = "0.2"
|
||||||
|
image = "0.24"
|
||||||
|
|
||||||
|
# Linux dependencies
|
||||||
|
[target.'cfg(target_os = "linux")'.dependencies]
|
||||||
|
x11 = { version = "2.21", features = ["xlib", "xtest"] }
|
||||||
|
image = "0.24"
|
||||||
|
|
||||||
|
# Windows dependencies
|
||||||
|
[target.'cfg(target_os = "windows")'.dependencies]
|
||||||
|
windows = { version = "0.52", features = [
|
||||||
|
"Win32_Foundation",
|
||||||
|
"Win32_UI_WindowsAndMessaging",
|
||||||
|
"Win32_UI_Input_KeyboardAndMouse",
|
||||||
|
"Win32_Graphics_Gdi",
|
||||||
|
] }
|
||||||
46
crates/g3-computer-control/examples/debug_screenshot.rs
Normal file
46
crates/g3-computer-control/examples/debug_screenshot.rs
Normal file
@@ -0,0 +1,46 @@
|
|||||||
|
use core_graphics::display::CGDisplay;
|
||||||
|
|
||||||
|
fn main() {
|
||||||
|
let display = CGDisplay::main();
|
||||||
|
let image = display.image().expect("Failed to capture screen");
|
||||||
|
|
||||||
|
println!("CGImage properties:");
|
||||||
|
println!(" Width: {}", image.width());
|
||||||
|
println!(" Height: {}", image.height());
|
||||||
|
println!(" Bits per component: {}", image.bits_per_component());
|
||||||
|
println!(" Bits per pixel: {}", image.bits_per_pixel());
|
||||||
|
println!(" Bytes per row: {}", image.bytes_per_row());
|
||||||
|
|
||||||
|
let data = image.data();
|
||||||
|
let expected_size = image.width() * image.height() * 4;
|
||||||
|
println!(" Data length: {}", data.len());
|
||||||
|
println!(" Expected (w*h*4): {}", expected_size);
|
||||||
|
|
||||||
|
// Check if there's padding in rows
|
||||||
|
let bytes_per_row = image.bytes_per_row();
|
||||||
|
let width = image.width();
|
||||||
|
let expected_bytes_per_row = width * 4;
|
||||||
|
println!("\nRow alignment:");
|
||||||
|
println!(" Actual bytes per row: {}", bytes_per_row);
|
||||||
|
println!(" Expected (width * 4): {}", expected_bytes_per_row);
|
||||||
|
println!(" Padding per row: {}", bytes_per_row - expected_bytes_per_row);
|
||||||
|
|
||||||
|
// Sample some pixels from different locations
|
||||||
|
println!("\nFirst 3 pixels (raw bytes):");
|
||||||
|
for i in 0..3 {
|
||||||
|
let offset = i * 4;
|
||||||
|
println!(" Pixel {}: [{:3}, {:3}, {:3}, {:3}]",
|
||||||
|
i, data[offset], data[offset+1], data[offset+2], data[offset+3]);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check a pixel from the middle
|
||||||
|
let mid_row = image.height() / 2;
|
||||||
|
let mid_col = image.width() / 2;
|
||||||
|
let mid_offset = (mid_row * bytes_per_row + mid_col * 4) as usize;
|
||||||
|
println!("\nMiddle pixel (row {}, col {}):", mid_row, mid_col);
|
||||||
|
println!(" Offset: {}", mid_offset);
|
||||||
|
if mid_offset + 3 < data.len() as usize {
|
||||||
|
println!(" Bytes: [{:3}, {:3}, {:3}, {:3}]",
|
||||||
|
data[mid_offset], data[mid_offset+1], data[mid_offset+2], data[mid_offset+3]);
|
||||||
|
}
|
||||||
|
}
|
||||||
56
crates/g3-computer-control/examples/list_windows.rs
Normal file
56
crates/g3-computer-control/examples/list_windows.rs
Normal file
@@ -0,0 +1,56 @@
|
|||||||
|
use core_graphics::window::{kCGWindowListOptionOnScreenOnly, kCGNullWindowID, CGWindowListCopyWindowInfo};
|
||||||
|
use core_foundation::dictionary::CFDictionary;
|
||||||
|
use core_foundation::string::CFString;
|
||||||
|
use core_foundation::base::TCFType;
|
||||||
|
|
||||||
|
fn main() {
|
||||||
|
println!("Listing all on-screen windows...");
|
||||||
|
println!("{:<10} {:<25} {}", "Window ID", "Owner", "Title");
|
||||||
|
println!("{}", "-".repeat(80));
|
||||||
|
|
||||||
|
unsafe {
|
||||||
|
let window_list = CGWindowListCopyWindowInfo(
|
||||||
|
kCGWindowListOptionOnScreenOnly,
|
||||||
|
kCGNullWindowID
|
||||||
|
);
|
||||||
|
|
||||||
|
let count = core_foundation::array::CFArray::<CFDictionary>::wrap_under_create_rule(window_list).len();
|
||||||
|
let array = core_foundation::array::CFArray::<CFDictionary>::wrap_under_create_rule(window_list);
|
||||||
|
|
||||||
|
for i in 0..count {
|
||||||
|
let dict = array.get(i).unwrap();
|
||||||
|
|
||||||
|
// Get window ID
|
||||||
|
let window_id_key = CFString::from_static_string("kCGWindowNumber");
|
||||||
|
let window_id: i64 = if let Some(value) = dict.find(window_id_key.as_concrete_TypeRef()) {
|
||||||
|
let num: core_foundation::number::CFNumber = TCFType::wrap_under_get_rule(*value as *const _);
|
||||||
|
num.to_i64().unwrap_or(0)
|
||||||
|
} else {
|
||||||
|
0
|
||||||
|
};
|
||||||
|
|
||||||
|
// Get owner name
|
||||||
|
let owner_key = CFString::from_static_string("kCGWindowOwnerName");
|
||||||
|
let owner: String = if let Some(value) = dict.find(owner_key.as_concrete_TypeRef()) {
|
||||||
|
let s: CFString = TCFType::wrap_under_get_rule(*value as *const _);
|
||||||
|
s.to_string()
|
||||||
|
} else {
|
||||||
|
"Unknown".to_string()
|
||||||
|
};
|
||||||
|
|
||||||
|
// Get window name/title
|
||||||
|
let name_key = CFString::from_static_string("kCGWindowName");
|
||||||
|
let title: String = if let Some(value) = dict.find(name_key.as_concrete_TypeRef()) {
|
||||||
|
let s: CFString = TCFType::wrap_under_get_rule(*value as *const _);
|
||||||
|
s.to_string()
|
||||||
|
} else {
|
||||||
|
"".to_string()
|
||||||
|
};
|
||||||
|
|
||||||
|
// Filter for iTerm or show all
|
||||||
|
if owner.contains("iTerm") || owner.contains("Terminal") {
|
||||||
|
println!("{:<10} {:<25} {}", window_id, owner, title);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
64
crates/g3-computer-control/examples/safari_demo.rs
Normal file
64
crates/g3-computer-control/examples/safari_demo.rs
Normal file
@@ -0,0 +1,64 @@
|
|||||||
|
use g3_computer_control::SafariDriver;
|
||||||
|
use g3_computer_control::webdriver::WebDriverController;
|
||||||
|
use anyhow::Result;
|
||||||
|
|
||||||
|
#[tokio::main]
|
||||||
|
async fn main() -> Result<()> {
|
||||||
|
println!("Safari WebDriver Demo");
|
||||||
|
println!("=====================\n");
|
||||||
|
|
||||||
|
println!("Make sure to:");
|
||||||
|
println!("1. Enable 'Allow Remote Automation' in Safari's Develop menu");
|
||||||
|
println!("2. Run: /usr/bin/safaridriver --enable");
|
||||||
|
println!("3. Start safaridriver in another terminal: safaridriver --port 4444\n");
|
||||||
|
|
||||||
|
println!("Connecting to SafariDriver...");
|
||||||
|
let mut driver = SafariDriver::new().await?;
|
||||||
|
println!("✅ Connected!\n");
|
||||||
|
|
||||||
|
// Navigate to a website
|
||||||
|
println!("Navigating to example.com...");
|
||||||
|
driver.navigate("https://example.com").await?;
|
||||||
|
println!("✅ Navigated\n");
|
||||||
|
|
||||||
|
// Get page title
|
||||||
|
let title = driver.title().await?;
|
||||||
|
println!("Page title: {}\n", title);
|
||||||
|
|
||||||
|
// Get current URL
|
||||||
|
let url = driver.current_url().await?;
|
||||||
|
println!("Current URL: {}\n", url);
|
||||||
|
|
||||||
|
// Find an element
|
||||||
|
println!("Finding h1 element...");
|
||||||
|
let mut h1 = driver.find_element("h1").await?;
|
||||||
|
let h1_text = h1.text().await?;
|
||||||
|
println!("H1 text: {}\n", h1_text);
|
||||||
|
|
||||||
|
// Find all paragraphs
|
||||||
|
println!("Finding all paragraphs...");
|
||||||
|
let paragraphs = driver.find_elements("p").await?;
|
||||||
|
println!("Found {} paragraphs\n", paragraphs.len());
|
||||||
|
|
||||||
|
// Get page source
|
||||||
|
println!("Getting page source...");
|
||||||
|
let source = driver.page_source().await?;
|
||||||
|
println!("Page source length: {} bytes\n", source.len());
|
||||||
|
|
||||||
|
// Execute JavaScript
|
||||||
|
println!("Executing JavaScript...");
|
||||||
|
let result = driver.execute_script("return document.title", vec![]).await?;
|
||||||
|
println!("JS result: {:?}\n", result);
|
||||||
|
|
||||||
|
// Take a screenshot
|
||||||
|
println!("Taking screenshot...");
|
||||||
|
driver.screenshot("/tmp/safari_demo.png").await?;
|
||||||
|
println!("✅ Screenshot saved to /tmp/safari_demo.png\n");
|
||||||
|
|
||||||
|
// Close the browser
|
||||||
|
println!("Closing browser...");
|
||||||
|
driver.quit().await?;
|
||||||
|
println!("✅ Done!");
|
||||||
|
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
@@ -0,0 +1,21 @@
|
|||||||
|
use g3_computer_control::{create_controller, ComputerController};
|
||||||
|
|
||||||
|
#[tokio::main]
|
||||||
|
async fn main() {
|
||||||
|
println!("Testing screenshot with permission prompt...");
|
||||||
|
|
||||||
|
let controller = create_controller().expect("Failed to create controller");
|
||||||
|
|
||||||
|
match controller.take_screenshot("/tmp/test_with_prompt.png", None, None).await {
|
||||||
|
Ok(_) => {
|
||||||
|
println!("\n✅ Screenshot saved to /tmp/test_with_prompt.png");
|
||||||
|
println!("Opening screenshot...");
|
||||||
|
let _ = std::process::Command::new("open")
|
||||||
|
.arg("/tmp/test_with_prompt.png")
|
||||||
|
.spawn();
|
||||||
|
}
|
||||||
|
Err(e) => {
|
||||||
|
println!("❌ Screenshot failed: {}", e);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,39 @@
|
|||||||
|
use std::process::Command;
|
||||||
|
|
||||||
|
fn main() {
|
||||||
|
let path = "/tmp/rust_screencapture_test.png";
|
||||||
|
|
||||||
|
println!("Testing screencapture command from Rust...");
|
||||||
|
|
||||||
|
let mut cmd = Command::new("screencapture");
|
||||||
|
cmd.arg("-x"); // No sound
|
||||||
|
cmd.arg(path);
|
||||||
|
|
||||||
|
println!("Command: {:?}", cmd);
|
||||||
|
|
||||||
|
match cmd.output() {
|
||||||
|
Ok(output) => {
|
||||||
|
println!("Exit status: {}", output.status);
|
||||||
|
println!("Stdout: {}", String::from_utf8_lossy(&output.stdout));
|
||||||
|
println!("Stderr: {}", String::from_utf8_lossy(&output.stderr));
|
||||||
|
|
||||||
|
if output.status.success() {
|
||||||
|
println!("\n✅ Screenshot saved to: {}", path);
|
||||||
|
|
||||||
|
// Check file exists and size
|
||||||
|
if let Ok(metadata) = std::fs::metadata(path) {
|
||||||
|
println!("File size: {} bytes ({:.1} MB)", metadata.len(), metadata.len() as f64 / 1_000_000.0);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Open it
|
||||||
|
let _ = Command::new("open").arg(path).spawn();
|
||||||
|
println!("\nOpened screenshot - please verify it looks correct!");
|
||||||
|
} else {
|
||||||
|
println!("\n❌ Screenshot failed!");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
Err(e) => {
|
||||||
|
println!("❌ Failed to execute screencapture: {}", e);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
69
crates/g3-computer-control/examples/test_screenshot_fix.rs
Normal file
69
crates/g3-computer-control/examples/test_screenshot_fix.rs
Normal file
@@ -0,0 +1,69 @@
|
|||||||
|
use core_graphics::display::CGDisplay;
|
||||||
|
use image::{ImageBuffer, RgbaImage};
|
||||||
|
use std::path::Path;
|
||||||
|
|
||||||
|
fn main() {
|
||||||
|
let display = CGDisplay::main();
|
||||||
|
let image = display.image().expect("Failed to capture screen");
|
||||||
|
|
||||||
|
let width = image.width() as u32;
|
||||||
|
let height = image.height() as u32;
|
||||||
|
let bytes_per_row = image.bytes_per_row() as usize;
|
||||||
|
let data = image.data();
|
||||||
|
|
||||||
|
println!("Testing screenshot fix...");
|
||||||
|
println!("Image: {}x{}, bytes_per_row: {}", width, height, bytes_per_row);
|
||||||
|
println!("Expected bytes per row: {}", width * 4);
|
||||||
|
println!("Padding per row: {} bytes", bytes_per_row - (width as usize * 4));
|
||||||
|
|
||||||
|
// OLD METHOD (broken) - treating data as continuous
|
||||||
|
println!("\n=== OLD METHOD (BROKEN) ===");
|
||||||
|
let mut old_rgba = Vec::with_capacity(data.len() as usize);
|
||||||
|
for chunk in data.chunks_exact(4) {
|
||||||
|
old_rgba.push(chunk[2]); // R
|
||||||
|
old_rgba.push(chunk[1]); // G
|
||||||
|
old_rgba.push(chunk[0]); // B
|
||||||
|
old_rgba.push(chunk[3]); // A
|
||||||
|
}
|
||||||
|
println!("Converted {} pixels", old_rgba.len() / 4);
|
||||||
|
println!("Expected {} pixels", width * height);
|
||||||
|
|
||||||
|
// NEW METHOD (fixed) - handling row padding
|
||||||
|
println!("\n=== NEW METHOD (FIXED) ===");
|
||||||
|
let mut new_rgba = Vec::with_capacity((width * height * 4) as usize);
|
||||||
|
for row in 0..height as usize {
|
||||||
|
let row_start = row * bytes_per_row;
|
||||||
|
let row_end = row_start + (width as usize * 4);
|
||||||
|
|
||||||
|
for chunk in data[row_start..row_end].chunks_exact(4) {
|
||||||
|
new_rgba.push(chunk[2]); // R
|
||||||
|
new_rgba.push(chunk[1]); // G
|
||||||
|
new_rgba.push(chunk[0]); // B
|
||||||
|
new_rgba.push(chunk[3]); // A
|
||||||
|
}
|
||||||
|
}
|
||||||
|
println!("Converted {} pixels", new_rgba.len() / 4);
|
||||||
|
println!("Expected {} pixels", width * height);
|
||||||
|
|
||||||
|
// Save a small crop from both methods
|
||||||
|
let crop_size = 200;
|
||||||
|
|
||||||
|
// Old method crop
|
||||||
|
let old_crop: Vec<u8> = old_rgba.iter().take((crop_size * crop_size * 4) as usize).copied().collect();
|
||||||
|
if let Some(old_img) = ImageBuffer::from_raw(crop_size, crop_size, old_crop) {
|
||||||
|
let old_img: RgbaImage = old_img;
|
||||||
|
old_img.save("/tmp/screenshot_old_method.png").unwrap();
|
||||||
|
println!("\nSaved OLD method crop to: /tmp/screenshot_old_method.png");
|
||||||
|
}
|
||||||
|
|
||||||
|
// New method crop
|
||||||
|
let new_crop: Vec<u8> = new_rgba.iter().take((crop_size * crop_size * 4) as usize).copied().collect();
|
||||||
|
if let Some(new_img) = ImageBuffer::from_raw(crop_size, crop_size, new_crop) {
|
||||||
|
let new_img: RgbaImage = new_img;
|
||||||
|
new_img.save("/tmp/screenshot_new_method.png").unwrap();
|
||||||
|
println!("Saved NEW method crop to: /tmp/screenshot_new_method.png");
|
||||||
|
}
|
||||||
|
|
||||||
|
println!("\nOpen both images to compare:");
|
||||||
|
println!(" open /tmp/screenshot_old_method.png /tmp/screenshot_new_method.png");
|
||||||
|
}
|
||||||
45
crates/g3-computer-control/examples/test_window_capture.rs
Normal file
45
crates/g3-computer-control/examples/test_window_capture.rs
Normal file
@@ -0,0 +1,45 @@
|
|||||||
|
use g3_computer_control::create_controller;
|
||||||
|
|
||||||
|
#[tokio::main]
|
||||||
|
async fn main() {
|
||||||
|
println!("Testing window-specific screenshot capture...");
|
||||||
|
|
||||||
|
let controller = create_controller().expect("Failed to create controller");
|
||||||
|
|
||||||
|
// Test 1: Capture iTerm2 window
|
||||||
|
println!("\n1. Capturing iTerm2 window...");
|
||||||
|
match controller.take_screenshot("/tmp/iterm_window.png", None, Some("iTerm2")).await {
|
||||||
|
Ok(_) => {
|
||||||
|
println!(" ✅ iTerm2 window captured to /tmp/iterm_window.png");
|
||||||
|
let _ = std::process::Command::new("open").arg("/tmp/iterm_window.png").spawn();
|
||||||
|
}
|
||||||
|
Err(e) => println!(" ❌ Failed: {}", e),
|
||||||
|
}
|
||||||
|
|
||||||
|
// Wait a moment for the image to open
|
||||||
|
tokio::time::sleep(tokio::time::Duration::from_secs(2)).await;
|
||||||
|
|
||||||
|
// Test 2: Full screen capture for comparison
|
||||||
|
println!("\n2. Capturing full screen for comparison...");
|
||||||
|
match controller.take_screenshot("/tmp/fullscreen.png", None, None).await {
|
||||||
|
Ok(_) => {
|
||||||
|
println!(" ✅ Full screen captured to /tmp/fullscreen.png");
|
||||||
|
let _ = std::process::Command::new("open").arg("/tmp/fullscreen.png").spawn();
|
||||||
|
}
|
||||||
|
Err(e) => println!(" ❌ Failed: {}", e),
|
||||||
|
}
|
||||||
|
|
||||||
|
println!("\n=== Comparison ===");
|
||||||
|
println!("iTerm window: /tmp/iterm_window.png (should show ONLY iTerm window)");
|
||||||
|
println!("Full screen: /tmp/fullscreen.png (should show entire desktop)");
|
||||||
|
|
||||||
|
// Show file sizes
|
||||||
|
if let Ok(meta1) = std::fs::metadata("/tmp/iterm_window.png") {
|
||||||
|
if let Ok(meta2) = std::fs::metadata("/tmp/fullscreen.png") {
|
||||||
|
println!("\nFile sizes:");
|
||||||
|
println!(" iTerm window: {:.1} MB", meta1.len() as f64 / 1_000_000.0);
|
||||||
|
println!(" Full screen: {:.1} MB", meta2.len() as f64 / 1_000_000.0);
|
||||||
|
println!("\nWindow capture should be smaller than full screen.");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
35
crates/g3-computer-control/src/lib.rs
Normal file
35
crates/g3-computer-control/src/lib.rs
Normal file
@@ -0,0 +1,35 @@
|
|||||||
|
pub mod types;
|
||||||
|
pub mod platform;
|
||||||
|
pub mod webdriver;
|
||||||
|
|
||||||
|
// Re-export webdriver types for convenience
|
||||||
|
pub use webdriver::{WebDriverController, WebElement, safari::SafariDriver};
|
||||||
|
|
||||||
|
use anyhow::Result;
|
||||||
|
use async_trait::async_trait;
|
||||||
|
use types::*;
|
||||||
|
|
||||||
|
#[async_trait]
|
||||||
|
pub trait ComputerController: Send + Sync {
|
||||||
|
// Screen capture
|
||||||
|
async fn take_screenshot(&self, path: &str, region: Option<Rect>, window_id: Option<&str>) -> Result<()>;
|
||||||
|
|
||||||
|
// OCR operations
|
||||||
|
async fn extract_text_from_screen(&self, region: Rect) -> Result<String>;
|
||||||
|
async fn extract_text_from_image(&self, path: &str) -> Result<String>;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Platform-specific constructor
|
||||||
|
pub fn create_controller() -> Result<Box<dyn ComputerController>> {
|
||||||
|
#[cfg(target_os = "macos")]
|
||||||
|
return Ok(Box::new(platform::macos::MacOSController::new()?));
|
||||||
|
|
||||||
|
#[cfg(target_os = "linux")]
|
||||||
|
return Ok(Box::new(platform::linux::LinuxController::new()?));
|
||||||
|
|
||||||
|
#[cfg(target_os = "windows")]
|
||||||
|
return Ok(Box::new(platform::windows::WindowsController::new()?));
|
||||||
|
|
||||||
|
#[cfg(not(any(target_os = "macos", target_os = "linux", target_os = "windows")))]
|
||||||
|
anyhow::bail!("Unsupported platform")
|
||||||
|
}
|
||||||
161
crates/g3-computer-control/src/platform/linux.rs
Normal file
161
crates/g3-computer-control/src/platform/linux.rs
Normal file
@@ -0,0 +1,161 @@
|
|||||||
|
use crate::{ComputerController, types::*};
|
||||||
|
use anyhow::Result;
|
||||||
|
use async_trait::async_trait;
|
||||||
|
use tesseract::Tesseract;
|
||||||
|
use uuid::Uuid;
|
||||||
|
|
||||||
|
pub struct LinuxController {
|
||||||
|
// Placeholder for X11 connection or other state
|
||||||
|
}
|
||||||
|
|
||||||
|
impl LinuxController {
|
||||||
|
pub fn new() -> Result<Self> {
|
||||||
|
// Initialize X11 connection
|
||||||
|
tracing::warn!("Linux computer control not fully implemented");
|
||||||
|
Ok(Self {})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[async_trait]
|
||||||
|
impl ComputerController for LinuxController {
|
||||||
|
async fn move_mouse(&self, _x: i32, _y: i32) -> Result<()> {
|
||||||
|
anyhow::bail!("Linux implementation not yet available")
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn click(&self, _button: MouseButton) -> Result<()> {
|
||||||
|
anyhow::bail!("Linux implementation not yet available")
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn double_click(&self, _button: MouseButton) -> Result<()> {
|
||||||
|
anyhow::bail!("Linux implementation not yet available")
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn type_text(&self, _text: &str) -> Result<()> {
|
||||||
|
anyhow::bail!("Linux implementation not yet available")
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn press_key(&self, _key: &str) -> Result<()> {
|
||||||
|
anyhow::bail!("Linux implementation not yet available")
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn list_windows(&self) -> Result<Vec<Window>> {
|
||||||
|
anyhow::bail!("Linux implementation not yet available")
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn focus_window(&self, _window_id: &str) -> Result<()> {
|
||||||
|
anyhow::bail!("Linux implementation not yet available")
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn get_window_bounds(&self, _window_id: &str) -> Result<Rect> {
|
||||||
|
anyhow::bail!("Linux implementation not yet available")
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn find_element(&self, _selector: &ElementSelector) -> Result<Option<UIElement>> {
|
||||||
|
anyhow::bail!("Linux implementation not yet available")
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn get_element_text(&self, _element_id: &str) -> Result<String> {
|
||||||
|
anyhow::bail!("Linux implementation not yet available")
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn get_element_bounds(&self, _element_id: &str) -> Result<Rect> {
|
||||||
|
anyhow::bail!("Linux implementation not yet available")
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn take_screenshot(&self, _path: &str, _region: Option<Rect>, _window_id: Option<&str>) -> Result<()> {
|
||||||
|
anyhow::bail!("Linux implementation not yet available")
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn extract_text_from_screen(&self, _region: Rect) -> Result<OCRResult> {
|
||||||
|
anyhow::bail!("Linux implementation not yet available")
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn extract_text_from_image(&self, _path: &str) -> Result<OCRResult> {
|
||||||
|
// Check if tesseract is available on the system
|
||||||
|
let tesseract_check = std::process::Command::new("which")
|
||||||
|
.arg("tesseract")
|
||||||
|
.output();
|
||||||
|
|
||||||
|
if tesseract_check.is_err() || !tesseract_check.as_ref().unwrap().status.success() {
|
||||||
|
anyhow::bail!("Tesseract OCR is not installed on your system.\n\n\
|
||||||
|
To install tesseract:\n \
|
||||||
|
Ubuntu/Debian: sudo apt-get install tesseract-ocr\n \
|
||||||
|
RHEL/CentOS: sudo yum install tesseract\n \
|
||||||
|
Arch Linux: sudo pacman -S tesseract\n\n\
|
||||||
|
After installation, restart your terminal and try again.");
|
||||||
|
}
|
||||||
|
|
||||||
|
// Initialize Tesseract
|
||||||
|
let tess = Tesseract::new(None, Some("eng"))
|
||||||
|
.map_err(|e| {
|
||||||
|
anyhow::anyhow!("Failed to initialize Tesseract: {}\n\n\
|
||||||
|
This usually means:\n1. Tesseract is not properly installed\n\
|
||||||
|
2. Language data files are missing\n\nTo fix:\n \
|
||||||
|
Ubuntu/Debian: sudo apt-get install tesseract-ocr-eng\n \
|
||||||
|
RHEL/CentOS: sudo yum install tesseract-langpack-eng\n \
|
||||||
|
Arch Linux: sudo pacman -S tesseract-data-eng", e)
|
||||||
|
})?;
|
||||||
|
|
||||||
|
let text = tess.set_image(_path)
|
||||||
|
.map_err(|e| anyhow::anyhow!("Failed to load image '{}': {}", _path, e))?
|
||||||
|
.get_text()
|
||||||
|
.map_err(|e| anyhow::anyhow!("Failed to extract text from image: {}", e))?;
|
||||||
|
|
||||||
|
// Get confidence (simplified - would need more complex API calls for per-word confidence)
|
||||||
|
let confidence = 0.85; // Placeholder
|
||||||
|
|
||||||
|
Ok(OCRResult {
|
||||||
|
text,
|
||||||
|
confidence,
|
||||||
|
bounds: Rect { x: 0, y: 0, width: 0, height: 0 }, // Would need image dimensions
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn find_text_on_screen(&self, _text: &str) -> Result<Option<Point>> {
|
||||||
|
// Check if tesseract is available on the system
|
||||||
|
let tesseract_check = std::process::Command::new("which")
|
||||||
|
.arg("tesseract")
|
||||||
|
.output();
|
||||||
|
|
||||||
|
if tesseract_check.is_err() || !tesseract_check.as_ref().unwrap().status.success() {
|
||||||
|
anyhow::bail!("Tesseract OCR is not installed on your system.\n\n\
|
||||||
|
To install tesseract:\n \
|
||||||
|
Ubuntu/Debian: sudo apt-get install tesseract-ocr\n \
|
||||||
|
RHEL/CentOS: sudo yum install tesseract\n \
|
||||||
|
Arch Linux: sudo pacman -S tesseract\n\n\
|
||||||
|
After installation, restart your terminal and try again.");
|
||||||
|
}
|
||||||
|
|
||||||
|
// Take full screen screenshot
|
||||||
|
let temp_path = format!("/tmp/g3_ocr_search_{}.png", uuid::Uuid::new_v4());
|
||||||
|
self.take_screenshot(&temp_path, None, None).await?;
|
||||||
|
|
||||||
|
// Use Tesseract to find text with bounding boxes
|
||||||
|
let tess = Tesseract::new(None, Some("eng"))
|
||||||
|
.map_err(|e| {
|
||||||
|
anyhow::anyhow!("Failed to initialize Tesseract: {}\n\n\
|
||||||
|
This usually means:\n1. Tesseract is not properly installed\n\
|
||||||
|
2. Language data files are missing\n\nTo fix:\n \
|
||||||
|
Ubuntu/Debian: sudo apt-get install tesseract-ocr-eng\n \
|
||||||
|
RHEL/CentOS: sudo yum install tesseract-langpack-eng\n \
|
||||||
|
Arch Linux: sudo pacman -S tesseract-data-eng", e)
|
||||||
|
})?;
|
||||||
|
|
||||||
|
let full_text = tess.set_image(temp_path.as_str())
|
||||||
|
.map_err(|e| anyhow::anyhow!("Failed to load screenshot: {}", e))?
|
||||||
|
.get_text()
|
||||||
|
.map_err(|e| anyhow::anyhow!("Failed to extract text from screen: {}", e))?;
|
||||||
|
|
||||||
|
// Clean up temp file
|
||||||
|
let _ = std::fs::remove_file(&temp_path);
|
||||||
|
|
||||||
|
// Simple text search - full implementation would use get_component_images
|
||||||
|
// to get bounding boxes for each word
|
||||||
|
if full_text.contains(_text) {
|
||||||
|
tracing::warn!("Text found but precise coordinates not available in simplified implementation");
|
||||||
|
Ok(Some(Point { x: 0, y: 0 }))
|
||||||
|
} else {
|
||||||
|
Ok(None)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
125
crates/g3-computer-control/src/platform/macos.rs
Normal file
125
crates/g3-computer-control/src/platform/macos.rs
Normal file
@@ -0,0 +1,125 @@
|
|||||||
|
use crate::{ComputerController, types::Rect};
|
||||||
|
use anyhow::Result;
|
||||||
|
use async_trait::async_trait;
|
||||||
|
use std::path::Path;
|
||||||
|
use tesseract::Tesseract;
|
||||||
|
|
||||||
|
pub struct MacOSController {
|
||||||
|
// Empty struct for now
|
||||||
|
}
|
||||||
|
|
||||||
|
impl MacOSController {
|
||||||
|
pub fn new() -> Result<Self> {
|
||||||
|
Ok(Self {})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[async_trait]
|
||||||
|
impl ComputerController for MacOSController {
|
||||||
|
async fn take_screenshot(&self, path: &str, region: Option<Rect>, window_id: Option<&str>) -> Result<()> {
|
||||||
|
// Determine the temporary directory for screenshots
|
||||||
|
let temp_dir = std::env::var("TMPDIR")
|
||||||
|
.or_else(|_| std::env::var("HOME").map(|h| format!("{}/tmp", h)))
|
||||||
|
.unwrap_or_else(|_| "/tmp".to_string());
|
||||||
|
|
||||||
|
// Ensure temp directory exists
|
||||||
|
std::fs::create_dir_all(&temp_dir)?;
|
||||||
|
|
||||||
|
// If path is relative or doesn't specify a directory, use temp_dir
|
||||||
|
let final_path = if path.starts_with('/') {
|
||||||
|
path.to_string()
|
||||||
|
} else {
|
||||||
|
format!("{}/{}", temp_dir.trim_end_matches('/'), path)
|
||||||
|
};
|
||||||
|
|
||||||
|
let path_obj = Path::new(&final_path);
|
||||||
|
if let Some(parent) = path_obj.parent() {
|
||||||
|
std::fs::create_dir_all(parent)?;
|
||||||
|
}
|
||||||
|
|
||||||
|
let mut cmd = std::process::Command::new("screencapture");
|
||||||
|
|
||||||
|
// Add flags
|
||||||
|
cmd.arg("-x"); // No sound
|
||||||
|
|
||||||
|
if let Some(region) = region {
|
||||||
|
// Capture specific region: -R x,y,width,height
|
||||||
|
cmd.arg("-R");
|
||||||
|
cmd.arg(format!("{},{},{},{}", region.x, region.y, region.width, region.height));
|
||||||
|
}
|
||||||
|
|
||||||
|
if let Some(app_name) = window_id {
|
||||||
|
// Capture specific window by app name
|
||||||
|
// Use AppleScript to get window ID
|
||||||
|
let script = format!(r#"tell application "{}" to id of window 1"#, app_name);
|
||||||
|
let output = std::process::Command::new("osascript")
|
||||||
|
.arg("-e")
|
||||||
|
.arg(&script)
|
||||||
|
.output()?;
|
||||||
|
|
||||||
|
if output.status.success() {
|
||||||
|
let window_id_str = String::from_utf8_lossy(&output.stdout).trim().to_string();
|
||||||
|
cmd.arg(format!("-l{}", window_id_str));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
cmd.arg(&final_path);
|
||||||
|
|
||||||
|
let screenshot_result = cmd.output()?;
|
||||||
|
|
||||||
|
if !screenshot_result.status.success() {
|
||||||
|
let stderr = String::from_utf8_lossy(&screenshot_result.stderr);
|
||||||
|
return Err(anyhow::anyhow!("screencapture failed: {}", stderr));
|
||||||
|
}
|
||||||
|
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn extract_text_from_screen(&self, region: Rect) -> Result<String> {
|
||||||
|
// Take screenshot of region first
|
||||||
|
let temp_path = format!("/tmp/g3_ocr_{}.png", uuid::Uuid::new_v4());
|
||||||
|
self.take_screenshot(&temp_path, Some(region), None).await?;
|
||||||
|
|
||||||
|
// Extract text from the screenshot
|
||||||
|
let result = self.extract_text_from_image(&temp_path).await?;
|
||||||
|
|
||||||
|
// Clean up temp file
|
||||||
|
let _ = std::fs::remove_file(&temp_path);
|
||||||
|
|
||||||
|
Ok(result)
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn extract_text_from_image(&self, path: &str) -> Result<String> {
|
||||||
|
// Check if tesseract is available on the system
|
||||||
|
let tesseract_check = std::process::Command::new("which")
|
||||||
|
.arg("tesseract")
|
||||||
|
.output();
|
||||||
|
|
||||||
|
if tesseract_check.is_err() || !tesseract_check.as_ref().unwrap().status.success() {
|
||||||
|
anyhow::bail!("Tesseract OCR is not installed on your system.\n\n\
|
||||||
|
To install tesseract:\n macOS: brew install tesseract\n \
|
||||||
|
Linux: sudo apt-get install tesseract-ocr (Ubuntu/Debian)\n \
|
||||||
|
sudo yum install tesseract (RHEL/CentOS)\n \
|
||||||
|
Windows: Download from https://github.com/UB-Mannheim/tesseract/wiki\n\n\
|
||||||
|
After installation, restart your terminal and try again.");
|
||||||
|
}
|
||||||
|
|
||||||
|
// Initialize Tesseract
|
||||||
|
let tess = Tesseract::new(None, Some("eng"))
|
||||||
|
.map_err(|e| {
|
||||||
|
anyhow::anyhow!("Failed to initialize Tesseract: {}\n\n\
|
||||||
|
This usually means:\n1. Tesseract is not properly installed\n\
|
||||||
|
2. Language data files are missing\n\nTo fix:\n \
|
||||||
|
macOS: brew reinstall tesseract\n \
|
||||||
|
Linux: sudo apt-get install tesseract-ocr-eng\n \
|
||||||
|
Windows: Reinstall tesseract and ensure language files are included", e)
|
||||||
|
})?;
|
||||||
|
|
||||||
|
let text = tess.set_image(path)
|
||||||
|
.map_err(|e| anyhow::anyhow!("Failed to load image '{}': {}", path, e))?
|
||||||
|
.get_text()
|
||||||
|
.map_err(|e| anyhow::anyhow!("Failed to extract text from image: {}", e))?;
|
||||||
|
|
||||||
|
Ok(text)
|
||||||
|
}
|
||||||
|
}
|
||||||
425
crates/g3-computer-control/src/platform/macos.rs.bak
Normal file
425
crates/g3-computer-control/src/platform/macos.rs.bak
Normal file
@@ -0,0 +1,425 @@
|
|||||||
|
use crate::{ComputerController, types::*};
|
||||||
|
use anyhow::Result;
|
||||||
|
use async_trait::async_trait;
|
||||||
|
use core_graphics::display::CGPoint;
|
||||||
|
use core_graphics::event::{CGEvent, CGEventType, CGMouseButton, CGEventTapLocation};
|
||||||
|
use core_graphics::event_source::{CGEventSource, CGEventSourceStateID};
|
||||||
|
use std::path::Path;
|
||||||
|
use tesseract::Tesseract;
|
||||||
|
|
||||||
|
// MacOSController doesn't store CGEventSource to avoid Send/Sync issues
|
||||||
|
// We create it fresh for each operation
|
||||||
|
pub struct MacOSController {
|
||||||
|
// Empty struct - event source created per operation
|
||||||
|
}
|
||||||
|
|
||||||
|
impl MacOSController {
|
||||||
|
pub fn new() -> Result<Self> {
|
||||||
|
// Test that we can create an event source
|
||||||
|
let _event_source = CGEventSource::new(CGEventSourceStateID::CombinedSessionState)
|
||||||
|
.map_err(|_| anyhow::anyhow!("Failed to create event source. Make sure Accessibility permissions are granted."))?;
|
||||||
|
Ok(Self {})
|
||||||
|
}
|
||||||
|
|
||||||
|
fn key_to_keycode(&self, key: &str) -> Result<u16> {
|
||||||
|
// Map key names to macOS keycodes
|
||||||
|
let keycode = match key.to_lowercase().as_str() {
|
||||||
|
"return" | "enter" => 36,
|
||||||
|
"tab" => 48,
|
||||||
|
"space" => 49,
|
||||||
|
"delete" | "backspace" => 51,
|
||||||
|
"escape" | "esc" => 53,
|
||||||
|
"command" | "cmd" => 55,
|
||||||
|
"shift" => 56,
|
||||||
|
"capslock" => 57,
|
||||||
|
"option" | "alt" => 58,
|
||||||
|
"control" | "ctrl" => 59,
|
||||||
|
"left" => 123,
|
||||||
|
"right" => 124,
|
||||||
|
"down" => 125,
|
||||||
|
"up" => 126,
|
||||||
|
_ => anyhow::bail!("Unknown key: {}", key),
|
||||||
|
};
|
||||||
|
Ok(keycode)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[async_trait]
|
||||||
|
impl ComputerController for MacOSController {
|
||||||
|
async fn move_mouse(&self, x: i32, y: i32) -> Result<()> {
|
||||||
|
let event_source = CGEventSource::new(CGEventSourceStateID::CombinedSessionState)
|
||||||
|
.map_err(|_| anyhow::anyhow!("Failed to create event source"))?;
|
||||||
|
let point = CGPoint::new(x as f64, y as f64);
|
||||||
|
let event = CGEvent::new_mouse_event(
|
||||||
|
event_source,
|
||||||
|
CGEventType::MouseMoved,
|
||||||
|
point,
|
||||||
|
CGMouseButton::Left,
|
||||||
|
).map_err(|_| anyhow::anyhow!("Failed to create mouse move event"))?;
|
||||||
|
|
||||||
|
event.post(CGEventTapLocation::HID);
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn click(&self, button: MouseButton) -> Result<()> {
|
||||||
|
let (cg_button, down_type, up_type) = match button {
|
||||||
|
MouseButton::Left => (CGMouseButton::Left, CGEventType::LeftMouseDown, CGEventType::LeftMouseUp),
|
||||||
|
MouseButton::Right => (CGMouseButton::Right, CGEventType::RightMouseDown, CGEventType::RightMouseUp),
|
||||||
|
MouseButton::Middle => (CGMouseButton::Center, CGEventType::OtherMouseDown, CGEventType::OtherMouseUp),
|
||||||
|
};
|
||||||
|
|
||||||
|
let point = {
|
||||||
|
// Get current mouse position
|
||||||
|
let temp_source = CGEventSource::new(CGEventSourceStateID::CombinedSessionState)
|
||||||
|
.map_err(|_| anyhow::anyhow!("Failed to create event source"))?;
|
||||||
|
let event = CGEvent::new(temp_source)
|
||||||
|
.map_err(|_| anyhow::anyhow!("Failed to get mouse position"))?;
|
||||||
|
let p = event.location();
|
||||||
|
p
|
||||||
|
};
|
||||||
|
|
||||||
|
{
|
||||||
|
let event_source = CGEventSource::new(CGEventSourceStateID::CombinedSessionState)
|
||||||
|
.map_err(|_| anyhow::anyhow!("Failed to create event source"))?;
|
||||||
|
|
||||||
|
// Mouse down
|
||||||
|
let down_event = CGEvent::new_mouse_event(
|
||||||
|
event_source,
|
||||||
|
down_type,
|
||||||
|
point,
|
||||||
|
cg_button,
|
||||||
|
).map_err(|_| anyhow::anyhow!("Failed to create mouse down event"))?;
|
||||||
|
down_event.post(CGEventTapLocation::HID);
|
||||||
|
} // event_source and down_event dropped here
|
||||||
|
|
||||||
|
// Small delay
|
||||||
|
tokio::time::sleep(tokio::time::Duration::from_millis(50)).await;
|
||||||
|
|
||||||
|
{
|
||||||
|
let event_source = CGEventSource::new(CGEventSourceStateID::CombinedSessionState)
|
||||||
|
.map_err(|_| anyhow::anyhow!("Failed to create event source"))?;
|
||||||
|
|
||||||
|
let up_event = CGEvent::new_mouse_event(
|
||||||
|
event_source,
|
||||||
|
up_type,
|
||||||
|
point,
|
||||||
|
cg_button,
|
||||||
|
).map_err(|_| anyhow::anyhow!("Failed to create mouse up event"))?;
|
||||||
|
up_event.post(CGEventTapLocation::HID);
|
||||||
|
} // event_source and up_event dropped here
|
||||||
|
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn double_click(&self, button: MouseButton) -> Result<()> {
|
||||||
|
self.click(button).await?;
|
||||||
|
tokio::time::sleep(tokio::time::Duration::from_millis(100)).await;
|
||||||
|
self.click(button).await?;
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn type_text(&self, text: &str) -> Result<()> {
|
||||||
|
for ch in text.chars() {
|
||||||
|
{
|
||||||
|
let event_source = CGEventSource::new(CGEventSourceStateID::CombinedSessionState)
|
||||||
|
.map_err(|_| anyhow::anyhow!("Failed to create event source"))?;
|
||||||
|
|
||||||
|
// Create keyboard event for character
|
||||||
|
let event = CGEvent::new_keyboard_event(
|
||||||
|
event_source,
|
||||||
|
0, // keycode (0 for unicode)
|
||||||
|
true,
|
||||||
|
).map_err(|_| anyhow::anyhow!("Failed to create keyboard event"))?;
|
||||||
|
|
||||||
|
// Set unicode string
|
||||||
|
let mut utf16_buf = [0u16; 2];
|
||||||
|
let utf16_slice = ch.encode_utf16(&mut utf16_buf);
|
||||||
|
let utf16_chars: Vec<u16> = utf16_slice.iter().copied().collect();
|
||||||
|
|
||||||
|
event.set_string_from_utf16_unchecked(utf16_chars.as_slice());
|
||||||
|
event.post(CGEventTapLocation::HID);
|
||||||
|
} // event_source and event dropped here
|
||||||
|
|
||||||
|
tokio::time::sleep(tokio::time::Duration::from_millis(10)).await;
|
||||||
|
}
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn press_key(&self, key: &str) -> Result<()> {
|
||||||
|
let keycode = self.key_to_keycode(key)?;
|
||||||
|
|
||||||
|
{
|
||||||
|
let event_source = CGEventSource::new(CGEventSourceStateID::CombinedSessionState)
|
||||||
|
.map_err(|_| anyhow::anyhow!("Failed to create event source"))?;
|
||||||
|
|
||||||
|
// Key down
|
||||||
|
let down_event = CGEvent::new_keyboard_event(
|
||||||
|
event_source,
|
||||||
|
keycode,
|
||||||
|
true,
|
||||||
|
).map_err(|_| anyhow::anyhow!("Failed to create key down event"))?;
|
||||||
|
down_event.post(CGEventTapLocation::HID);
|
||||||
|
} // event_source and down_event dropped here
|
||||||
|
|
||||||
|
tokio::time::sleep(tokio::time::Duration::from_millis(50)).await;
|
||||||
|
|
||||||
|
{
|
||||||
|
let event_source = CGEventSource::new(CGEventSourceStateID::CombinedSessionState)
|
||||||
|
.map_err(|_| anyhow::anyhow!("Failed to create event source"))?;
|
||||||
|
|
||||||
|
// Key up
|
||||||
|
let up_event = CGEvent::new_keyboard_event(
|
||||||
|
event_source,
|
||||||
|
keycode,
|
||||||
|
false,
|
||||||
|
).map_err(|_| anyhow::anyhow!("Failed to create key up event"))?;
|
||||||
|
up_event.post(CGEventTapLocation::HID);
|
||||||
|
} // event_source and up_event dropped here
|
||||||
|
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn list_windows(&self) -> Result<Vec<Window>> {
|
||||||
|
// Note: Full implementation would use CGWindowListCopyWindowInfo
|
||||||
|
// For now, return empty list as this requires more complex FFI
|
||||||
|
tracing::warn!("list_windows not fully implemented on macOS");
|
||||||
|
Ok(vec![])
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn focus_window(&self, _window_id: &str) -> Result<()> {
|
||||||
|
// Note: Full implementation would use NSWorkspace to activate application
|
||||||
|
tracing::warn!("focus_window not fully implemented on macOS");
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn get_window_bounds(&self, _window_id: &str) -> Result<Rect> {
|
||||||
|
// Note: Full implementation would use Accessibility API
|
||||||
|
tracing::warn!("get_window_bounds not fully implemented on macOS");
|
||||||
|
Ok(Rect { x: 0, y: 0, width: 800, height: 600 })
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn find_element(&self, _selector: &ElementSelector) -> Result<Option<UIElement>> {
|
||||||
|
// Note: Full implementation would use macOS Accessibility API
|
||||||
|
tracing::warn!("find_element not fully implemented on macOS");
|
||||||
|
Ok(None)
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn get_element_text(&self, _element_id: &str) -> Result<String> {
|
||||||
|
// Note: Full implementation would use Accessibility API
|
||||||
|
tracing::warn!("get_element_text not fully implemented on macOS");
|
||||||
|
Ok(String::new())
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn get_element_bounds(&self, _element_id: &str) -> Result<Rect> {
|
||||||
|
// Note: Full implementation would use Accessibility API
|
||||||
|
tracing::warn!("get_element_bounds not fully implemented on macOS");
|
||||||
|
Ok(Rect { x: 0, y: 0, width: 100, height: 30 })
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn take_screenshot(&self, path: &str, _region: Option<Rect>, window_id: Option<&str>) -> Result<()> {
|
||||||
|
// Use native macOS screencapture command which handles all the format complexities
|
||||||
|
|
||||||
|
// Check if we have Screen Recording permission by attempting a test capture
|
||||||
|
// If we only get wallpaper/menubar but no windows, we need permission
|
||||||
|
let needs_permission_check = std::env::var("G3_SKIP_PERMISSION_CHECK").is_err();
|
||||||
|
|
||||||
|
if needs_permission_check {
|
||||||
|
// Try to open Screen Recording settings if this is the first screenshot
|
||||||
|
static PERMISSION_PROMPTED: std::sync::atomic::AtomicBool = std::sync::atomic::AtomicBool::new(false);
|
||||||
|
|
||||||
|
if !PERMISSION_PROMPTED.swap(true, std::sync::atomic::Ordering::Relaxed) {
|
||||||
|
tracing::warn!("\n=== Screen Recording Permission Required ===\n\
|
||||||
|
macOS requires explicit permission to capture window content.\n\
|
||||||
|
If screenshots only show wallpaper/menubar (no windows):\n\n\
|
||||||
|
1. Open System Settings > Privacy & Security > Screen Recording\n\
|
||||||
|
2. Enable permission for your terminal (iTerm/Terminal) or g3\n\
|
||||||
|
3. Restart your terminal if needed\n\n\
|
||||||
|
Opening Screen Recording settings now...\n");
|
||||||
|
|
||||||
|
// Try to open the settings (non-blocking)
|
||||||
|
let _ = std::process::Command::new("open")
|
||||||
|
.arg("x-apple.systempreferences:com.apple.preference.security?Privacy_ScreenCapture")
|
||||||
|
.spawn();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
let path_obj = Path::new(path);
|
||||||
|
if let Some(parent) = path_obj.parent() {
|
||||||
|
std::fs::create_dir_all(parent)?;
|
||||||
|
}
|
||||||
|
|
||||||
|
let mut cmd = std::process::Command::new("screencapture");
|
||||||
|
|
||||||
|
// Add flags
|
||||||
|
cmd.arg("-x"); // No sound
|
||||||
|
|
||||||
|
if let Some(window_id) = window_id {
|
||||||
|
// Capture specific window by getting its bounds and using region capture
|
||||||
|
// window_id format: "AppName" or "AppName:WindowTitle"
|
||||||
|
let app_name = window_id.split(':').next().unwrap_or(window_id);
|
||||||
|
|
||||||
|
// Use AppleScript to get window bounds
|
||||||
|
let script = format!(
|
||||||
|
r#"tell application "{}"
|
||||||
|
tell current window
|
||||||
|
get bounds
|
||||||
|
end tell
|
||||||
|
end tell"#,
|
||||||
|
app_name
|
||||||
|
);
|
||||||
|
|
||||||
|
let output = std::process::Command::new("osascript")
|
||||||
|
.arg("-e")
|
||||||
|
.arg(&script)
|
||||||
|
.output()
|
||||||
|
.map_err(|e| anyhow::anyhow!("Failed to get window bounds: {}", e))?;
|
||||||
|
|
||||||
|
if output.status.success() {
|
||||||
|
let bounds_str = String::from_utf8_lossy(&output.stdout);
|
||||||
|
let bounds: Vec<i32> = bounds_str
|
||||||
|
.trim()
|
||||||
|
.split(',')
|
||||||
|
.filter_map(|s| s.trim().parse().ok())
|
||||||
|
.collect();
|
||||||
|
|
||||||
|
if bounds.len() == 4 {
|
||||||
|
let (left, top, right, bottom) = (bounds[0], bounds[1], bounds[2], bounds[3]);
|
||||||
|
let width = right - left;
|
||||||
|
let height = bottom - top;
|
||||||
|
|
||||||
|
cmd.arg("-R");
|
||||||
|
cmd.arg(format!("{},{},{},{}", left, top, width, height));
|
||||||
|
|
||||||
|
tracing::debug!("Capturing window '{}' at region: {},{} {}x{}", app_name, left, top, width, height);
|
||||||
|
} else {
|
||||||
|
tracing::warn!("Failed to parse window bounds, capturing full screen");
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
tracing::warn!("Failed to get window bounds for '{}', capturing full screen", app_name);
|
||||||
|
}
|
||||||
|
} else if let Some(region) = _region {
|
||||||
|
// Capture specific region: -R x,y,width,height
|
||||||
|
cmd.arg("-R");
|
||||||
|
cmd.arg(format!("{},{},{},{}", region.x, region.y, region.width, region.height));
|
||||||
|
}
|
||||||
|
|
||||||
|
cmd.arg(path);
|
||||||
|
|
||||||
|
let output = cmd.output()
|
||||||
|
.map_err(|e| anyhow::anyhow!("Failed to execute screencapture: {}", e))?;
|
||||||
|
|
||||||
|
if !output.status.success() {
|
||||||
|
let stderr = String::from_utf8_lossy(&output.stderr);
|
||||||
|
anyhow::bail!("screencapture failed: {}", stderr);
|
||||||
|
}
|
||||||
|
|
||||||
|
tracing::debug!("Screenshot saved using screencapture: {}", path);
|
||||||
|
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn extract_text_from_screen(&self, region: Rect) -> Result<OCRResult> {
|
||||||
|
// Take screenshot of region first
|
||||||
|
let temp_path = format!("/tmp/g3_ocr_{}.png", uuid::Uuid::new_v4());
|
||||||
|
self.take_screenshot(&temp_path, Some(region), None).await?;
|
||||||
|
|
||||||
|
// Extract text from the screenshot
|
||||||
|
let result = self.extract_text_from_image(&temp_path).await?;
|
||||||
|
|
||||||
|
// Clean up temp file
|
||||||
|
let _ = std::fs::remove_file(&temp_path);
|
||||||
|
|
||||||
|
Ok(result)
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn extract_text_from_image(&self, _path: &str) -> Result<OCRResult> {
|
||||||
|
// Check if tesseract is available on the system
|
||||||
|
let tesseract_check = std::process::Command::new("which")
|
||||||
|
.arg("tesseract")
|
||||||
|
.output();
|
||||||
|
|
||||||
|
if tesseract_check.is_err() || !tesseract_check.as_ref().unwrap().status.success() {
|
||||||
|
anyhow::bail!("Tesseract OCR is not installed on your system.\n\n\
|
||||||
|
To install tesseract:\n macOS: brew install tesseract\n \
|
||||||
|
Linux: sudo apt-get install tesseract-ocr (Ubuntu/Debian)\n \
|
||||||
|
sudo yum install tesseract (RHEL/CentOS)\n \
|
||||||
|
Windows: Download from https://github.com/UB-Mannheim/tesseract/wiki\n\n\
|
||||||
|
After installation, restart your terminal and try again.");
|
||||||
|
}
|
||||||
|
|
||||||
|
// Initialize Tesseract
|
||||||
|
let tess = Tesseract::new(None, Some("eng"))
|
||||||
|
.map_err(|e| {
|
||||||
|
anyhow::anyhow!("Failed to initialize Tesseract: {}\n\n\
|
||||||
|
This usually means:\n1. Tesseract is not properly installed\n\
|
||||||
|
2. Language data files are missing\n\nTo fix:\n \
|
||||||
|
macOS: brew reinstall tesseract\n \
|
||||||
|
Linux: sudo apt-get install tesseract-ocr-eng\n \
|
||||||
|
Windows: Reinstall tesseract and ensure language files are included", e)
|
||||||
|
})?;
|
||||||
|
|
||||||
|
let text = tess.set_image(_path)
|
||||||
|
.map_err(|e| anyhow::anyhow!("Failed to load image '{}': {}", _path, e))?
|
||||||
|
.get_text()
|
||||||
|
.map_err(|e| anyhow::anyhow!("Failed to extract text from image: {}", e))?;
|
||||||
|
|
||||||
|
// Get confidence (simplified - would need more complex API calls for per-word confidence)
|
||||||
|
let confidence = 0.85; // Placeholder
|
||||||
|
|
||||||
|
Ok(OCRResult {
|
||||||
|
text,
|
||||||
|
confidence,
|
||||||
|
bounds: Rect { x: 0, y: 0, width: 0, height: 0 }, // Would need image dimensions
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn find_text_on_screen(&self, _text: &str) -> Result<Option<Point>> {
|
||||||
|
// Check if tesseract is available on the system
|
||||||
|
let tesseract_check = std::process::Command::new("which")
|
||||||
|
.arg("tesseract")
|
||||||
|
.output();
|
||||||
|
|
||||||
|
if tesseract_check.is_err() || !tesseract_check.as_ref().unwrap().status.success() {
|
||||||
|
anyhow::bail!("Tesseract OCR is not installed on your system.\n\n\
|
||||||
|
To install tesseract:\n macOS: brew install tesseract\n \
|
||||||
|
Linux: sudo apt-get install tesseract-ocr (Ubuntu/Debian)\n \
|
||||||
|
sudo yum install tesseract (RHEL/CentOS)\n \
|
||||||
|
Windows: Download from https://github.com/UB-Mannheim/tesseract/wiki\n\n\
|
||||||
|
After installation, restart your terminal and try again.");
|
||||||
|
}
|
||||||
|
|
||||||
|
// Take full screen screenshot
|
||||||
|
let temp_path = format!("/tmp/g3_ocr_search_{}.png", uuid::Uuid::new_v4());
|
||||||
|
self.take_screenshot(&temp_path, None, None).await?;
|
||||||
|
|
||||||
|
// Use Tesseract to find text with bounding boxes
|
||||||
|
let tess = Tesseract::new(None, Some("eng"))
|
||||||
|
.map_err(|e| {
|
||||||
|
anyhow::anyhow!("Failed to initialize Tesseract: {}\n\n\
|
||||||
|
This usually means:\n1. Tesseract is not properly installed\n\
|
||||||
|
2. Language data files are missing\n\nTo fix:\n \
|
||||||
|
macOS: brew reinstall tesseract\n \
|
||||||
|
Linux: sudo apt-get install tesseract-ocr-eng\n \
|
||||||
|
Windows: Reinstall tesseract and ensure language files are included", e)
|
||||||
|
})?;
|
||||||
|
|
||||||
|
let full_text = tess.set_image(temp_path.as_str())
|
||||||
|
.map_err(|e| anyhow::anyhow!("Failed to load screenshot: {}", e))?
|
||||||
|
.get_text()
|
||||||
|
.map_err(|e| anyhow::anyhow!("Failed to extract text from screen: {}", e))?;
|
||||||
|
|
||||||
|
// Clean up temp file
|
||||||
|
let _ = std::fs::remove_file(&temp_path);
|
||||||
|
|
||||||
|
// Simple text search - full implementation would use get_component_images
|
||||||
|
// to get bounding boxes for each word
|
||||||
|
if full_text.contains(_text) {
|
||||||
|
tracing::warn!("Text found but precise coordinates not available in simplified implementation");
|
||||||
|
Ok(Some(Point { x: 0, y: 0 }))
|
||||||
|
} else {
|
||||||
|
Ok(None)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
8
crates/g3-computer-control/src/platform/mod.rs
Normal file
8
crates/g3-computer-control/src/platform/mod.rs
Normal file
@@ -0,0 +1,8 @@
|
|||||||
|
#[cfg(target_os = "macos")]
|
||||||
|
pub mod macos;
|
||||||
|
|
||||||
|
#[cfg(target_os = "linux")]
|
||||||
|
pub mod linux;
|
||||||
|
|
||||||
|
#[cfg(target_os = "windows")]
|
||||||
|
pub mod windows;
|
||||||
162
crates/g3-computer-control/src/platform/windows.rs
Normal file
162
crates/g3-computer-control/src/platform/windows.rs
Normal file
@@ -0,0 +1,162 @@
|
|||||||
|
use crate::{ComputerController, types::*};
|
||||||
|
use anyhow::Result;
|
||||||
|
use async_trait::async_trait;
|
||||||
|
use tesseract::Tesseract;
|
||||||
|
use uuid::Uuid;
|
||||||
|
|
||||||
|
pub struct WindowsController {
|
||||||
|
// Placeholder for Windows-specific state
|
||||||
|
}
|
||||||
|
|
||||||
|
impl WindowsController {
|
||||||
|
pub fn new() -> Result<Self> {
|
||||||
|
tracing::warn!("Windows computer control not fully implemented");
|
||||||
|
Ok(Self {})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[async_trait]
|
||||||
|
impl ComputerController for WindowsController {
|
||||||
|
async fn move_mouse(&self, _x: i32, _y: i32) -> Result<()> {
|
||||||
|
anyhow::bail!("Windows implementation not yet available")
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn click(&self, _button: MouseButton) -> Result<()> {
|
||||||
|
anyhow::bail!("Windows implementation not yet available")
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn double_click(&self, _button: MouseButton) -> Result<()> {
|
||||||
|
anyhow::bail!("Windows implementation not yet available")
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn type_text(&self, _text: &str) -> Result<()> {
|
||||||
|
anyhow::bail!("Windows implementation not yet available")
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn press_key(&self, _key: &str) -> Result<()> {
|
||||||
|
anyhow::bail!("Windows implementation not yet available")
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn list_windows(&self) -> Result<Vec<Window>> {
|
||||||
|
anyhow::bail!("Windows implementation not yet available")
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn focus_window(&self, _window_id: &str) -> Result<()> {
|
||||||
|
anyhow::bail!("Windows implementation not yet available")
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn get_window_bounds(&self, _window_id: &str) -> Result<Rect> {
|
||||||
|
anyhow::bail!("Windows implementation not yet available")
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn find_element(&self, _selector: &ElementSelector) -> Result<Option<UIElement>> {
|
||||||
|
anyhow::bail!("Windows implementation not yet available")
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn get_element_text(&self, _element_id: &str) -> Result<String> {
|
||||||
|
anyhow::bail!("Windows implementation not yet available")
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn get_element_bounds(&self, _element_id: &str) -> Result<Rect> {
|
||||||
|
anyhow::bail!("Windows implementation not yet available")
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn take_screenshot(&self, _path: &str, _region: Option<Rect>, _window_id: Option<&str>) -> Result<()> {
|
||||||
|
anyhow::bail!("Windows implementation not yet available")
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn extract_text_from_screen(&self, _region: Rect) -> Result<OCRResult> {
|
||||||
|
anyhow::bail!("Windows implementation not yet available")
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn extract_text_from_image(&self, _path: &str) -> Result<OCRResult> {
|
||||||
|
// Check if tesseract is available on the system
|
||||||
|
let tesseract_check = std::process::Command::new("where")
|
||||||
|
.arg("tesseract")
|
||||||
|
.output();
|
||||||
|
|
||||||
|
if tesseract_check.is_err() || !tesseract_check.as_ref().unwrap().status.success() {
|
||||||
|
anyhow::bail!("Tesseract OCR is not installed on your system.\n\n\
|
||||||
|
To install tesseract on Windows:\n \
|
||||||
|
1. Download the installer from: https://github.com/UB-Mannheim/tesseract/wiki\n \
|
||||||
|
2. Run the installer and follow the instructions\n \
|
||||||
|
3. Add tesseract to your PATH environment variable\n \
|
||||||
|
4. Restart your terminal/command prompt\n\n\
|
||||||
|
After installation, restart your terminal and try again.");
|
||||||
|
}
|
||||||
|
|
||||||
|
// Initialize Tesseract
|
||||||
|
let tess = Tesseract::new(None, Some("eng"))
|
||||||
|
.map_err(|e| {
|
||||||
|
anyhow::anyhow!("Failed to initialize Tesseract: {}\n\n\
|
||||||
|
This usually means:\n1. Tesseract is not properly installed\n\
|
||||||
|
2. Language data files are missing\n\nTo fix:\n \
|
||||||
|
1. Reinstall tesseract from https://github.com/UB-Mannheim/tesseract/wiki\n \
|
||||||
|
2. Make sure to select 'Additional language data' during installation\n \
|
||||||
|
3. Ensure tesseract is in your PATH", e)
|
||||||
|
})?;
|
||||||
|
|
||||||
|
let text = tess.set_image(_path)
|
||||||
|
.map_err(|e| anyhow::anyhow!("Failed to load image '{}': {}", _path, e))?
|
||||||
|
.get_text()
|
||||||
|
.map_err(|e| anyhow::anyhow!("Failed to extract text from image: {}", e))?;
|
||||||
|
|
||||||
|
// Get confidence (simplified - would need more complex API calls for per-word confidence)
|
||||||
|
let confidence = 0.85; // Placeholder
|
||||||
|
|
||||||
|
Ok(OCRResult {
|
||||||
|
text,
|
||||||
|
confidence,
|
||||||
|
bounds: Rect { x: 0, y: 0, width: 0, height: 0 }, // Would need image dimensions
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn find_text_on_screen(&self, _text: &str) -> Result<Option<Point>> {
|
||||||
|
// Check if tesseract is available on the system
|
||||||
|
let tesseract_check = std::process::Command::new("where")
|
||||||
|
.arg("tesseract")
|
||||||
|
.output();
|
||||||
|
|
||||||
|
if tesseract_check.is_err() || !tesseract_check.as_ref().unwrap().status.success() {
|
||||||
|
anyhow::bail!("Tesseract OCR is not installed on your system.\n\n\
|
||||||
|
To install tesseract on Windows:\n \
|
||||||
|
1. Download the installer from: https://github.com/UB-Mannheim/tesseract/wiki\n \
|
||||||
|
2. Run the installer and follow the instructions\n \
|
||||||
|
3. Add tesseract to your PATH environment variable\n \
|
||||||
|
4. Restart your terminal/command prompt\n\n\
|
||||||
|
After installation, restart your terminal and try again.");
|
||||||
|
}
|
||||||
|
|
||||||
|
// Take full screen screenshot
|
||||||
|
let temp_path = format!("C:\\\\Temp\\\\g3_ocr_search_{}.png", uuid::Uuid::new_v4());
|
||||||
|
self.take_screenshot(&temp_path, None, None).await?;
|
||||||
|
|
||||||
|
// Use Tesseract to find text with bounding boxes
|
||||||
|
let tess = Tesseract::new(None, Some("eng"))
|
||||||
|
.map_err(|e| {
|
||||||
|
anyhow::anyhow!("Failed to initialize Tesseract: {}\n\n\
|
||||||
|
This usually means:\n1. Tesseract is not properly installed\n\
|
||||||
|
2. Language data files are missing\n\nTo fix:\n \
|
||||||
|
1. Reinstall tesseract from https://github.com/UB-Mannheim/tesseract/wiki\n \
|
||||||
|
2. Make sure to select 'Additional language data' during installation\n \
|
||||||
|
3. Ensure tesseract is in your PATH", e)
|
||||||
|
})?;
|
||||||
|
|
||||||
|
let full_text = tess.set_image(temp_path.as_str())
|
||||||
|
.map_err(|e| anyhow::anyhow!("Failed to load screenshot: {}", e))?
|
||||||
|
.get_text()
|
||||||
|
.map_err(|e| anyhow::anyhow!("Failed to extract text from screen: {}", e))?;
|
||||||
|
|
||||||
|
// Clean up temp file
|
||||||
|
let _ = std::fs::remove_file(&temp_path);
|
||||||
|
|
||||||
|
// Simple text search - full implementation would use get_component_images
|
||||||
|
// to get bounding boxes for each word
|
||||||
|
if full_text.contains(_text) {
|
||||||
|
tracing::warn!("Text found but precise coordinates not available in simplified implementation");
|
||||||
|
Ok(Some(Point { x: 0, y: 0 }))
|
||||||
|
} else {
|
||||||
|
Ok(None)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
9
crates/g3-computer-control/src/types.rs
Normal file
9
crates/g3-computer-control/src/types.rs
Normal file
@@ -0,0 +1,9 @@
|
|||||||
|
use serde::{Deserialize, Serialize};
|
||||||
|
|
||||||
|
#[derive(Debug, Clone, Copy, Serialize, Deserialize)]
|
||||||
|
pub struct Rect {
|
||||||
|
pub x: i32,
|
||||||
|
pub y: i32,
|
||||||
|
pub width: i32,
|
||||||
|
pub height: i32,
|
||||||
|
}
|
||||||
111
crates/g3-computer-control/src/webdriver/mod.rs
Normal file
111
crates/g3-computer-control/src/webdriver/mod.rs
Normal file
@@ -0,0 +1,111 @@
|
|||||||
|
pub mod safari;
|
||||||
|
|
||||||
|
use anyhow::Result;
|
||||||
|
use async_trait::async_trait;
|
||||||
|
use serde_json::Value;
|
||||||
|
|
||||||
|
/// WebDriver controller for browser automation
|
||||||
|
#[async_trait]
|
||||||
|
pub trait WebDriverController: Send + Sync {
|
||||||
|
/// Navigate to a URL
|
||||||
|
async fn navigate(&mut self, url: &str) -> Result<()>;
|
||||||
|
|
||||||
|
/// Get the current URL
|
||||||
|
async fn current_url(&self) -> Result<String>;
|
||||||
|
|
||||||
|
/// Get the page title
|
||||||
|
async fn title(&self) -> Result<String>;
|
||||||
|
|
||||||
|
/// Find an element by CSS selector
|
||||||
|
async fn find_element(&mut self, selector: &str) -> Result<WebElement>;
|
||||||
|
|
||||||
|
/// Find multiple elements by CSS selector
|
||||||
|
async fn find_elements(&mut self, selector: &str) -> Result<Vec<WebElement>>;
|
||||||
|
|
||||||
|
/// Execute JavaScript in the browser
|
||||||
|
async fn execute_script(&mut self, script: &str, args: Vec<Value>) -> Result<Value>;
|
||||||
|
|
||||||
|
/// Get the page source (HTML)
|
||||||
|
async fn page_source(&self) -> Result<String>;
|
||||||
|
|
||||||
|
/// Take a screenshot and save to path
|
||||||
|
async fn screenshot(&mut self, path: &str) -> Result<()>;
|
||||||
|
|
||||||
|
/// Close the current window/tab
|
||||||
|
async fn close(&mut self) -> Result<()>;
|
||||||
|
|
||||||
|
/// Quit the browser session
|
||||||
|
async fn quit(self) -> Result<()>;
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Represents a web element in the DOM
|
||||||
|
pub struct WebElement {
|
||||||
|
pub(crate) inner: fantoccini::elements::Element,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl WebElement {
|
||||||
|
/// Click the element
|
||||||
|
pub async fn click(&mut self) -> Result<()> {
|
||||||
|
self.inner.click().await?;
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Send keys/text to the element
|
||||||
|
pub async fn send_keys(&mut self, text: &str) -> Result<()> {
|
||||||
|
self.inner.send_keys(text).await?;
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Clear the element's content (for input fields)
|
||||||
|
pub async fn clear(&mut self) -> Result<()> {
|
||||||
|
self.inner.clear().await?;
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Get the element's text content
|
||||||
|
pub async fn text(&self) -> Result<String> {
|
||||||
|
Ok(self.inner.text().await?)
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Get an attribute value
|
||||||
|
pub async fn attr(&self, name: &str) -> Result<Option<String>> {
|
||||||
|
Ok(self.inner.attr(name).await?)
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Get a property value
|
||||||
|
pub async fn prop(&self, name: &str) -> Result<Option<String>> {
|
||||||
|
Ok(self.inner.prop(name).await?)
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Get the element's HTML
|
||||||
|
pub async fn html(&self, inner: bool) -> Result<String> {
|
||||||
|
Ok(self.inner.html(inner).await?)
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Check if element is displayed
|
||||||
|
pub async fn is_displayed(&self) -> Result<bool> {
|
||||||
|
Ok(self.inner.is_displayed().await?)
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Check if element is enabled
|
||||||
|
pub async fn is_enabled(&self) -> Result<bool> {
|
||||||
|
Ok(self.inner.is_enabled().await?)
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Check if element is selected (for checkboxes/radio buttons)
|
||||||
|
pub async fn is_selected(&self) -> Result<bool> {
|
||||||
|
Ok(self.inner.is_selected().await?)
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Find a child element by CSS selector
|
||||||
|
pub async fn find_element(&mut self, selector: &str) -> Result<WebElement> {
|
||||||
|
let elem = self.inner.find(fantoccini::Locator::Css(selector)).await?;
|
||||||
|
Ok(WebElement { inner: elem })
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Find multiple child elements by CSS selector
|
||||||
|
pub async fn find_elements(&mut self, selector: &str) -> Result<Vec<WebElement>> {
|
||||||
|
let elems = self.inner.find_all(fantoccini::Locator::Css(selector)).await?;
|
||||||
|
Ok(elems.into_iter().map(|inner| WebElement { inner }).collect())
|
||||||
|
}
|
||||||
|
}
|
||||||
212
crates/g3-computer-control/src/webdriver/safari.rs
Normal file
212
crates/g3-computer-control/src/webdriver/safari.rs
Normal file
@@ -0,0 +1,212 @@
|
|||||||
|
use super::{WebDriverController, WebElement};
|
||||||
|
use anyhow::{Context, Result};
|
||||||
|
use async_trait::async_trait;
|
||||||
|
use fantoccini::{Client, ClientBuilder};
|
||||||
|
use serde_json::Value;
|
||||||
|
use std::time::Duration;
|
||||||
|
|
||||||
|
/// SafariDriver WebDriver controller
|
||||||
|
pub struct SafariDriver {
|
||||||
|
client: Client,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl SafariDriver {
|
||||||
|
/// Create a new SafariDriver instance
|
||||||
|
///
|
||||||
|
/// This will connect to SafariDriver running on the default port (4444).
|
||||||
|
/// Make sure to enable "Allow Remote Automation" in Safari's Develop menu first.
|
||||||
|
///
|
||||||
|
/// You can start SafariDriver manually with:
|
||||||
|
/// ```bash
|
||||||
|
/// /usr/bin/safaridriver --enable
|
||||||
|
/// ```
|
||||||
|
pub async fn new() -> Result<Self> {
|
||||||
|
Self::with_port(4444).await
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Create a new SafariDriver instance with a custom port
|
||||||
|
pub async fn with_port(port: u16) -> Result<Self> {
|
||||||
|
let url = format!("http://localhost:{}", port);
|
||||||
|
|
||||||
|
let mut caps = serde_json::Map::new();
|
||||||
|
caps.insert("browserName".to_string(), Value::String("safari".to_string()));
|
||||||
|
|
||||||
|
let client = ClientBuilder::native()
|
||||||
|
.capabilities(caps)
|
||||||
|
.connect(&url)
|
||||||
|
.await
|
||||||
|
.context("Failed to connect to SafariDriver. Make sure SafariDriver is running and 'Allow Remote Automation' is enabled in Safari's Develop menu.")?;
|
||||||
|
|
||||||
|
Ok(Self { client })
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Go back in browser history
|
||||||
|
pub async fn back(&mut self) -> Result<()> {
|
||||||
|
self.client.back().await?;
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Go forward in browser history
|
||||||
|
pub async fn forward(&mut self) -> Result<()> {
|
||||||
|
self.client.forward().await?;
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Refresh the current page
|
||||||
|
pub async fn refresh(&mut self) -> Result<()> {
|
||||||
|
self.client.refresh().await?;
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Get all window handles
|
||||||
|
pub async fn window_handles(&mut self) -> Result<Vec<String>> {
|
||||||
|
let handles = self.client.windows().await?;
|
||||||
|
Ok(handles.into_iter()
|
||||||
|
.map(|h| h.into())
|
||||||
|
.collect())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Switch to a window by handle
|
||||||
|
pub async fn switch_to_window(&mut self, handle: &str) -> Result<()> {
|
||||||
|
let window_handle: fantoccini::wd::WindowHandle = handle.to_string().try_into()?;
|
||||||
|
self.client.switch_to_window(window_handle).await?;
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Get the current window handle
|
||||||
|
pub async fn current_window_handle(&mut self) -> Result<String> {
|
||||||
|
Ok(self.client.window().await?.into())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Close the current window
|
||||||
|
pub async fn close_window(&mut self) -> Result<()> {
|
||||||
|
self.client.close_window().await?;
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Create a new window/tab
|
||||||
|
pub async fn new_window(&mut self, is_tab: bool) -> Result<String> {
|
||||||
|
let window_type = if is_tab { "tab" } else { "window" };
|
||||||
|
let response = self.client.new_window(window_type == "tab").await?;
|
||||||
|
Ok(response.handle.into())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Get cookies
|
||||||
|
pub async fn get_cookies(&mut self) -> Result<Vec<fantoccini::cookies::Cookie<'static>>> {
|
||||||
|
Ok(self.client.get_all_cookies().await?)
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Add a cookie
|
||||||
|
pub async fn add_cookie(&mut self, cookie: fantoccini::cookies::Cookie<'static>) -> Result<()> {
|
||||||
|
self.client.add_cookie(cookie).await?;
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Delete all cookies
|
||||||
|
pub async fn delete_all_cookies(&mut self) -> Result<()> {
|
||||||
|
self.client.delete_all_cookies().await?;
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Wait for an element to appear (with timeout)
|
||||||
|
pub async fn wait_for_element(&mut self, selector: &str, timeout: Duration) -> Result<WebElement> {
|
||||||
|
let start = std::time::Instant::now();
|
||||||
|
let poll_interval = Duration::from_millis(100);
|
||||||
|
|
||||||
|
loop {
|
||||||
|
if let Ok(elem) = self.find_element(selector).await {
|
||||||
|
return Ok(elem);
|
||||||
|
}
|
||||||
|
|
||||||
|
if start.elapsed() >= timeout {
|
||||||
|
anyhow::bail!("Timeout waiting for element: {}", selector);
|
||||||
|
}
|
||||||
|
|
||||||
|
tokio::time::sleep(poll_interval).await;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Wait for an element to be visible (with timeout)
|
||||||
|
pub async fn wait_for_visible(&mut self, selector: &str, timeout: Duration) -> Result<WebElement> {
|
||||||
|
let start = std::time::Instant::now();
|
||||||
|
let poll_interval = Duration::from_millis(100);
|
||||||
|
|
||||||
|
loop {
|
||||||
|
if let Ok(elem) = self.find_element(selector).await {
|
||||||
|
if elem.is_displayed().await.unwrap_or(false) {
|
||||||
|
return Ok(elem);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if start.elapsed() >= timeout {
|
||||||
|
anyhow::bail!("Timeout waiting for element to be visible: {}", selector);
|
||||||
|
}
|
||||||
|
|
||||||
|
tokio::time::sleep(poll_interval).await;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[async_trait]
|
||||||
|
impl WebDriverController for SafariDriver {
|
||||||
|
async fn navigate(&mut self, url: &str) -> Result<()> {
|
||||||
|
self.client.goto(url).await?;
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn current_url(&self) -> Result<String> {
|
||||||
|
Ok(self.client.current_url().await?.to_string())
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn title(&self) -> Result<String> {
|
||||||
|
Ok(self.client.title().await?)
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn find_element(&mut self, selector: &str) -> Result<WebElement> {
|
||||||
|
let elem = self.client.find(fantoccini::Locator::Css(selector)).await
|
||||||
|
.context(format!("Failed to find element with selector: {}", selector))?;
|
||||||
|
Ok(WebElement { inner: elem })
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn find_elements(&mut self, selector: &str) -> Result<Vec<WebElement>> {
|
||||||
|
let elems = self.client.find_all(fantoccini::Locator::Css(selector)).await?;
|
||||||
|
Ok(elems.into_iter().map(|inner| WebElement { inner }).collect())
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn execute_script(&mut self, script: &str, args: Vec<Value>) -> Result<Value> {
|
||||||
|
Ok(self.client.execute(script, args).await?)
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn page_source(&self) -> Result<String> {
|
||||||
|
Ok(self.client.source().await?)
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn screenshot(&mut self, path: &str) -> Result<()> {
|
||||||
|
let screenshot_data = self.client.screenshot().await?;
|
||||||
|
|
||||||
|
// Expand tilde in path
|
||||||
|
let expanded_path = shellexpand::tilde(path);
|
||||||
|
let path_str = expanded_path.as_ref();
|
||||||
|
|
||||||
|
// Create parent directories if needed
|
||||||
|
if let Some(parent) = std::path::Path::new(path_str).parent() {
|
||||||
|
std::fs::create_dir_all(parent)
|
||||||
|
.context("Failed to create parent directories for screenshot")?;
|
||||||
|
}
|
||||||
|
|
||||||
|
std::fs::write(path_str, screenshot_data)
|
||||||
|
.context("Failed to write screenshot to file")?;
|
||||||
|
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn close(&mut self) -> Result<()> {
|
||||||
|
self.client.close_window().await?;
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn quit(mut self) -> Result<()> {
|
||||||
|
self.client.close().await?;
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
}
|
||||||
62
crates/g3-computer-control/tests/integration_test.rs
Normal file
62
crates/g3-computer-control/tests/integration_test.rs
Normal file
@@ -0,0 +1,62 @@
|
|||||||
|
use g3_computer_control::*;
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_mouse_movement() {
|
||||||
|
let controller = create_controller().expect("Failed to create controller");
|
||||||
|
|
||||||
|
// Move mouse to center of screen (assuming 1920x1080)
|
||||||
|
let result = controller.move_mouse(960, 540).await;
|
||||||
|
assert!(result.is_ok(), "Failed to move mouse: {:?}", result.err());
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_typing() {
|
||||||
|
let controller = create_controller().expect("Failed to create controller");
|
||||||
|
|
||||||
|
// Type some text
|
||||||
|
let result = controller.type_text("Hello, World!").await;
|
||||||
|
assert!(result.is_ok(), "Failed to type text: {:?}", result.err());
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_screenshot() {
|
||||||
|
let controller = create_controller().expect("Failed to create controller");
|
||||||
|
|
||||||
|
// Take screenshot
|
||||||
|
let path = "/tmp/test_screenshot.png";
|
||||||
|
let result = controller.take_screenshot(path, None, None).await;
|
||||||
|
assert!(result.is_ok(), "Failed to take screenshot: {:?}", result.err());
|
||||||
|
|
||||||
|
// Verify file exists
|
||||||
|
assert!(std::path::Path::new(path).exists(), "Screenshot file was not created");
|
||||||
|
|
||||||
|
// Clean up
|
||||||
|
let _ = std::fs::remove_file(path);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_click() {
|
||||||
|
let controller = create_controller().expect("Failed to create controller");
|
||||||
|
|
||||||
|
// Click at a safe location
|
||||||
|
let result = controller.click(types::MouseButton::Left).await;
|
||||||
|
assert!(result.is_ok(), "Failed to click: {:?}", result.err());
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_double_click() {
|
||||||
|
let controller = create_controller().expect("Failed to create controller");
|
||||||
|
|
||||||
|
// Double click
|
||||||
|
let result = controller.double_click(types::MouseButton::Left).await;
|
||||||
|
assert!(result.is_ok(), "Failed to double click: {:?}", result.err());
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_press_key() {
|
||||||
|
let controller = create_controller().expect("Failed to create controller");
|
||||||
|
|
||||||
|
// Press escape key
|
||||||
|
let result = controller.press_key("escape").await;
|
||||||
|
assert!(result.is_ok(), "Failed to press key: {:?}", result.err());
|
||||||
|
}
|
||||||
@@ -12,3 +12,6 @@ thiserror = { workspace = true }
|
|||||||
toml = "0.8"
|
toml = "0.8"
|
||||||
shellexpand = "3.0"
|
shellexpand = "3.0"
|
||||||
dirs = "5.0"
|
dirs = "5.0"
|
||||||
|
|
||||||
|
[dev-dependencies]
|
||||||
|
tempfile = "3.8"
|
||||||
|
|||||||
@@ -6,6 +6,8 @@ use std::path::Path;
|
|||||||
pub struct Config {
|
pub struct Config {
|
||||||
pub providers: ProvidersConfig,
|
pub providers: ProvidersConfig,
|
||||||
pub agent: AgentConfig,
|
pub agent: AgentConfig,
|
||||||
|
pub computer_control: ComputerControlConfig,
|
||||||
|
pub webdriver: WebDriverConfig,
|
||||||
}
|
}
|
||||||
|
|
||||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||||
@@ -15,6 +17,8 @@ pub struct ProvidersConfig {
|
|||||||
pub databricks: Option<DatabricksConfig>,
|
pub databricks: Option<DatabricksConfig>,
|
||||||
pub embedded: Option<EmbeddedConfig>,
|
pub embedded: Option<EmbeddedConfig>,
|
||||||
pub default_provider: String,
|
pub default_provider: String,
|
||||||
|
pub coach: Option<String>, // Provider to use for coach in autonomous mode
|
||||||
|
pub player: Option<String>, // Provider to use for player in autonomous mode
|
||||||
}
|
}
|
||||||
|
|
||||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||||
@@ -62,6 +66,38 @@ pub struct AgentConfig {
|
|||||||
pub timeout_seconds: u64,
|
pub timeout_seconds: u64,
|
||||||
}
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||||
|
pub struct ComputerControlConfig {
|
||||||
|
pub enabled: bool,
|
||||||
|
pub require_confirmation: bool,
|
||||||
|
pub max_actions_per_second: u32,
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||||
|
pub struct WebDriverConfig {
|
||||||
|
pub enabled: bool,
|
||||||
|
pub safari_port: u16,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl Default for WebDriverConfig {
|
||||||
|
fn default() -> Self {
|
||||||
|
Self {
|
||||||
|
enabled: false,
|
||||||
|
safari_port: 4444,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
impl Default for ComputerControlConfig {
|
||||||
|
fn default() -> Self {
|
||||||
|
Self {
|
||||||
|
enabled: false, // Disabled by default for safety
|
||||||
|
require_confirmation: true,
|
||||||
|
max_actions_per_second: 5,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
impl Default for Config {
|
impl Default for Config {
|
||||||
fn default() -> Self {
|
fn default() -> Self {
|
||||||
Self {
|
Self {
|
||||||
@@ -78,12 +114,16 @@ impl Default for Config {
|
|||||||
}),
|
}),
|
||||||
embedded: None,
|
embedded: None,
|
||||||
default_provider: "databricks".to_string(),
|
default_provider: "databricks".to_string(),
|
||||||
|
coach: None, // Will use default_provider if not specified
|
||||||
|
player: None, // Will use default_provider if not specified
|
||||||
},
|
},
|
||||||
agent: AgentConfig {
|
agent: AgentConfig {
|
||||||
max_context_length: 8192,
|
max_context_length: 8192,
|
||||||
enable_streaming: true,
|
enable_streaming: true,
|
||||||
timeout_seconds: 60,
|
timeout_seconds: 60,
|
||||||
},
|
},
|
||||||
|
computer_control: ComputerControlConfig::default(),
|
||||||
|
webdriver: WebDriverConfig::default(),
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -188,12 +228,16 @@ impl Config {
|
|||||||
threads: Some(8),
|
threads: Some(8),
|
||||||
}),
|
}),
|
||||||
default_provider: "embedded".to_string(),
|
default_provider: "embedded".to_string(),
|
||||||
|
coach: None, // Will use default_provider if not specified
|
||||||
|
player: None, // Will use default_provider if not specified
|
||||||
},
|
},
|
||||||
agent: AgentConfig {
|
agent: AgentConfig {
|
||||||
max_context_length: 8192,
|
max_context_length: 8192,
|
||||||
enable_streaming: true,
|
enable_streaming: true,
|
||||||
timeout_seconds: 60,
|
timeout_seconds: 60,
|
||||||
},
|
},
|
||||||
|
computer_control: ComputerControlConfig::default(),
|
||||||
|
webdriver: WebDriverConfig::default(),
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -262,4 +306,67 @@ impl Config {
|
|||||||
|
|
||||||
Ok(config)
|
Ok(config)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Get the provider to use for coach mode in autonomous execution
|
||||||
|
pub fn get_coach_provider(&self) -> &str {
|
||||||
|
self.providers.coach
|
||||||
|
.as_deref()
|
||||||
|
.unwrap_or(&self.providers.default_provider)
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Get the provider to use for player mode in autonomous execution
|
||||||
|
pub fn get_player_provider(&self) -> &str {
|
||||||
|
self.providers.player
|
||||||
|
.as_deref()
|
||||||
|
.unwrap_or(&self.providers.default_provider)
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Create a copy of the config with a different default provider
|
||||||
|
pub fn with_provider_override(&self, provider: &str) -> Result<Self> {
|
||||||
|
// Validate that the provider is configured
|
||||||
|
match provider {
|
||||||
|
"anthropic" if self.providers.anthropic.is_none() => {
|
||||||
|
return Err(anyhow::anyhow!(
|
||||||
|
"Provider '{}' is specified but not configured. Please add {} configuration to your config file.",
|
||||||
|
provider, provider
|
||||||
|
));
|
||||||
|
}
|
||||||
|
"databricks" if self.providers.databricks.is_none() => {
|
||||||
|
return Err(anyhow::anyhow!(
|
||||||
|
"Provider '{}' is specified but not configured. Please add {} configuration to your config file.",
|
||||||
|
provider, provider
|
||||||
|
));
|
||||||
|
}
|
||||||
|
"embedded" if self.providers.embedded.is_none() => {
|
||||||
|
return Err(anyhow::anyhow!(
|
||||||
|
"Provider '{}' is specified but not configured. Please add {} configuration to your config file.",
|
||||||
|
provider, provider
|
||||||
|
));
|
||||||
|
}
|
||||||
|
"openai" if self.providers.openai.is_none() => {
|
||||||
|
return Err(anyhow::anyhow!(
|
||||||
|
"Provider '{}' is specified but not configured. Please add {} configuration to your config file.",
|
||||||
|
provider, provider
|
||||||
|
));
|
||||||
|
}
|
||||||
|
_ => {} // Provider is configured or unknown (will be caught later)
|
||||||
|
}
|
||||||
|
|
||||||
|
let mut config = self.clone();
|
||||||
|
config.providers.default_provider = provider.to_string();
|
||||||
|
Ok(config)
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Create a copy of the config for coach mode in autonomous execution
|
||||||
|
pub fn for_coach(&self) -> Result<Self> {
|
||||||
|
self.with_provider_override(self.get_coach_provider())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Create a copy of the config for player mode in autonomous execution
|
||||||
|
pub fn for_player(&self) -> Result<Self> {
|
||||||
|
self.with_provider_override(self.get_player_provider())
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
#[cfg(test)]
|
||||||
|
mod tests;
|
||||||
|
|||||||
131
crates/g3-config/src/tests.rs
Normal file
131
crates/g3-config/src/tests.rs
Normal file
@@ -0,0 +1,131 @@
|
|||||||
|
#[cfg(test)]
|
||||||
|
mod tests {
|
||||||
|
use crate::Config;
|
||||||
|
use std::fs;
|
||||||
|
use tempfile::TempDir;
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_coach_player_providers() {
|
||||||
|
// Create a temporary directory for the test config
|
||||||
|
let temp_dir = TempDir::new().unwrap();
|
||||||
|
let config_path = temp_dir.path().join("test_config.toml");
|
||||||
|
|
||||||
|
// Write a test configuration with coach and player providers
|
||||||
|
let config_content = r#"
|
||||||
|
[providers]
|
||||||
|
default_provider = "databricks"
|
||||||
|
coach = "anthropic"
|
||||||
|
player = "embedded"
|
||||||
|
|
||||||
|
[providers.databricks]
|
||||||
|
host = "https://test.databricks.com"
|
||||||
|
token = "test-token"
|
||||||
|
model = "test-model"
|
||||||
|
|
||||||
|
[providers.anthropic]
|
||||||
|
api_key = "test-key"
|
||||||
|
model = "claude-3"
|
||||||
|
|
||||||
|
[providers.embedded]
|
||||||
|
model_path = "test.gguf"
|
||||||
|
model_type = "llama"
|
||||||
|
|
||||||
|
[agent]
|
||||||
|
max_context_length = 8192
|
||||||
|
enable_streaming = true
|
||||||
|
timeout_seconds = 60
|
||||||
|
"#;
|
||||||
|
|
||||||
|
fs::write(&config_path, config_content).unwrap();
|
||||||
|
|
||||||
|
// Load the configuration
|
||||||
|
let config = Config::load(Some(config_path.to_str().unwrap())).unwrap();
|
||||||
|
|
||||||
|
// Test that the providers are correctly identified
|
||||||
|
assert_eq!(config.providers.default_provider, "databricks");
|
||||||
|
assert_eq!(config.get_coach_provider(), "anthropic");
|
||||||
|
assert_eq!(config.get_player_provider(), "embedded");
|
||||||
|
|
||||||
|
// Test creating coach config
|
||||||
|
let coach_config = config.for_coach().unwrap();
|
||||||
|
assert_eq!(coach_config.providers.default_provider, "anthropic");
|
||||||
|
|
||||||
|
// Test creating player config
|
||||||
|
let player_config = config.for_player().unwrap();
|
||||||
|
assert_eq!(player_config.providers.default_provider, "embedded");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_coach_player_fallback_to_default() {
|
||||||
|
// Create a temporary directory for the test config
|
||||||
|
let temp_dir = TempDir::new().unwrap();
|
||||||
|
let config_path = temp_dir.path().join("test_config.toml");
|
||||||
|
|
||||||
|
// Write a test configuration WITHOUT coach and player providers
|
||||||
|
let config_content = r#"
|
||||||
|
[providers]
|
||||||
|
default_provider = "databricks"
|
||||||
|
|
||||||
|
[providers.databricks]
|
||||||
|
host = "https://test.databricks.com"
|
||||||
|
token = "test-token"
|
||||||
|
model = "test-model"
|
||||||
|
|
||||||
|
[agent]
|
||||||
|
max_context_length = 8192
|
||||||
|
enable_streaming = true
|
||||||
|
timeout_seconds = 60
|
||||||
|
"#;
|
||||||
|
|
||||||
|
fs::write(&config_path, config_content).unwrap();
|
||||||
|
|
||||||
|
// Load the configuration
|
||||||
|
let config = Config::load(Some(config_path.to_str().unwrap())).unwrap();
|
||||||
|
|
||||||
|
// Test that coach and player fall back to default provider
|
||||||
|
assert_eq!(config.get_coach_provider(), "databricks");
|
||||||
|
assert_eq!(config.get_player_provider(), "databricks");
|
||||||
|
|
||||||
|
// Test creating coach config (should use default)
|
||||||
|
let coach_config = config.for_coach().unwrap();
|
||||||
|
assert_eq!(coach_config.providers.default_provider, "databricks");
|
||||||
|
|
||||||
|
// Test creating player config (should use default)
|
||||||
|
let player_config = config.for_player().unwrap();
|
||||||
|
assert_eq!(player_config.providers.default_provider, "databricks");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_invalid_provider_error() {
|
||||||
|
// Create a temporary directory for the test config
|
||||||
|
let temp_dir = TempDir::new().unwrap();
|
||||||
|
let config_path = temp_dir.path().join("test_config.toml");
|
||||||
|
|
||||||
|
// Write a test configuration with an unconfigured provider
|
||||||
|
let config_content = r#"
|
||||||
|
[providers]
|
||||||
|
default_provider = "databricks"
|
||||||
|
coach = "openai" # OpenAI is not configured
|
||||||
|
|
||||||
|
[providers.databricks]
|
||||||
|
host = "https://test.databricks.com"
|
||||||
|
token = "test-token"
|
||||||
|
model = "test-model"
|
||||||
|
|
||||||
|
[agent]
|
||||||
|
max_context_length = 8192
|
||||||
|
enable_streaming = true
|
||||||
|
timeout_seconds = 60
|
||||||
|
"#;
|
||||||
|
|
||||||
|
fs::write(&config_path, config_content).unwrap();
|
||||||
|
|
||||||
|
// Load the configuration
|
||||||
|
let config = Config::load(Some(config_path.to_str().unwrap())).unwrap();
|
||||||
|
|
||||||
|
// Test that trying to create a coach config with unconfigured provider fails
|
||||||
|
let result = config.for_coach();
|
||||||
|
assert!(result.is_err());
|
||||||
|
assert!(result.unwrap_err().to_string().contains("not configured"));
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -8,6 +8,7 @@ description = "Core engine for G3 AI coding agent"
|
|||||||
g3-providers = { path = "../g3-providers" }
|
g3-providers = { path = "../g3-providers" }
|
||||||
g3-config = { path = "../g3-config" }
|
g3-config = { path = "../g3-config" }
|
||||||
g3-execution = { path = "../g3-execution" }
|
g3-execution = { path = "../g3-execution" }
|
||||||
|
g3-computer-control = { path = "../g3-computer-control" }
|
||||||
tokio = { workspace = true }
|
tokio = { workspace = true }
|
||||||
reqwest = { workspace = true }
|
reqwest = { workspace = true }
|
||||||
anyhow = { workspace = true }
|
anyhow = { workspace = true }
|
||||||
@@ -23,3 +24,4 @@ futures-util = "0.3"
|
|||||||
chrono = { version = "0.4", features = ["serde"] }
|
chrono = { version = "0.4", features = ["serde"] }
|
||||||
rand = "0.8"
|
rand = "0.8"
|
||||||
regex = "1.0"
|
regex = "1.0"
|
||||||
|
shellexpand = "3.1"
|
||||||
|
|||||||
File diff suppressed because it is too large
Load Diff
36
crates/g3-core/src/tilde_expansion_tests.rs
Normal file
36
crates/g3-core/src/tilde_expansion_tests.rs
Normal file
@@ -0,0 +1,36 @@
|
|||||||
|
#[cfg(test)]
|
||||||
|
mod tilde_expansion_tests {
|
||||||
|
use std::env;
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_tilde_expansion() {
|
||||||
|
// Test that shellexpand works
|
||||||
|
let path_with_tilde = "~/test.txt";
|
||||||
|
let expanded = shellexpand::tilde(path_with_tilde);
|
||||||
|
|
||||||
|
// Get the actual home directory
|
||||||
|
let home = env::var("HOME").expect("HOME environment variable not set");
|
||||||
|
|
||||||
|
// Verify expansion happened
|
||||||
|
assert_eq!(expanded.as_ref(), format!("{}/test.txt", home));
|
||||||
|
assert!(!expanded.contains("~"));
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_tilde_expansion_with_subdirs() {
|
||||||
|
let path_with_tilde = "~/Documents/test.txt";
|
||||||
|
let expanded = shellexpand::tilde(path_with_tilde);
|
||||||
|
|
||||||
|
let home = env::var("HOME").expect("HOME environment variable not set");
|
||||||
|
|
||||||
|
assert_eq!(expanded.as_ref(), format!("{}/Documents/test.txt", home));
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_no_tilde_unchanged() {
|
||||||
|
let path_without_tilde = "/absolute/path/test.txt";
|
||||||
|
let expanded = shellexpand::tilde(path_without_tilde);
|
||||||
|
|
||||||
|
assert_eq!(expanded.as_ref(), path_without_tilde);
|
||||||
|
}
|
||||||
|
}
|
||||||
157
crates/g3-core/tests/test_context_thinning.rs
Normal file
157
crates/g3-core/tests/test_context_thinning.rs
Normal file
@@ -0,0 +1,157 @@
|
|||||||
|
use g3_core::ContextWindow;
|
||||||
|
use g3_providers::{Message, MessageRole};
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_thinning_thresholds() {
|
||||||
|
let mut context = ContextWindow::new(10000);
|
||||||
|
|
||||||
|
// At 0%, should not thin
|
||||||
|
assert!(!context.should_thin());
|
||||||
|
|
||||||
|
// Simulate reaching 50% usage
|
||||||
|
context.used_tokens = 5000;
|
||||||
|
assert!(context.should_thin());
|
||||||
|
|
||||||
|
// After thinning at 50%, should not thin again until next threshold
|
||||||
|
context.last_thinning_percentage = 50;
|
||||||
|
assert!(!context.should_thin());
|
||||||
|
|
||||||
|
// At 60%, should thin again
|
||||||
|
context.used_tokens = 6000;
|
||||||
|
assert!(context.should_thin());
|
||||||
|
|
||||||
|
// After thinning at 60%, should not thin
|
||||||
|
context.last_thinning_percentage = 60;
|
||||||
|
assert!(!context.should_thin());
|
||||||
|
|
||||||
|
// At 70%, should thin
|
||||||
|
context.used_tokens = 7000;
|
||||||
|
assert!(context.should_thin());
|
||||||
|
|
||||||
|
// At 80%, should thin
|
||||||
|
context.last_thinning_percentage = 70;
|
||||||
|
context.used_tokens = 8000;
|
||||||
|
assert!(context.should_thin());
|
||||||
|
|
||||||
|
// After 80%, should not thin (compaction takes over)
|
||||||
|
context.last_thinning_percentage = 80;
|
||||||
|
context.used_tokens = 8500;
|
||||||
|
assert!(!context.should_thin());
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_thin_context_basic() {
|
||||||
|
let mut context = ContextWindow::new(10000);
|
||||||
|
|
||||||
|
// Add some messages to the first third
|
||||||
|
for i in 0..9 {
|
||||||
|
if i % 2 == 0 {
|
||||||
|
context.add_message(Message {
|
||||||
|
role: MessageRole::Assistant,
|
||||||
|
content: format!("Assistant message {}", i),
|
||||||
|
});
|
||||||
|
} else {
|
||||||
|
// Add tool results with varying sizes
|
||||||
|
let content = if i == 1 {
|
||||||
|
// Large tool result (> 1000 chars)
|
||||||
|
format!("Tool result: {}", "x".repeat(1500))
|
||||||
|
} else if i == 3 {
|
||||||
|
// Another large tool result
|
||||||
|
format!("Tool result: {}", "y".repeat(2000))
|
||||||
|
} else {
|
||||||
|
// Small tool result (< 1000 chars)
|
||||||
|
format!("Tool result: small result {}", i)
|
||||||
|
};
|
||||||
|
|
||||||
|
context.add_message(Message {
|
||||||
|
role: MessageRole::User,
|
||||||
|
content,
|
||||||
|
});
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Trigger thinning at 50%
|
||||||
|
context.used_tokens = 5000;
|
||||||
|
let summary = context.thin_context();
|
||||||
|
|
||||||
|
println!("Thinning summary: {}", summary);
|
||||||
|
|
||||||
|
// Should have thinned at least 1 large tool result in the first third
|
||||||
|
assert!(summary.contains("1 tool result"), "Summary was: {}", summary);
|
||||||
|
assert!(summary.contains("50%"));
|
||||||
|
|
||||||
|
// Check that the large tool results were replaced
|
||||||
|
let first_third_end = context.conversation_history.len() / 3;
|
||||||
|
for i in 0..first_third_end {
|
||||||
|
if let Some(msg) = context.conversation_history.get(i) {
|
||||||
|
if matches!(msg.role, MessageRole::User) && msg.content.starts_with("Tool result:") {
|
||||||
|
if msg.content.len() > 1000 {
|
||||||
|
panic!("Found un-thinned large tool result at index {}", i);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_thin_context_no_large_results() {
|
||||||
|
let mut context = ContextWindow::new(10000);
|
||||||
|
|
||||||
|
// Add only small messages
|
||||||
|
for i in 0..9 {
|
||||||
|
context.add_message(Message {
|
||||||
|
role: MessageRole::User,
|
||||||
|
content: format!("Tool result: small {}", i),
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
context.used_tokens = 5000;
|
||||||
|
let summary = context.thin_context();
|
||||||
|
|
||||||
|
// Should report no large results found
|
||||||
|
assert!(summary.contains("no large tool results found"));
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_thin_context_only_affects_first_third() {
|
||||||
|
let mut context = ContextWindow::new(10000);
|
||||||
|
|
||||||
|
// Add 12 messages (first third = 4 messages)
|
||||||
|
for i in 0..12 {
|
||||||
|
let content = if i % 2 == 1 {
|
||||||
|
// All odd indices are large tool results
|
||||||
|
format!("Tool result: {}", "x".repeat(1500))
|
||||||
|
} else {
|
||||||
|
format!("Assistant message {}", i)
|
||||||
|
};
|
||||||
|
|
||||||
|
let role = if i % 2 == 1 {
|
||||||
|
MessageRole::User
|
||||||
|
} else {
|
||||||
|
MessageRole::Assistant
|
||||||
|
};
|
||||||
|
|
||||||
|
context.add_message(Message { role, content });
|
||||||
|
}
|
||||||
|
|
||||||
|
context.used_tokens = 5000;
|
||||||
|
let summary = context.thin_context();
|
||||||
|
|
||||||
|
// First third is 4 messages (indices 0-3), so only indices 1 and 3 should be thinned
|
||||||
|
// That's 2 tool results
|
||||||
|
assert!(summary.contains("2 tool results"));
|
||||||
|
|
||||||
|
// Check that messages after the first third are NOT thinned
|
||||||
|
let first_third_end = context.conversation_history.len() / 3;
|
||||||
|
for i in first_third_end..context.conversation_history.len() {
|
||||||
|
if let Some(msg) = context.conversation_history.get(i) {
|
||||||
|
if matches!(msg.role, MessageRole::User) && msg.content.starts_with("Tool result:") {
|
||||||
|
// These should still be large (not thinned)
|
||||||
|
if i % 2 == 1 {
|
||||||
|
assert!(msg.content.len() > 1000,
|
||||||
|
"Message at index {} should not have been thinned", i);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -156,8 +156,9 @@ impl AnthropicProvider {
|
|||||||
.post(ANTHROPIC_API_URL)
|
.post(ANTHROPIC_API_URL)
|
||||||
.header("x-api-key", &self.api_key)
|
.header("x-api-key", &self.api_key)
|
||||||
.header("anthropic-version", ANTHROPIC_VERSION)
|
.header("anthropic-version", ANTHROPIC_VERSION)
|
||||||
|
// Anthropic beta 1m context window. Enable if needed. It costs extra, so check first.
|
||||||
|
// .header("anthropic-beta", "context-1m-2025-08-07")
|
||||||
.header("content-type", "application/json");
|
.header("content-type", "application/json");
|
||||||
|
|
||||||
if streaming {
|
if streaming {
|
||||||
builder = builder.header("accept", "text/event-stream");
|
builder = builder.header("accept", "text/event-stream");
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -88,10 +88,12 @@ pub mod anthropic;
|
|||||||
pub mod databricks;
|
pub mod databricks;
|
||||||
pub mod embedded;
|
pub mod embedded;
|
||||||
pub mod oauth;
|
pub mod oauth;
|
||||||
|
pub mod openai;
|
||||||
|
|
||||||
pub use anthropic::AnthropicProvider;
|
pub use anthropic::AnthropicProvider;
|
||||||
pub use databricks::DatabricksProvider;
|
pub use databricks::DatabricksProvider;
|
||||||
pub use embedded::EmbeddedProvider;
|
pub use embedded::EmbeddedProvider;
|
||||||
|
pub use openai::OpenAIProvider;
|
||||||
|
|
||||||
/// Provider registry for managing multiple LLM providers
|
/// Provider registry for managing multiple LLM providers
|
||||||
pub struct ProviderRegistry {
|
pub struct ProviderRegistry {
|
||||||
|
|||||||
495
crates/g3-providers/src/openai.rs
Normal file
495
crates/g3-providers/src/openai.rs
Normal file
@@ -0,0 +1,495 @@
|
|||||||
|
use anyhow::Result;
|
||||||
|
use async_trait::async_trait;
|
||||||
|
use bytes::Bytes;
|
||||||
|
use futures_util::stream::StreamExt;
|
||||||
|
use reqwest::Client;
|
||||||
|
use serde::Deserialize;
|
||||||
|
use serde_json::json;
|
||||||
|
use tokio::sync::mpsc;
|
||||||
|
use tokio_stream::wrappers::ReceiverStream;
|
||||||
|
use tracing::{debug, error};
|
||||||
|
|
||||||
|
use crate::{
|
||||||
|
CompletionChunk, CompletionRequest, CompletionResponse, CompletionStream, LLMProvider,
|
||||||
|
Message, MessageRole, Tool, ToolCall, Usage,
|
||||||
|
};
|
||||||
|
|
||||||
|
#[derive(Clone)]
|
||||||
|
pub struct OpenAIProvider {
|
||||||
|
client: Client,
|
||||||
|
api_key: String,
|
||||||
|
model: String,
|
||||||
|
base_url: String,
|
||||||
|
max_tokens: Option<u32>,
|
||||||
|
_temperature: Option<f32>,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl OpenAIProvider {
|
||||||
|
pub fn new(
|
||||||
|
api_key: String,
|
||||||
|
model: Option<String>,
|
||||||
|
base_url: Option<String>,
|
||||||
|
max_tokens: Option<u32>,
|
||||||
|
temperature: Option<f32>,
|
||||||
|
) -> Result<Self> {
|
||||||
|
Ok(Self {
|
||||||
|
client: Client::new(),
|
||||||
|
api_key,
|
||||||
|
model: model.unwrap_or_else(|| "gpt-4o".to_string()),
|
||||||
|
base_url: base_url.unwrap_or_else(|| "https://api.openai.com/v1".to_string()),
|
||||||
|
max_tokens,
|
||||||
|
_temperature: temperature,
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
fn create_request_body(
|
||||||
|
&self,
|
||||||
|
messages: &[Message],
|
||||||
|
tools: Option<&[Tool]>,
|
||||||
|
stream: bool,
|
||||||
|
max_tokens: Option<u32>,
|
||||||
|
_temperature: Option<f32>,
|
||||||
|
) -> serde_json::Value {
|
||||||
|
let mut body = json!({
|
||||||
|
"model": self.model,
|
||||||
|
"messages": convert_messages(messages),
|
||||||
|
"stream": stream,
|
||||||
|
});
|
||||||
|
|
||||||
|
if let Some(max_tokens) = max_tokens.or(self.max_tokens) {
|
||||||
|
body["max_completion_tokens"] = json!(max_tokens);
|
||||||
|
}
|
||||||
|
|
||||||
|
// OpenAI calls with temp setting seem to fail, so don't send one.
|
||||||
|
// if let Some(temperature) = temperature.or(self.temperature) {
|
||||||
|
// body["temperature"] = json!(temperature);
|
||||||
|
// }
|
||||||
|
|
||||||
|
if let Some(tools) = tools {
|
||||||
|
if !tools.is_empty() {
|
||||||
|
body["tools"] = json!(convert_tools(tools));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if stream {
|
||||||
|
body["stream_options"] = json!({
|
||||||
|
"include_usage": true,
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
body
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn parse_streaming_response(
|
||||||
|
&self,
|
||||||
|
mut stream: impl futures_util::Stream<Item = reqwest::Result<Bytes>> + Unpin,
|
||||||
|
tx: mpsc::Sender<Result<CompletionChunk>>,
|
||||||
|
) -> Option<Usage> {
|
||||||
|
let mut buffer = String::new();
|
||||||
|
let mut accumulated_content = String::new();
|
||||||
|
let mut accumulated_usage: Option<Usage> = None;
|
||||||
|
let mut current_tool_calls: Vec<OpenAIStreamingToolCall> = Vec::new();
|
||||||
|
|
||||||
|
while let Some(chunk_result) = stream.next().await {
|
||||||
|
match chunk_result {
|
||||||
|
Ok(chunk) => {
|
||||||
|
let chunk_str = match std::str::from_utf8(&chunk) {
|
||||||
|
Ok(s) => s,
|
||||||
|
Err(e) => {
|
||||||
|
error!("Failed to parse chunk as UTF-8: {}", e);
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
buffer.push_str(chunk_str);
|
||||||
|
|
||||||
|
// Process complete lines
|
||||||
|
while let Some(line_end) = buffer.find('\n') {
|
||||||
|
let line = buffer[..line_end].trim().to_string();
|
||||||
|
buffer.drain(..line_end + 1);
|
||||||
|
|
||||||
|
if line.is_empty() {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Parse Server-Sent Events format
|
||||||
|
if let Some(data) = line.strip_prefix("data: ") {
|
||||||
|
if data == "[DONE]" {
|
||||||
|
debug!("Received stream completion marker");
|
||||||
|
|
||||||
|
// Send final chunk with accumulated content and tool calls
|
||||||
|
if !accumulated_content.is_empty() || !current_tool_calls.is_empty() {
|
||||||
|
let tool_calls = if current_tool_calls.is_empty() {
|
||||||
|
None
|
||||||
|
} else {
|
||||||
|
Some(
|
||||||
|
current_tool_calls
|
||||||
|
.iter()
|
||||||
|
.filter_map(|tc| tc.to_tool_call())
|
||||||
|
.collect(),
|
||||||
|
)
|
||||||
|
};
|
||||||
|
|
||||||
|
let final_chunk = CompletionChunk {
|
||||||
|
content: accumulated_content.clone(),
|
||||||
|
finished: true,
|
||||||
|
tool_calls,
|
||||||
|
usage: accumulated_usage.clone(),
|
||||||
|
};
|
||||||
|
let _ = tx.send(Ok(final_chunk)).await;
|
||||||
|
}
|
||||||
|
|
||||||
|
return accumulated_usage;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Parse the JSON data
|
||||||
|
match serde_json::from_str::<OpenAIStreamChunk>(data) {
|
||||||
|
Ok(chunk_data) => {
|
||||||
|
// Handle content
|
||||||
|
for choice in &chunk_data.choices {
|
||||||
|
if let Some(content) = &choice.delta.content {
|
||||||
|
accumulated_content.push_str(content);
|
||||||
|
|
||||||
|
let chunk = CompletionChunk {
|
||||||
|
content: content.clone(),
|
||||||
|
finished: false,
|
||||||
|
tool_calls: None,
|
||||||
|
usage: None,
|
||||||
|
};
|
||||||
|
if tx.send(Ok(chunk)).await.is_err() {
|
||||||
|
debug!("Receiver dropped, stopping stream");
|
||||||
|
return accumulated_usage;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Handle tool calls
|
||||||
|
if let Some(delta_tool_calls) = &choice.delta.tool_calls {
|
||||||
|
for delta_tool_call in delta_tool_calls {
|
||||||
|
if let Some(index) = delta_tool_call.index {
|
||||||
|
// Ensure we have enough tool calls in our vector
|
||||||
|
while current_tool_calls.len() <= index {
|
||||||
|
current_tool_calls
|
||||||
|
.push(OpenAIStreamingToolCall::default());
|
||||||
|
}
|
||||||
|
|
||||||
|
let tool_call = &mut current_tool_calls[index];
|
||||||
|
|
||||||
|
if let Some(id) = &delta_tool_call.id {
|
||||||
|
tool_call.id = Some(id.clone());
|
||||||
|
}
|
||||||
|
|
||||||
|
if let Some(function) = &delta_tool_call.function {
|
||||||
|
if let Some(name) = &function.name {
|
||||||
|
tool_call.name = Some(name.clone());
|
||||||
|
}
|
||||||
|
if let Some(arguments) = &function.arguments {
|
||||||
|
tool_call.arguments.push_str(arguments);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Handle usage
|
||||||
|
if let Some(usage) = chunk_data.usage {
|
||||||
|
accumulated_usage = Some(Usage {
|
||||||
|
prompt_tokens: usage.prompt_tokens,
|
||||||
|
completion_tokens: usage.completion_tokens,
|
||||||
|
total_tokens: usage.total_tokens,
|
||||||
|
});
|
||||||
|
}
|
||||||
|
}
|
||||||
|
Err(e) => {
|
||||||
|
debug!("Failed to parse stream chunk: {} - Data: {}", e, data);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
Err(e) => {
|
||||||
|
error!("Stream error: {}", e);
|
||||||
|
let _ = tx.send(Err(anyhow::anyhow!("Stream error: {}", e))).await;
|
||||||
|
return accumulated_usage;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Send final chunk if we haven't already
|
||||||
|
let tool_calls = if current_tool_calls.is_empty() {
|
||||||
|
None
|
||||||
|
} else {
|
||||||
|
Some(
|
||||||
|
current_tool_calls
|
||||||
|
.iter()
|
||||||
|
.filter_map(|tc| tc.to_tool_call())
|
||||||
|
.collect(),
|
||||||
|
)
|
||||||
|
};
|
||||||
|
|
||||||
|
let final_chunk = CompletionChunk {
|
||||||
|
content: String::new(),
|
||||||
|
finished: true,
|
||||||
|
tool_calls,
|
||||||
|
usage: accumulated_usage.clone(),
|
||||||
|
};
|
||||||
|
let _ = tx.send(Ok(final_chunk)).await;
|
||||||
|
|
||||||
|
accumulated_usage
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[async_trait]
|
||||||
|
impl LLMProvider for OpenAIProvider {
|
||||||
|
async fn complete(&self, request: CompletionRequest) -> Result<CompletionResponse> {
|
||||||
|
debug!(
|
||||||
|
"Processing OpenAI completion request with {} messages",
|
||||||
|
request.messages.len()
|
||||||
|
);
|
||||||
|
|
||||||
|
let body = self.create_request_body(
|
||||||
|
&request.messages,
|
||||||
|
request.tools.as_deref(),
|
||||||
|
false,
|
||||||
|
request.max_tokens,
|
||||||
|
request.temperature,
|
||||||
|
);
|
||||||
|
|
||||||
|
debug!("Sending request to OpenAI API: model={}", self.model);
|
||||||
|
|
||||||
|
let response = self
|
||||||
|
.client
|
||||||
|
.post(&format!("{}/chat/completions", self.base_url))
|
||||||
|
.header("Authorization", format!("Bearer {}", self.api_key))
|
||||||
|
.json(&body)
|
||||||
|
.send()
|
||||||
|
.await?;
|
||||||
|
|
||||||
|
let status = response.status();
|
||||||
|
if !status.is_success() {
|
||||||
|
let error_text = response
|
||||||
|
.text()
|
||||||
|
.await
|
||||||
|
.unwrap_or_else(|_| "Unknown error".to_string());
|
||||||
|
return Err(anyhow::anyhow!("OpenAI API error {}: {}", status, error_text));
|
||||||
|
}
|
||||||
|
|
||||||
|
let openai_response: OpenAIResponse = response.json().await?;
|
||||||
|
|
||||||
|
let content = openai_response
|
||||||
|
.choices
|
||||||
|
.first()
|
||||||
|
.and_then(|choice| choice.message.content.clone())
|
||||||
|
.unwrap_or_default();
|
||||||
|
|
||||||
|
let usage = Usage {
|
||||||
|
prompt_tokens: openai_response.usage.prompt_tokens,
|
||||||
|
completion_tokens: openai_response.usage.completion_tokens,
|
||||||
|
total_tokens: openai_response.usage.total_tokens,
|
||||||
|
};
|
||||||
|
|
||||||
|
debug!(
|
||||||
|
"OpenAI completion successful: {} tokens generated",
|
||||||
|
usage.completion_tokens
|
||||||
|
);
|
||||||
|
|
||||||
|
Ok(CompletionResponse {
|
||||||
|
content,
|
||||||
|
usage,
|
||||||
|
model: self.model.clone(),
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn stream(&self, request: CompletionRequest) -> Result<CompletionStream> {
|
||||||
|
debug!(
|
||||||
|
"Processing OpenAI streaming request with {} messages",
|
||||||
|
request.messages.len()
|
||||||
|
);
|
||||||
|
|
||||||
|
let body = self.create_request_body(
|
||||||
|
&request.messages,
|
||||||
|
request.tools.as_deref(),
|
||||||
|
true,
|
||||||
|
request.max_tokens,
|
||||||
|
request.temperature,
|
||||||
|
);
|
||||||
|
|
||||||
|
debug!("Sending streaming request to OpenAI API: model={}", self.model);
|
||||||
|
|
||||||
|
let response = self
|
||||||
|
.client
|
||||||
|
.post(&format!("{}/chat/completions", self.base_url))
|
||||||
|
.header("Authorization", format!("Bearer {}", self.api_key))
|
||||||
|
.json(&body)
|
||||||
|
.send()
|
||||||
|
.await?;
|
||||||
|
|
||||||
|
let status = response.status();
|
||||||
|
if !status.is_success() {
|
||||||
|
let error_text = response
|
||||||
|
.text()
|
||||||
|
.await
|
||||||
|
.unwrap_or_else(|_| "Unknown error".to_string());
|
||||||
|
return Err(anyhow::anyhow!("OpenAI API error {}: {}", status, error_text));
|
||||||
|
}
|
||||||
|
|
||||||
|
let stream = response.bytes_stream();
|
||||||
|
let (tx, rx) = mpsc::channel(100);
|
||||||
|
|
||||||
|
// Spawn task to process the stream
|
||||||
|
let provider = self.clone();
|
||||||
|
tokio::spawn(async move {
|
||||||
|
let usage = provider.parse_streaming_response(stream, tx).await;
|
||||||
|
// Log the final usage if available
|
||||||
|
if let Some(usage) = usage {
|
||||||
|
debug!(
|
||||||
|
"Stream completed with usage - prompt: {}, completion: {}, total: {}",
|
||||||
|
usage.prompt_tokens, usage.completion_tokens, usage.total_tokens
|
||||||
|
);
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
Ok(ReceiverStream::new(rx))
|
||||||
|
}
|
||||||
|
|
||||||
|
fn name(&self) -> &str {
|
||||||
|
"openai"
|
||||||
|
}
|
||||||
|
|
||||||
|
fn model(&self) -> &str {
|
||||||
|
&self.model
|
||||||
|
}
|
||||||
|
|
||||||
|
fn has_native_tool_calling(&self) -> bool {
|
||||||
|
// OpenAI models support native tool calling
|
||||||
|
true
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
fn convert_messages(messages: &[Message]) -> Vec<serde_json::Value> {
|
||||||
|
messages
|
||||||
|
.iter()
|
||||||
|
.map(|msg| {
|
||||||
|
json!({
|
||||||
|
"role": match msg.role {
|
||||||
|
MessageRole::System => "system",
|
||||||
|
MessageRole::User => "user",
|
||||||
|
MessageRole::Assistant => "assistant",
|
||||||
|
},
|
||||||
|
"content": msg.content,
|
||||||
|
})
|
||||||
|
})
|
||||||
|
.collect()
|
||||||
|
}
|
||||||
|
|
||||||
|
fn convert_tools(tools: &[Tool]) -> Vec<serde_json::Value> {
|
||||||
|
tools
|
||||||
|
.iter()
|
||||||
|
.map(|tool| {
|
||||||
|
json!({
|
||||||
|
"type": "function",
|
||||||
|
"function": {
|
||||||
|
"name": tool.name,
|
||||||
|
"description": tool.description,
|
||||||
|
"parameters": tool.input_schema,
|
||||||
|
}
|
||||||
|
})
|
||||||
|
})
|
||||||
|
.collect()
|
||||||
|
}
|
||||||
|
|
||||||
|
// OpenAI API response structures
|
||||||
|
#[derive(Debug, Deserialize)]
|
||||||
|
struct OpenAIResponse {
|
||||||
|
choices: Vec<OpenAIChoice>,
|
||||||
|
usage: OpenAIUsage,
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Deserialize)]
|
||||||
|
struct OpenAIChoice {
|
||||||
|
message: OpenAIMessage,
|
||||||
|
}
|
||||||
|
|
||||||
|
#[allow(dead_code)]
|
||||||
|
#[derive(Debug, Deserialize)]
|
||||||
|
struct OpenAIMessage {
|
||||||
|
content: Option<String>,
|
||||||
|
#[serde(default)]
|
||||||
|
tool_calls: Option<Vec<OpenAIToolCall>>,
|
||||||
|
}
|
||||||
|
|
||||||
|
#[allow(dead_code)]
|
||||||
|
#[derive(Debug, Deserialize)]
|
||||||
|
struct OpenAIToolCall {
|
||||||
|
id: String,
|
||||||
|
function: OpenAIFunction,
|
||||||
|
}
|
||||||
|
|
||||||
|
#[allow(dead_code)]
|
||||||
|
#[derive(Debug, Deserialize)]
|
||||||
|
struct OpenAIFunction {
|
||||||
|
name: String,
|
||||||
|
arguments: String,
|
||||||
|
}
|
||||||
|
|
||||||
|
// Streaming tool call accumulator
|
||||||
|
#[derive(Debug, Default)]
|
||||||
|
struct OpenAIStreamingToolCall {
|
||||||
|
id: Option<String>,
|
||||||
|
name: Option<String>,
|
||||||
|
arguments: String,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl OpenAIStreamingToolCall {
|
||||||
|
fn to_tool_call(&self) -> Option<ToolCall> {
|
||||||
|
let id = self.id.as_ref()?;
|
||||||
|
let name = self.name.as_ref()?;
|
||||||
|
|
||||||
|
let args = serde_json::from_str(&self.arguments).unwrap_or(serde_json::Value::Null);
|
||||||
|
|
||||||
|
Some(ToolCall {
|
||||||
|
id: id.clone(),
|
||||||
|
tool: name.clone(),
|
||||||
|
args,
|
||||||
|
})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Deserialize)]
|
||||||
|
struct OpenAIUsage {
|
||||||
|
prompt_tokens: u32,
|
||||||
|
completion_tokens: u32,
|
||||||
|
total_tokens: u32,
|
||||||
|
}
|
||||||
|
|
||||||
|
// Streaming response structures
|
||||||
|
#[derive(Debug, Deserialize)]
|
||||||
|
struct OpenAIStreamChunk {
|
||||||
|
choices: Vec<OpenAIStreamChoice>,
|
||||||
|
usage: Option<OpenAIUsage>,
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Deserialize)]
|
||||||
|
struct OpenAIStreamChoice {
|
||||||
|
delta: OpenAIDelta,
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Deserialize)]
|
||||||
|
struct OpenAIDelta {
|
||||||
|
content: Option<String>,
|
||||||
|
#[serde(default)]
|
||||||
|
tool_calls: Option<Vec<OpenAIDeltaToolCall>>,
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Deserialize)]
|
||||||
|
struct OpenAIDeltaToolCall {
|
||||||
|
index: Option<usize>,
|
||||||
|
id: Option<String>,
|
||||||
|
function: Option<OpenAIDeltaFunction>,
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Deserialize)]
|
||||||
|
struct OpenAIDeltaFunction {
|
||||||
|
name: Option<String>,
|
||||||
|
arguments: Option<String>,
|
||||||
|
}
|
||||||
75
docs/coach-player-providers.md
Normal file
75
docs/coach-player-providers.md
Normal file
@@ -0,0 +1,75 @@
|
|||||||
|
# Coach-Player Provider Configuration
|
||||||
|
|
||||||
|
G3 now supports specifying different LLM providers for the coach and player agents when running in autonomous mode. This allows you to optimize for different requirements:
|
||||||
|
|
||||||
|
- **Player**: The agent that implements code - might benefit from a faster, more cost-effective model
|
||||||
|
- **Coach**: The agent that reviews code - might benefit from a more powerful, analytical model
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
In your `config.toml` file, under the `[providers]` section, you can specify:
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[providers]
|
||||||
|
default_provider = "databricks" # Used for normal operations
|
||||||
|
coach = "databricks" # Provider for coach (code reviewer)
|
||||||
|
player = "anthropic" # Provider for player (code implementer)
|
||||||
|
```
|
||||||
|
|
||||||
|
If `coach` or `player` are not specified, they will default to using the `default_provider`.
|
||||||
|
|
||||||
|
## Example Use Cases
|
||||||
|
|
||||||
|
### Cost Optimization
|
||||||
|
Use a cheaper, faster model for initial implementations (player) and a more powerful model for review (coach):
|
||||||
|
|
||||||
|
```toml
|
||||||
|
coach = "anthropic" # Claude Sonnet for thorough review
|
||||||
|
player = "anthropic" # Claude Haiku for quick implementation
|
||||||
|
```
|
||||||
|
|
||||||
|
### Speed vs Quality Trade-off
|
||||||
|
Use a local embedded model for fast iterations (player) and a cloud model for quality review (coach):
|
||||||
|
|
||||||
|
```toml
|
||||||
|
coach = "databricks" # Cloud model for quality review
|
||||||
|
player = "embedded" # Local model for fast implementation
|
||||||
|
```
|
||||||
|
|
||||||
|
### Specialized Models
|
||||||
|
Use different models optimized for different tasks:
|
||||||
|
|
||||||
|
```toml
|
||||||
|
coach = "databricks" # Model fine-tuned for code review
|
||||||
|
player = "openai" # Model optimized for code generation
|
||||||
|
```
|
||||||
|
|
||||||
|
## Requirements
|
||||||
|
|
||||||
|
- Both providers must be properly configured in your config file
|
||||||
|
- Each provider must have valid credentials
|
||||||
|
- The models specified for each provider must be accessible
|
||||||
|
|
||||||
|
## How It Works
|
||||||
|
|
||||||
|
When running in autonomous mode (`g3 --autonomous`), the system will:
|
||||||
|
|
||||||
|
1. Use the `player` provider (or default) for the initial implementation
|
||||||
|
2. Switch to the `coach` provider (or default) for code review
|
||||||
|
3. Return to the `player` provider for implementing feedback
|
||||||
|
4. Continue this cycle for the specified number of turns
|
||||||
|
|
||||||
|
The providers are logged at startup so you can verify which models are being used:
|
||||||
|
|
||||||
|
```
|
||||||
|
🎮 Player provider: anthropic
|
||||||
|
👨🏫 Coach provider: databricks
|
||||||
|
ℹ️ Using different providers for player and coach
|
||||||
|
```
|
||||||
|
|
||||||
|
## Benefits
|
||||||
|
|
||||||
|
- **Cost Efficiency**: Use expensive models only where they add the most value
|
||||||
|
- **Speed Optimization**: Use faster models for iterative development
|
||||||
|
- **Specialization**: Leverage models that excel at specific tasks
|
||||||
|
- **Flexibility**: Easy to experiment with different provider combinations
|
||||||
39
test-ai-requirements.sh
Executable file
39
test-ai-requirements.sh
Executable file
@@ -0,0 +1,39 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
# Test script for AI-enhanced interactive requirements mode
|
||||||
|
|
||||||
|
echo "Testing AI-enhanced interactive requirements mode..."
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# Create a test workspace
|
||||||
|
TEST_WORKSPACE="/tmp/g3-test-interactive-$(date +%s)"
|
||||||
|
mkdir -p "$TEST_WORKSPACE"
|
||||||
|
|
||||||
|
echo "Test workspace: $TEST_WORKSPACE"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# Create sample brief input
|
||||||
|
BRIEF_INPUT="build a calculator cli in rust with basic operations"
|
||||||
|
|
||||||
|
echo "Brief input:"
|
||||||
|
echo "---"
|
||||||
|
echo "$BRIEF_INPUT"
|
||||||
|
echo "---"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
echo "This will:"
|
||||||
|
echo "1. Send brief input to AI"
|
||||||
|
echo "2. AI generates structured requirements.md"
|
||||||
|
echo "3. Show enhanced requirements"
|
||||||
|
echo "4. Prompt for confirmation (y/e/n)"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
echo "To test manually, run:"
|
||||||
|
echo "cargo run -- --autonomous --interactive-requirements --workspace $TEST_WORKSPACE"
|
||||||
|
echo ""
|
||||||
|
echo "Then type: $BRIEF_INPUT"
|
||||||
|
echo "Press Ctrl+D"
|
||||||
|
echo "Review the AI-generated requirements"
|
||||||
|
echo "Choose 'y' to proceed, 'e' to edit, or 'n' to cancel"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
echo "Test workspace will be at: $TEST_WORKSPACE"
|
||||||
Reference in New Issue
Block a user