Compare commits
45 Commits
micn/agent
...
micn/alway
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
c42e0bce54 | ||
|
|
b529d7f814 | ||
|
|
9752e81489 | ||
|
|
63c2aff7ba | ||
|
|
aa4a0267ea | ||
|
|
6cfa1e225c | ||
|
|
f53cd8e8f3 | ||
|
|
45bffc40da | ||
|
|
4bf0f71bbd | ||
|
|
c1ce3038d8 | ||
|
|
4b1694b308 | ||
|
|
5e08d6bbba | ||
|
|
c3f3f79dc5 | ||
|
|
834153ea69 | ||
|
|
65f25f840e | ||
|
|
a8af5d7cc1 | ||
|
|
61d748034d | ||
|
|
d0ac222e2e | ||
|
|
e1e732150a | ||
|
|
0be4829ca9 | ||
|
|
efd4eca755 | ||
|
|
3ec65e38ee | ||
|
|
c5d6fbef08 | ||
|
|
f93844d378 | ||
|
|
af6d37a8e2 | ||
|
|
c1c6680e03 | ||
|
|
f2d8e744bb | ||
|
|
010a43d203 | ||
|
|
758e255af8 | ||
|
|
393826ae02 | ||
|
|
3afad3d61f | ||
|
|
2488cc54d5 | ||
|
|
2ad0c9a3fd | ||
|
|
2008a81193 | ||
|
|
776f5034b8 | ||
|
|
92bece957b | ||
|
|
767299ff4e | ||
|
|
9d35449be8 | ||
|
|
da652bf287 | ||
|
|
a566171203 | ||
|
|
347c9e1e00 | ||
|
|
aa7eda0331 | ||
|
|
e42c76f3b9 | ||
|
|
dd211fab1c | ||
|
|
bcece38473 |
5
.cargo/config.toml
Normal file
5
.cargo/config.toml
Normal file
@@ -0,0 +1,5 @@
|
||||
[target.aarch64-apple-darwin]
|
||||
rustflags = ["-C", "link-args=-Wl,-rpath,@executable_path"]
|
||||
|
||||
[target.x86_64-apple-darwin]
|
||||
rustflags = ["-C", "link-args=-Wl,-rpath,@executable_path"]
|
||||
1
.gitignore
vendored
1
.gitignore
vendored
@@ -2,6 +2,7 @@
|
||||
# will have compiled files and executables
|
||||
debug
|
||||
target
|
||||
.build
|
||||
|
||||
# These are backup files generated by rustfmt
|
||||
**/*.rs.bk
|
||||
|
||||
1154
Cargo.lock
generated
1154
Cargo.lock
generated
File diff suppressed because it is too large
Load Diff
@@ -4,7 +4,8 @@ members = [
|
||||
"crates/g3-core",
|
||||
"crates/g3-providers",
|
||||
"crates/g3-config",
|
||||
"crates/g3-execution"
|
||||
"crates/g3-execution",
|
||||
"crates/g3-computer-control"
|
||||
]
|
||||
resolver = "2"
|
||||
|
||||
|
||||
62
DESIGN.md
62
DESIGN.md
@@ -29,7 +29,8 @@ g3/
|
||||
│ ├── g3-core/ # Core agent engine, tools, and streaming logic
|
||||
│ ├── g3-providers/ # LLM provider abstractions and implementations
|
||||
│ ├── g3-config/ # Configuration management
|
||||
│ └── g3-execution/ # Code execution engine
|
||||
│ ├── g3-execution/ # Code execution engine
|
||||
│ └── g3-computer-control/ # Computer control and automation
|
||||
├── logs/ # Session logs (auto-created)
|
||||
├── README.md # Project documentation
|
||||
└── DESIGN.md # This design document
|
||||
@@ -48,6 +49,7 @@ g3/
|
||||
│ • Retro TUI │ │ • Tool system │ │ • Embedded │
|
||||
│ • Autonomous │ │ • Streaming │ │ (llama.cpp) │
|
||||
│ mode │ │ • Task exec │ │ • OAuth flow │
|
||||
│ │ │ • TODO mgmt │ │ │
|
||||
└─────────────────┘ └─────────────────┘ └─────────────────┘
|
||||
│ │ │
|
||||
└───────────────────────┼───────────────────────┘
|
||||
@@ -59,7 +61,18 @@ g3/
|
||||
│ • Shell cmds │ │ • Env overrides │
|
||||
│ • Streaming │ │ • Provider │
|
||||
│ • Error hdlg │ │ settings │
|
||||
└─────────────────┘ └─────────────────┘
|
||||
└─────────────────┘ │ • Computer │
|
||||
│ │ control cfg │
|
||||
│ └─────────────────┘
|
||||
│ │
|
||||
┌─────────────────┐ │
|
||||
│ g3-computer- │◄────────────┘
|
||||
│ control │
|
||||
│ • Mouse/kbd │
|
||||
│ • Screenshots │
|
||||
│ • OCR/Tesseract │
|
||||
│ • Windows/UI │
|
||||
└─────────────────┘
|
||||
```
|
||||
|
||||
## Core Components
|
||||
@@ -79,6 +92,7 @@ g3/
|
||||
- **Streaming Parser**: Real-time parsing of LLM responses with tool call detection and execution
|
||||
- **Session Management**: Automatic session logging with detailed conversation history and token usage
|
||||
- **Error Recovery**: Sophisticated error classification and retry logic for recoverable errors
|
||||
- **TODO Management**: In-memory TODO list with read/write tools for task tracking
|
||||
|
||||
**Available Tools:**
|
||||
- `shell`: Execute shell commands with streaming output
|
||||
@@ -86,7 +100,15 @@ g3/
|
||||
- `write_file`: Create or overwrite files with content
|
||||
- `str_replace`: Apply unified diffs to files with precise editing
|
||||
- `final_output`: Signal task completion with detailed summaries
|
||||
- **Project Management**: Workspace handling, requirements.md processing for autonomous mode
|
||||
- `todo_read`: Read the entire TODO list content
|
||||
- `todo_write`: Write or overwrite the entire TODO list
|
||||
- `mouse_click`: Click the mouse at specific coordinates
|
||||
- `type_text`: Type text at the current cursor position
|
||||
- `find_element`: Find UI elements by text, role, or attributes
|
||||
- `take_screenshot`: Capture screenshots of screen, region, or window
|
||||
- `extract_text`: Extract text from images or screen regions using OCR
|
||||
- `find_text_on_screen`: Find text visually on screen and return coordinates
|
||||
- `list_windows`: List all open windows with IDs and titles
|
||||
|
||||
### 2. g3-providers: LLM Provider Abstraction
|
||||
|
||||
@@ -172,6 +194,26 @@ g3/
|
||||
- **Validation**: Configuration validation with helpful error messages
|
||||
- **Flexible Paths**: Support for shell expansion (`~`, environment variables)
|
||||
|
||||
### 6. g3-computer-control: Computer Control & Automation
|
||||
|
||||
**Primary Responsibilities:**
|
||||
- Cross-platform computer control and automation
|
||||
- Mouse and keyboard input simulation
|
||||
- Window management and screenshot capture
|
||||
- OCR text extraction from images and screen regions
|
||||
|
||||
**Platform Support:**
|
||||
- **macOS**: Core Graphics, Cocoa, screencapture integration
|
||||
- **Linux**: X11/Xtest for input, X11 for window management
|
||||
- **Windows**: Win32 APIs for input and window control
|
||||
|
||||
**Key Features:**
|
||||
- **OCR Integration**: Tesseract-based text extraction from images
|
||||
- **Window Management**: List, identify, and capture specific application windows
|
||||
- **UI Automation**: Find elements, simulate clicks, type text
|
||||
- **Screenshot Capture**: Full screen, regions, or specific windows
|
||||
- **Accessibility**: Requires OS-level permissions for automation
|
||||
|
||||
## Advanced Features
|
||||
|
||||
### Context Window Management
|
||||
@@ -180,6 +222,7 @@ G3 implements sophisticated context window management:
|
||||
|
||||
- **Automatic Monitoring**: Tracks token usage with percentage-based thresholds
|
||||
- **Smart Summarization**: Auto-triggers at 80% capacity to prevent context overflow
|
||||
- **Context Thinning**: Progressive thinning at 50%, 60%, 70%, 80% thresholds - replaces large tool results with file references
|
||||
- **Conversation Preservation**: Maintains conversation continuity through intelligent summaries
|
||||
- **Provider-Specific Limits**: Adapts to different model context windows (4k to 200k+ tokens)
|
||||
- **Cumulative Tracking**: Monitors total token usage across entire sessions
|
||||
@@ -354,20 +397,23 @@ This design document reflects the current state of G3 as a mature, production-re
|
||||
### Fully Implemented
|
||||
- ✅ **Core Agent Engine**: Complete with streaming, tool execution, and context management
|
||||
- ✅ **Provider System**: Anthropic, Databricks, and Embedded providers with OAuth support
|
||||
- ✅ **Tool System**: All 5 core tools (shell, read_file, write_file, str_replace, final_output)
|
||||
- ✅ **Tool System**: 13 tools including file ops, shell, TODO management, and computer control
|
||||
- ✅ **CLI Interface**: Interactive mode, single-shot mode, retro TUI
|
||||
- ✅ **Autonomous Mode**: Coach-player feedback loop with requirements.md processing
|
||||
- ✅ **Configuration**: TOML-based config with environment overrides
|
||||
- ✅ **Error Handling**: Comprehensive retry logic and error classification
|
||||
- ✅ **Session Logging**: Automatic session tracking and JSON logs
|
||||
- ✅ **Context Management**: Auto-summarization at 80% capacity
|
||||
- ✅ **Context Management**: Context thinning (50-80%) and auto-summarization at 80% capacity
|
||||
- ✅ **Computer Control**: Cross-platform automation with OCR support
|
||||
- ✅ **TODO Management**: In-memory TODO list with read/write tools
|
||||
|
||||
### Architecture Highlights
|
||||
- **Workspace**: 5 crates with clear separation of concerns
|
||||
- **Workspace**: 6 crates with clear separation of concerns
|
||||
- **Dependencies**: Modern Rust ecosystem (Tokio, Clap, Serde, etc.)
|
||||
- **Streaming**: Real-time response processing with tool call detection
|
||||
- **Cross-Platform**: Works on macOS, Linux, and Windows
|
||||
- **GPU Support**: Metal acceleration for local models on macOS
|
||||
- **GPU Support**: Metal acceleration for local models on macOS, CUDA on Linux
|
||||
- **OCR Support**: Tesseract integration for text extraction from images
|
||||
|
||||
### Key Files
|
||||
- `src/main.rs`: main entry point delegating to g3-cli
|
||||
@@ -376,3 +422,5 @@ This design document reflects the current state of G3 as a mature, production-re
|
||||
- `crates/g3-providers/src/lib.rs`: provider trait and registry
|
||||
- `crates/g3-config/src/lib.rs`: configuration management
|
||||
- `crates/g3-execution/src/lib.rs`: code execution engine
|
||||
- `crates/g3-computer-control/src/lib.rs`: computer control and automation
|
||||
- `crates/g3-computer-control/src/platform/`: platform-specific implementations
|
||||
|
||||
125
README.md
125
README.md
@@ -11,8 +11,8 @@ G3 follows a modular architecture organized as a Rust workspace with multiple cr
|
||||
#### **g3-core**
|
||||
The heart of the agent system, containing:
|
||||
- **Agent Engine**: Main orchestration logic for handling conversations, tool execution, and task management
|
||||
- **Context Window Management**: Intelligent tracking of token usage with auto-summarization capabilities when approaching context limits (~80% capacity)
|
||||
- **Tool System**: Built-in tools for file operations (read, write, edit), shell command execution, and structured output generation
|
||||
- **Context Window Management**: Intelligent tracking of token usage with context thinning (50-80%) and auto-summarization at 80% capacity
|
||||
- **Tool System**: Built-in tools for file operations, shell commands, computer control, TODO management, and structured output
|
||||
- **Streaming Response Parser**: Real-time parsing of LLM responses with tool call detection and execution
|
||||
- **Task Execution**: Support for single and iterative task execution with automatic retry logic
|
||||
|
||||
@@ -40,6 +40,13 @@ Task execution framework:
|
||||
- Error handling and retry mechanisms
|
||||
- Progress tracking and reporting
|
||||
|
||||
#### **g3-computer-control**
|
||||
Computer control capabilities:
|
||||
- Mouse and keyboard automation
|
||||
- UI element inspection and interaction
|
||||
- Screenshot capture and window management
|
||||
- OCR text extraction via Tesseract
|
||||
|
||||
#### **g3-cli**
|
||||
Command-line interface:
|
||||
- Interactive terminal interface
|
||||
@@ -61,13 +68,32 @@ G3 includes robust error handling with automatic retry logic:
|
||||
### Intelligent Context Management
|
||||
- Automatic context window monitoring with percentage-based tracking
|
||||
- Smart auto-summarization when approaching token limits
|
||||
- **Context thinning** at 50%, 60%, 70%, 80% thresholds - automatically replaces large tool results with file references
|
||||
- Conversation history preservation through summaries
|
||||
- Dynamic token allocation for different providers
|
||||
- Dynamic token allocation for different providers (4k to 200k+ tokens)
|
||||
|
||||
### Interactive Control Commands
|
||||
G3's interactive CLI includes control commands for manual context management:
|
||||
- **`/compact`**: Manually trigger summarization to compact conversation history
|
||||
- **`/thinnify`**: Manually trigger context thinning to replace large tool results with file references
|
||||
- **`/readme`**: Reload README.md and AGENTS.md from disk without restarting
|
||||
- **`/stats`**: Show detailed context and performance statistics
|
||||
- **`/help`**: Display all available control commands
|
||||
|
||||
These commands give you fine-grained control over context management, allowing you to proactively optimize token usage and refresh project documentation. See [Control Commands Documentation](docs/CONTROL_COMMANDS.md) for detailed usage.
|
||||
|
||||
### Tool Ecosystem
|
||||
- **File Operations**: Read, write, and edit files with line-range precision
|
||||
- **Shell Integration**: Execute system commands with output capture
|
||||
- **Code Generation**: Structured code generation with syntax awareness
|
||||
- **TODO Management**: Read and write TODO lists with markdown checkbox format
|
||||
- **Computer Control** (Experimental): Automate desktop applications
|
||||
- Mouse and keyboard control
|
||||
- macOS Accessibility API for native app automation (via `--macax` flag)
|
||||
- UI element inspection
|
||||
- Screenshot capture and window management
|
||||
- OCR text extraction from images and screen regions
|
||||
- Window listing and identification
|
||||
- **Final Output**: Formatted result presentation
|
||||
|
||||
### Provider Flexibility
|
||||
@@ -98,24 +124,111 @@ G3 is designed for:
|
||||
- Automated code generation and refactoring
|
||||
- File manipulation and project scaffolding
|
||||
- System administration tasks
|
||||
- Data processing and transformation
|
||||
- Data processing and transformation
|
||||
- API integration and testing
|
||||
- Documentation generation
|
||||
- Complex multi-step workflows
|
||||
- Desktop application automation and testing
|
||||
|
||||
## Getting Started
|
||||
|
||||
### Default Mode: Accumulative Autonomous
|
||||
|
||||
The default interactive mode now uses **accumulative autonomous mode**, which combines the best of interactive and autonomous workflows:
|
||||
|
||||
```bash
|
||||
# Simply run g3 in any directory
|
||||
g3
|
||||
|
||||
# You'll be prompted to describe what you want to build
|
||||
# Each input you provide:
|
||||
# 1. Gets added to accumulated requirements
|
||||
# 2. Automatically triggers autonomous mode (coach-player loop)
|
||||
# 3. Implements your requirements iteratively
|
||||
|
||||
# Example session:
|
||||
requirement> create a simple web server in Python with Flask
|
||||
# ... autonomous mode runs and implements it ...
|
||||
requirement> add a /health endpoint that returns JSON
|
||||
# ... autonomous mode runs again with both requirements ...
|
||||
```
|
||||
|
||||
### Other Modes
|
||||
|
||||
```bash
|
||||
# Single-shot mode (one task, then exit)
|
||||
g3 "implement a function to calculate fibonacci numbers"
|
||||
|
||||
# Traditional autonomous mode (reads requirements.md)
|
||||
g3 --autonomous
|
||||
|
||||
# Traditional chat mode (simple interactive chat without autonomous runs)
|
||||
g3 --chat
|
||||
```
|
||||
|
||||
```bash
|
||||
# Build the project
|
||||
cargo build --release
|
||||
|
||||
# Run G3
|
||||
cargo run
|
||||
# Run from the build directory
|
||||
./target/release/g3
|
||||
|
||||
# Or copy both files to somewhere in your PATH (macOS only needs both files)
|
||||
cp target/release/g3 ~/.local/bin/
|
||||
cp target/release/libVisionBridge.dylib ~/.local/bin/ # macOS only
|
||||
|
||||
# Execute a task
|
||||
g3 "implement a function to calculate fibonacci numbers"
|
||||
```
|
||||
|
||||
## WebDriver Browser Automation
|
||||
|
||||
G3 includes WebDriver support for browser automation tasks using Safari.
|
||||
|
||||
**One-Time Setup** (macOS only):
|
||||
|
||||
Safari Remote Automation must be enabled before using WebDriver tools. Run this once:
|
||||
|
||||
```bash
|
||||
# Option 1: Use the provided script
|
||||
./scripts/enable-safari-automation.sh
|
||||
|
||||
# Option 2: Enable manually
|
||||
safaridriver --enable # Requires password
|
||||
|
||||
# Option 3: Enable via Safari UI
|
||||
# Safari → Preferences → Advanced → Show Develop menu
|
||||
# Then: Develop → Allow Remote Automation
|
||||
```
|
||||
|
||||
**For detailed setup instructions and troubleshooting**, see [WebDriver Setup Guide](docs/webdriver-setup.md).
|
||||
|
||||
**Usage**: Run G3 with the `--webdriver` flag to enable browser automation tools.
|
||||
|
||||
## macOS Accessibility API Tools
|
||||
|
||||
G3 includes support for controlling macOS applications via the Accessibility API, allowing you to automate native macOS apps.
|
||||
|
||||
**Available Tools**: `macax_list_apps`, `macax_get_frontmost_app`, `macax_activate_app`, `macax_get_ui_tree`, `macax_find_elements`, `macax_click`, `macax_set_value`, `macax_get_value`, `macax_press_key`
|
||||
|
||||
**Setup**: Enable with the `--macax` flag or in config with `macax.enabled = true`. Grant accessibility permissions:
|
||||
- **macOS**: System Preferences → Security & Privacy → Privacy → Accessibility → Add your terminal app
|
||||
|
||||
**For detailed documentation**, see [macOS Accessibility Tools Guide](docs/macax-tools.md).
|
||||
|
||||
**Note**: This is particularly useful for testing and automating apps you're building with G3, as you can add accessibility identifiers to your UI elements.
|
||||
|
||||
## Computer Control (Experimental)
|
||||
|
||||
G3 can interact with your computer's GUI for automation tasks:
|
||||
|
||||
**Available Tools**: `mouse_click`, `type_text`, `find_element`, `take_screenshot`, `extract_text`, `find_text_on_screen`, `list_windows`
|
||||
|
||||
**Setup**: Enable in config with `computer_control.enabled = true` and grant OS accessibility permissions:
|
||||
- **macOS**: System Preferences → Security & Privacy → Accessibility
|
||||
- **Linux**: Ensure X11 or Wayland access
|
||||
- **Windows**: Run as administrator (first time only)
|
||||
|
||||
## Session Logs
|
||||
|
||||
G3 automatically saves session logs for each interaction in the `logs/` directory. These logs contain:
|
||||
|
||||
24
config.coach-player.example.toml
Normal file
24
config.coach-player.example.toml
Normal file
@@ -0,0 +1,24 @@
|
||||
[providers]
|
||||
default_provider = "databricks"
|
||||
# Specify different providers for coach and player in autonomous mode
|
||||
coach = "databricks" # Provider for coach (code reviewer) - can be more powerful/expensive
|
||||
player = "anthropic" # Provider for player (code implementer) - can be faster/cheaper
|
||||
|
||||
[providers.databricks]
|
||||
host = "https://your-workspace.cloud.databricks.com"
|
||||
# token = "your-databricks-token" # Optional - will use OAuth if not provided
|
||||
model = "databricks-claude-sonnet-4"
|
||||
max_tokens = 4096
|
||||
temperature = 0.1
|
||||
use_oauth = true
|
||||
|
||||
[providers.anthropic]
|
||||
api_key = "your-anthropic-api-key"
|
||||
model = "claude-3-haiku-20240307" # Using a faster model for player
|
||||
max_tokens = 4096
|
||||
temperature = 0.3 # Slightly higher temperature for more creative implementations
|
||||
|
||||
[agent]
|
||||
max_context_length = 8192
|
||||
enable_streaming = true
|
||||
timeout_seconds = 60
|
||||
@@ -1,5 +1,10 @@
|
||||
[providers]
|
||||
default_provider = "databricks"
|
||||
# Optional: Specify different providers for coach and player in autonomous mode
|
||||
# If not specified, will use default_provider for both
|
||||
# coach = "databricks" # Provider for coach (code reviewer)
|
||||
# player = "anthropic" # Provider for player (code implementer)
|
||||
# Note: Make sure the specified providers are configured below
|
||||
|
||||
[providers.databricks]
|
||||
host = "https://your-workspace.cloud.databricks.com"
|
||||
@@ -13,3 +18,8 @@ use_oauth = true
|
||||
max_context_length = 8192
|
||||
enable_streaming = true
|
||||
timeout_seconds = 60
|
||||
|
||||
[computer_control]
|
||||
enabled = false # Set to true to enable computer control (requires OS permissions)
|
||||
require_confirmation = true
|
||||
max_actions_per_second = 5
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
94
crates/g3-cli/src/machine_ui_writer.rs
Normal file
94
crates/g3-cli/src/machine_ui_writer.rs
Normal file
@@ -0,0 +1,94 @@
|
||||
use g3_core::ui_writer::UiWriter;
|
||||
use std::io::{self, Write};
|
||||
|
||||
/// Machine-mode implementation of UiWriter that prints plain, unformatted output
|
||||
/// This is designed for programmatic consumption and outputs everything verbatim
|
||||
pub struct MachineUiWriter;
|
||||
|
||||
impl MachineUiWriter {
|
||||
pub fn new() -> Self {
|
||||
Self
|
||||
}
|
||||
}
|
||||
|
||||
impl UiWriter for MachineUiWriter {
|
||||
fn print(&self, message: &str) {
|
||||
print!("{}", message);
|
||||
}
|
||||
|
||||
fn println(&self, message: &str) {
|
||||
println!("{}", message);
|
||||
}
|
||||
|
||||
fn print_inline(&self, message: &str) {
|
||||
print!("{}", message);
|
||||
let _ = io::stdout().flush();
|
||||
}
|
||||
|
||||
fn print_system_prompt(&self, prompt: &str) {
|
||||
println!("SYSTEM_PROMPT:");
|
||||
println!("{}", prompt);
|
||||
println!("END_SYSTEM_PROMPT");
|
||||
println!();
|
||||
}
|
||||
|
||||
fn print_context_status(&self, message: &str) {
|
||||
println!("CONTEXT_STATUS: {}", message);
|
||||
}
|
||||
|
||||
fn print_context_thinning(&self, message: &str) {
|
||||
println!("CONTEXT_THINNING: {}", message);
|
||||
}
|
||||
|
||||
fn print_tool_header(&self, tool_name: &str) {
|
||||
println!("TOOL_CALL: {}", tool_name);
|
||||
}
|
||||
|
||||
fn print_tool_arg(&self, key: &str, value: &str) {
|
||||
println!("TOOL_ARG: {} = {}", key, value);
|
||||
}
|
||||
|
||||
fn print_tool_output_header(&self) {
|
||||
println!("TOOL_OUTPUT:");
|
||||
}
|
||||
|
||||
fn update_tool_output_line(&self, line: &str) {
|
||||
println!("{}", line);
|
||||
}
|
||||
|
||||
fn print_tool_output_line(&self, line: &str) {
|
||||
println!("{}", line);
|
||||
}
|
||||
|
||||
fn print_tool_output_summary(&self, count: usize) {
|
||||
println!("TOOL_OUTPUT_LINES: {}", count);
|
||||
}
|
||||
|
||||
fn print_tool_timing(&self, duration_str: &str) {
|
||||
println!("TOOL_DURATION: {}", duration_str);
|
||||
println!("END_TOOL_OUTPUT");
|
||||
println!();
|
||||
}
|
||||
|
||||
fn print_agent_prompt(&self) {
|
||||
println!("AGENT_RESPONSE:");
|
||||
let _ = io::stdout().flush();
|
||||
}
|
||||
|
||||
fn print_agent_response(&self, content: &str) {
|
||||
print!("{}", content);
|
||||
let _ = io::stdout().flush();
|
||||
}
|
||||
|
||||
fn notify_sse_received(&self) {
|
||||
// No-op for machine mode
|
||||
}
|
||||
|
||||
fn flush(&self) {
|
||||
let _ = io::stdout().flush();
|
||||
}
|
||||
|
||||
fn wants_full_output(&self) -> bool {
|
||||
true // Machine mode wants complete, untruncated output
|
||||
}
|
||||
}
|
||||
@@ -267,23 +267,23 @@ impl TerminalState {
|
||||
let mut current_text = String::new();
|
||||
|
||||
// Check for headers first
|
||||
if line.starts_with("### ") {
|
||||
if let Some(stripped) = line.strip_prefix("### ") {
|
||||
return Line::from(Span::styled(
|
||||
format!(" {}", &line[4..]),
|
||||
format!(" {}", stripped),
|
||||
Style::default()
|
||||
.fg(self.theme.terminal_cyan.to_color())
|
||||
.add_modifier(Modifier::BOLD | Modifier::UNDERLINED),
|
||||
));
|
||||
} else if line.starts_with("## ") {
|
||||
} else if let Some(stripped) = line.strip_prefix("## ") {
|
||||
return Line::from(Span::styled(
|
||||
format!(" {}", &line[3..]),
|
||||
format!(" {}", stripped),
|
||||
Style::default()
|
||||
.fg(self.theme.terminal_amber.to_color())
|
||||
.add_modifier(Modifier::BOLD),
|
||||
));
|
||||
} else if line.starts_with("# ") {
|
||||
} else if let Some(stripped) = line.strip_prefix("# ") {
|
||||
return Line::from(Span::styled(
|
||||
format!(" {}", &line[2..]),
|
||||
format!(" {}", stripped),
|
||||
Style::default()
|
||||
.fg(self.theme.terminal_green.to_color())
|
||||
.add_modifier(Modifier::BOLD),
|
||||
@@ -343,7 +343,7 @@ impl TerminalState {
|
||||
}
|
||||
// Find closing *
|
||||
let mut italic_text = String::new();
|
||||
while let Some(ch) = chars.next() {
|
||||
for ch in chars.by_ref() {
|
||||
if ch == '*' {
|
||||
break;
|
||||
}
|
||||
@@ -367,7 +367,7 @@ impl TerminalState {
|
||||
}
|
||||
// Find closing `
|
||||
let mut code_text = String::new();
|
||||
while let Some(ch) = chars.next() {
|
||||
for ch in chars.by_ref() {
|
||||
if ch == '`' {
|
||||
break;
|
||||
}
|
||||
@@ -612,11 +612,9 @@ impl RetroTui {
|
||||
}
|
||||
|
||||
// Update status blink only if status is "PROCESSING"
|
||||
if state.status_line == "PROCESSING" {
|
||||
if state.last_status_blink.elapsed() > Duration::from_millis(500) {
|
||||
state.status_blink = !state.status_blink;
|
||||
state.last_status_blink = Instant::now();
|
||||
}
|
||||
if state.status_line == "PROCESSING" && state.last_status_blink.elapsed() > Duration::from_millis(500) {
|
||||
state.status_blink = !state.status_blink;
|
||||
state.last_status_blink = Instant::now();
|
||||
}
|
||||
|
||||
// Update activity area animation
|
||||
@@ -771,12 +769,7 @@ impl RetroTui {
|
||||
let total_cursor_pos = cursor_position;
|
||||
|
||||
// Determine the window into the buffer we should show
|
||||
let window_start = if total_cursor_pos > available_width - 1 {
|
||||
// Cursor is beyond the visible area, scroll the view
|
||||
total_cursor_pos - (available_width - 1)
|
||||
} else {
|
||||
0
|
||||
};
|
||||
let window_start = total_cursor_pos.saturating_sub(available_width - 1);
|
||||
|
||||
// Get the visible portion of the buffer
|
||||
let visible_buffer: String = input_buffer
|
||||
@@ -1013,9 +1006,9 @@ impl RetroTui {
|
||||
let fade_color = |color: Color| -> Color {
|
||||
match color {
|
||||
Color::Rgb(r, g, b) => {
|
||||
let faded_r = ((r as f32 * opacity) as u8).max(0);
|
||||
let faded_g = ((g as f32 * opacity) as u8).max(0);
|
||||
let faded_b = ((b as f32 * opacity) as u8).max(0);
|
||||
let faded_r = (r as f32 * opacity) as u8;
|
||||
let faded_g = (g as f32 * opacity) as u8;
|
||||
let faded_b = (b as f32 * opacity) as u8;
|
||||
Color::Rgb(faded_r, faded_g, faded_b)
|
||||
}
|
||||
_ => color,
|
||||
@@ -1098,9 +1091,9 @@ impl RetroTui {
|
||||
let fade_color = |color: Color| -> Color {
|
||||
match color {
|
||||
Color::Rgb(r, g, b) => {
|
||||
let faded_r = ((r as f32 * opacity) as u8).max(0);
|
||||
let faded_g = ((g as f32 * opacity) as u8).max(0);
|
||||
let faded_b = ((b as f32 * opacity) as u8).max(0);
|
||||
let faded_r = (r as f32 * opacity) as u8;
|
||||
let faded_g = (g as f32 * opacity) as u8;
|
||||
let faded_b = (b as f32 * opacity) as u8;
|
||||
Color::Rgb(faded_r, faded_g, faded_b)
|
||||
}
|
||||
_ => color,
|
||||
@@ -1176,7 +1169,7 @@ impl RetroTui {
|
||||
}
|
||||
|
||||
// Wave characters for smooth animation
|
||||
let wave_chars = vec!['▁', '▂', '▃', '▄', '▅', '▆', '▇', '█'];
|
||||
let wave_chars = ['▁', '▂', '▃', '▄', '▅', '▆', '▇', '█'];
|
||||
|
||||
// Build the wave line
|
||||
let mut wave_line = String::new();
|
||||
@@ -1190,7 +1183,7 @@ impl RetroTui {
|
||||
let idx = wave_data.len().saturating_sub(display_width) + i;
|
||||
|
||||
if idx < wave_data.len() {
|
||||
let value = wave_data[idx].min(1.0).max(0.0);
|
||||
let value = wave_data[idx].clamp(0.0, 1.0);
|
||||
let char_idx = ((value * 7.0) as usize).min(7);
|
||||
wave_line.push(wave_chars[char_idx]);
|
||||
} else {
|
||||
@@ -1206,8 +1199,6 @@ impl RetroTui {
|
||||
f.render_widget(wave_paragraph, area);
|
||||
}
|
||||
|
||||
/// Draw the status bar
|
||||
|
||||
/// Draw the status bar
|
||||
fn draw_status_bar(
|
||||
f: &mut Frame,
|
||||
|
||||
32
crates/g3-cli/src/simple_output.rs
Normal file
32
crates/g3-cli/src/simple_output.rs
Normal file
@@ -0,0 +1,32 @@
|
||||
/// Simple output helper for printing messages
|
||||
pub struct SimpleOutput {
|
||||
machine_mode: bool,
|
||||
}
|
||||
|
||||
impl SimpleOutput {
|
||||
pub fn new() -> Self {
|
||||
SimpleOutput { machine_mode: false }
|
||||
}
|
||||
|
||||
pub fn new_with_mode(machine_mode: bool) -> Self {
|
||||
SimpleOutput { machine_mode }
|
||||
}
|
||||
|
||||
pub fn print(&self, message: &str) {
|
||||
if !self.machine_mode {
|
||||
println!("{}", message);
|
||||
}
|
||||
}
|
||||
|
||||
pub fn print_smart(&self, message: &str) {
|
||||
if !self.machine_mode {
|
||||
println!("{}", message);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl Default for SimpleOutput {
|
||||
fn default() -> Self {
|
||||
Self::new()
|
||||
}
|
||||
}
|
||||
@@ -1,5 +1,6 @@
|
||||
use crossterm::style::Color;
|
||||
use crossterm::style::{SetForegroundColor, ResetColor};
|
||||
use std::io::{self, Write};
|
||||
use termimad::MadSkin;
|
||||
|
||||
/// Simple output handler with markdown support
|
||||
@@ -40,7 +41,7 @@ impl SimpleOutput {
|
||||
trimmed.starts_with("* ") ||
|
||||
trimmed.starts_with("+ ") ||
|
||||
(trimmed.len() > 2 &&
|
||||
trimmed.chars().next().map_or(false, |c| c.is_ascii_digit()) &&
|
||||
trimmed.chars().next().is_some_and(|c| c.is_ascii_digit()) &&
|
||||
trimmed.chars().nth(1) == Some('.') &&
|
||||
trimmed.chars().nth(2) == Some(' ')) ||
|
||||
(trimmed.contains('[') && trimmed.contains("]("))
|
||||
@@ -93,6 +94,37 @@ impl SimpleOutput {
|
||||
print!("{}", ResetColor);
|
||||
println!(" {:.1}% | {}/{} tokens", percentage, used, total);
|
||||
}
|
||||
|
||||
pub fn print_context_thinning(&self, message: &str) {
|
||||
// Animated highlight for context thinning
|
||||
// Use bright cyan/green with a quick flash animation
|
||||
|
||||
// Flash animation: print with bright background, then normal
|
||||
let frames = vec![
|
||||
"\x1b[1;97;46m", // Frame 1: Bold white on cyan background
|
||||
"\x1b[1;97;42m", // Frame 2: Bold white on green background
|
||||
"\x1b[1;96;40m", // Frame 3: Bold cyan on black background
|
||||
];
|
||||
|
||||
println!();
|
||||
|
||||
// Quick flash animation
|
||||
for frame in &frames {
|
||||
print!("\r{} ✨ {} ✨\x1b[0m", frame, message);
|
||||
let _ = io::stdout().flush();
|
||||
std::thread::sleep(std::time::Duration::from_millis(80));
|
||||
}
|
||||
|
||||
// Final display with bright cyan and sparkle emojis
|
||||
print!("\r\x1b[1;96m✨ {} ✨\x1b[0m", message);
|
||||
println!();
|
||||
|
||||
// Add a subtle "success" indicator line
|
||||
println!("\x1b[2;36m └─ Context optimized successfully\x1b[0m");
|
||||
println!();
|
||||
|
||||
let _ = io::stdout().flush();
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
|
||||
@@ -1,8 +1,6 @@
|
||||
use crate::retro_tui::RetroTui;
|
||||
use g3_core::ui_writer::UiWriter;
|
||||
use std::io::{self, Write};
|
||||
use std::sync::Mutex;
|
||||
use std::time::Instant;
|
||||
|
||||
/// Console implementation of UiWriter that prints to stdout
|
||||
pub struct ConsoleUiWriter {
|
||||
@@ -10,6 +8,7 @@ pub struct ConsoleUiWriter {
|
||||
current_tool_args: Mutex<Vec<(String, String)>>,
|
||||
current_output_line: Mutex<Option<String>>,
|
||||
output_line_printed: Mutex<bool>,
|
||||
in_todo_tool: Mutex<bool>,
|
||||
}
|
||||
|
||||
impl ConsoleUiWriter {
|
||||
@@ -19,6 +18,60 @@ impl ConsoleUiWriter {
|
||||
current_tool_args: Mutex::new(Vec::new()),
|
||||
current_output_line: Mutex::new(None),
|
||||
output_line_printed: Mutex::new(false),
|
||||
in_todo_tool: Mutex::new(false),
|
||||
}
|
||||
}
|
||||
|
||||
fn print_todo_line(&self, line: &str) {
|
||||
// Transform and print todo list lines elegantly
|
||||
let trimmed = line.trim();
|
||||
|
||||
// Skip the "📝 TODO list:" prefix line
|
||||
if trimmed.starts_with("📝 TODO list:") || trimmed == "📝 TODO list is empty" {
|
||||
return;
|
||||
}
|
||||
|
||||
// Handle empty lines
|
||||
if trimmed.is_empty() {
|
||||
println!();
|
||||
return;
|
||||
}
|
||||
|
||||
// Detect indentation level
|
||||
let indent_count = line.chars().take_while(|c| c.is_whitespace()).count();
|
||||
let indent = " ".repeat(indent_count / 2); // Convert spaces to visual indent
|
||||
|
||||
// Format based on line type
|
||||
if trimmed.starts_with("- [ ]") {
|
||||
// Incomplete task
|
||||
let task = trimmed.strip_prefix("- [ ]").unwrap_or(trimmed).trim();
|
||||
println!("{}☐ {}", indent, task);
|
||||
} else if trimmed.starts_with("- [x]") || trimmed.starts_with("- [X]") {
|
||||
// Completed task
|
||||
let task = trimmed.strip_prefix("- [x]")
|
||||
.or_else(|| trimmed.strip_prefix("- [X]"))
|
||||
.unwrap_or(trimmed)
|
||||
.trim();
|
||||
println!("{}\x1b[2m☑ {}\x1b[0m", indent, task);
|
||||
} else if trimmed.starts_with("- ") {
|
||||
// Regular bullet point
|
||||
let item = trimmed.strip_prefix("- ").unwrap_or(trimmed).trim();
|
||||
println!("{}• {}", indent, item);
|
||||
} else if trimmed.starts_with("# ") {
|
||||
// Heading
|
||||
let heading = trimmed.strip_prefix("# ").unwrap_or(trimmed).trim();
|
||||
println!("\n\x1b[1m{}\x1b[0m", heading);
|
||||
} else if trimmed.starts_with("## ") {
|
||||
// Subheading
|
||||
let subheading = trimmed.strip_prefix("## ").unwrap_or(trimmed).trim();
|
||||
println!("\n\x1b[1m{}\x1b[0m", subheading);
|
||||
} else if trimmed.starts_with("**") && trimmed.ends_with("**") {
|
||||
// Bold text (section marker)
|
||||
let text = trimmed.trim_start_matches("**").trim_end_matches("**");
|
||||
println!("{}\x1b[1m{}\x1b[0m", indent, text);
|
||||
} else {
|
||||
// Regular text or note
|
||||
println!("{}{}", indent, trimmed);
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -49,10 +102,49 @@ impl UiWriter for ConsoleUiWriter {
|
||||
println!("{}", message);
|
||||
}
|
||||
|
||||
fn print_context_thinning(&self, message: &str) {
|
||||
// Animated highlight for context thinning
|
||||
// Use bright cyan/green with a quick flash animation
|
||||
|
||||
// Flash animation: print with bright background, then normal
|
||||
let frames = vec![
|
||||
"\x1b[1;97;46m", // Frame 1: Bold white on cyan background
|
||||
"\x1b[1;97;42m", // Frame 2: Bold white on green background
|
||||
"\x1b[1;96;40m", // Frame 3: Bold cyan on black background
|
||||
];
|
||||
|
||||
println!();
|
||||
|
||||
// Quick flash animation
|
||||
for frame in &frames {
|
||||
print!("\r{} ✨ {} ✨\x1b[0m", frame, message);
|
||||
let _ = io::stdout().flush();
|
||||
std::thread::sleep(std::time::Duration::from_millis(80));
|
||||
}
|
||||
|
||||
// Final display with bright cyan and sparkle emojis
|
||||
print!("\r\x1b[1;96m✨ {} ✨\x1b[0m", message);
|
||||
println!();
|
||||
|
||||
// Add a subtle "success" indicator line
|
||||
println!("\x1b[2;36m └─ Context optimized successfully\x1b[0m");
|
||||
println!();
|
||||
|
||||
let _ = io::stdout().flush();
|
||||
}
|
||||
|
||||
fn print_tool_header(&self, tool_name: &str) {
|
||||
// Store the tool name and clear args for collection
|
||||
*self.current_tool_name.lock().unwrap() = Some(tool_name.to_string());
|
||||
self.current_tool_args.lock().unwrap().clear();
|
||||
|
||||
// Check if this is a todo tool call
|
||||
let is_todo = tool_name == "todo_read" || tool_name == "todo_write";
|
||||
*self.in_todo_tool.lock().unwrap() = is_todo;
|
||||
|
||||
// For todo tools, we'll skip the normal header and print a custom one later
|
||||
if is_todo {
|
||||
}
|
||||
}
|
||||
|
||||
fn print_tool_arg(&self, key: &str, value: &str) {
|
||||
@@ -75,6 +167,12 @@ impl UiWriter for ConsoleUiWriter {
|
||||
}
|
||||
|
||||
fn print_tool_output_header(&self) {
|
||||
// Skip normal header for todo tools
|
||||
if *self.in_todo_tool.lock().unwrap() {
|
||||
println!(); // Just add a newline
|
||||
return;
|
||||
}
|
||||
|
||||
println!();
|
||||
// Now print the tool header with the most important arg in bold green
|
||||
if let Some(tool_name) = self.current_tool_name.lock().unwrap().as_ref() {
|
||||
@@ -93,7 +191,12 @@ impl UiWriter for ConsoleUiWriter {
|
||||
|
||||
// Truncate long values for display
|
||||
let display_value = if first_line.len() > 80 {
|
||||
format!("{}...", &first_line[..77])
|
||||
// Use char_indices to safely truncate at character boundary
|
||||
let truncate_at = first_line.char_indices()
|
||||
.nth(77)
|
||||
.map(|(i, _)| i)
|
||||
.unwrap_or(first_line.len());
|
||||
format!("{}...", &first_line[..truncate_at])
|
||||
} else {
|
||||
first_line.to_string()
|
||||
};
|
||||
@@ -115,8 +218,8 @@ impl UiWriter for ConsoleUiWriter {
|
||||
String::new()
|
||||
};
|
||||
|
||||
// Print with bold green formatting using ANSI escape codes
|
||||
println!("┌─\x1b[1;32m {} | {}{}\x1b[0m", tool_name, display_value, header_suffix);
|
||||
// Print with bold green tool name, purple (non-bold) for pipe and args
|
||||
println!("┌─\x1b[1;32m {}\x1b[0m\x1b[35m | {}{}\x1b[0m", tool_name, display_value, header_suffix);
|
||||
} else {
|
||||
// Print with bold green formatting using ANSI escape codes
|
||||
println!("┌─\x1b[1;32m {}\x1b[0m", tool_name);
|
||||
@@ -144,10 +247,21 @@ impl UiWriter for ConsoleUiWriter {
|
||||
}
|
||||
|
||||
fn print_tool_output_line(&self, line: &str) {
|
||||
// Special handling for todo tools
|
||||
if *self.in_todo_tool.lock().unwrap() {
|
||||
self.print_todo_line(line);
|
||||
return;
|
||||
}
|
||||
|
||||
println!("│ \x1b[2m{}\x1b[0m", line);
|
||||
}
|
||||
|
||||
fn print_tool_output_summary(&self, count: usize) {
|
||||
// Skip for todo tools
|
||||
if *self.in_todo_tool.lock().unwrap() {
|
||||
return;
|
||||
}
|
||||
|
||||
println!(
|
||||
"│ \x1b[2m({} line{})\x1b[0m",
|
||||
count,
|
||||
@@ -156,7 +270,55 @@ impl UiWriter for ConsoleUiWriter {
|
||||
}
|
||||
|
||||
fn print_tool_timing(&self, duration_str: &str) {
|
||||
println!("└─ ⚡️ {}", duration_str);
|
||||
// For todo tools, just print a simple completion message
|
||||
if *self.in_todo_tool.lock().unwrap() {
|
||||
println!();
|
||||
*self.in_todo_tool.lock().unwrap() = false;
|
||||
return;
|
||||
}
|
||||
|
||||
// Parse the duration string to determine color
|
||||
// Format is like "1.5s", "500ms", "2m 30.0s"
|
||||
let color_code = if duration_str.ends_with("ms") {
|
||||
// Milliseconds - use default color (< 1s)
|
||||
""
|
||||
} else if duration_str.contains('m') {
|
||||
// Contains minutes
|
||||
// Extract minutes value
|
||||
if let Some(m_pos) = duration_str.find('m') {
|
||||
if let Ok(minutes) = duration_str[..m_pos].trim().parse::<u32>() {
|
||||
if minutes >= 5 {
|
||||
"\x1b[31m" // Red for >= 5 minutes
|
||||
} else {
|
||||
"\x1b[38;5;208m" // Orange for >= 1 minute but < 5 minutes
|
||||
}
|
||||
} else {
|
||||
"" // Default color if parsing fails
|
||||
}
|
||||
} else {
|
||||
"" // Default color if 'm' not found (shouldn't happen)
|
||||
}
|
||||
} else if duration_str.ends_with('s') {
|
||||
// Seconds only
|
||||
if let Some(s_value) = duration_str.strip_suffix('s') {
|
||||
if let Ok(seconds) = s_value.trim().parse::<f64>() {
|
||||
if seconds >= 1.0 {
|
||||
"\x1b[33m" // Yellow for >= 1 second
|
||||
} else {
|
||||
"" // Default color for < 1 second
|
||||
}
|
||||
} else {
|
||||
"" // Default color if parsing fails
|
||||
}
|
||||
} else {
|
||||
"" // Default color
|
||||
}
|
||||
} else {
|
||||
// Milliseconds or other format - use default color
|
||||
""
|
||||
};
|
||||
|
||||
println!("└─ ⚡️ {}{}\x1b[0m", color_code, duration_str);
|
||||
println!();
|
||||
// Clear the stored tool info
|
||||
*self.current_tool_name.lock().unwrap() = None;
|
||||
@@ -183,223 +345,3 @@ impl UiWriter for ConsoleUiWriter {
|
||||
}
|
||||
}
|
||||
|
||||
/// RetroTui implementation of UiWriter that sends output to the TUI
|
||||
pub struct RetroTuiWriter {
|
||||
tui: RetroTui,
|
||||
current_tool_name: Mutex<Option<String>>,
|
||||
current_tool_output: Mutex<Vec<String>>,
|
||||
current_tool_start: Mutex<Option<Instant>>,
|
||||
current_tool_caption: Mutex<String>,
|
||||
}
|
||||
|
||||
impl RetroTuiWriter {
|
||||
pub fn new(tui: RetroTui) -> Self {
|
||||
Self {
|
||||
tui,
|
||||
current_tool_name: Mutex::new(None),
|
||||
current_tool_output: Mutex::new(Vec::new()),
|
||||
current_tool_start: Mutex::new(None),
|
||||
current_tool_caption: Mutex::new(String::new()),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl UiWriter for RetroTuiWriter {
|
||||
fn print(&self, message: &str) {
|
||||
self.tui.output(message);
|
||||
}
|
||||
|
||||
fn println(&self, message: &str) {
|
||||
self.tui.output(message);
|
||||
}
|
||||
|
||||
fn print_inline(&self, message: &str) {
|
||||
// For inline printing, we'll just append to the output
|
||||
self.tui.output(message);
|
||||
}
|
||||
|
||||
fn print_system_prompt(&self, prompt: &str) {
|
||||
self.tui.output("🔍 System Prompt:");
|
||||
self.tui.output("================");
|
||||
for line in prompt.lines() {
|
||||
self.tui.output(line);
|
||||
}
|
||||
self.tui.output("================");
|
||||
self.tui.output("");
|
||||
}
|
||||
|
||||
fn print_context_status(&self, message: &str) {
|
||||
self.tui.output(message);
|
||||
}
|
||||
|
||||
fn print_tool_header(&self, tool_name: &str) {
|
||||
// Start collecting tool output
|
||||
*self.current_tool_start.lock().unwrap() = Some(Instant::now());
|
||||
*self.current_tool_name.lock().unwrap() = Some(tool_name.to_string());
|
||||
self.current_tool_output.lock().unwrap().clear();
|
||||
self.current_tool_output
|
||||
.lock()
|
||||
.unwrap()
|
||||
.push(format!("Tool: {}", tool_name));
|
||||
|
||||
// Initialize caption
|
||||
*self.current_tool_caption.lock().unwrap() = String::new();
|
||||
}
|
||||
|
||||
fn print_tool_arg(&self, key: &str, value: &str) {
|
||||
// Filter out any keys that look like they might be agent message content
|
||||
// (e.g., keys that are suspiciously long or contain message-like content)
|
||||
let is_valid_arg_key = key.len() < 50
|
||||
&& !key.contains('\n')
|
||||
&& !key.contains("I'll")
|
||||
&& !key.contains("Let me")
|
||||
&& !key.contains("Here's")
|
||||
&& !key.contains("I can");
|
||||
|
||||
if is_valid_arg_key {
|
||||
self.current_tool_output
|
||||
.lock()
|
||||
.unwrap()
|
||||
.push(format!("{}: {}", key, value));
|
||||
}
|
||||
|
||||
// Build caption from first argument (usually the most important one)
|
||||
let mut caption = self.current_tool_caption.lock().unwrap();
|
||||
if caption.is_empty() && (key == "file_path" || key == "command" || key == "path") {
|
||||
// Truncate long values for the caption
|
||||
let truncated = if value.len() > 50 {
|
||||
format!("{}...", &value[..47])
|
||||
} else {
|
||||
value.to_string()
|
||||
};
|
||||
|
||||
// Add range information for read_file tool calls
|
||||
let tool_name = self.current_tool_name.lock().unwrap();
|
||||
let range_suffix = if tool_name.as_ref().map_or(false, |name| name == "read_file") {
|
||||
// We need to check if start/end args will be provided - for now just check if this is a partial read
|
||||
// This is a simplified approach since we're building the caption incrementally
|
||||
String::new() // We'll handle this in print_tool_output_header instead
|
||||
} else {
|
||||
String::new()
|
||||
};
|
||||
|
||||
*caption = format!("{}{}", truncated, range_suffix);
|
||||
}
|
||||
}
|
||||
|
||||
fn print_tool_output_header(&self) {
|
||||
// This is called right before tool execution starts
|
||||
// Send the initial tool header to the TUI now
|
||||
if let Some(tool_name) = self.current_tool_name.lock().unwrap().as_ref() {
|
||||
let mut caption = self.current_tool_caption.lock().unwrap().clone();
|
||||
|
||||
// Add range information for read_file tool calls
|
||||
if tool_name == "read_file" {
|
||||
// Check the tool output for start/end parameters
|
||||
let output = self.current_tool_output.lock().unwrap();
|
||||
let has_start = output.iter().any(|line| line.starts_with("start:"));
|
||||
let has_end = output.iter().any(|line| line.starts_with("end:"));
|
||||
|
||||
if has_start || has_end {
|
||||
let start_val = output.iter().find(|line| line.starts_with("start:")).map(|line| line.split(':').nth(1).unwrap_or("0").trim()).unwrap_or("0");
|
||||
let end_val = output.iter().find(|line| line.starts_with("end:")).map(|line| line.split(':').nth(1).unwrap_or("end").trim()).unwrap_or("end");
|
||||
caption = format!("{} [{}..{}]", caption, start_val, end_val);
|
||||
}
|
||||
}
|
||||
|
||||
// Send the tool output with initial header
|
||||
self.tui.tool_output(tool_name, &caption, "");
|
||||
}
|
||||
|
||||
self.current_tool_output.lock().unwrap().push(String::new());
|
||||
self.current_tool_output
|
||||
.lock()
|
||||
.unwrap()
|
||||
.push("Output:".to_string());
|
||||
}
|
||||
|
||||
fn update_tool_output_line(&self, line: &str) {
|
||||
// For retro mode, we'll just add to the output buffer
|
||||
self.current_tool_output
|
||||
.lock()
|
||||
.unwrap()
|
||||
.push(line.to_string());
|
||||
}
|
||||
|
||||
fn print_tool_output_line(&self, line: &str) {
|
||||
self.current_tool_output
|
||||
.lock()
|
||||
.unwrap()
|
||||
.push(line.to_string());
|
||||
}
|
||||
|
||||
fn print_tool_output_summary(&self, hidden_count: usize) {
|
||||
self.current_tool_output.lock().unwrap().push(format!(
|
||||
"... ({} more line{})",
|
||||
hidden_count,
|
||||
if hidden_count == 1 { "" } else { "s" }
|
||||
));
|
||||
}
|
||||
|
||||
fn print_tool_timing(&self, duration_str: &str) {
|
||||
self.current_tool_output
|
||||
.lock()
|
||||
.unwrap()
|
||||
.push(format!("⚡️ {}", duration_str));
|
||||
|
||||
// Calculate the actual duration
|
||||
let duration_ms = if let Some(start) = *self.current_tool_start.lock().unwrap() {
|
||||
start.elapsed().as_millis()
|
||||
} else {
|
||||
0
|
||||
};
|
||||
|
||||
// Get the tool name and caption
|
||||
if let Some(tool_name) = self.current_tool_name.lock().unwrap().as_ref() {
|
||||
let content = self.current_tool_output.lock().unwrap().join("\n");
|
||||
let caption = self.current_tool_caption.lock().unwrap().clone();
|
||||
let caption = if caption.is_empty() {
|
||||
"Completed".to_string()
|
||||
} else {
|
||||
caption
|
||||
};
|
||||
|
||||
// Update the tool detail panel with the complete output without adding a new header
|
||||
// This keeps the original header in place to be updated by tool_complete
|
||||
self.tui.update_tool_detail(tool_name, &content);
|
||||
|
||||
// Determine success based on whether there's an error in the output
|
||||
// This is a simple heuristic - you might want to make this more sophisticated
|
||||
let success = !content.contains("error")
|
||||
&& !content.contains("Error")
|
||||
&& !content.contains("ERROR");
|
||||
|
||||
// Send the completion status to update the header
|
||||
self.tui
|
||||
.tool_complete(tool_name, success, duration_ms, &caption);
|
||||
}
|
||||
|
||||
// Clear the buffers
|
||||
*self.current_tool_name.lock().unwrap() = None;
|
||||
self.current_tool_output.lock().unwrap().clear();
|
||||
*self.current_tool_start.lock().unwrap() = None;
|
||||
*self.current_tool_caption.lock().unwrap() = String::new();
|
||||
}
|
||||
|
||||
fn print_agent_prompt(&self) {
|
||||
self.tui.output("\n💬 ");
|
||||
}
|
||||
|
||||
fn print_agent_response(&self, content: &str) {
|
||||
self.tui.output(content);
|
||||
}
|
||||
|
||||
fn notify_sse_received(&self) {
|
||||
// Notify the TUI that an SSE was received
|
||||
self.tui.sse_received();
|
||||
}
|
||||
|
||||
fn flush(&self) {
|
||||
// No-op for TUI since it handles its own rendering
|
||||
}
|
||||
}
|
||||
|
||||
47
crates/g3-computer-control/Cargo.toml
Normal file
47
crates/g3-computer-control/Cargo.toml
Normal file
@@ -0,0 +1,47 @@
|
||||
[package]
|
||||
name = "g3-computer-control"
|
||||
version = "0.1.0"
|
||||
edition = "2021"
|
||||
|
||||
[build-dependencies]
|
||||
# Only needed for building Swift bridge on macOS
|
||||
|
||||
[dependencies]
|
||||
# Workspace dependencies
|
||||
tokio = { workspace = true }
|
||||
anyhow = { workspace = true }
|
||||
thiserror = { workspace = true }
|
||||
serde = { workspace = true }
|
||||
serde_json = { workspace = true }
|
||||
tracing = { workspace = true }
|
||||
uuid = { workspace = true }
|
||||
|
||||
shellexpand = "3.1"
|
||||
# Async trait support
|
||||
async-trait = "0.1"
|
||||
|
||||
# WebDriver support
|
||||
fantoccini = "0.21"
|
||||
|
||||
# macOS dependencies
|
||||
[target.'cfg(target_os = "macos")'.dependencies]
|
||||
core-graphics = "0.23"
|
||||
core-foundation = "0.10"
|
||||
cocoa = "0.25"
|
||||
objc = "0.2"
|
||||
accessibility = "0.2"
|
||||
image = "0.24"
|
||||
|
||||
# Linux dependencies
|
||||
[target.'cfg(target_os = "linux")'.dependencies]
|
||||
x11 = { version = "2.21", features = ["xlib", "xtest"] }
|
||||
image = "0.24"
|
||||
|
||||
# Windows dependencies
|
||||
[target.'cfg(target_os = "windows")'.dependencies]
|
||||
windows = { version = "0.52", features = [
|
||||
"Win32_Foundation",
|
||||
"Win32_UI_WindowsAndMessaging",
|
||||
"Win32_UI_Input_KeyboardAndMouse",
|
||||
"Win32_Graphics_Gdi",
|
||||
] }
|
||||
63
crates/g3-computer-control/build.rs
Normal file
63
crates/g3-computer-control/build.rs
Normal file
@@ -0,0 +1,63 @@
|
||||
use std::env;
|
||||
use std::path::PathBuf;
|
||||
use std::process::Command;
|
||||
|
||||
fn main() {
|
||||
// Only build Vision bridge on macOS
|
||||
if env::var("CARGO_CFG_TARGET_OS").unwrap() != "macos" {
|
||||
return;
|
||||
}
|
||||
|
||||
println!("cargo:rerun-if-changed=vision-bridge/Sources/VisionBridge/VisionOCR.swift");
|
||||
println!("cargo:rerun-if-changed=vision-bridge/Sources/VisionBridge/VisionBridge.h");
|
||||
println!("cargo:rerun-if-changed=vision-bridge/Package.swift");
|
||||
|
||||
let manifest_dir = PathBuf::from(env::var("CARGO_MANIFEST_DIR").unwrap());
|
||||
let vision_bridge_dir = manifest_dir.join("vision-bridge");
|
||||
|
||||
// Build Swift package
|
||||
println!("cargo:warning=Building VisionBridge Swift package...");
|
||||
let build_status = Command::new("swift")
|
||||
.args(&["build", "-c", "release"])
|
||||
.current_dir(&vision_bridge_dir)
|
||||
.status()
|
||||
.expect("Failed to build Swift package");
|
||||
|
||||
if !build_status.success() {
|
||||
panic!("Swift build failed");
|
||||
}
|
||||
|
||||
// Find the built library
|
||||
let lib_path = vision_bridge_dir
|
||||
.join(".build/release")
|
||||
.canonicalize()
|
||||
.expect("Failed to find .build/release directory");
|
||||
|
||||
// Copy the dylib to the output directory so it can be found at runtime
|
||||
let target_dir = manifest_dir.parent().unwrap().parent().unwrap().join("target");
|
||||
let profile = env::var("PROFILE").unwrap_or_else(|_| "debug".to_string());
|
||||
let output_dir = target_dir.join(&profile);
|
||||
|
||||
let dylib_src = lib_path.join("libVisionBridge.dylib");
|
||||
let dylib_dst = output_dir.join("libVisionBridge.dylib");
|
||||
|
||||
std::fs::copy(&dylib_src, &dylib_dst)
|
||||
.expect(&format!("Failed to copy dylib from {} to {}", dylib_src.display(), dylib_dst.display()));
|
||||
|
||||
println!("cargo:warning=Copied libVisionBridge.dylib to {}", dylib_dst.display());
|
||||
|
||||
// Add rpath so the dylib can be found at runtime
|
||||
println!("cargo:rustc-link-arg=-Wl,-rpath,@executable_path");
|
||||
println!("cargo:rustc-link-arg=-Wl,-rpath,@loader_path");
|
||||
println!("cargo:rustc-link-search=native={}", lib_path.display());
|
||||
println!("cargo:rustc-link-lib=dylib=VisionBridge");
|
||||
|
||||
// Link required frameworks
|
||||
println!("cargo:rustc-link-lib=framework=Vision");
|
||||
println!("cargo:rustc-link-lib=framework=AppKit");
|
||||
println!("cargo:rustc-link-lib=framework=Foundation");
|
||||
println!("cargo:rustc-link-lib=framework=CoreGraphics");
|
||||
println!("cargo:rustc-link-lib=framework=CoreImage");
|
||||
|
||||
println!("cargo:warning=VisionBridge built successfully at {}", lib_path.display());
|
||||
}
|
||||
46
crates/g3-computer-control/examples/debug_screenshot.rs
Normal file
46
crates/g3-computer-control/examples/debug_screenshot.rs
Normal file
@@ -0,0 +1,46 @@
|
||||
use core_graphics::display::CGDisplay;
|
||||
|
||||
fn main() {
|
||||
let display = CGDisplay::main();
|
||||
let image = display.image().expect("Failed to capture screen");
|
||||
|
||||
println!("CGImage properties:");
|
||||
println!(" Width: {}", image.width());
|
||||
println!(" Height: {}", image.height());
|
||||
println!(" Bits per component: {}", image.bits_per_component());
|
||||
println!(" Bits per pixel: {}", image.bits_per_pixel());
|
||||
println!(" Bytes per row: {}", image.bytes_per_row());
|
||||
|
||||
let data = image.data();
|
||||
let expected_size = image.width() * image.height() * 4;
|
||||
println!(" Data length: {}", data.len());
|
||||
println!(" Expected (w*h*4): {}", expected_size);
|
||||
|
||||
// Check if there's padding in rows
|
||||
let bytes_per_row = image.bytes_per_row();
|
||||
let width = image.width();
|
||||
let expected_bytes_per_row = width * 4;
|
||||
println!("\nRow alignment:");
|
||||
println!(" Actual bytes per row: {}", bytes_per_row);
|
||||
println!(" Expected (width * 4): {}", expected_bytes_per_row);
|
||||
println!(" Padding per row: {}", bytes_per_row - expected_bytes_per_row);
|
||||
|
||||
// Sample some pixels from different locations
|
||||
println!("\nFirst 3 pixels (raw bytes):");
|
||||
for i in 0..3 {
|
||||
let offset = i * 4;
|
||||
println!(" Pixel {}: [{:3}, {:3}, {:3}, {:3}]",
|
||||
i, data[offset], data[offset+1], data[offset+2], data[offset+3]);
|
||||
}
|
||||
|
||||
// Check a pixel from the middle
|
||||
let mid_row = image.height() / 2;
|
||||
let mid_col = image.width() / 2;
|
||||
let mid_offset = (mid_row * bytes_per_row + mid_col * 4) as usize;
|
||||
println!("\nMiddle pixel (row {}, col {}):", mid_row, mid_col);
|
||||
println!(" Offset: {}", mid_offset);
|
||||
if mid_offset + 3 < data.len() as usize {
|
||||
println!(" Bytes: [{:3}, {:3}, {:3}, {:3}]",
|
||||
data[mid_offset], data[mid_offset+1], data[mid_offset+2], data[mid_offset+3]);
|
||||
}
|
||||
}
|
||||
56
crates/g3-computer-control/examples/list_windows.rs
Normal file
56
crates/g3-computer-control/examples/list_windows.rs
Normal file
@@ -0,0 +1,56 @@
|
||||
use core_graphics::window::{kCGWindowListOptionOnScreenOnly, kCGNullWindowID, CGWindowListCopyWindowInfo};
|
||||
use core_foundation::dictionary::CFDictionary;
|
||||
use core_foundation::string::CFString;
|
||||
use core_foundation::base::{TCFType, ToVoid};
|
||||
|
||||
fn main() {
|
||||
println!("Listing all on-screen windows...");
|
||||
println!("{:<10} {:<25} {}", "Window ID", "Owner", "Title");
|
||||
println!("{}", "-".repeat(80));
|
||||
|
||||
unsafe {
|
||||
let window_list = CGWindowListCopyWindowInfo(
|
||||
kCGWindowListOptionOnScreenOnly,
|
||||
kCGNullWindowID
|
||||
);
|
||||
|
||||
let count = core_foundation::array::CFArray::<CFDictionary>::wrap_under_create_rule(window_list).len();
|
||||
let array = core_foundation::array::CFArray::<CFDictionary>::wrap_under_create_rule(window_list);
|
||||
|
||||
for i in 0..count {
|
||||
let dict = array.get(i).unwrap();
|
||||
|
||||
// Get window ID
|
||||
let window_id_key = CFString::from_static_string("kCGWindowNumber");
|
||||
let window_id: i64 = if let Some(value) = dict.find(window_id_key.to_void()) {
|
||||
let num: core_foundation::number::CFNumber = TCFType::wrap_under_get_rule(*value as *const _);
|
||||
num.to_i64().unwrap_or(0)
|
||||
} else {
|
||||
0
|
||||
};
|
||||
|
||||
// Get owner name
|
||||
let owner_key = CFString::from_static_string("kCGWindowOwnerName");
|
||||
let owner: String = if let Some(value) = dict.find(owner_key.to_void()) {
|
||||
let s: CFString = TCFType::wrap_under_get_rule(*value as *const _);
|
||||
s.to_string()
|
||||
} else {
|
||||
"Unknown".to_string()
|
||||
};
|
||||
|
||||
// Get window name/title
|
||||
let name_key = CFString::from_static_string("kCGWindowName");
|
||||
let title: String = if let Some(value) = dict.find(name_key.to_void()) {
|
||||
let s: CFString = TCFType::wrap_under_get_rule(*value as *const _);
|
||||
s.to_string()
|
||||
} else {
|
||||
"".to_string()
|
||||
};
|
||||
|
||||
// Show all windows
|
||||
if !owner.is_empty() {
|
||||
println!("{:<10} {:<25} {}", window_id, owner, title);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
74
crates/g3-computer-control/examples/macax_demo.rs
Normal file
74
crates/g3-computer-control/examples/macax_demo.rs
Normal file
@@ -0,0 +1,74 @@
|
||||
//! Example demonstrating macOS Accessibility API tools
|
||||
//!
|
||||
//! This example shows how to use the macax tools to control macOS applications.
|
||||
//!
|
||||
//! Run with: cargo run --example macax_demo
|
||||
|
||||
use anyhow::Result;
|
||||
use g3_computer_control::MacAxController;
|
||||
|
||||
#[tokio::main]
|
||||
async fn main() -> Result<()> {
|
||||
println!("🍎 macOS Accessibility API Demo\n");
|
||||
println!("This demo shows how to control macOS applications using the Accessibility API.\n");
|
||||
|
||||
// Create controller
|
||||
let controller = MacAxController::new()?;
|
||||
println!("✅ MacAxController initialized\n");
|
||||
|
||||
// List running applications
|
||||
println!("📱 Listing running applications:");
|
||||
match controller.list_applications() {
|
||||
Ok(apps) => {
|
||||
for app in apps.iter().take(10) {
|
||||
println!(" - {}", app.name);
|
||||
}
|
||||
if apps.len() > 10 {
|
||||
println!(" ... and {} more", apps.len() - 10);
|
||||
}
|
||||
}
|
||||
Err(e) => println!(" ❌ Error: {}", e),
|
||||
}
|
||||
println!();
|
||||
|
||||
// Get frontmost app
|
||||
println!("🎯 Getting frontmost application:");
|
||||
match controller.get_frontmost_app() {
|
||||
Ok(app) => println!(" Current: {}", app.name),
|
||||
Err(e) => println!(" ❌ Error: {}", e),
|
||||
}
|
||||
println!();
|
||||
|
||||
// Example: Activate Finder and get its UI tree
|
||||
println!("📂 Activating Finder and inspecting UI:");
|
||||
match controller.activate_app("Finder") {
|
||||
Ok(_) => {
|
||||
println!(" ✅ Finder activated");
|
||||
|
||||
// Wait a moment for activation
|
||||
tokio::time::sleep(tokio::time::Duration::from_millis(500)).await;
|
||||
|
||||
// Get UI tree
|
||||
match controller.get_ui_tree("Finder", 2) {
|
||||
Ok(tree) => {
|
||||
println!("\n UI Tree:");
|
||||
for line in tree.lines().take(10) {
|
||||
println!(" {}", line);
|
||||
}
|
||||
}
|
||||
Err(e) => println!(" ❌ Error getting UI tree: {}", e),
|
||||
}
|
||||
}
|
||||
Err(e) => println!(" ❌ Error: {}", e),
|
||||
}
|
||||
println!();
|
||||
|
||||
println!("✨ Demo complete!\n");
|
||||
println!("💡 Tips:");
|
||||
println!(" - Use --macax flag with g3 to enable these tools");
|
||||
println!(" - Grant accessibility permissions in System Preferences");
|
||||
println!(" - Add accessibility identifiers to your apps for easier automation");
|
||||
println!(" - See docs/macax-tools.md for full documentation\n");
|
||||
|
||||
Ok(())
|
||||
}
|
||||
64
crates/g3-computer-control/examples/safari_demo.rs
Normal file
64
crates/g3-computer-control/examples/safari_demo.rs
Normal file
@@ -0,0 +1,64 @@
|
||||
use g3_computer_control::SafariDriver;
|
||||
use g3_computer_control::webdriver::WebDriverController;
|
||||
use anyhow::Result;
|
||||
|
||||
#[tokio::main]
|
||||
async fn main() -> Result<()> {
|
||||
println!("Safari WebDriver Demo");
|
||||
println!("=====================\n");
|
||||
|
||||
println!("Make sure to:");
|
||||
println!("1. Enable 'Allow Remote Automation' in Safari's Develop menu");
|
||||
println!("2. Run: /usr/bin/safaridriver --enable");
|
||||
println!("3. Start safaridriver in another terminal: safaridriver --port 4444\n");
|
||||
|
||||
println!("Connecting to SafariDriver...");
|
||||
let mut driver = SafariDriver::new().await?;
|
||||
println!("✅ Connected!\n");
|
||||
|
||||
// Navigate to a website
|
||||
println!("Navigating to example.com...");
|
||||
driver.navigate("https://example.com").await?;
|
||||
println!("✅ Navigated\n");
|
||||
|
||||
// Get page title
|
||||
let title = driver.title().await?;
|
||||
println!("Page title: {}\n", title);
|
||||
|
||||
// Get current URL
|
||||
let url = driver.current_url().await?;
|
||||
println!("Current URL: {}\n", url);
|
||||
|
||||
// Find an element
|
||||
println!("Finding h1 element...");
|
||||
let h1 = driver.find_element("h1").await?;
|
||||
let h1_text = h1.text().await?;
|
||||
println!("H1 text: {}\n", h1_text);
|
||||
|
||||
// Find all paragraphs
|
||||
println!("Finding all paragraphs...");
|
||||
let paragraphs = driver.find_elements("p").await?;
|
||||
println!("Found {} paragraphs\n", paragraphs.len());
|
||||
|
||||
// Get page source
|
||||
println!("Getting page source...");
|
||||
let source = driver.page_source().await?;
|
||||
println!("Page source length: {} bytes\n", source.len());
|
||||
|
||||
// Execute JavaScript
|
||||
println!("Executing JavaScript...");
|
||||
let result = driver.execute_script("return document.title", vec![]).await?;
|
||||
println!("JS result: {:?}\n", result);
|
||||
|
||||
// Take a screenshot
|
||||
println!("Taking screenshot...");
|
||||
driver.screenshot("/tmp/safari_demo.png").await?;
|
||||
println!("✅ Screenshot saved to /tmp/safari_demo.png\n");
|
||||
|
||||
// Close the browser
|
||||
println!("Closing browser...");
|
||||
driver.quit().await?;
|
||||
println!("✅ Done!");
|
||||
|
||||
Ok(())
|
||||
}
|
||||
@@ -0,0 +1,21 @@
|
||||
use g3_computer_control::create_controller;
|
||||
|
||||
#[tokio::main]
|
||||
async fn main() {
|
||||
println!("Testing screenshot with permission prompt...");
|
||||
|
||||
let controller = create_controller().expect("Failed to create controller");
|
||||
|
||||
match controller.take_screenshot("/tmp/test_with_prompt.png", None, None).await {
|
||||
Ok(_) => {
|
||||
println!("\n✅ Screenshot saved to /tmp/test_with_prompt.png");
|
||||
println!("Opening screenshot...");
|
||||
let _ = std::process::Command::new("open")
|
||||
.arg("/tmp/test_with_prompt.png")
|
||||
.spawn();
|
||||
}
|
||||
Err(e) => {
|
||||
println!("❌ Screenshot failed: {}", e);
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,39 @@
|
||||
use std::process::Command;
|
||||
|
||||
fn main() {
|
||||
let path = "/tmp/rust_screencapture_test.png";
|
||||
|
||||
println!("Testing screencapture command from Rust...");
|
||||
|
||||
let mut cmd = Command::new("screencapture");
|
||||
cmd.arg("-x"); // No sound
|
||||
cmd.arg(path);
|
||||
|
||||
println!("Command: {:?}", cmd);
|
||||
|
||||
match cmd.output() {
|
||||
Ok(output) => {
|
||||
println!("Exit status: {}", output.status);
|
||||
println!("Stdout: {}", String::from_utf8_lossy(&output.stdout));
|
||||
println!("Stderr: {}", String::from_utf8_lossy(&output.stderr));
|
||||
|
||||
if output.status.success() {
|
||||
println!("\n✅ Screenshot saved to: {}", path);
|
||||
|
||||
// Check file exists and size
|
||||
if let Ok(metadata) = std::fs::metadata(path) {
|
||||
println!("File size: {} bytes ({:.1} MB)", metadata.len(), metadata.len() as f64 / 1_000_000.0);
|
||||
}
|
||||
|
||||
// Open it
|
||||
let _ = Command::new("open").arg(path).spawn();
|
||||
println!("\nOpened screenshot - please verify it looks correct!");
|
||||
} else {
|
||||
println!("\n❌ Screenshot failed!");
|
||||
}
|
||||
}
|
||||
Err(e) => {
|
||||
println!("❌ Failed to execute screencapture: {}", e);
|
||||
}
|
||||
}
|
||||
}
|
||||
68
crates/g3-computer-control/examples/test_screenshot_fix.rs
Normal file
68
crates/g3-computer-control/examples/test_screenshot_fix.rs
Normal file
@@ -0,0 +1,68 @@
|
||||
use core_graphics::display::CGDisplay;
|
||||
use image::{ImageBuffer, RgbaImage};
|
||||
|
||||
fn main() {
|
||||
let display = CGDisplay::main();
|
||||
let image = display.image().expect("Failed to capture screen");
|
||||
|
||||
let width = image.width() as u32;
|
||||
let height = image.height() as u32;
|
||||
let bytes_per_row = image.bytes_per_row() as usize;
|
||||
let data = image.data();
|
||||
|
||||
println!("Testing screenshot fix...");
|
||||
println!("Image: {}x{}, bytes_per_row: {}", width, height, bytes_per_row);
|
||||
println!("Expected bytes per row: {}", width * 4);
|
||||
println!("Padding per row: {} bytes", bytes_per_row - (width as usize * 4));
|
||||
|
||||
// OLD METHOD (broken) - treating data as continuous
|
||||
println!("\n=== OLD METHOD (BROKEN) ===");
|
||||
let mut old_rgba = Vec::with_capacity(data.len() as usize);
|
||||
for chunk in data.chunks_exact(4) {
|
||||
old_rgba.push(chunk[2]); // R
|
||||
old_rgba.push(chunk[1]); // G
|
||||
old_rgba.push(chunk[0]); // B
|
||||
old_rgba.push(chunk[3]); // A
|
||||
}
|
||||
println!("Converted {} pixels", old_rgba.len() / 4);
|
||||
println!("Expected {} pixels", width * height);
|
||||
|
||||
// NEW METHOD (fixed) - handling row padding
|
||||
println!("\n=== NEW METHOD (FIXED) ===");
|
||||
let mut new_rgba = Vec::with_capacity((width * height * 4) as usize);
|
||||
for row in 0..height as usize {
|
||||
let row_start = row * bytes_per_row;
|
||||
let row_end = row_start + (width as usize * 4);
|
||||
|
||||
for chunk in data[row_start..row_end].chunks_exact(4) {
|
||||
new_rgba.push(chunk[2]); // R
|
||||
new_rgba.push(chunk[1]); // G
|
||||
new_rgba.push(chunk[0]); // B
|
||||
new_rgba.push(chunk[3]); // A
|
||||
}
|
||||
}
|
||||
println!("Converted {} pixels", new_rgba.len() / 4);
|
||||
println!("Expected {} pixels", width * height);
|
||||
|
||||
// Save a small crop from both methods
|
||||
let crop_size = 200;
|
||||
|
||||
// Old method crop
|
||||
let old_crop: Vec<u8> = old_rgba.iter().take((crop_size * crop_size * 4) as usize).copied().collect();
|
||||
if let Some(old_img) = ImageBuffer::from_raw(crop_size, crop_size, old_crop) {
|
||||
let old_img: RgbaImage = old_img;
|
||||
old_img.save("/tmp/screenshot_old_method.png").unwrap();
|
||||
println!("\nSaved OLD method crop to: /tmp/screenshot_old_method.png");
|
||||
}
|
||||
|
||||
// New method crop
|
||||
let new_crop: Vec<u8> = new_rgba.iter().take((crop_size * crop_size * 4) as usize).copied().collect();
|
||||
if let Some(new_img) = ImageBuffer::from_raw(crop_size, crop_size, new_crop) {
|
||||
let new_img: RgbaImage = new_img;
|
||||
new_img.save("/tmp/screenshot_new_method.png").unwrap();
|
||||
println!("Saved NEW method crop to: /tmp/screenshot_new_method.png");
|
||||
}
|
||||
|
||||
println!("\nOpen both images to compare:");
|
||||
println!(" open /tmp/screenshot_old_method.png /tmp/screenshot_new_method.png");
|
||||
}
|
||||
48
crates/g3-computer-control/examples/test_type_text.rs
Normal file
48
crates/g3-computer-control/examples/test_type_text.rs
Normal file
@@ -0,0 +1,48 @@
|
||||
//! Test the new type_text functionality
|
||||
|
||||
use anyhow::Result;
|
||||
use g3_computer_control::MacAxController;
|
||||
|
||||
#[tokio::main]
|
||||
async fn main() -> Result<()> {
|
||||
println!("🧪 Testing macax type_text functionality\n");
|
||||
|
||||
let controller = MacAxController::new()?;
|
||||
println!("✅ Controller initialized\n");
|
||||
|
||||
// Test 1: Type simple text
|
||||
println!("Test 1: Typing simple text into TextEdit");
|
||||
println!(" Please open TextEdit and create a new document...");
|
||||
std::thread::sleep(std::time::Duration::from_secs(3));
|
||||
|
||||
match controller.type_text("TextEdit", "Hello, World!") {
|
||||
Ok(_) => println!(" ✅ Successfully typed simple text\n"),
|
||||
Err(e) => println!(" ❌ Failed: {}\n", e),
|
||||
}
|
||||
|
||||
std::thread::sleep(std::time::Duration::from_secs(1));
|
||||
|
||||
// Test 2: Type unicode and emojis
|
||||
println!("Test 2: Typing unicode and emojis");
|
||||
match controller.type_text("TextEdit", "\n🌟 Unicode test: café, naïve, 日本語 🎉") {
|
||||
Ok(_) => println!(" ✅ Successfully typed unicode text\n"),
|
||||
Err(e) => println!(" ❌ Failed: {}\n", e),
|
||||
}
|
||||
|
||||
std::thread::sleep(std::time::Duration::from_secs(1));
|
||||
|
||||
// Test 3: Type special characters
|
||||
println!("Test 3: Typing special characters");
|
||||
match controller.type_text("TextEdit", "\nSpecial: @#$%^&*()_+-=[]{}|;':,.<>?/") {
|
||||
Ok(_) => println!(" ✅ Successfully typed special characters\n"),
|
||||
Err(e) => println!(" ❌ Failed: {}\n", e),
|
||||
}
|
||||
|
||||
println!("\n✨ Tests complete!");
|
||||
println!("\n💡 Now try with Things3:");
|
||||
println!(" 1. Open Things3");
|
||||
println!(" 2. Press Cmd+N to create a new task");
|
||||
println!(" 3. Run: g3 --macax 'type \"🌟 My awesome task\" into Things'");
|
||||
|
||||
Ok(())
|
||||
}
|
||||
85
crates/g3-computer-control/examples/test_vision.rs
Normal file
85
crates/g3-computer-control/examples/test_vision.rs
Normal file
@@ -0,0 +1,85 @@
|
||||
use g3_computer_control::ocr::{OCREngine, DefaultOCR};
|
||||
use anyhow::Result;
|
||||
|
||||
#[tokio::main]
|
||||
async fn main() -> Result<()> {
|
||||
println!("🧪 Testing Apple Vision OCR");
|
||||
println!("===========================\n");
|
||||
|
||||
// Initialize OCR engine
|
||||
println!("📦 Initializing OCR engine...");
|
||||
let ocr = DefaultOCR::new()?;
|
||||
println!("✅ OCR engine: {}\n", ocr.name());
|
||||
|
||||
// Check if test image exists
|
||||
let test_image = "/tmp/safari_test.png";
|
||||
if !std::path::Path::new(test_image).exists() {
|
||||
println!("⚠️ Test image not found: {}", test_image);
|
||||
println!(" Creating a screenshot...");
|
||||
|
||||
let status = std::process::Command::new("screencapture")
|
||||
.arg("-x")
|
||||
.arg("-R")
|
||||
.arg("0,0,1200,800")
|
||||
.arg(test_image)
|
||||
.status()?;
|
||||
|
||||
if !status.success() {
|
||||
anyhow::bail!("Failed to create screenshot");
|
||||
}
|
||||
|
||||
println!("✅ Screenshot created\n");
|
||||
}
|
||||
|
||||
// Run OCR
|
||||
println!("🔍 Running Apple Vision OCR on {}...", test_image);
|
||||
let start = std::time::Instant::now();
|
||||
let locations = ocr.extract_text_with_locations(test_image).await?;
|
||||
let duration = start.elapsed();
|
||||
|
||||
println!("✅ OCR completed in {:.3}s\n", duration.as_secs_f64());
|
||||
|
||||
// Display results
|
||||
println!("📊 Results:");
|
||||
println!(" Found {} text elements\n", locations.len());
|
||||
|
||||
if locations.is_empty() {
|
||||
println!("⚠️ No text found in image");
|
||||
} else {
|
||||
println!(" Top 20 results:");
|
||||
println!(" {:<4} {:<40} {:<15} {:<12} {:<8}", "#", "Text", "Position", "Size", "Conf");
|
||||
println!(" {}", "-".repeat(85));
|
||||
|
||||
for (i, loc) in locations.iter().take(20).enumerate() {
|
||||
let text = if loc.text.len() > 37 {
|
||||
format!("{}...", &loc.text[..37])
|
||||
} else {
|
||||
loc.text.clone()
|
||||
};
|
||||
|
||||
println!(" {:<4} {:<40} ({:>4},{:>4}) {:>4}x{:<4} {:.2}",
|
||||
i + 1,
|
||||
text,
|
||||
loc.x,
|
||||
loc.y,
|
||||
loc.width,
|
||||
loc.height,
|
||||
loc.confidence
|
||||
);
|
||||
}
|
||||
|
||||
if locations.len() > 20 {
|
||||
println!("\n ... and {} more", locations.len() - 20);
|
||||
}
|
||||
|
||||
// Performance comparison
|
||||
println!("\n📈 Performance:");
|
||||
println!(" OCR Speed: {:.3}s", duration.as_secs_f64());
|
||||
println!(" Text elements: {}", locations.len());
|
||||
println!(" Avg per element: {:.1}ms", duration.as_millis() as f64 / locations.len() as f64);
|
||||
}
|
||||
|
||||
println!("\n✅ Test complete!");
|
||||
|
||||
Ok(())
|
||||
}
|
||||
45
crates/g3-computer-control/examples/test_window_capture.rs
Normal file
45
crates/g3-computer-control/examples/test_window_capture.rs
Normal file
@@ -0,0 +1,45 @@
|
||||
use g3_computer_control::create_controller;
|
||||
|
||||
#[tokio::main]
|
||||
async fn main() {
|
||||
println!("Testing window-specific screenshot capture...");
|
||||
|
||||
let controller = create_controller().expect("Failed to create controller");
|
||||
|
||||
// Test 1: Capture iTerm2 window
|
||||
println!("\n1. Capturing iTerm2 window...");
|
||||
match controller.take_screenshot("/tmp/iterm_window.png", None, Some("iTerm2")).await {
|
||||
Ok(_) => {
|
||||
println!(" ✅ iTerm2 window captured to /tmp/iterm_window.png");
|
||||
let _ = std::process::Command::new("open").arg("/tmp/iterm_window.png").spawn();
|
||||
}
|
||||
Err(e) => println!(" ❌ Failed: {}", e),
|
||||
}
|
||||
|
||||
// Wait a moment for the image to open
|
||||
tokio::time::sleep(tokio::time::Duration::from_secs(2)).await;
|
||||
|
||||
// Test 2: Full screen capture for comparison
|
||||
println!("\n2. Capturing full screen for comparison...");
|
||||
match controller.take_screenshot("/tmp/fullscreen.png", None, None).await {
|
||||
Ok(_) => {
|
||||
println!(" ✅ Full screen captured to /tmp/fullscreen.png");
|
||||
let _ = std::process::Command::new("open").arg("/tmp/fullscreen.png").spawn();
|
||||
}
|
||||
Err(e) => println!(" ❌ Failed: {}", e),
|
||||
}
|
||||
|
||||
println!("\n=== Comparison ===");
|
||||
println!("iTerm window: /tmp/iterm_window.png (should show ONLY iTerm window)");
|
||||
println!("Full screen: /tmp/fullscreen.png (should show entire desktop)");
|
||||
|
||||
// Show file sizes
|
||||
if let Ok(meta1) = std::fs::metadata("/tmp/iterm_window.png") {
|
||||
if let Ok(meta2) = std::fs::metadata("/tmp/fullscreen.png") {
|
||||
println!("\nFile sizes:");
|
||||
println!(" iTerm window: {:.1} MB", meta1.len() as f64 / 1_000_000.0);
|
||||
println!(" Full screen: {:.1} MB", meta2.len() as f64 / 1_000_000.0);
|
||||
println!("\nWindow capture should be smaller than full screen.");
|
||||
}
|
||||
}
|
||||
}
|
||||
49
crates/g3-computer-control/src/lib.rs
Normal file
49
crates/g3-computer-control/src/lib.rs
Normal file
@@ -0,0 +1,49 @@
|
||||
// Suppress warnings from objc crate macros
|
||||
#![allow(unexpected_cfgs)]
|
||||
|
||||
pub mod types;
|
||||
pub mod platform;
|
||||
pub mod ocr;
|
||||
pub mod webdriver;
|
||||
pub mod macax;
|
||||
|
||||
// Re-export webdriver types for convenience
|
||||
pub use webdriver::{WebDriverController, WebElement, safari::SafariDriver};
|
||||
|
||||
// Re-export macax types for convenience
|
||||
pub use macax::{MacAxController, AXElement, AXApplication};
|
||||
|
||||
use anyhow::Result;
|
||||
use async_trait::async_trait;
|
||||
use types::*;
|
||||
|
||||
#[async_trait]
|
||||
pub trait ComputerController: Send + Sync {
|
||||
// Screen capture
|
||||
async fn take_screenshot(&self, path: &str, region: Option<Rect>, window_id: Option<&str>) -> Result<()>;
|
||||
|
||||
// OCR operations
|
||||
async fn extract_text_from_screen(&self, region: Rect, window_id: &str) -> Result<String>;
|
||||
async fn extract_text_from_image(&self, path: &str) -> Result<String>;
|
||||
async fn extract_text_with_locations(&self, path: &str) -> Result<Vec<TextLocation>>;
|
||||
async fn find_text_in_app(&self, app_name: &str, search_text: &str) -> Result<Option<TextLocation>>;
|
||||
|
||||
// Mouse operations
|
||||
fn move_mouse(&self, x: i32, y: i32) -> Result<()>;
|
||||
fn click_at(&self, x: i32, y: i32, app_name: Option<&str>) -> Result<()>;
|
||||
}
|
||||
|
||||
// Platform-specific constructor
|
||||
pub fn create_controller() -> Result<Box<dyn ComputerController>> {
|
||||
#[cfg(target_os = "macos")]
|
||||
return Ok(Box::new(platform::macos::MacOSController::new()?));
|
||||
|
||||
#[cfg(target_os = "linux")]
|
||||
return Ok(Box::new(platform::linux::LinuxController::new()?));
|
||||
|
||||
#[cfg(target_os = "windows")]
|
||||
return Ok(Box::new(platform::windows::WindowsController::new()?));
|
||||
|
||||
#[cfg(not(any(target_os = "macos", target_os = "linux", target_os = "windows")))]
|
||||
anyhow::bail!("Unsupported platform")
|
||||
}
|
||||
822
crates/g3-computer-control/src/macax/controller.rs
Normal file
822
crates/g3-computer-control/src/macax/controller.rs
Normal file
@@ -0,0 +1,822 @@
|
||||
use super::{AXApplication, AXElement};
|
||||
use anyhow::{Context, Result};
|
||||
use std::collections::HashMap;
|
||||
|
||||
#[cfg(target_os = "macos")]
|
||||
use accessibility::{AXUIElement, AXUIElementAttributes, ElementFinder, TreeVisitor, TreeWalker, TreeWalkerFlow};
|
||||
|
||||
#[cfg(target_os = "macos")]
|
||||
use core_foundation::base::TCFType;
|
||||
|
||||
#[cfg(target_os = "macos")]
|
||||
use core_foundation::string::CFString;
|
||||
|
||||
/// macOS Accessibility API controller using native APIs
|
||||
pub struct MacAxController {
|
||||
// Cache for application elements
|
||||
app_cache: std::sync::Mutex<HashMap<String, AXUIElement>>,
|
||||
}
|
||||
|
||||
impl MacAxController {
|
||||
pub fn new() -> Result<Self> {
|
||||
#[cfg(target_os = "macos")]
|
||||
{
|
||||
// Check if we have accessibility permissions by trying to get system-wide element
|
||||
let _system = AXUIElement::system_wide();
|
||||
|
||||
Ok(Self {
|
||||
app_cache: std::sync::Mutex::new(HashMap::new()),
|
||||
})
|
||||
}
|
||||
|
||||
#[cfg(not(target_os = "macos"))]
|
||||
{
|
||||
anyhow::bail!("macOS Accessibility API is only available on macOS")
|
||||
}
|
||||
}
|
||||
|
||||
/// List all running applications
|
||||
#[cfg(target_os = "macos")]
|
||||
pub fn list_applications(&self) -> Result<Vec<AXApplication>> {
|
||||
let apps = Self::get_running_applications()?;
|
||||
Ok(apps)
|
||||
}
|
||||
|
||||
#[cfg(not(target_os = "macos"))]
|
||||
pub fn list_applications(&self) -> Result<Vec<AXApplication>> {
|
||||
anyhow::bail!("Not supported on this platform")
|
||||
}
|
||||
|
||||
#[cfg(target_os = "macos")]
|
||||
fn get_running_applications() -> Result<Vec<AXApplication>> {
|
||||
use cocoa::appkit::NSApplicationActivationPolicy;
|
||||
use cocoa::base::{id, nil};
|
||||
use objc::{class, msg_send, sel, sel_impl};
|
||||
|
||||
unsafe {
|
||||
let workspace: id = msg_send![class!(NSWorkspace), sharedWorkspace];
|
||||
let running_apps: id = msg_send![workspace, runningApplications];
|
||||
let count: usize = msg_send![running_apps, count];
|
||||
|
||||
let mut apps = Vec::new();
|
||||
|
||||
for i in 0..count {
|
||||
let app: id = msg_send![running_apps, objectAtIndex: i];
|
||||
|
||||
// Get app name
|
||||
let localized_name: id = msg_send![app, localizedName];
|
||||
if localized_name == nil {
|
||||
continue;
|
||||
}
|
||||
let name_ptr: *const i8 = msg_send![localized_name, UTF8String];
|
||||
let name = if !name_ptr.is_null() {
|
||||
std::ffi::CStr::from_ptr(name_ptr)
|
||||
.to_string_lossy()
|
||||
.to_string()
|
||||
} else {
|
||||
continue;
|
||||
};
|
||||
|
||||
// Get bundle ID
|
||||
let bundle_id_obj: id = msg_send![app, bundleIdentifier];
|
||||
let bundle_id = if bundle_id_obj != nil {
|
||||
let bundle_id_ptr: *const i8 = msg_send![bundle_id_obj, UTF8String];
|
||||
if !bundle_id_ptr.is_null() {
|
||||
Some(
|
||||
std::ffi::CStr::from_ptr(bundle_id_ptr)
|
||||
.to_string_lossy()
|
||||
.to_string(),
|
||||
)
|
||||
} else {
|
||||
None
|
||||
}
|
||||
} else {
|
||||
None
|
||||
};
|
||||
|
||||
// Get PID
|
||||
let pid: i32 = msg_send![app, processIdentifier];
|
||||
|
||||
// Skip background-only apps
|
||||
let activation_policy: i64 = msg_send![app, activationPolicy];
|
||||
if activation_policy == NSApplicationActivationPolicy::NSApplicationActivationPolicyRegular as i64 {
|
||||
apps.push(AXApplication {
|
||||
name,
|
||||
bundle_id,
|
||||
pid,
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
Ok(apps)
|
||||
}
|
||||
}
|
||||
|
||||
/// Get the frontmost (active) application
|
||||
#[cfg(target_os = "macos")]
|
||||
pub fn get_frontmost_app(&self) -> Result<AXApplication> {
|
||||
use cocoa::base::{id, nil};
|
||||
use objc::{class, msg_send, sel, sel_impl};
|
||||
|
||||
unsafe {
|
||||
let workspace: id = msg_send![class!(NSWorkspace), sharedWorkspace];
|
||||
let frontmost_app: id = msg_send![workspace, frontmostApplication];
|
||||
|
||||
if frontmost_app == nil {
|
||||
anyhow::bail!("No frontmost application");
|
||||
}
|
||||
|
||||
// Get app name
|
||||
let localized_name: id = msg_send![frontmost_app, localizedName];
|
||||
let name_ptr: *const i8 = msg_send![localized_name, UTF8String];
|
||||
let name = std::ffi::CStr::from_ptr(name_ptr)
|
||||
.to_string_lossy()
|
||||
.to_string();
|
||||
|
||||
// Get bundle ID
|
||||
let bundle_id_obj: id = msg_send![frontmost_app, bundleIdentifier];
|
||||
let bundle_id = if bundle_id_obj != nil {
|
||||
let bundle_id_ptr: *const i8 = msg_send![bundle_id_obj, UTF8String];
|
||||
if !bundle_id_ptr.is_null() {
|
||||
Some(
|
||||
std::ffi::CStr::from_ptr(bundle_id_ptr)
|
||||
.to_string_lossy()
|
||||
.to_string(),
|
||||
)
|
||||
} else {
|
||||
None
|
||||
}
|
||||
} else {
|
||||
None
|
||||
};
|
||||
|
||||
// Get PID
|
||||
let pid: i32 = msg_send![frontmost_app, processIdentifier];
|
||||
|
||||
Ok(AXApplication {
|
||||
name,
|
||||
bundle_id,
|
||||
pid,
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(not(target_os = "macos"))]
|
||||
pub fn get_frontmost_app(&self) -> Result<AXApplication> {
|
||||
anyhow::bail!("Not supported on this platform")
|
||||
}
|
||||
|
||||
/// Get AXUIElement for an application by name or PID
|
||||
#[cfg(target_os = "macos")]
|
||||
fn get_app_element(&self, app_name: &str) -> Result<AXUIElement> {
|
||||
// Check cache first
|
||||
{
|
||||
let cache = self.app_cache.lock().unwrap();
|
||||
if let Some(element) = cache.get(app_name) {
|
||||
return Ok(element.clone());
|
||||
}
|
||||
}
|
||||
|
||||
// Find the app by name
|
||||
let apps = Self::get_running_applications()?;
|
||||
let app = apps
|
||||
.iter()
|
||||
.find(|a| a.name == app_name)
|
||||
.ok_or_else(|| anyhow::anyhow!("Application '{}' not found", app_name))?;
|
||||
|
||||
// Create AXUIElement for the app
|
||||
let element = AXUIElement::application(app.pid);
|
||||
|
||||
// Cache it
|
||||
{
|
||||
let mut cache = self.app_cache.lock().unwrap();
|
||||
cache.insert(app_name.to_string(), element.clone());
|
||||
}
|
||||
|
||||
Ok(element)
|
||||
}
|
||||
|
||||
/// Activate (bring to front) an application
|
||||
#[cfg(target_os = "macos")]
|
||||
pub fn activate_app(&self, app_name: &str) -> Result<()> {
|
||||
use cocoa::base::id;
|
||||
use objc::{class, msg_send, sel, sel_impl};
|
||||
|
||||
// Find the app
|
||||
let apps = Self::get_running_applications()?;
|
||||
let app = apps
|
||||
.iter()
|
||||
.find(|a| a.name == app_name)
|
||||
.ok_or_else(|| anyhow::anyhow!("Application '{}' not found", app_name))?;
|
||||
|
||||
unsafe {
|
||||
let workspace: id = msg_send![class!(NSWorkspace), sharedWorkspace];
|
||||
let running_apps: id = msg_send![workspace, runningApplications];
|
||||
let count: usize = msg_send![running_apps, count];
|
||||
|
||||
for i in 0..count {
|
||||
let running_app: id = msg_send![running_apps, objectAtIndex: i];
|
||||
let pid: i32 = msg_send![running_app, processIdentifier];
|
||||
|
||||
if pid == app.pid {
|
||||
let _: bool = msg_send![running_app, activateWithOptions: 0];
|
||||
return Ok(());
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
anyhow::bail!("Failed to activate application")
|
||||
}
|
||||
|
||||
#[cfg(not(target_os = "macos"))]
|
||||
pub fn activate_app(&self, _app_name: &str) -> Result<()> {
|
||||
anyhow::bail!("Not supported on this platform")
|
||||
}
|
||||
|
||||
/// Get the UI hierarchy of an application
|
||||
#[cfg(target_os = "macos")]
|
||||
pub fn get_ui_tree(&self, app_name: &str, max_depth: usize) -> Result<String> {
|
||||
let app_element = self.get_app_element(app_name)?;
|
||||
let mut output = format!("Application: {}\n", app_name);
|
||||
|
||||
Self::build_ui_tree(&app_element, &mut output, 0, max_depth)?;
|
||||
|
||||
Ok(output)
|
||||
}
|
||||
|
||||
#[cfg(not(target_os = "macos"))]
|
||||
pub fn get_ui_tree(&self, _app_name: &str, _max_depth: usize) -> Result<String> {
|
||||
anyhow::bail!("Not supported on this platform")
|
||||
}
|
||||
|
||||
#[cfg(target_os = "macos")]
|
||||
fn build_ui_tree(
|
||||
element: &AXUIElement,
|
||||
output: &mut String,
|
||||
depth: usize,
|
||||
max_depth: usize,
|
||||
) -> Result<()> {
|
||||
if depth >= max_depth {
|
||||
return Ok(());
|
||||
}
|
||||
|
||||
let indent = " ".repeat(depth);
|
||||
|
||||
// Get role
|
||||
let role = element.role().ok().map(|s| s.to_string())
|
||||
.unwrap_or_else(|| "Unknown".to_string());
|
||||
|
||||
// Get title
|
||||
let title = element.title().ok()
|
||||
.map(|s| s.to_string());
|
||||
|
||||
// Get identifier
|
||||
let identifier = element.identifier().ok()
|
||||
.map(|s| s.to_string());
|
||||
|
||||
// Format output
|
||||
output.push_str(&format!("{}Role: {}", indent, role));
|
||||
if let Some(t) = title {
|
||||
output.push_str(&format!(", Title: {}", t));
|
||||
}
|
||||
if let Some(id) = identifier {
|
||||
output.push_str(&format!(", ID: {}", id));
|
||||
}
|
||||
output.push('\n');
|
||||
|
||||
// Get children
|
||||
if let Ok(children) = element.children() {
|
||||
for i in 0..children.len() {
|
||||
if let Some(child) = children.get(i) {
|
||||
let _ = Self::build_ui_tree(&child, output, depth + 1, max_depth);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Find UI elements in an application
|
||||
#[cfg(target_os = "macos")]
|
||||
pub fn find_elements(
|
||||
&self,
|
||||
app_name: &str,
|
||||
role: Option<&str>,
|
||||
title: Option<&str>,
|
||||
identifier: Option<&str>,
|
||||
) -> Result<Vec<AXElement>> {
|
||||
let app_element = self.get_app_element(app_name)?;
|
||||
let mut found_elements = Vec::new();
|
||||
|
||||
let visitor = ElementCollector {
|
||||
role_filter: role.map(|s| s.to_string()),
|
||||
title_filter: title.map(|s| s.to_string()),
|
||||
identifier_filter: identifier.map(|s| s.to_string()),
|
||||
results: std::cell::RefCell::new(&mut found_elements),
|
||||
depth: std::cell::Cell::new(0),
|
||||
};
|
||||
|
||||
let walker = TreeWalker::new();
|
||||
walker.walk(&app_element, &visitor);
|
||||
|
||||
Ok(found_elements)
|
||||
}
|
||||
|
||||
#[cfg(not(target_os = "macos"))]
|
||||
pub fn find_elements(
|
||||
&self,
|
||||
_app_name: &str,
|
||||
_role: Option<&str>,
|
||||
_title: Option<&str>,
|
||||
_identifier: Option<&str>,
|
||||
) -> Result<Vec<AXElement>> {
|
||||
anyhow::bail!("Not supported on this platform")
|
||||
}
|
||||
|
||||
/// Find a single element (helper for click, set_value, etc.)
|
||||
#[cfg(target_os = "macos")]
|
||||
fn find_element(
|
||||
&self,
|
||||
app_name: &str,
|
||||
role: &str,
|
||||
title: Option<&str>,
|
||||
identifier: Option<&str>,
|
||||
) -> Result<AXUIElement> {
|
||||
let app_element = self.get_app_element(app_name)?;
|
||||
|
||||
let role_str = role.to_string();
|
||||
let title_str = title.map(|s| s.to_string());
|
||||
let identifier_str = identifier.map(|s| s.to_string());
|
||||
|
||||
let finder = ElementFinder::new(
|
||||
&app_element,
|
||||
move |element| {
|
||||
// Check role
|
||||
let elem_role = element.role()
|
||||
.ok()
|
||||
.map(|s| s.to_string());
|
||||
|
||||
if let Some(r) = elem_role {
|
||||
if !r.contains(&role_str) {
|
||||
return false;
|
||||
}
|
||||
} else {
|
||||
return false;
|
||||
}
|
||||
|
||||
// Check title if specified
|
||||
if let Some(ref title_filter) = title_str {
|
||||
let elem_title = element.title()
|
||||
.ok()
|
||||
.map(|s| s.to_string());
|
||||
|
||||
if let Some(t) = elem_title {
|
||||
if !t.contains(title_filter) {
|
||||
return false;
|
||||
}
|
||||
} else {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
// Check identifier if specified
|
||||
if let Some(ref id_filter) = identifier_str {
|
||||
let elem_id = element.identifier()
|
||||
.ok()
|
||||
.map(|s| s.to_string());
|
||||
|
||||
if let Some(id) = elem_id {
|
||||
if !id.contains(id_filter) {
|
||||
return false;
|
||||
}
|
||||
} else {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
true
|
||||
},
|
||||
Some(std::time::Duration::from_secs(2)),
|
||||
);
|
||||
|
||||
finder.find().context("Element not found")
|
||||
}
|
||||
|
||||
/// Click on a UI element
|
||||
#[cfg(target_os = "macos")]
|
||||
pub fn click_element(
|
||||
&self,
|
||||
app_name: &str,
|
||||
role: &str,
|
||||
title: Option<&str>,
|
||||
identifier: Option<&str>,
|
||||
) -> Result<()> {
|
||||
let element = self.find_element(app_name, role, title, identifier)?;
|
||||
|
||||
// Perform the press action
|
||||
let action_name = CFString::new("AXPress");
|
||||
element
|
||||
.perform_action(&action_name)
|
||||
.map_err(|e| anyhow::anyhow!("Failed to perform press action: {:?}", e))?;
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
#[cfg(not(target_os = "macos"))]
|
||||
pub fn click_element(
|
||||
&self,
|
||||
_app_name: &str,
|
||||
_role: &str,
|
||||
_title: Option<&str>,
|
||||
_identifier: Option<&str>,
|
||||
) -> Result<()> {
|
||||
anyhow::bail!("Not supported on this platform")
|
||||
}
|
||||
|
||||
/// Set the value of a UI element
|
||||
#[cfg(target_os = "macos")]
|
||||
pub fn set_value(
|
||||
&self,
|
||||
app_name: &str,
|
||||
role: &str,
|
||||
value: &str,
|
||||
title: Option<&str>,
|
||||
identifier: Option<&str>,
|
||||
) -> Result<()> {
|
||||
let element = self.find_element(app_name, role, title, identifier)?;
|
||||
|
||||
// Set the value - convert CFString to CFType
|
||||
let cf_value = CFString::new(value);
|
||||
|
||||
element.set_value(cf_value.as_CFType())
|
||||
.map_err(|e| anyhow::anyhow!("Failed to set value: {:?}", e))?;
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
#[cfg(not(target_os = "macos"))]
|
||||
pub fn set_value(
|
||||
&self,
|
||||
_app_name: &str,
|
||||
_role: &str,
|
||||
_value: &str,
|
||||
_title: Option<&str>,
|
||||
_identifier: Option<&str>,
|
||||
) -> Result<()> {
|
||||
anyhow::bail!("Not supported on this platform")
|
||||
}
|
||||
|
||||
/// Get the value of a UI element
|
||||
#[cfg(target_os = "macos")]
|
||||
pub fn get_value(
|
||||
&self,
|
||||
app_name: &str,
|
||||
role: &str,
|
||||
title: Option<&str>,
|
||||
identifier: Option<&str>,
|
||||
) -> Result<String> {
|
||||
let element = self.find_element(app_name, role, title, identifier)?;
|
||||
|
||||
// Get the value
|
||||
let value_type = element.value()
|
||||
.map_err(|e| anyhow::anyhow!("Failed to get value: {:?}", e))?;
|
||||
|
||||
// Try to downcast to CFString
|
||||
if let Some(cf_string) = value_type.downcast::<CFString>() {
|
||||
Ok(cf_string.to_string())
|
||||
} else {
|
||||
// For non-string values, try to get a description
|
||||
Ok(format!("<non-string value>"))
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(not(target_os = "macos"))]
|
||||
pub fn get_value(
|
||||
&self,
|
||||
_app_name: &str,
|
||||
_role: &str,
|
||||
_title: Option<&str>,
|
||||
_identifier: Option<&str>,
|
||||
) -> Result<String> {
|
||||
anyhow::bail!("Not supported on this platform")
|
||||
}
|
||||
|
||||
/// Type text into the currently focused element (uses system text input)
|
||||
#[cfg(target_os = "macos")]
|
||||
pub fn type_text(&self, app_name: &str, text: &str) -> Result<()> {
|
||||
use cocoa::base::{id, nil};
|
||||
use cocoa::foundation::NSString;
|
||||
use objc::{class, msg_send, sel, sel_impl};
|
||||
|
||||
// First, make sure the app is active
|
||||
self.activate_app(app_name)?;
|
||||
|
||||
// Wait for app to fully activate
|
||||
std::thread::sleep(std::time::Duration::from_millis(500));
|
||||
|
||||
// Send a Tab key to try to focus on a text field
|
||||
// This helps ensure something is focused before we paste
|
||||
let _ = self.press_key(app_name, "tab", vec![]);
|
||||
std::thread::sleep(std::time::Duration::from_millis(800));
|
||||
|
||||
// Save old clipboard, set new content, paste, then restore
|
||||
let old_content: id;
|
||||
unsafe {
|
||||
// Get the general pasteboard
|
||||
let pasteboard: id = msg_send![class!(NSPasteboard), generalPasteboard];
|
||||
|
||||
// Save current clipboard content
|
||||
let ns_string_type = NSString::alloc(nil).init_str("public.utf8-plain-text");
|
||||
old_content = msg_send![pasteboard, stringForType: ns_string_type];
|
||||
|
||||
// Clear and set new content
|
||||
let _: () = msg_send![pasteboard, clearContents];
|
||||
|
||||
let ns_string = NSString::alloc(nil).init_str(text);
|
||||
let ns_type = NSString::alloc(nil).init_str("public.utf8-plain-text");
|
||||
let _: bool = msg_send![pasteboard, setString:ns_string forType:ns_type];
|
||||
}
|
||||
|
||||
// Wait a moment for clipboard to update
|
||||
std::thread::sleep(std::time::Duration::from_millis(200));
|
||||
|
||||
// Paste using Cmd+V (outside unsafe block)
|
||||
self.press_key(app_name, "v", vec!["command"])?;
|
||||
|
||||
// Wait for paste to complete
|
||||
std::thread::sleep(std::time::Duration::from_millis(300));
|
||||
|
||||
// Restore old clipboard content if it existed
|
||||
unsafe {
|
||||
if old_content != nil {
|
||||
let pasteboard: id = msg_send![class!(NSPasteboard), generalPasteboard];
|
||||
let _: () = msg_send![pasteboard, clearContents];
|
||||
let ns_type = NSString::alloc(nil).init_str("public.utf8-plain-text");
|
||||
let _: bool = msg_send![pasteboard, setString:old_content forType:ns_type];
|
||||
}
|
||||
}
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
#[cfg(not(target_os = "macos"))]
|
||||
pub fn type_text(&self, _app_name: &str, _text: &str) -> Result<()> {
|
||||
anyhow::bail!("Not supported on this platform")
|
||||
}
|
||||
|
||||
/// Focus on a text field or text area element
|
||||
#[cfg(target_os = "macos")]
|
||||
pub fn focus_element(
|
||||
&self,
|
||||
app_name: &str,
|
||||
role: &str,
|
||||
title: Option<&str>,
|
||||
identifier: Option<&str>,
|
||||
) -> Result<()> {
|
||||
let element = self.find_element(app_name, role, title, identifier)?;
|
||||
|
||||
// Set focused attribute to true
|
||||
use core_foundation::boolean::CFBoolean;
|
||||
let cf_true = CFBoolean::true_value();
|
||||
|
||||
element.set_attribute(&accessibility::AXAttribute::focused(), cf_true)
|
||||
.map_err(|e| anyhow::anyhow!("Failed to focus element: {:?}", e))?;
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Press a keyboard shortcut
|
||||
#[cfg(target_os = "macos")]
|
||||
pub fn press_key(
|
||||
&self,
|
||||
app_name: &str,
|
||||
key: &str,
|
||||
modifiers: Vec<&str>,
|
||||
) -> Result<()> {
|
||||
use core_graphics::event::{
|
||||
CGEvent, CGEventFlags, CGEventTapLocation,
|
||||
};
|
||||
use core_graphics::event_source::{CGEventSource, CGEventSourceStateID};
|
||||
|
||||
// First, make sure the app is active
|
||||
self.activate_app(app_name)?;
|
||||
|
||||
// Wait a bit for activation
|
||||
std::thread::sleep(std::time::Duration::from_millis(100));
|
||||
|
||||
// Map key string to key code
|
||||
let key_code = Self::key_to_keycode(key)
|
||||
.ok_or_else(|| anyhow::anyhow!("Unknown key: {}", key))?;
|
||||
|
||||
// Map modifiers to flags
|
||||
let mut flags = CGEventFlags::CGEventFlagNull;
|
||||
for modifier in modifiers {
|
||||
match modifier.to_lowercase().as_str() {
|
||||
"command" | "cmd" => flags |= CGEventFlags::CGEventFlagCommand,
|
||||
"option" | "alt" => flags |= CGEventFlags::CGEventFlagAlternate,
|
||||
"control" | "ctrl" => flags |= CGEventFlags::CGEventFlagControl,
|
||||
"shift" => flags |= CGEventFlags::CGEventFlagShift,
|
||||
_ => {}
|
||||
}
|
||||
}
|
||||
|
||||
// Create event source
|
||||
let source = CGEventSource::new(CGEventSourceStateID::HIDSystemState)
|
||||
.ok().context("Failed to create event source")?;
|
||||
|
||||
// Create key down event
|
||||
let key_down = CGEvent::new_keyboard_event(source.clone(), key_code, true)
|
||||
.ok().context("Failed to create key down event")?;
|
||||
key_down.set_flags(flags);
|
||||
|
||||
// Create key up event
|
||||
let key_up = CGEvent::new_keyboard_event(source, key_code, false)
|
||||
.ok().context("Failed to create key up event")?;
|
||||
key_up.set_flags(flags);
|
||||
|
||||
// Post events
|
||||
key_down.post(CGEventTapLocation::HID);
|
||||
std::thread::sleep(std::time::Duration::from_millis(50));
|
||||
key_up.post(CGEventTapLocation::HID);
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
#[cfg(not(target_os = "macos"))]
|
||||
pub fn press_key(
|
||||
&self,
|
||||
_app_name: &str,
|
||||
_key: &str,
|
||||
_modifiers: Vec<&str>,
|
||||
) -> Result<()> {
|
||||
anyhow::bail!("Not supported on this platform")
|
||||
}
|
||||
|
||||
#[cfg(target_os = "macos")]
|
||||
fn key_to_keycode(key: &str) -> Option<u16> {
|
||||
// Map common keys to keycodes
|
||||
// See: https://eastmanreference.com/complete-list-of-applescript-key-codes
|
||||
match key.to_lowercase().as_str() {
|
||||
"a" => Some(0x00),
|
||||
"s" => Some(0x01),
|
||||
"d" => Some(0x02),
|
||||
"f" => Some(0x03),
|
||||
"h" => Some(0x04),
|
||||
"g" => Some(0x05),
|
||||
"z" => Some(0x06),
|
||||
"x" => Some(0x07),
|
||||
"c" => Some(0x08),
|
||||
"v" => Some(0x09),
|
||||
"b" => Some(0x0B),
|
||||
"q" => Some(0x0C),
|
||||
"w" => Some(0x0D),
|
||||
"e" => Some(0x0E),
|
||||
"r" => Some(0x0F),
|
||||
"y" => Some(0x10),
|
||||
"t" => Some(0x11),
|
||||
"1" => Some(0x12),
|
||||
"2" => Some(0x13),
|
||||
"3" => Some(0x14),
|
||||
"4" => Some(0x15),
|
||||
"6" => Some(0x16),
|
||||
"5" => Some(0x17),
|
||||
"=" => Some(0x18),
|
||||
"9" => Some(0x19),
|
||||
"7" => Some(0x1A),
|
||||
"-" => Some(0x1B),
|
||||
"8" => Some(0x1C),
|
||||
"0" => Some(0x1D),
|
||||
"]" => Some(0x1E),
|
||||
"o" => Some(0x1F),
|
||||
"u" => Some(0x20),
|
||||
"[" => Some(0x21),
|
||||
"i" => Some(0x22),
|
||||
"p" => Some(0x23),
|
||||
"return" | "enter" => Some(0x24),
|
||||
"l" => Some(0x25),
|
||||
"j" => Some(0x26),
|
||||
"'" => Some(0x27),
|
||||
"k" => Some(0x28),
|
||||
";" => Some(0x29),
|
||||
"\\" => Some(0x2A),
|
||||
"," => Some(0x2B),
|
||||
"/" => Some(0x2C),
|
||||
"n" => Some(0x2D),
|
||||
"m" => Some(0x2E),
|
||||
"." => Some(0x2F),
|
||||
"tab" => Some(0x30),
|
||||
"space" => Some(0x31),
|
||||
"`" => Some(0x32),
|
||||
"delete" | "backspace" => Some(0x33),
|
||||
"escape" | "esc" => Some(0x35),
|
||||
"f1" => Some(0x7A),
|
||||
"f2" => Some(0x78),
|
||||
"f3" => Some(0x63),
|
||||
"f4" => Some(0x76),
|
||||
"f5" => Some(0x60),
|
||||
"f6" => Some(0x61),
|
||||
"f7" => Some(0x62),
|
||||
"f8" => Some(0x64),
|
||||
"f9" => Some(0x65),
|
||||
"f10" => Some(0x6D),
|
||||
"f11" => Some(0x67),
|
||||
"f12" => Some(0x6F),
|
||||
"left" => Some(0x7B),
|
||||
"right" => Some(0x7C),
|
||||
"down" => Some(0x7D),
|
||||
"up" => Some(0x7E),
|
||||
_ => None,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(target_os = "macos")]
|
||||
struct ElementCollector<'a> {
|
||||
role_filter: Option<String>,
|
||||
title_filter: Option<String>,
|
||||
identifier_filter: Option<String>,
|
||||
results: std::cell::RefCell<&'a mut Vec<AXElement>>,
|
||||
depth: std::cell::Cell<usize>,
|
||||
}
|
||||
|
||||
#[cfg(target_os = "macos")]
|
||||
impl<'a> TreeVisitor for ElementCollector<'a> {
|
||||
fn enter_element(&self, element: &AXUIElement) -> TreeWalkerFlow {
|
||||
self.depth.set(self.depth.get() + 1);
|
||||
|
||||
if self.depth.get() > 20 {
|
||||
return TreeWalkerFlow::SkipSubtree;
|
||||
}
|
||||
|
||||
// Get element properties
|
||||
let role = element.role()
|
||||
.ok()
|
||||
.map(|s| s.to_string())
|
||||
.unwrap_or_else(|| "Unknown".to_string());
|
||||
|
||||
let title = element.title()
|
||||
.ok()
|
||||
.map(|s| s.to_string());
|
||||
|
||||
let identifier = element.identifier()
|
||||
.ok()
|
||||
.map(|s| s.to_string());
|
||||
|
||||
// Check if this element matches the filters
|
||||
let role_matches = self.role_filter.as_ref().map_or(true, |r| role.contains(r));
|
||||
let title_matches = self.title_filter.as_ref().map_or(true, |t| {
|
||||
title.as_ref().map_or(false, |title_str| title_str.contains(t))
|
||||
});
|
||||
let identifier_matches = self.identifier_filter.as_ref().map_or(true, |id| {
|
||||
identifier.as_ref().map_or(false, |id_str| id_str.contains(id))
|
||||
});
|
||||
|
||||
if role_matches && title_matches && identifier_matches {
|
||||
// Get additional properties
|
||||
let value = element.value()
|
||||
.ok()
|
||||
.and_then(|v| {
|
||||
v.downcast::<CFString>().map(|s| s.to_string())
|
||||
});
|
||||
|
||||
let label = element.description()
|
||||
.ok()
|
||||
.map(|s| s.to_string());
|
||||
|
||||
let enabled = element.enabled()
|
||||
.ok()
|
||||
.map(|b| b.into())
|
||||
.unwrap_or(false);
|
||||
|
||||
let focused = element.focused()
|
||||
.ok()
|
||||
.map(|b| b.into())
|
||||
.unwrap_or(false);
|
||||
|
||||
// Count children
|
||||
let children_count = element.children()
|
||||
.ok()
|
||||
.map(|arr| arr.len() as usize)
|
||||
.unwrap_or(0);
|
||||
|
||||
self.results.borrow_mut().push(AXElement {
|
||||
role,
|
||||
title,
|
||||
value,
|
||||
label,
|
||||
identifier,
|
||||
enabled,
|
||||
focused,
|
||||
position: None,
|
||||
size: None,
|
||||
children_count,
|
||||
});
|
||||
}
|
||||
|
||||
TreeWalkerFlow::Continue
|
||||
}
|
||||
|
||||
fn exit_element(&self, _element: &AXUIElement) {
|
||||
self.depth.set(self.depth.get() - 1);
|
||||
}
|
||||
}
|
||||
65
crates/g3-computer-control/src/macax/mod.rs
Normal file
65
crates/g3-computer-control/src/macax/mod.rs
Normal file
@@ -0,0 +1,65 @@
|
||||
pub mod controller;
|
||||
|
||||
pub use controller::MacAxController;
|
||||
|
||||
use serde::{Deserialize, Serialize};
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests;
|
||||
|
||||
/// Represents an accessibility element in the UI hierarchy
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
pub struct AXElement {
|
||||
pub role: String,
|
||||
pub title: Option<String>,
|
||||
pub value: Option<String>,
|
||||
pub label: Option<String>,
|
||||
pub identifier: Option<String>,
|
||||
pub enabled: bool,
|
||||
pub focused: bool,
|
||||
pub position: Option<(f64, f64)>,
|
||||
pub size: Option<(f64, f64)>,
|
||||
pub children_count: usize,
|
||||
}
|
||||
|
||||
/// Represents a macOS application
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct AXApplication {
|
||||
pub name: String,
|
||||
pub bundle_id: Option<String>,
|
||||
pub pid: i32,
|
||||
}
|
||||
|
||||
impl AXElement {
|
||||
/// Convert to a human-readable string representation
|
||||
pub fn to_string(&self) -> String {
|
||||
let mut parts = vec![format!("Role: {}", self.role)];
|
||||
|
||||
if let Some(ref title) = self.title {
|
||||
parts.push(format!("Title: {}", title));
|
||||
}
|
||||
if let Some(ref value) = self.value {
|
||||
parts.push(format!("Value: {}", value));
|
||||
}
|
||||
if let Some(ref label) = self.label {
|
||||
parts.push(format!("Label: {}", label));
|
||||
}
|
||||
if let Some(ref id) = self.identifier {
|
||||
parts.push(format!("ID: {}", id));
|
||||
}
|
||||
|
||||
parts.push(format!("Enabled: {}", self.enabled));
|
||||
parts.push(format!("Focused: {}", self.focused));
|
||||
|
||||
if let Some((x, y)) = self.position {
|
||||
parts.push(format!("Position: ({:.0}, {:.0})", x, y));
|
||||
}
|
||||
if let Some((w, h)) = self.size {
|
||||
parts.push(format!("Size: ({:.0}, {:.0})", w, h));
|
||||
}
|
||||
|
||||
parts.push(format!("Children: {}", self.children_count));
|
||||
|
||||
parts.join(", ")
|
||||
}
|
||||
}
|
||||
37
crates/g3-computer-control/src/macax/tests.rs
Normal file
37
crates/g3-computer-control/src/macax/tests.rs
Normal file
@@ -0,0 +1,37 @@
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use crate::{AXElement, MacAxController};
|
||||
|
||||
#[test]
|
||||
fn test_ax_element_to_string() {
|
||||
let element = AXElement {
|
||||
role: "button".to_string(),
|
||||
title: Some("Click Me".to_string()),
|
||||
value: None,
|
||||
label: Some("Submit Button".to_string()),
|
||||
identifier: Some("submitBtn".to_string()),
|
||||
enabled: true,
|
||||
focused: false,
|
||||
position: Some((100.0, 200.0)),
|
||||
size: Some((80.0, 30.0)),
|
||||
children_count: 0,
|
||||
};
|
||||
|
||||
let string_repr = element.to_string();
|
||||
assert!(string_repr.contains("Role: button"));
|
||||
assert!(string_repr.contains("Title: Click Me"));
|
||||
assert!(string_repr.contains("Label: Submit Button"));
|
||||
assert!(string_repr.contains("ID: submitBtn"));
|
||||
assert!(string_repr.contains("Enabled: true"));
|
||||
assert!(string_repr.contains("Position: (100, 200)"));
|
||||
assert!(string_repr.contains("Size: (80, 30)"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_controller_creation() {
|
||||
// Just test that we can create a controller
|
||||
// Actual functionality requires macOS and permissions
|
||||
let result = MacAxController::new();
|
||||
assert!(result.is_ok());
|
||||
}
|
||||
}
|
||||
26
crates/g3-computer-control/src/ocr/mod.rs
Normal file
26
crates/g3-computer-control/src/ocr/mod.rs
Normal file
@@ -0,0 +1,26 @@
|
||||
use crate::types::TextLocation;
|
||||
use anyhow::Result;
|
||||
use async_trait::async_trait;
|
||||
|
||||
/// OCR engine trait for text recognition with bounding boxes
|
||||
#[async_trait]
|
||||
pub trait OCREngine: Send + Sync {
|
||||
/// Extract text with locations from an image file
|
||||
async fn extract_text_with_locations(&self, path: &str) -> Result<Vec<TextLocation>>;
|
||||
|
||||
/// Get the name of the OCR engine
|
||||
fn name(&self) -> &str;
|
||||
}
|
||||
|
||||
// Platform-specific modules
|
||||
#[cfg(target_os = "macos")]
|
||||
pub mod vision;
|
||||
|
||||
pub mod tesseract;
|
||||
|
||||
// Re-export the default OCR engine for the platform
|
||||
#[cfg(target_os = "macos")]
|
||||
pub use vision::AppleVisionOCR as DefaultOCR;
|
||||
|
||||
#[cfg(not(target_os = "macos"))]
|
||||
pub use tesseract::TesseractOCR as DefaultOCR;
|
||||
84
crates/g3-computer-control/src/ocr/tesseract.rs
Normal file
84
crates/g3-computer-control/src/ocr/tesseract.rs
Normal file
@@ -0,0 +1,84 @@
|
||||
use super::OCREngine;
|
||||
use crate::types::TextLocation;
|
||||
use anyhow::Result;
|
||||
use async_trait::async_trait;
|
||||
|
||||
/// Tesseract OCR engine (fallback/cross-platform)
|
||||
pub struct TesseractOCR;
|
||||
|
||||
impl TesseractOCR {
|
||||
pub fn new() -> Result<Self> {
|
||||
// Check if tesseract is available
|
||||
let tesseract_check = std::process::Command::new("which")
|
||||
.arg("tesseract")
|
||||
.output();
|
||||
|
||||
if tesseract_check.is_err() || !tesseract_check.as_ref().unwrap().status.success() {
|
||||
anyhow::bail!("Tesseract OCR is not installed on your system.\n\n\
|
||||
To install tesseract:\n macOS: brew install tesseract\n \
|
||||
Linux: sudo apt-get install tesseract-ocr (Ubuntu/Debian)\n \
|
||||
sudo yum install tesseract (RHEL/CentOS)\n \
|
||||
Windows: Download from https://github.com/UB-Mannheim/tesseract/wiki\n\n\
|
||||
After installation, restart your terminal and try again.");
|
||||
}
|
||||
|
||||
Ok(Self)
|
||||
}
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
impl OCREngine for TesseractOCR {
|
||||
async fn extract_text_with_locations(&self, path: &str) -> Result<Vec<TextLocation>> {
|
||||
// Use tesseract CLI with TSV output to get bounding boxes
|
||||
let output = std::process::Command::new("tesseract")
|
||||
.arg(path)
|
||||
.arg("stdout")
|
||||
.arg("tsv")
|
||||
.output()
|
||||
.map_err(|e| anyhow::anyhow!("Failed to run tesseract: {}", e))?;
|
||||
|
||||
if !output.status.success() {
|
||||
anyhow::bail!("Tesseract failed: {}", String::from_utf8_lossy(&output.stderr));
|
||||
}
|
||||
|
||||
let tsv_text = String::from_utf8_lossy(&output.stdout);
|
||||
let mut locations = Vec::new();
|
||||
|
||||
// Parse TSV output (skip header line)
|
||||
for (i, line) in tsv_text.lines().enumerate() {
|
||||
if i == 0 { continue; } // Skip header
|
||||
|
||||
let parts: Vec<&str> = line.split('\t').collect();
|
||||
if parts.len() >= 12 {
|
||||
// TSV format: level, page_num, block_num, par_num, line_num, word_num,
|
||||
// left, top, width, height, conf, text
|
||||
if let (Ok(x), Ok(y), Ok(w), Ok(h), Ok(conf), text) = (
|
||||
parts[6].parse::<i32>(),
|
||||
parts[7].parse::<i32>(),
|
||||
parts[8].parse::<i32>(),
|
||||
parts[9].parse::<i32>(),
|
||||
parts[10].parse::<f32>(),
|
||||
parts[11],
|
||||
) {
|
||||
let trimmed = text.trim();
|
||||
if !trimmed.is_empty() && conf > 0.0 {
|
||||
locations.push(TextLocation {
|
||||
text: trimmed.to_string(),
|
||||
x,
|
||||
y,
|
||||
width: w,
|
||||
height: h,
|
||||
confidence: conf / 100.0, // Convert from 0-100 to 0-1
|
||||
});
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
Ok(locations)
|
||||
}
|
||||
|
||||
fn name(&self) -> &str {
|
||||
"Tesseract OCR"
|
||||
}
|
||||
}
|
||||
103
crates/g3-computer-control/src/ocr/vision.rs
Normal file
103
crates/g3-computer-control/src/ocr/vision.rs
Normal file
@@ -0,0 +1,103 @@
|
||||
use super::OCREngine;
|
||||
use crate::types::TextLocation;
|
||||
use anyhow::{Result, Context};
|
||||
use async_trait::async_trait;
|
||||
use std::ffi::{CStr, CString};
|
||||
use std::os::raw::{c_char, c_float, c_uint};
|
||||
|
||||
// FFI bindings to Swift VisionBridge
|
||||
#[repr(C)]
|
||||
struct VisionTextBox {
|
||||
text: *const c_char,
|
||||
text_len: c_uint,
|
||||
x: i32,
|
||||
y: i32,
|
||||
width: i32,
|
||||
height: i32,
|
||||
confidence: c_float,
|
||||
}
|
||||
|
||||
extern "C" {
|
||||
fn vision_recognize_text(
|
||||
image_path: *const c_char,
|
||||
image_path_len: c_uint,
|
||||
out_boxes: *mut *mut std::ffi::c_void,
|
||||
out_count: *mut c_uint,
|
||||
) -> bool;
|
||||
|
||||
fn vision_free_boxes(boxes: *mut std::ffi::c_void, count: c_uint);
|
||||
}
|
||||
|
||||
/// Apple Vision Framework OCR engine
|
||||
pub struct AppleVisionOCR;
|
||||
|
||||
impl AppleVisionOCR {
|
||||
pub fn new() -> Result<Self> {
|
||||
Ok(Self)
|
||||
}
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
impl OCREngine for AppleVisionOCR {
|
||||
async fn extract_text_with_locations(&self, path: &str) -> Result<Vec<TextLocation>> {
|
||||
// Convert path to C string
|
||||
let c_path = CString::new(path)
|
||||
.context("Failed to convert path to C string")?;
|
||||
|
||||
let mut boxes_ptr: *mut std::ffi::c_void = std::ptr::null_mut();
|
||||
let mut count: c_uint = 0;
|
||||
|
||||
// Call Swift Vision API
|
||||
let success = unsafe {
|
||||
vision_recognize_text(
|
||||
c_path.as_ptr(),
|
||||
path.len() as c_uint,
|
||||
&mut boxes_ptr,
|
||||
&mut count,
|
||||
)
|
||||
};
|
||||
|
||||
if !success || boxes_ptr.is_null() {
|
||||
anyhow::bail!("Apple Vision OCR failed");
|
||||
}
|
||||
|
||||
// Convert C array to Rust Vec
|
||||
let mut locations = Vec::new();
|
||||
|
||||
unsafe {
|
||||
let typed_boxes = boxes_ptr as *const VisionTextBox;
|
||||
let boxes_slice = std::slice::from_raw_parts(typed_boxes, count as usize);
|
||||
|
||||
for box_data in boxes_slice {
|
||||
// Convert C string to Rust String
|
||||
let text = if !box_data.text.is_null() {
|
||||
CStr::from_ptr(box_data.text)
|
||||
.to_string_lossy()
|
||||
.into_owned()
|
||||
} else {
|
||||
String::new()
|
||||
};
|
||||
|
||||
if !text.is_empty() {
|
||||
locations.push(TextLocation {
|
||||
text,
|
||||
x: box_data.x,
|
||||
y: box_data.y,
|
||||
width: box_data.width,
|
||||
height: box_data.height,
|
||||
confidence: box_data.confidence,
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
// Free the C array
|
||||
vision_free_boxes(boxes_ptr, count);
|
||||
}
|
||||
|
||||
Ok(locations)
|
||||
}
|
||||
|
||||
fn name(&self) -> &str {
|
||||
"Apple Vision Framework"
|
||||
}
|
||||
}
|
||||
166
crates/g3-computer-control/src/platform/linux.rs
Normal file
166
crates/g3-computer-control/src/platform/linux.rs
Normal file
@@ -0,0 +1,166 @@
|
||||
use crate::{ComputerController, types::*};
|
||||
use anyhow::Result;
|
||||
use async_trait::async_trait;
|
||||
use tesseract::Tesseract;
|
||||
use uuid::Uuid;
|
||||
|
||||
pub struct LinuxController {
|
||||
// Placeholder for X11 connection or other state
|
||||
}
|
||||
|
||||
impl LinuxController {
|
||||
pub fn new() -> Result<Self> {
|
||||
// Initialize X11 connection
|
||||
tracing::warn!("Linux computer control not fully implemented");
|
||||
Ok(Self {})
|
||||
}
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
impl ComputerController for LinuxController {
|
||||
async fn move_mouse(&self, _x: i32, _y: i32) -> Result<()> {
|
||||
anyhow::bail!("Linux implementation not yet available")
|
||||
}
|
||||
|
||||
async fn click(&self, _button: MouseButton) -> Result<()> {
|
||||
anyhow::bail!("Linux implementation not yet available")
|
||||
}
|
||||
|
||||
async fn double_click(&self, _button: MouseButton) -> Result<()> {
|
||||
anyhow::bail!("Linux implementation not yet available")
|
||||
}
|
||||
|
||||
async fn type_text(&self, _text: &str) -> Result<()> {
|
||||
anyhow::bail!("Linux implementation not yet available")
|
||||
}
|
||||
|
||||
async fn press_key(&self, _key: &str) -> Result<()> {
|
||||
anyhow::bail!("Linux implementation not yet available")
|
||||
}
|
||||
|
||||
async fn list_windows(&self) -> Result<Vec<Window>> {
|
||||
anyhow::bail!("Linux implementation not yet available")
|
||||
}
|
||||
|
||||
async fn focus_window(&self, _window_id: &str) -> Result<()> {
|
||||
anyhow::bail!("Linux implementation not yet available")
|
||||
}
|
||||
|
||||
async fn get_window_bounds(&self, _window_id: &str) -> Result<Rect> {
|
||||
anyhow::bail!("Linux implementation not yet available")
|
||||
}
|
||||
|
||||
async fn find_element(&self, _selector: &ElementSelector) -> Result<Option<UIElement>> {
|
||||
anyhow::bail!("Linux implementation not yet available")
|
||||
}
|
||||
|
||||
async fn get_element_text(&self, _element_id: &str) -> Result<String> {
|
||||
anyhow::bail!("Linux implementation not yet available")
|
||||
}
|
||||
|
||||
async fn get_element_bounds(&self, _element_id: &str) -> Result<Rect> {
|
||||
anyhow::bail!("Linux implementation not yet available")
|
||||
}
|
||||
|
||||
async fn take_screenshot(&self, _path: &str, _region: Option<Rect>, _window_id: Option<&str>) -> Result<()> {
|
||||
// Enforce that window_id must be provided
|
||||
if _window_id.is_none() {
|
||||
anyhow::bail!("window_id is required. You must specify which window to capture (e.g., 'Firefox', 'Terminal', 'gedit'). Use list_windows to see available windows.");
|
||||
}
|
||||
|
||||
anyhow::bail!("Linux implementation not yet available")
|
||||
}
|
||||
|
||||
async fn extract_text_from_screen(&self, _region: Rect, _window_id: &str) -> Result<String> {
|
||||
anyhow::bail!("Linux implementation not yet available")
|
||||
}
|
||||
|
||||
async fn extract_text_from_image(&self, _path: &str) -> Result<OCRResult> {
|
||||
// Check if tesseract is available on the system
|
||||
let tesseract_check = std::process::Command::new("which")
|
||||
.arg("tesseract")
|
||||
.output();
|
||||
|
||||
if tesseract_check.is_err() || !tesseract_check.as_ref().unwrap().status.success() {
|
||||
anyhow::bail!("Tesseract OCR is not installed on your system.\n\n\
|
||||
To install tesseract:\n \
|
||||
Ubuntu/Debian: sudo apt-get install tesseract-ocr\n \
|
||||
RHEL/CentOS: sudo yum install tesseract\n \
|
||||
Arch Linux: sudo pacman -S tesseract\n\n\
|
||||
After installation, restart your terminal and try again.");
|
||||
}
|
||||
|
||||
// Initialize Tesseract
|
||||
let tess = Tesseract::new(None, Some("eng"))
|
||||
.map_err(|e| {
|
||||
anyhow::anyhow!("Failed to initialize Tesseract: {}\n\n\
|
||||
This usually means:\n1. Tesseract is not properly installed\n\
|
||||
2. Language data files are missing\n\nTo fix:\n \
|
||||
Ubuntu/Debian: sudo apt-get install tesseract-ocr-eng\n \
|
||||
RHEL/CentOS: sudo yum install tesseract-langpack-eng\n \
|
||||
Arch Linux: sudo pacman -S tesseract-data-eng", e)
|
||||
})?;
|
||||
|
||||
let text = tess.set_image(_path)
|
||||
.map_err(|e| anyhow::anyhow!("Failed to load image '{}': {}", _path, e))?
|
||||
.get_text()
|
||||
.map_err(|e| anyhow::anyhow!("Failed to extract text from image: {}", e))?;
|
||||
|
||||
// Get confidence (simplified - would need more complex API calls for per-word confidence)
|
||||
let confidence = 0.85; // Placeholder
|
||||
|
||||
Ok(OCRResult {
|
||||
text,
|
||||
confidence,
|
||||
bounds: Rect { x: 0, y: 0, width: 0, height: 0 }, // Would need image dimensions
|
||||
})
|
||||
}
|
||||
|
||||
async fn find_text_on_screen(&self, _text: &str) -> Result<Option<Point>> {
|
||||
// Check if tesseract is available on the system
|
||||
let tesseract_check = std::process::Command::new("which")
|
||||
.arg("tesseract")
|
||||
.output();
|
||||
|
||||
if tesseract_check.is_err() || !tesseract_check.as_ref().unwrap().status.success() {
|
||||
anyhow::bail!("Tesseract OCR is not installed on your system.\n\n\
|
||||
To install tesseract:\n \
|
||||
Ubuntu/Debian: sudo apt-get install tesseract-ocr\n \
|
||||
RHEL/CentOS: sudo yum install tesseract\n \
|
||||
Arch Linux: sudo pacman -S tesseract\n\n\
|
||||
After installation, restart your terminal and try again.");
|
||||
}
|
||||
|
||||
// Take full screen screenshot
|
||||
let temp_path = format!("/tmp/g3_ocr_search_{}.png", uuid::Uuid::new_v4());
|
||||
self.take_screenshot(&temp_path, None, None).await?;
|
||||
|
||||
// Use Tesseract to find text with bounding boxes
|
||||
let tess = Tesseract::new(None, Some("eng"))
|
||||
.map_err(|e| {
|
||||
anyhow::anyhow!("Failed to initialize Tesseract: {}\n\n\
|
||||
This usually means:\n1. Tesseract is not properly installed\n\
|
||||
2. Language data files are missing\n\nTo fix:\n \
|
||||
Ubuntu/Debian: sudo apt-get install tesseract-ocr-eng\n \
|
||||
RHEL/CentOS: sudo yum install tesseract-langpack-eng\n \
|
||||
Arch Linux: sudo pacman -S tesseract-data-eng", e)
|
||||
})?;
|
||||
|
||||
let full_text = tess.set_image(temp_path.as_str())
|
||||
.map_err(|e| anyhow::anyhow!("Failed to load screenshot: {}", e))?
|
||||
.get_text()
|
||||
.map_err(|e| anyhow::anyhow!("Failed to extract text from screen: {}", e))?;
|
||||
|
||||
// Clean up temp file
|
||||
let _ = std::fs::remove_file(&temp_path);
|
||||
|
||||
// Simple text search - full implementation would use get_component_images
|
||||
// to get bounding boxes for each word
|
||||
if full_text.contains(_text) {
|
||||
tracing::warn!("Text found but precise coordinates not available in simplified implementation");
|
||||
Ok(Some(Point { x: 0, y: 0 }))
|
||||
} else {
|
||||
Ok(None)
|
||||
}
|
||||
}
|
||||
}
|
||||
507
crates/g3-computer-control/src/platform/macos.rs
Normal file
507
crates/g3-computer-control/src/platform/macos.rs
Normal file
@@ -0,0 +1,507 @@
|
||||
use crate::{ComputerController, types::{Rect, TextLocation}};
|
||||
use crate::ocr::{OCREngine, DefaultOCR};
|
||||
use anyhow::{Result, Context};
|
||||
use async_trait::async_trait;
|
||||
use std::path::Path;
|
||||
use core_graphics::window::{kCGWindowListOptionOnScreenOnly, kCGNullWindowID, CGWindowListCopyWindowInfo};
|
||||
use core_foundation::dictionary::CFDictionary;
|
||||
use core_foundation::string::CFString;
|
||||
use core_foundation::base::{TCFType, ToVoid};
|
||||
use core_foundation::array::CFArray;
|
||||
|
||||
pub struct MacOSController {
|
||||
ocr_engine: Box<dyn OCREngine>,
|
||||
#[allow(dead_code)]
|
||||
ocr_name: String,
|
||||
}
|
||||
|
||||
impl MacOSController {
|
||||
pub fn new() -> Result<Self> {
|
||||
let ocr = Box::new(DefaultOCR::new()?);
|
||||
let ocr_name = ocr.name().to_string();
|
||||
tracing::info!("Initialized macOS controller with OCR engine: {}", ocr_name);
|
||||
Ok(Self { ocr_engine: ocr, ocr_name })
|
||||
}
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
impl ComputerController for MacOSController {
|
||||
async fn take_screenshot(&self, path: &str, region: Option<Rect>, window_id: Option<&str>) -> Result<()> {
|
||||
// Enforce that window_id must be provided
|
||||
if window_id.is_none() {
|
||||
return Err(anyhow::anyhow!("window_id is required. You must specify which window to capture (e.g., 'Safari', 'Terminal', 'Google Chrome'). Use list_windows to see available windows."));
|
||||
}
|
||||
|
||||
// Determine the temporary directory for screenshots
|
||||
let temp_dir = std::env::var("TMPDIR")
|
||||
.or_else(|_| std::env::var("HOME").map(|h| format!("{}/tmp", h)))
|
||||
.unwrap_or_else(|_| "/tmp".to_string());
|
||||
|
||||
// Ensure temp directory exists
|
||||
std::fs::create_dir_all(&temp_dir)?;
|
||||
|
||||
// If path is relative or doesn't specify a directory, use temp_dir
|
||||
let final_path = if path.starts_with('/') {
|
||||
path.to_string()
|
||||
} else {
|
||||
format!("{}/{}", temp_dir.trim_end_matches('/'), path)
|
||||
};
|
||||
|
||||
let path_obj = Path::new(&final_path);
|
||||
if let Some(parent) = path_obj.parent() {
|
||||
std::fs::create_dir_all(parent)?;
|
||||
}
|
||||
|
||||
let app_name = window_id.unwrap(); // Safe because we checked is_none() above
|
||||
|
||||
// Get the window ID for the specified application
|
||||
let cg_window_id = unsafe {
|
||||
let window_list = CGWindowListCopyWindowInfo(
|
||||
kCGWindowListOptionOnScreenOnly,
|
||||
kCGNullWindowID
|
||||
);
|
||||
|
||||
let array = CFArray::<CFDictionary>::wrap_under_create_rule(window_list);
|
||||
let count = array.len();
|
||||
|
||||
let mut found_window_id: Option<(u32, String)> = None; // (id, owner)
|
||||
let app_name_lower = app_name.to_lowercase();
|
||||
|
||||
for i in 0..count {
|
||||
let dict = array.get(i).unwrap();
|
||||
|
||||
// Get owner name
|
||||
let owner_key = CFString::from_static_string("kCGWindowOwnerName");
|
||||
let owner: String = if let Some(value) = dict.find(owner_key.to_void()) {
|
||||
let s: CFString = TCFType::wrap_under_get_rule(*value as *const _);
|
||||
s.to_string()
|
||||
} else {
|
||||
continue;
|
||||
};
|
||||
|
||||
tracing::debug!("Checking window: owner='{}', looking for '{}'", owner, app_name);
|
||||
let owner_lower = owner.to_lowercase();
|
||||
|
||||
// Normalize by removing spaces for exact matching
|
||||
let app_name_normalized = app_name_lower.replace(" ", "");
|
||||
let owner_normalized = owner_lower.replace(" ", "");
|
||||
|
||||
// ONLY accept exact matches (case-insensitive, with or without spaces)
|
||||
// This prevents "Goose" from matching "GooseStudio"
|
||||
let is_match = owner_lower == app_name_lower || owner_normalized == app_name_normalized;
|
||||
|
||||
if is_match {
|
||||
// Get window ID
|
||||
let window_id_key = CFString::from_static_string("kCGWindowNumber");
|
||||
if let Some(value) = dict.find(window_id_key.to_void()) {
|
||||
let num: core_foundation::number::CFNumber = TCFType::wrap_under_get_rule(*value as *const _);
|
||||
if let Some(id) = num.to_i64() {
|
||||
// Get window layer to filter out menu bar windows
|
||||
let layer_key = CFString::from_static_string("kCGWindowLayer");
|
||||
let layer: i32 = if let Some(value) = dict.find(layer_key.to_void()) {
|
||||
let num: core_foundation::number::CFNumber = TCFType::wrap_under_get_rule(*value as *const _);
|
||||
num.to_i32().unwrap_or(0)
|
||||
} else {
|
||||
0
|
||||
};
|
||||
|
||||
// Get window bounds to verify it's a real window
|
||||
let bounds_key = CFString::from_static_string("kCGWindowBounds");
|
||||
let has_real_bounds = if let Some(value) = dict.find(bounds_key.to_void()) {
|
||||
let bounds_dict: CFDictionary = TCFType::wrap_under_get_rule(*value as *const _);
|
||||
let width_key = CFString::from_static_string("Width");
|
||||
let height_key = CFString::from_static_string("Height");
|
||||
|
||||
if let (Some(w_val), Some(h_val)) = (
|
||||
bounds_dict.find(width_key.to_void()),
|
||||
bounds_dict.find(height_key.to_void()),
|
||||
) {
|
||||
let w_num: core_foundation::number::CFNumber = TCFType::wrap_under_get_rule(*w_val as *const _);
|
||||
let h_num: core_foundation::number::CFNumber = TCFType::wrap_under_get_rule(*h_val as *const _);
|
||||
let width = w_num.to_f64().unwrap_or(0.0);
|
||||
let height = h_num.to_f64().unwrap_or(0.0);
|
||||
// Real windows should be at least 100x100 pixels
|
||||
width >= 100.0 && height >= 100.0
|
||||
} else {
|
||||
false
|
||||
}
|
||||
} else {
|
||||
false
|
||||
};
|
||||
|
||||
// Only accept windows that are:
|
||||
// 1. At layer 0 (normal windows, not menu bar)
|
||||
// 2. Have real bounds (width and height >= 100)
|
||||
if layer == 0 && has_real_bounds {
|
||||
tracing::info!("Found valid window: ID {} for app '{}' (layer={}, bounds valid)", id, owner, layer);
|
||||
found_window_id = Some((id as u32, owner.clone()));
|
||||
break;
|
||||
} else {
|
||||
tracing::debug!("Skipping window ID {} for '{}': layer={}, has_real_bounds={}", id, owner, layer, has_real_bounds);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
found_window_id
|
||||
};
|
||||
|
||||
let (cg_window_id, matched_owner) = cg_window_id.ok_or_else(|| {
|
||||
anyhow::anyhow!("Could not find window for application '{}'. Use list_windows to see available windows.", app_name)
|
||||
})?;
|
||||
tracing::info!("Taking screenshot of window ID {} for app '{}'", cg_window_id, matched_owner);
|
||||
|
||||
// Use screencapture with the window ID for now
|
||||
// TODO: Implement direct CGWindowListCreateImage approach with proper image saving
|
||||
let mut cmd = std::process::Command::new("screencapture");
|
||||
cmd.arg("-x"); // No sound
|
||||
cmd.arg("-l");
|
||||
cmd.arg(cg_window_id.to_string());
|
||||
|
||||
if let Some(region) = region {
|
||||
cmd.arg("-R");
|
||||
cmd.arg(format!("{},{},{},{}", region.x, region.y, region.width, region.height));
|
||||
}
|
||||
|
||||
cmd.arg(&final_path);
|
||||
|
||||
let screenshot_result = cmd.output()?;
|
||||
|
||||
if !screenshot_result.status.success() {
|
||||
let stderr = String::from_utf8_lossy(&screenshot_result.stderr);
|
||||
return Err(anyhow::anyhow!("screencapture failed for window {}: {}", cg_window_id, stderr));
|
||||
}
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
async fn extract_text_from_screen(&self, region: Rect, window_id: &str) -> Result<String> {
|
||||
// Take screenshot of region first
|
||||
let temp_path = format!("/tmp/g3_ocr_{}.png", uuid::Uuid::new_v4());
|
||||
self.take_screenshot(&temp_path, Some(region), Some(window_id)).await?;
|
||||
|
||||
// Extract text from the screenshot
|
||||
let result = self.extract_text_from_image(&temp_path).await?;
|
||||
|
||||
// Clean up temp file
|
||||
let _ = std::fs::remove_file(&temp_path);
|
||||
|
||||
Ok(result)
|
||||
}
|
||||
|
||||
async fn extract_text_from_image(&self, path: &str) -> Result<String> {
|
||||
// Extract all text and concatenate
|
||||
let locations = self.ocr_engine.extract_text_with_locations(path).await?;
|
||||
Ok(locations.iter().map(|loc| loc.text.as_str()).collect::<Vec<_>>().join(" "))
|
||||
}
|
||||
|
||||
async fn extract_text_with_locations(&self, path: &str) -> Result<Vec<TextLocation>> {
|
||||
// Use the OCR engine
|
||||
self.ocr_engine.extract_text_with_locations(path).await
|
||||
}
|
||||
|
||||
async fn find_text_in_app(&self, app_name: &str, search_text: &str) -> Result<Option<TextLocation>> {
|
||||
// Take screenshot of specific app window
|
||||
let home = std::env::var("HOME").unwrap_or_else(|_| "/tmp".to_string());
|
||||
let temp_path = format!("{}/tmp/g3_find_text_{}_{}.png", home, app_name, uuid::Uuid::new_v4());
|
||||
self.take_screenshot(&temp_path, None, Some(app_name)).await?;
|
||||
|
||||
// Get screenshot dimensions before we delete it
|
||||
let screenshot_dims = get_image_dimensions(&temp_path)?;
|
||||
|
||||
// Extract all text with locations
|
||||
let locations = self.extract_text_with_locations(&temp_path).await?;
|
||||
|
||||
// Get window bounds to calculate coordinate transformation
|
||||
let window_bounds = self.get_window_bounds(app_name)?;
|
||||
|
||||
// Clean up temp file
|
||||
let _ = std::fs::remove_file(&temp_path);
|
||||
|
||||
// Find matching text (case-insensitive)
|
||||
let search_lower = search_text.to_lowercase();
|
||||
for location in locations {
|
||||
if location.text.to_lowercase().contains(&search_lower) {
|
||||
// Transform coordinates from screenshot space to screen space
|
||||
let transformed = transform_screenshot_to_screen_coords(
|
||||
location,
|
||||
window_bounds,
|
||||
screenshot_dims,
|
||||
);
|
||||
return Ok(Some(transformed));
|
||||
}
|
||||
}
|
||||
|
||||
Ok(None)
|
||||
}
|
||||
|
||||
fn move_mouse(&self, x: i32, y: i32) -> Result<()> {
|
||||
use core_graphics::event::{
|
||||
CGEvent, CGEventTapLocation, CGEventType, CGMouseButton,
|
||||
};
|
||||
use core_graphics::event_source::{
|
||||
CGEventSource, CGEventSourceStateID,
|
||||
};
|
||||
use core_graphics::geometry::CGPoint;
|
||||
|
||||
let source = CGEventSource::new(CGEventSourceStateID::HIDSystemState)
|
||||
.ok().context("Failed to create event source")?;
|
||||
|
||||
let event = CGEvent::new_mouse_event(
|
||||
source,
|
||||
CGEventType::MouseMoved,
|
||||
CGPoint::new(x as f64, y as f64),
|
||||
CGMouseButton::Left,
|
||||
).ok().context("Failed to create mouse event")?;
|
||||
|
||||
event.post(CGEventTapLocation::HID);
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
fn click_at(&self, x: i32, y: i32, _app_name: Option<&str>) -> Result<()> {
|
||||
use core_graphics::event::{
|
||||
CGEvent, CGEventTapLocation, CGEventType, CGMouseButton,
|
||||
};
|
||||
use core_graphics::event_source::{
|
||||
CGEventSource, CGEventSourceStateID,
|
||||
};
|
||||
use core_graphics::geometry::CGPoint;
|
||||
use core_graphics::display::CGDisplay;
|
||||
|
||||
// IMPORTANT: Coordinates passed here are in NSScreen/CGWindowListCopyWindowInfo space
|
||||
// (Y=0 at BOTTOM, increases UPWARD)
|
||||
// But CGEvent uses a different coordinate system (Y=0 at TOP, increases DOWNWARD)
|
||||
// We need to convert: CGEvent.y = screenHeight - NSScreen.y
|
||||
|
||||
let screen_height = CGDisplay::main().pixels_high() as i32;
|
||||
let cgevent_x = x;
|
||||
let cgevent_y = screen_height - y;
|
||||
|
||||
tracing::debug!("click_at: NSScreen coords ({}, {}) -> CGEvent coords ({}, {}) [screen_height={}]",
|
||||
x, y, cgevent_x, cgevent_y, screen_height);
|
||||
|
||||
let (global_x, global_y) = (cgevent_x, cgevent_y);
|
||||
|
||||
let point = CGPoint::new(global_x as f64, global_y as f64);
|
||||
|
||||
let source = CGEventSource::new(CGEventSourceStateID::HIDSystemState)
|
||||
.ok().context("Failed to create event source")?;
|
||||
|
||||
// Move mouse to position first
|
||||
let move_event = CGEvent::new_mouse_event(
|
||||
source.clone(),
|
||||
CGEventType::MouseMoved,
|
||||
point,
|
||||
CGMouseButton::Left,
|
||||
).ok().context("Failed to create mouse move event")?;
|
||||
move_event.post(CGEventTapLocation::HID);
|
||||
|
||||
std::thread::sleep(std::time::Duration::from_millis(100));
|
||||
|
||||
// Mouse down
|
||||
let mouse_down = CGEvent::new_mouse_event(
|
||||
source.clone(),
|
||||
CGEventType::LeftMouseDown,
|
||||
point,
|
||||
CGMouseButton::Left,
|
||||
).ok().context("Failed to create mouse down event")?;
|
||||
mouse_down.post(CGEventTapLocation::HID);
|
||||
|
||||
std::thread::sleep(std::time::Duration::from_millis(50));
|
||||
|
||||
// Mouse up
|
||||
let mouse_up = CGEvent::new_mouse_event(
|
||||
source,
|
||||
CGEventType::LeftMouseUp,
|
||||
point,
|
||||
CGMouseButton::Left,
|
||||
).ok().context("Failed to create mouse up event")?;
|
||||
mouse_up.post(CGEventTapLocation::HID);
|
||||
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
|
||||
impl MacOSController {
|
||||
/// Get window bounds for an application (helper method)
|
||||
fn get_window_bounds(&self, app_name: &str) -> Result<(i32, i32, i32, i32)> {
|
||||
unsafe {
|
||||
let window_list = CGWindowListCopyWindowInfo(
|
||||
kCGWindowListOptionOnScreenOnly,
|
||||
kCGNullWindowID
|
||||
);
|
||||
|
||||
let array = CFArray::<CFDictionary>::wrap_under_create_rule(window_list);
|
||||
let count = array.len();
|
||||
|
||||
let app_name_lower = app_name.to_lowercase();
|
||||
|
||||
for i in 0..count {
|
||||
let dict = array.get(i).unwrap();
|
||||
|
||||
// Get owner name
|
||||
let owner_key = CFString::from_static_string("kCGWindowOwnerName");
|
||||
let owner: String = if let Some(value) = dict.find(owner_key.to_void()) {
|
||||
let s: CFString = TCFType::wrap_under_get_rule(*value as *const _);
|
||||
s.to_string()
|
||||
} else {
|
||||
continue;
|
||||
};
|
||||
|
||||
let owner_lower = owner.to_lowercase();
|
||||
|
||||
// Normalize by removing spaces for exact matching
|
||||
let app_name_normalized = app_name_lower.replace(" ", "");
|
||||
let owner_normalized = owner_lower.replace(" ", "");
|
||||
|
||||
// ONLY accept exact matches (case-insensitive, with or without spaces)
|
||||
// This prevents "Goose" from matching "GooseStudio"
|
||||
let is_match = owner_lower == app_name_lower || owner_normalized == app_name_normalized;
|
||||
|
||||
if is_match {
|
||||
// Get window layer to filter out menu bar windows
|
||||
let layer_key = CFString::from_static_string("kCGWindowLayer");
|
||||
let layer: i32 = if let Some(value) = dict.find(layer_key.to_void()) {
|
||||
let num: core_foundation::number::CFNumber = TCFType::wrap_under_get_rule(*value as *const _);
|
||||
num.to_i32().unwrap_or(0)
|
||||
} else {
|
||||
0
|
||||
};
|
||||
|
||||
// Skip menu bar windows (layer >= 20)
|
||||
if layer >= 20 {
|
||||
tracing::debug!("Skipping window for '{}' at layer {} (menu bar)", owner, layer);
|
||||
continue;
|
||||
}
|
||||
|
||||
// Get window bounds to verify it's a real window
|
||||
let bounds_key = CFString::from_static_string("kCGWindowBounds");
|
||||
if let Some(value) = dict.find(bounds_key.to_void()) {
|
||||
let bounds_dict: CFDictionary = TCFType::wrap_under_get_rule(*value as *const _);
|
||||
|
||||
let x_key = CFString::from_static_string("X");
|
||||
let y_key = CFString::from_static_string("Y");
|
||||
let width_key = CFString::from_static_string("Width");
|
||||
let height_key = CFString::from_static_string("Height");
|
||||
|
||||
if let (Some(x_val), Some(y_val), Some(w_val), Some(h_val)) = (
|
||||
bounds_dict.find(x_key.to_void()),
|
||||
bounds_dict.find(y_key.to_void()),
|
||||
bounds_dict.find(width_key.to_void()),
|
||||
bounds_dict.find(height_key.to_void()),
|
||||
) {
|
||||
let x_num: core_foundation::number::CFNumber = TCFType::wrap_under_get_rule(*x_val as *const _);
|
||||
let y_num: core_foundation::number::CFNumber = TCFType::wrap_under_get_rule(*y_val as *const _);
|
||||
let w_num: core_foundation::number::CFNumber = TCFType::wrap_under_get_rule(*w_val as *const _);
|
||||
let h_num: core_foundation::number::CFNumber = TCFType::wrap_under_get_rule(*h_val as *const _);
|
||||
|
||||
let x: i32 = x_num.to_i64().unwrap_or(0) as i32;
|
||||
let y: i32 = y_num.to_i64().unwrap_or(0) as i32;
|
||||
let w: i32 = w_num.to_i64().unwrap_or(0) as i32;
|
||||
let h: i32 = h_num.to_i64().unwrap_or(0) as i32;
|
||||
|
||||
// Only accept windows with real bounds (>= 100x100 pixels)
|
||||
if w >= 100 && h >= 100 {
|
||||
tracing::info!("Found valid window bounds for '{}': x={}, y={}, w={}, h={} (layer={})", owner, x, y, w, h, layer);
|
||||
return Ok((x, y, w, h));
|
||||
} else {
|
||||
tracing::debug!("Skipping window for '{}': too small ({}x{})", owner, w, h);
|
||||
continue;
|
||||
}
|
||||
} else {
|
||||
continue;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
Err(anyhow::anyhow!("Could not find window bounds for '{}'", app_name))
|
||||
}
|
||||
}
|
||||
|
||||
/// Get image dimensions from a PNG file
|
||||
fn get_image_dimensions(path: &str) -> Result<(i32, i32)> {
|
||||
use std::fs::File;
|
||||
use std::io::Read;
|
||||
|
||||
let mut file = File::open(path)?;
|
||||
let mut buffer = vec![0u8; 24];
|
||||
file.read_exact(&mut buffer)?;
|
||||
|
||||
// PNG signature check
|
||||
if &buffer[0..8] != b"\x89PNG\r\n\x1a\n" {
|
||||
anyhow::bail!("Not a valid PNG file");
|
||||
}
|
||||
|
||||
// Read IHDR chunk (width and height are at bytes 16-23)
|
||||
let width = u32::from_be_bytes([buffer[16], buffer[17], buffer[18], buffer[19]]) as i32;
|
||||
let height = u32::from_be_bytes([buffer[20], buffer[21], buffer[22], buffer[23]]) as i32;
|
||||
|
||||
Ok((width, height))
|
||||
}
|
||||
|
||||
/// Transform coordinates from screenshot space to screen space
|
||||
///
|
||||
/// The screenshot is taken of a window, and Vision OCR returns coordinates
|
||||
/// relative to the screenshot image. We need to transform these to actual
|
||||
/// screen coordinates for clicking.
|
||||
///
|
||||
/// On Retina displays, screenshots are taken at 2x resolution, so we need
|
||||
/// to account for this scaling factor.
|
||||
fn transform_screenshot_to_screen_coords(
|
||||
location: TextLocation,
|
||||
window_bounds: (i32, i32, i32, i32), // (x, y, width, height) in screen space
|
||||
screenshot_dims: (i32, i32), // (width, height) in pixels
|
||||
) -> TextLocation {
|
||||
let (win_x, win_y, win_width, win_height) = window_bounds;
|
||||
let (screenshot_width, screenshot_height) = screenshot_dims;
|
||||
|
||||
// Calculate scale factors
|
||||
// On Retina displays, screenshot is typically 2x the window size
|
||||
let scale_x = win_width as f64 / screenshot_width as f64;
|
||||
let scale_y = win_height as f64 / screenshot_height as f64;
|
||||
|
||||
tracing::debug!("Transform: screenshot={}x{}, window={}x{} at ({},{}), scale=({:.2},{:.2})",
|
||||
screenshot_width, screenshot_height, win_width, win_height, win_x, win_y, scale_x, scale_y);
|
||||
|
||||
// Transform coordinates from image space to screen space
|
||||
// IMPORTANT: macOS screen coordinates have origin at BOTTOM-LEFT (Y increases upward)
|
||||
// Image coordinates have origin at TOP-LEFT (Y increases downward)
|
||||
// win_y is the BOTTOM of the window in screen coordinates
|
||||
// So we need to: (win_y + win_height) to get window TOP, then subtract screenshot_y
|
||||
let window_top_y = win_y + win_height;
|
||||
|
||||
tracing::debug!("[transform] Input location in image space: x={}, y={}, width={}, height={}",
|
||||
location.x, location.y, location.width, location.height);
|
||||
tracing::debug!("[transform] Scale factors: scale_x={:.4}, scale_y={:.4}", scale_x, scale_y);
|
||||
|
||||
let transformed_x = win_x + (location.x as f64 * scale_x) as i32;
|
||||
let transformed_y = window_top_y - (location.y as f64 * scale_y) as i32;
|
||||
let transformed_width = (location.width as f64 * scale_x) as i32;
|
||||
let transformed_height = (location.height as f64 * scale_y) as i32;
|
||||
|
||||
tracing::debug!("[transform] Calculation details:");
|
||||
tracing::debug!(" - transformed_x = {} + ({} * {:.4}) = {} + {:.2} = {}", win_x, location.x, scale_x, win_x, location.x as f64 * scale_x, transformed_x);
|
||||
tracing::debug!(" - transformed_width = ({} * {:.4}) = {:.2} -> {}", location.width, scale_x, location.width as f64 * scale_x, transformed_width);
|
||||
tracing::debug!(" - transformed_height = ({} * {:.4}) = {:.2} -> {}", location.height, scale_y, location.height as f64 * scale_y, transformed_height);
|
||||
|
||||
tracing::debug!("Transformed location: screenshot=({},{}) {}x{} -> screen=({},{}) {}x{}",
|
||||
location.x, location.y, location.width, location.height,
|
||||
transformed_x, transformed_y, transformed_width, transformed_height);
|
||||
|
||||
TextLocation {
|
||||
text: location.text,
|
||||
x: transformed_x,
|
||||
y: transformed_y,
|
||||
width: transformed_width,
|
||||
height: transformed_height,
|
||||
confidence: location.confidence,
|
||||
}
|
||||
}
|
||||
|
||||
#[path = "macos_window_matching_test.rs"]
|
||||
#[cfg(test)]
|
||||
mod tests;
|
||||
@@ -0,0 +1,45 @@
|
||||
#[cfg(test)]
|
||||
mod window_matching_tests {
|
||||
/// Test that window name matching handles spaces correctly
|
||||
///
|
||||
/// Issue: When a user requests a screenshot of "Goose Studio" but the actual
|
||||
/// application name is "GooseStudio" (no space), the fuzzy matching should
|
||||
/// still find the window.
|
||||
///
|
||||
/// The fix normalizes both names by removing spaces before comparing.
|
||||
#[test]
|
||||
fn test_space_normalization() {
|
||||
let test_cases = vec![
|
||||
// (user_input, actual_app_name, should_match)
|
||||
("Goose Studio", "GooseStudio", true),
|
||||
("GooseStudio", "Goose Studio", true),
|
||||
("Visual Studio Code", "VisualStudioCode", true),
|
||||
("Google Chrome", "Google Chrome", true),
|
||||
("Safari", "Safari", true),
|
||||
("iTerm", "iTerm2", true), // fuzzy match
|
||||
("Code", "Visual Studio Code", true), // fuzzy match
|
||||
];
|
||||
|
||||
for (user_input, app_name, should_match) in test_cases {
|
||||
let user_lower = user_input.to_lowercase();
|
||||
let app_lower = app_name.to_lowercase();
|
||||
|
||||
let user_normalized = user_lower.replace(" ", "");
|
||||
let app_normalized = app_lower.replace(" ", "");
|
||||
|
||||
let is_exact = app_lower == user_lower || app_normalized == user_normalized;
|
||||
let is_fuzzy = app_lower.contains(&user_lower)
|
||||
|| user_lower.contains(&app_lower)
|
||||
|| app_normalized.contains(&user_normalized)
|
||||
|| user_normalized.contains(&app_normalized);
|
||||
|
||||
let matches = is_exact || is_fuzzy;
|
||||
|
||||
assert_eq!(
|
||||
matches, should_match,
|
||||
"Expected '{}' vs '{}' to match={}, but got match={}",
|
||||
user_input, app_name, should_match, matches
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
8
crates/g3-computer-control/src/platform/mod.rs
Normal file
8
crates/g3-computer-control/src/platform/mod.rs
Normal file
@@ -0,0 +1,8 @@
|
||||
#[cfg(target_os = "macos")]
|
||||
pub mod macos;
|
||||
|
||||
#[cfg(target_os = "linux")]
|
||||
pub mod linux;
|
||||
|
||||
#[cfg(target_os = "windows")]
|
||||
pub mod windows;
|
||||
167
crates/g3-computer-control/src/platform/windows.rs
Normal file
167
crates/g3-computer-control/src/platform/windows.rs
Normal file
@@ -0,0 +1,167 @@
|
||||
use crate::{ComputerController, types::*};
|
||||
use anyhow::Result;
|
||||
use async_trait::async_trait;
|
||||
use tesseract::Tesseract;
|
||||
use uuid::Uuid;
|
||||
|
||||
pub struct WindowsController {
|
||||
// Placeholder for Windows-specific state
|
||||
}
|
||||
|
||||
impl WindowsController {
|
||||
pub fn new() -> Result<Self> {
|
||||
tracing::warn!("Windows computer control not fully implemented");
|
||||
Ok(Self {})
|
||||
}
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
impl ComputerController for WindowsController {
|
||||
async fn move_mouse(&self, _x: i32, _y: i32) -> Result<()> {
|
||||
anyhow::bail!("Windows implementation not yet available")
|
||||
}
|
||||
|
||||
async fn click(&self, _button: MouseButton) -> Result<()> {
|
||||
anyhow::bail!("Windows implementation not yet available")
|
||||
}
|
||||
|
||||
async fn double_click(&self, _button: MouseButton) -> Result<()> {
|
||||
anyhow::bail!("Windows implementation not yet available")
|
||||
}
|
||||
|
||||
async fn type_text(&self, _text: &str) -> Result<()> {
|
||||
anyhow::bail!("Windows implementation not yet available")
|
||||
}
|
||||
|
||||
async fn press_key(&self, _key: &str) -> Result<()> {
|
||||
anyhow::bail!("Windows implementation not yet available")
|
||||
}
|
||||
|
||||
async fn list_windows(&self) -> Result<Vec<Window>> {
|
||||
anyhow::bail!("Windows implementation not yet available")
|
||||
}
|
||||
|
||||
async fn focus_window(&self, _window_id: &str) -> Result<()> {
|
||||
anyhow::bail!("Windows implementation not yet available")
|
||||
}
|
||||
|
||||
async fn get_window_bounds(&self, _window_id: &str) -> Result<Rect> {
|
||||
anyhow::bail!("Windows implementation not yet available")
|
||||
}
|
||||
|
||||
async fn find_element(&self, _selector: &ElementSelector) -> Result<Option<UIElement>> {
|
||||
anyhow::bail!("Windows implementation not yet available")
|
||||
}
|
||||
|
||||
async fn get_element_text(&self, _element_id: &str) -> Result<String> {
|
||||
anyhow::bail!("Windows implementation not yet available")
|
||||
}
|
||||
|
||||
async fn get_element_bounds(&self, _element_id: &str) -> Result<Rect> {
|
||||
anyhow::bail!("Windows implementation not yet available")
|
||||
}
|
||||
|
||||
async fn take_screenshot(&self, _path: &str, _region: Option<Rect>, _window_id: Option<&str>) -> Result<()> {
|
||||
// Enforce that window_id must be provided
|
||||
if _window_id.is_none() {
|
||||
anyhow::bail!("window_id is required. You must specify which window to capture (e.g., 'Chrome', 'Terminal', 'Notepad'). Use list_windows to see available windows.");
|
||||
}
|
||||
|
||||
anyhow::bail!("Windows implementation not yet available")
|
||||
}
|
||||
|
||||
async fn extract_text_from_screen(&self, _region: Rect, _window_id: &str) -> Result<String> {
|
||||
anyhow::bail!("Windows implementation not yet available")
|
||||
}
|
||||
|
||||
async fn extract_text_from_image(&self, _path: &str) -> Result<OCRResult> {
|
||||
// Check if tesseract is available on the system
|
||||
let tesseract_check = std::process::Command::new("where")
|
||||
.arg("tesseract")
|
||||
.output();
|
||||
|
||||
if tesseract_check.is_err() || !tesseract_check.as_ref().unwrap().status.success() {
|
||||
anyhow::bail!("Tesseract OCR is not installed on your system.\n\n\
|
||||
To install tesseract on Windows:\n \
|
||||
1. Download the installer from: https://github.com/UB-Mannheim/tesseract/wiki\n \
|
||||
2. Run the installer and follow the instructions\n \
|
||||
3. Add tesseract to your PATH environment variable\n \
|
||||
4. Restart your terminal/command prompt\n\n\
|
||||
After installation, restart your terminal and try again.");
|
||||
}
|
||||
|
||||
// Initialize Tesseract
|
||||
let tess = Tesseract::new(None, Some("eng"))
|
||||
.map_err(|e| {
|
||||
anyhow::anyhow!("Failed to initialize Tesseract: {}\n\n\
|
||||
This usually means:\n1. Tesseract is not properly installed\n\
|
||||
2. Language data files are missing\n\nTo fix:\n \
|
||||
1. Reinstall tesseract from https://github.com/UB-Mannheim/tesseract/wiki\n \
|
||||
2. Make sure to select 'Additional language data' during installation\n \
|
||||
3. Ensure tesseract is in your PATH", e)
|
||||
})?;
|
||||
|
||||
let text = tess.set_image(_path)
|
||||
.map_err(|e| anyhow::anyhow!("Failed to load image '{}': {}", _path, e))?
|
||||
.get_text()
|
||||
.map_err(|e| anyhow::anyhow!("Failed to extract text from image: {}", e))?;
|
||||
|
||||
// Get confidence (simplified - would need more complex API calls for per-word confidence)
|
||||
let confidence = 0.85; // Placeholder
|
||||
|
||||
Ok(OCRResult {
|
||||
text,
|
||||
confidence,
|
||||
bounds: Rect { x: 0, y: 0, width: 0, height: 0 }, // Would need image dimensions
|
||||
})
|
||||
}
|
||||
|
||||
async fn find_text_on_screen(&self, _text: &str) -> Result<Option<Point>> {
|
||||
// Check if tesseract is available on the system
|
||||
let tesseract_check = std::process::Command::new("where")
|
||||
.arg("tesseract")
|
||||
.output();
|
||||
|
||||
if tesseract_check.is_err() || !tesseract_check.as_ref().unwrap().status.success() {
|
||||
anyhow::bail!("Tesseract OCR is not installed on your system.\n\n\
|
||||
To install tesseract on Windows:\n \
|
||||
1. Download the installer from: https://github.com/UB-Mannheim/tesseract/wiki\n \
|
||||
2. Run the installer and follow the instructions\n \
|
||||
3. Add tesseract to your PATH environment variable\n \
|
||||
4. Restart your terminal/command prompt\n\n\
|
||||
After installation, restart your terminal and try again.");
|
||||
}
|
||||
|
||||
// Take full screen screenshot
|
||||
let temp_path = format!("C:\\\\Temp\\\\g3_ocr_search_{}.png", uuid::Uuid::new_v4());
|
||||
self.take_screenshot(&temp_path, None, None).await?;
|
||||
|
||||
// Use Tesseract to find text with bounding boxes
|
||||
let tess = Tesseract::new(None, Some("eng"))
|
||||
.map_err(|e| {
|
||||
anyhow::anyhow!("Failed to initialize Tesseract: {}\n\n\
|
||||
This usually means:\n1. Tesseract is not properly installed\n\
|
||||
2. Language data files are missing\n\nTo fix:\n \
|
||||
1. Reinstall tesseract from https://github.com/UB-Mannheim/tesseract/wiki\n \
|
||||
2. Make sure to select 'Additional language data' during installation\n \
|
||||
3. Ensure tesseract is in your PATH", e)
|
||||
})?;
|
||||
|
||||
let full_text = tess.set_image(temp_path.as_str())
|
||||
.map_err(|e| anyhow::anyhow!("Failed to load screenshot: {}", e))?
|
||||
.get_text()
|
||||
.map_err(|e| anyhow::anyhow!("Failed to extract text from screen: {}", e))?;
|
||||
|
||||
// Clean up temp file
|
||||
let _ = std::fs::remove_file(&temp_path);
|
||||
|
||||
// Simple text search - full implementation would use get_component_images
|
||||
// to get bounding boxes for each word
|
||||
if full_text.contains(_text) {
|
||||
tracing::warn!("Text found but precise coordinates not available in simplified implementation");
|
||||
Ok(Some(Point { x: 0, y: 0 }))
|
||||
} else {
|
||||
Ok(None)
|
||||
}
|
||||
}
|
||||
}
|
||||
19
crates/g3-computer-control/src/types.rs
Normal file
19
crates/g3-computer-control/src/types.rs
Normal file
@@ -0,0 +1,19 @@
|
||||
use serde::{Deserialize, Serialize};
|
||||
|
||||
#[derive(Debug, Clone, Copy, Serialize, Deserialize)]
|
||||
pub struct Rect {
|
||||
pub x: i32,
|
||||
pub y: i32,
|
||||
pub width: i32,
|
||||
pub height: i32,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
pub struct TextLocation {
|
||||
pub text: String,
|
||||
pub x: i32,
|
||||
pub y: i32,
|
||||
pub width: i32,
|
||||
pub height: i32,
|
||||
pub confidence: f32,
|
||||
}
|
||||
111
crates/g3-computer-control/src/webdriver/mod.rs
Normal file
111
crates/g3-computer-control/src/webdriver/mod.rs
Normal file
@@ -0,0 +1,111 @@
|
||||
pub mod safari;
|
||||
|
||||
use anyhow::Result;
|
||||
use async_trait::async_trait;
|
||||
use serde_json::Value;
|
||||
|
||||
/// WebDriver controller for browser automation
|
||||
#[async_trait]
|
||||
pub trait WebDriverController: Send + Sync {
|
||||
/// Navigate to a URL
|
||||
async fn navigate(&mut self, url: &str) -> Result<()>;
|
||||
|
||||
/// Get the current URL
|
||||
async fn current_url(&self) -> Result<String>;
|
||||
|
||||
/// Get the page title
|
||||
async fn title(&self) -> Result<String>;
|
||||
|
||||
/// Find an element by CSS selector
|
||||
async fn find_element(&mut self, selector: &str) -> Result<WebElement>;
|
||||
|
||||
/// Find multiple elements by CSS selector
|
||||
async fn find_elements(&mut self, selector: &str) -> Result<Vec<WebElement>>;
|
||||
|
||||
/// Execute JavaScript in the browser
|
||||
async fn execute_script(&mut self, script: &str, args: Vec<Value>) -> Result<Value>;
|
||||
|
||||
/// Get the page source (HTML)
|
||||
async fn page_source(&self) -> Result<String>;
|
||||
|
||||
/// Take a screenshot and save to path
|
||||
async fn screenshot(&mut self, path: &str) -> Result<()>;
|
||||
|
||||
/// Close the current window/tab
|
||||
async fn close(&mut self) -> Result<()>;
|
||||
|
||||
/// Quit the browser session
|
||||
async fn quit(self) -> Result<()>;
|
||||
}
|
||||
|
||||
/// Represents a web element in the DOM
|
||||
pub struct WebElement {
|
||||
pub(crate) inner: fantoccini::elements::Element,
|
||||
}
|
||||
|
||||
impl WebElement {
|
||||
/// Click the element
|
||||
pub async fn click(&mut self) -> Result<()> {
|
||||
self.inner.click().await?;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Send keys/text to the element
|
||||
pub async fn send_keys(&mut self, text: &str) -> Result<()> {
|
||||
self.inner.send_keys(text).await?;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Clear the element's content (for input fields)
|
||||
pub async fn clear(&mut self) -> Result<()> {
|
||||
self.inner.clear().await?;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Get the element's text content
|
||||
pub async fn text(&self) -> Result<String> {
|
||||
Ok(self.inner.text().await?)
|
||||
}
|
||||
|
||||
/// Get an attribute value
|
||||
pub async fn attr(&self, name: &str) -> Result<Option<String>> {
|
||||
Ok(self.inner.attr(name).await?)
|
||||
}
|
||||
|
||||
/// Get a property value
|
||||
pub async fn prop(&self, name: &str) -> Result<Option<String>> {
|
||||
Ok(self.inner.prop(name).await?)
|
||||
}
|
||||
|
||||
/// Get the element's HTML
|
||||
pub async fn html(&self, inner: bool) -> Result<String> {
|
||||
Ok(self.inner.html(inner).await?)
|
||||
}
|
||||
|
||||
/// Check if element is displayed
|
||||
pub async fn is_displayed(&self) -> Result<bool> {
|
||||
Ok(self.inner.is_displayed().await?)
|
||||
}
|
||||
|
||||
/// Check if element is enabled
|
||||
pub async fn is_enabled(&self) -> Result<bool> {
|
||||
Ok(self.inner.is_enabled().await?)
|
||||
}
|
||||
|
||||
/// Check if element is selected (for checkboxes/radio buttons)
|
||||
pub async fn is_selected(&self) -> Result<bool> {
|
||||
Ok(self.inner.is_selected().await?)
|
||||
}
|
||||
|
||||
/// Find a child element by CSS selector
|
||||
pub async fn find_element(&mut self, selector: &str) -> Result<WebElement> {
|
||||
let elem = self.inner.find(fantoccini::Locator::Css(selector)).await?;
|
||||
Ok(WebElement { inner: elem })
|
||||
}
|
||||
|
||||
/// Find multiple child elements by CSS selector
|
||||
pub async fn find_elements(&mut self, selector: &str) -> Result<Vec<WebElement>> {
|
||||
let elems = self.inner.find_all(fantoccini::Locator::Css(selector)).await?;
|
||||
Ok(elems.into_iter().map(|inner| WebElement { inner }).collect())
|
||||
}
|
||||
}
|
||||
212
crates/g3-computer-control/src/webdriver/safari.rs
Normal file
212
crates/g3-computer-control/src/webdriver/safari.rs
Normal file
@@ -0,0 +1,212 @@
|
||||
use super::{WebDriverController, WebElement};
|
||||
use anyhow::{Context, Result};
|
||||
use async_trait::async_trait;
|
||||
use fantoccini::{Client, ClientBuilder};
|
||||
use serde_json::Value;
|
||||
use std::time::Duration;
|
||||
|
||||
/// SafariDriver WebDriver controller
|
||||
pub struct SafariDriver {
|
||||
client: Client,
|
||||
}
|
||||
|
||||
impl SafariDriver {
|
||||
/// Create a new SafariDriver instance
|
||||
///
|
||||
/// This will connect to SafariDriver running on the default port (4444).
|
||||
/// Make sure to enable "Allow Remote Automation" in Safari's Develop menu first.
|
||||
///
|
||||
/// You can start SafariDriver manually with:
|
||||
/// ```bash
|
||||
/// /usr/bin/safaridriver --enable
|
||||
/// ```
|
||||
pub async fn new() -> Result<Self> {
|
||||
Self::with_port(4444).await
|
||||
}
|
||||
|
||||
/// Create a new SafariDriver instance with a custom port
|
||||
pub async fn with_port(port: u16) -> Result<Self> {
|
||||
let url = format!("http://localhost:{}", port);
|
||||
|
||||
let mut caps = serde_json::Map::new();
|
||||
caps.insert("browserName".to_string(), Value::String("safari".to_string()));
|
||||
|
||||
let client = ClientBuilder::native()
|
||||
.capabilities(caps)
|
||||
.connect(&url)
|
||||
.await
|
||||
.context("Failed to connect to SafariDriver. Make sure SafariDriver is running and 'Allow Remote Automation' is enabled in Safari's Develop menu.")?;
|
||||
|
||||
Ok(Self { client })
|
||||
}
|
||||
|
||||
/// Go back in browser history
|
||||
pub async fn back(&mut self) -> Result<()> {
|
||||
self.client.back().await?;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Go forward in browser history
|
||||
pub async fn forward(&mut self) -> Result<()> {
|
||||
self.client.forward().await?;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Refresh the current page
|
||||
pub async fn refresh(&mut self) -> Result<()> {
|
||||
self.client.refresh().await?;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Get all window handles
|
||||
pub async fn window_handles(&mut self) -> Result<Vec<String>> {
|
||||
let handles = self.client.windows().await?;
|
||||
Ok(handles.into_iter()
|
||||
.map(|h| h.into())
|
||||
.collect())
|
||||
}
|
||||
|
||||
/// Switch to a window by handle
|
||||
pub async fn switch_to_window(&mut self, handle: &str) -> Result<()> {
|
||||
let window_handle: fantoccini::wd::WindowHandle = handle.to_string().try_into()?;
|
||||
self.client.switch_to_window(window_handle).await?;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Get the current window handle
|
||||
pub async fn current_window_handle(&mut self) -> Result<String> {
|
||||
Ok(self.client.window().await?.into())
|
||||
}
|
||||
|
||||
/// Close the current window
|
||||
pub async fn close_window(&mut self) -> Result<()> {
|
||||
self.client.close_window().await?;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Create a new window/tab
|
||||
pub async fn new_window(&mut self, is_tab: bool) -> Result<String> {
|
||||
let window_type = if is_tab { "tab" } else { "window" };
|
||||
let response = self.client.new_window(window_type == "tab").await?;
|
||||
Ok(response.handle.into())
|
||||
}
|
||||
|
||||
/// Get cookies
|
||||
pub async fn get_cookies(&mut self) -> Result<Vec<fantoccini::cookies::Cookie<'static>>> {
|
||||
Ok(self.client.get_all_cookies().await?)
|
||||
}
|
||||
|
||||
/// Add a cookie
|
||||
pub async fn add_cookie(&mut self, cookie: fantoccini::cookies::Cookie<'static>) -> Result<()> {
|
||||
self.client.add_cookie(cookie).await?;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Delete all cookies
|
||||
pub async fn delete_all_cookies(&mut self) -> Result<()> {
|
||||
self.client.delete_all_cookies().await?;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Wait for an element to appear (with timeout)
|
||||
pub async fn wait_for_element(&mut self, selector: &str, timeout: Duration) -> Result<WebElement> {
|
||||
let start = std::time::Instant::now();
|
||||
let poll_interval = Duration::from_millis(100);
|
||||
|
||||
loop {
|
||||
if let Ok(elem) = self.find_element(selector).await {
|
||||
return Ok(elem);
|
||||
}
|
||||
|
||||
if start.elapsed() >= timeout {
|
||||
anyhow::bail!("Timeout waiting for element: {}", selector);
|
||||
}
|
||||
|
||||
tokio::time::sleep(poll_interval).await;
|
||||
}
|
||||
}
|
||||
|
||||
/// Wait for an element to be visible (with timeout)
|
||||
pub async fn wait_for_visible(&mut self, selector: &str, timeout: Duration) -> Result<WebElement> {
|
||||
let start = std::time::Instant::now();
|
||||
let poll_interval = Duration::from_millis(100);
|
||||
|
||||
loop {
|
||||
if let Ok(elem) = self.find_element(selector).await {
|
||||
if elem.is_displayed().await.unwrap_or(false) {
|
||||
return Ok(elem);
|
||||
}
|
||||
}
|
||||
|
||||
if start.elapsed() >= timeout {
|
||||
anyhow::bail!("Timeout waiting for element to be visible: {}", selector);
|
||||
}
|
||||
|
||||
tokio::time::sleep(poll_interval).await;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
impl WebDriverController for SafariDriver {
|
||||
async fn navigate(&mut self, url: &str) -> Result<()> {
|
||||
self.client.goto(url).await?;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
async fn current_url(&self) -> Result<String> {
|
||||
Ok(self.client.current_url().await?.to_string())
|
||||
}
|
||||
|
||||
async fn title(&self) -> Result<String> {
|
||||
Ok(self.client.title().await?)
|
||||
}
|
||||
|
||||
async fn find_element(&mut self, selector: &str) -> Result<WebElement> {
|
||||
let elem = self.client.find(fantoccini::Locator::Css(selector)).await
|
||||
.context(format!("Failed to find element with selector: {}", selector))?;
|
||||
Ok(WebElement { inner: elem })
|
||||
}
|
||||
|
||||
async fn find_elements(&mut self, selector: &str) -> Result<Vec<WebElement>> {
|
||||
let elems = self.client.find_all(fantoccini::Locator::Css(selector)).await?;
|
||||
Ok(elems.into_iter().map(|inner| WebElement { inner }).collect())
|
||||
}
|
||||
|
||||
async fn execute_script(&mut self, script: &str, args: Vec<Value>) -> Result<Value> {
|
||||
Ok(self.client.execute(script, args).await?)
|
||||
}
|
||||
|
||||
async fn page_source(&self) -> Result<String> {
|
||||
Ok(self.client.source().await?)
|
||||
}
|
||||
|
||||
async fn screenshot(&mut self, path: &str) -> Result<()> {
|
||||
let screenshot_data = self.client.screenshot().await?;
|
||||
|
||||
// Expand tilde in path
|
||||
let expanded_path = shellexpand::tilde(path);
|
||||
let path_str = expanded_path.as_ref();
|
||||
|
||||
// Create parent directories if needed
|
||||
if let Some(parent) = std::path::Path::new(path_str).parent() {
|
||||
std::fs::create_dir_all(parent)
|
||||
.context("Failed to create parent directories for screenshot")?;
|
||||
}
|
||||
|
||||
std::fs::write(path_str, screenshot_data)
|
||||
.context("Failed to write screenshot to file")?;
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
async fn close(&mut self) -> Result<()> {
|
||||
self.client.close_window().await?;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
async fn quit(mut self) -> Result<()> {
|
||||
self.client.close().await?;
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
17
crates/g3-computer-control/tests/integration_test.rs
Normal file
17
crates/g3-computer-control/tests/integration_test.rs
Normal file
@@ -0,0 +1,17 @@
|
||||
use g3_computer_control::*;
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_screenshot() {
|
||||
let controller = create_controller().expect("Failed to create controller");
|
||||
|
||||
// Take screenshot
|
||||
let path = "/tmp/test_screenshot.png";
|
||||
let result = controller.take_screenshot(path, None, None).await;
|
||||
assert!(result.is_ok(), "Failed to take screenshot: {:?}", result.err());
|
||||
|
||||
// Verify file exists
|
||||
assert!(std::path::Path::new(path).exists(), "Screenshot file was not created");
|
||||
|
||||
// Clean up
|
||||
let _ = std::fs::remove_file(path);
|
||||
}
|
||||
24
crates/g3-computer-control/vision-bridge/Package.swift
Normal file
24
crates/g3-computer-control/vision-bridge/Package.swift
Normal file
@@ -0,0 +1,24 @@
|
||||
// swift-tools-version:5.9
|
||||
import PackageDescription
|
||||
|
||||
let package = Package(
|
||||
name: "VisionBridge",
|
||||
platforms: [
|
||||
.macOS(.v11)
|
||||
],
|
||||
products: [
|
||||
.library(
|
||||
name: "VisionBridge",
|
||||
type: .dynamic,
|
||||
targets: ["VisionBridge"]
|
||||
),
|
||||
],
|
||||
targets: [
|
||||
.target(
|
||||
name: "VisionBridge",
|
||||
dependencies: [],
|
||||
path: "Sources/VisionBridge",
|
||||
publicHeadersPath: "."
|
||||
),
|
||||
]
|
||||
)
|
||||
@@ -0,0 +1,39 @@
|
||||
#ifndef VisionBridge_h
|
||||
#define VisionBridge_h
|
||||
|
||||
#include <stdint.h>
|
||||
#include <stdbool.h>
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
// Text box structure for FFI
|
||||
typedef struct {
|
||||
const char* text;
|
||||
uint32_t text_len;
|
||||
int32_t x;
|
||||
int32_t y;
|
||||
int32_t width;
|
||||
int32_t height;
|
||||
float confidence;
|
||||
} VisionTextBox;
|
||||
|
||||
// Recognize text in an image and return bounding boxes
|
||||
// Returns true on success, false on failure
|
||||
// Caller must free the returned boxes using vision_free_boxes
|
||||
bool vision_recognize_text(
|
||||
const char* image_path,
|
||||
uint32_t image_path_len,
|
||||
VisionTextBox** out_boxes,
|
||||
uint32_t* out_count
|
||||
);
|
||||
|
||||
// Free memory allocated by vision_recognize_text
|
||||
void vision_free_boxes(VisionTextBox* boxes, uint32_t count);
|
||||
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
||||
|
||||
#endif /* VisionBridge_h */
|
||||
@@ -0,0 +1,145 @@
|
||||
import Foundation
|
||||
import Vision
|
||||
import AppKit
|
||||
import CoreGraphics
|
||||
|
||||
// MARK: - C Bridge Functions
|
||||
|
||||
@_cdecl("vision_recognize_text")
|
||||
public func vision_recognize_text(
|
||||
_ imagePath: UnsafePointer<CChar>,
|
||||
_ imagePathLen: UInt32,
|
||||
_ outBoxes: UnsafeMutablePointer<UnsafeMutableRawPointer?>,
|
||||
_ outCount: UnsafeMutablePointer<UInt32>
|
||||
) -> Bool {
|
||||
// Convert C string to Swift String
|
||||
guard let pathData = Data(bytes: imagePath, count: Int(imagePathLen)).withUnsafeBytes({
|
||||
String(bytes: $0, encoding: .utf8)
|
||||
}) else {
|
||||
return false
|
||||
}
|
||||
|
||||
let path = pathData.trimmingCharacters(in: .whitespaces)
|
||||
|
||||
// Load image
|
||||
guard let image = NSImage(contentsOfFile: path),
|
||||
let cgImage = image.cgImage(forProposedRect: nil, context: nil, hints: nil) else {
|
||||
return false
|
||||
}
|
||||
|
||||
// Perform OCR
|
||||
var textBoxes: [CTextBox] = []
|
||||
let semaphore = DispatchSemaphore(value: 0)
|
||||
var success = false
|
||||
|
||||
let request = VNRecognizeTextRequest { request, error in
|
||||
defer { semaphore.signal() }
|
||||
|
||||
if let error = error {
|
||||
print("Vision OCR error: \(error.localizedDescription)")
|
||||
return
|
||||
}
|
||||
|
||||
guard let observations = request.results as? [VNRecognizedTextObservation] else {
|
||||
return
|
||||
}
|
||||
|
||||
let imageSize = CGSize(width: cgImage.width, height: cgImage.height)
|
||||
|
||||
for observation in observations {
|
||||
guard let candidate = observation.topCandidates(1).first else { continue }
|
||||
|
||||
let text = candidate.string
|
||||
let boundingBox = observation.boundingBox
|
||||
|
||||
// Convert normalized coordinates (bottom-left origin) to pixel coordinates (top-left origin)
|
||||
let x = Int32(boundingBox.origin.x * imageSize.width)
|
||||
let y = Int32((1.0 - boundingBox.origin.y - boundingBox.height) * imageSize.height)
|
||||
let width = Int32(boundingBox.width * imageSize.width)
|
||||
let height = Int32(boundingBox.height * imageSize.height)
|
||||
|
||||
// Allocate C string for text
|
||||
let cString = strdup(text)
|
||||
|
||||
textBoxes.append(CTextBox(
|
||||
text: cString,
|
||||
text_len: UInt32(text.utf8.count),
|
||||
x: x,
|
||||
y: y,
|
||||
width: width,
|
||||
height: height,
|
||||
confidence: observation.confidence
|
||||
))
|
||||
}
|
||||
|
||||
success = true
|
||||
}
|
||||
|
||||
// Configure request for best accuracy
|
||||
request.recognitionLevel = .accurate
|
||||
request.usesLanguageCorrection = true
|
||||
request.recognitionLanguages = ["en-US"]
|
||||
|
||||
// Perform request
|
||||
let handler = VNImageRequestHandler(cgImage: cgImage, options: [:])
|
||||
do {
|
||||
try handler.perform([request])
|
||||
} catch {
|
||||
print("Vision request failed: \(error.localizedDescription)")
|
||||
return false
|
||||
}
|
||||
|
||||
// Wait for completion
|
||||
semaphore.wait()
|
||||
|
||||
if !success {
|
||||
return false
|
||||
}
|
||||
|
||||
// Allocate array for results
|
||||
let boxesPtr = UnsafeMutablePointer<CTextBox>.allocate(capacity: textBoxes.count)
|
||||
for (index, box) in textBoxes.enumerated() {
|
||||
boxesPtr[index] = box
|
||||
}
|
||||
|
||||
outBoxes.pointee = UnsafeMutableRawPointer(boxesPtr)
|
||||
outCount.pointee = UInt32(textBoxes.count)
|
||||
|
||||
return true
|
||||
}
|
||||
|
||||
@_cdecl("vision_free_boxes")
|
||||
public func vision_free_boxes(
|
||||
_ boxes: UnsafeMutableRawPointer,
|
||||
_ count: UInt32
|
||||
) {
|
||||
let typedBoxes = boxes.assumingMemoryBound(to: CTextBox.self)
|
||||
for i in 0..<Int(count) {
|
||||
if let text = typedBoxes[i].text {
|
||||
free(UnsafeMutableRawPointer(mutating: text))
|
||||
}
|
||||
}
|
||||
typedBoxes.deallocate()
|
||||
}
|
||||
|
||||
// MARK: - C-Compatible Structure
|
||||
|
||||
public struct CTextBox {
|
||||
public let text: UnsafePointer<CChar>?
|
||||
public let text_len: UInt32
|
||||
public let x: Int32
|
||||
public let y: Int32
|
||||
public let width: Int32
|
||||
public let height: Int32
|
||||
public let confidence: Float
|
||||
|
||||
public init(text: UnsafePointer<CChar>?, text_len: UInt32, x: Int32, y: Int32, width: Int32, height: Int32, confidence: Float) {
|
||||
self.text = text
|
||||
self.text_len = text_len
|
||||
self.x = x
|
||||
self.y = y
|
||||
self.width = width
|
||||
self.height = height
|
||||
self.confidence = confidence
|
||||
}
|
||||
}
|
||||
@@ -12,3 +12,6 @@ thiserror = { workspace = true }
|
||||
toml = "0.8"
|
||||
shellexpand = "3.0"
|
||||
dirs = "5.0"
|
||||
|
||||
[dev-dependencies]
|
||||
tempfile = "3.8"
|
||||
|
||||
@@ -6,6 +6,9 @@ use std::path::Path;
|
||||
pub struct Config {
|
||||
pub providers: ProvidersConfig,
|
||||
pub agent: AgentConfig,
|
||||
pub computer_control: ComputerControlConfig,
|
||||
pub webdriver: WebDriverConfig,
|
||||
pub macax: MacAxConfig,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
@@ -15,6 +18,8 @@ pub struct ProvidersConfig {
|
||||
pub databricks: Option<DatabricksConfig>,
|
||||
pub embedded: Option<EmbeddedConfig>,
|
||||
pub default_provider: String,
|
||||
pub coach: Option<String>, // Provider to use for coach in autonomous mode
|
||||
pub player: Option<String>, // Provider to use for player in autonomous mode
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
@@ -62,6 +67,51 @@ pub struct AgentConfig {
|
||||
pub timeout_seconds: u64,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
pub struct ComputerControlConfig {
|
||||
pub enabled: bool,
|
||||
pub require_confirmation: bool,
|
||||
pub max_actions_per_second: u32,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
pub struct WebDriverConfig {
|
||||
pub enabled: bool,
|
||||
pub safari_port: u16,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
pub struct MacAxConfig {
|
||||
pub enabled: bool,
|
||||
}
|
||||
|
||||
impl Default for MacAxConfig {
|
||||
fn default() -> Self {
|
||||
Self {
|
||||
enabled: false,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl Default for WebDriverConfig {
|
||||
fn default() -> Self {
|
||||
Self {
|
||||
enabled: false,
|
||||
safari_port: 4444,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl Default for ComputerControlConfig {
|
||||
fn default() -> Self {
|
||||
Self {
|
||||
enabled: false, // Disabled by default for safety
|
||||
require_confirmation: true,
|
||||
max_actions_per_second: 5,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl Default for Config {
|
||||
fn default() -> Self {
|
||||
Self {
|
||||
@@ -78,12 +128,17 @@ impl Default for Config {
|
||||
}),
|
||||
embedded: None,
|
||||
default_provider: "databricks".to_string(),
|
||||
coach: None, // Will use default_provider if not specified
|
||||
player: None, // Will use default_provider if not specified
|
||||
},
|
||||
agent: AgentConfig {
|
||||
max_context_length: 8192,
|
||||
enable_streaming: true,
|
||||
timeout_seconds: 60,
|
||||
},
|
||||
computer_control: ComputerControlConfig::default(),
|
||||
webdriver: WebDriverConfig::default(),
|
||||
macax: MacAxConfig::default(),
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -188,12 +243,17 @@ impl Config {
|
||||
threads: Some(8),
|
||||
}),
|
||||
default_provider: "embedded".to_string(),
|
||||
coach: None, // Will use default_provider if not specified
|
||||
player: None, // Will use default_provider if not specified
|
||||
},
|
||||
agent: AgentConfig {
|
||||
max_context_length: 8192,
|
||||
enable_streaming: true,
|
||||
timeout_seconds: 60,
|
||||
},
|
||||
computer_control: ComputerControlConfig::default(),
|
||||
webdriver: WebDriverConfig::default(),
|
||||
macax: MacAxConfig::default(),
|
||||
}
|
||||
}
|
||||
|
||||
@@ -262,4 +322,67 @@ impl Config {
|
||||
|
||||
Ok(config)
|
||||
}
|
||||
|
||||
/// Get the provider to use for coach mode in autonomous execution
|
||||
pub fn get_coach_provider(&self) -> &str {
|
||||
self.providers.coach
|
||||
.as_deref()
|
||||
.unwrap_or(&self.providers.default_provider)
|
||||
}
|
||||
|
||||
/// Get the provider to use for player mode in autonomous execution
|
||||
pub fn get_player_provider(&self) -> &str {
|
||||
self.providers.player
|
||||
.as_deref()
|
||||
.unwrap_or(&self.providers.default_provider)
|
||||
}
|
||||
|
||||
/// Create a copy of the config with a different default provider
|
||||
pub fn with_provider_override(&self, provider: &str) -> Result<Self> {
|
||||
// Validate that the provider is configured
|
||||
match provider {
|
||||
"anthropic" if self.providers.anthropic.is_none() => {
|
||||
return Err(anyhow::anyhow!(
|
||||
"Provider '{}' is specified but not configured. Please add {} configuration to your config file.",
|
||||
provider, provider
|
||||
));
|
||||
}
|
||||
"databricks" if self.providers.databricks.is_none() => {
|
||||
return Err(anyhow::anyhow!(
|
||||
"Provider '{}' is specified but not configured. Please add {} configuration to your config file.",
|
||||
provider, provider
|
||||
));
|
||||
}
|
||||
"embedded" if self.providers.embedded.is_none() => {
|
||||
return Err(anyhow::anyhow!(
|
||||
"Provider '{}' is specified but not configured. Please add {} configuration to your config file.",
|
||||
provider, provider
|
||||
));
|
||||
}
|
||||
"openai" if self.providers.openai.is_none() => {
|
||||
return Err(anyhow::anyhow!(
|
||||
"Provider '{}' is specified but not configured. Please add {} configuration to your config file.",
|
||||
provider, provider
|
||||
));
|
||||
}
|
||||
_ => {} // Provider is configured or unknown (will be caught later)
|
||||
}
|
||||
|
||||
let mut config = self.clone();
|
||||
config.providers.default_provider = provider.to_string();
|
||||
Ok(config)
|
||||
}
|
||||
|
||||
/// Create a copy of the config for coach mode in autonomous execution
|
||||
pub fn for_coach(&self) -> Result<Self> {
|
||||
self.with_provider_override(self.get_coach_provider())
|
||||
}
|
||||
|
||||
/// Create a copy of the config for player mode in autonomous execution
|
||||
pub fn for_player(&self) -> Result<Self> {
|
||||
self.with_provider_override(self.get_player_provider())
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests;
|
||||
|
||||
131
crates/g3-config/src/tests.rs
Normal file
131
crates/g3-config/src/tests.rs
Normal file
@@ -0,0 +1,131 @@
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use crate::Config;
|
||||
use std::fs;
|
||||
use tempfile::TempDir;
|
||||
|
||||
#[test]
|
||||
fn test_coach_player_providers() {
|
||||
// Create a temporary directory for the test config
|
||||
let temp_dir = TempDir::new().unwrap();
|
||||
let config_path = temp_dir.path().join("test_config.toml");
|
||||
|
||||
// Write a test configuration with coach and player providers
|
||||
let config_content = r#"
|
||||
[providers]
|
||||
default_provider = "databricks"
|
||||
coach = "anthropic"
|
||||
player = "embedded"
|
||||
|
||||
[providers.databricks]
|
||||
host = "https://test.databricks.com"
|
||||
token = "test-token"
|
||||
model = "test-model"
|
||||
|
||||
[providers.anthropic]
|
||||
api_key = "test-key"
|
||||
model = "claude-3"
|
||||
|
||||
[providers.embedded]
|
||||
model_path = "test.gguf"
|
||||
model_type = "llama"
|
||||
|
||||
[agent]
|
||||
max_context_length = 8192
|
||||
enable_streaming = true
|
||||
timeout_seconds = 60
|
||||
"#;
|
||||
|
||||
fs::write(&config_path, config_content).unwrap();
|
||||
|
||||
// Load the configuration
|
||||
let config = Config::load(Some(config_path.to_str().unwrap())).unwrap();
|
||||
|
||||
// Test that the providers are correctly identified
|
||||
assert_eq!(config.providers.default_provider, "databricks");
|
||||
assert_eq!(config.get_coach_provider(), "anthropic");
|
||||
assert_eq!(config.get_player_provider(), "embedded");
|
||||
|
||||
// Test creating coach config
|
||||
let coach_config = config.for_coach().unwrap();
|
||||
assert_eq!(coach_config.providers.default_provider, "anthropic");
|
||||
|
||||
// Test creating player config
|
||||
let player_config = config.for_player().unwrap();
|
||||
assert_eq!(player_config.providers.default_provider, "embedded");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_coach_player_fallback_to_default() {
|
||||
// Create a temporary directory for the test config
|
||||
let temp_dir = TempDir::new().unwrap();
|
||||
let config_path = temp_dir.path().join("test_config.toml");
|
||||
|
||||
// Write a test configuration WITHOUT coach and player providers
|
||||
let config_content = r#"
|
||||
[providers]
|
||||
default_provider = "databricks"
|
||||
|
||||
[providers.databricks]
|
||||
host = "https://test.databricks.com"
|
||||
token = "test-token"
|
||||
model = "test-model"
|
||||
|
||||
[agent]
|
||||
max_context_length = 8192
|
||||
enable_streaming = true
|
||||
timeout_seconds = 60
|
||||
"#;
|
||||
|
||||
fs::write(&config_path, config_content).unwrap();
|
||||
|
||||
// Load the configuration
|
||||
let config = Config::load(Some(config_path.to_str().unwrap())).unwrap();
|
||||
|
||||
// Test that coach and player fall back to default provider
|
||||
assert_eq!(config.get_coach_provider(), "databricks");
|
||||
assert_eq!(config.get_player_provider(), "databricks");
|
||||
|
||||
// Test creating coach config (should use default)
|
||||
let coach_config = config.for_coach().unwrap();
|
||||
assert_eq!(coach_config.providers.default_provider, "databricks");
|
||||
|
||||
// Test creating player config (should use default)
|
||||
let player_config = config.for_player().unwrap();
|
||||
assert_eq!(player_config.providers.default_provider, "databricks");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_invalid_provider_error() {
|
||||
// Create a temporary directory for the test config
|
||||
let temp_dir = TempDir::new().unwrap();
|
||||
let config_path = temp_dir.path().join("test_config.toml");
|
||||
|
||||
// Write a test configuration with an unconfigured provider
|
||||
let config_content = r#"
|
||||
[providers]
|
||||
default_provider = "databricks"
|
||||
coach = "openai" # OpenAI is not configured
|
||||
|
||||
[providers.databricks]
|
||||
host = "https://test.databricks.com"
|
||||
token = "test-token"
|
||||
model = "test-model"
|
||||
|
||||
[agent]
|
||||
max_context_length = 8192
|
||||
enable_streaming = true
|
||||
timeout_seconds = 60
|
||||
"#;
|
||||
|
||||
fs::write(&config_path, config_content).unwrap();
|
||||
|
||||
// Load the configuration
|
||||
let config = Config::load(Some(config_path.to_str().unwrap())).unwrap();
|
||||
|
||||
// Test that trying to create a coach config with unconfigured provider fails
|
||||
let result = config.for_coach();
|
||||
assert!(result.is_err());
|
||||
assert!(result.unwrap_err().to_string().contains("not configured"));
|
||||
}
|
||||
}
|
||||
@@ -8,6 +8,7 @@ description = "Core engine for G3 AI coding agent"
|
||||
g3-providers = { path = "../g3-providers" }
|
||||
g3-config = { path = "../g3-config" }
|
||||
g3-execution = { path = "../g3-execution" }
|
||||
g3-computer-control = { path = "../g3-computer-control" }
|
||||
tokio = { workspace = true }
|
||||
reqwest = { workspace = true }
|
||||
anyhow = { workspace = true }
|
||||
@@ -23,3 +24,4 @@ futures-util = "0.3"
|
||||
chrono = { version = "0.4", features = ["serde"] }
|
||||
rand = "0.8"
|
||||
regex = "1.0"
|
||||
shellexpand = "3.1"
|
||||
|
||||
@@ -156,15 +156,15 @@ pub fn fixed_filter_json_tool_calls(content: &str) -> String {
|
||||
}
|
||||
|
||||
// No JSON tool call detected, return only the new content we haven't returned yet
|
||||
let new_content = if state.buffer.len() > state.content_returned_up_to {
|
||||
|
||||
|
||||
if state.buffer.len() > state.content_returned_up_to {
|
||||
let result = state.buffer[state.content_returned_up_to..].to_string();
|
||||
state.content_returned_up_to = state.buffer.len();
|
||||
result
|
||||
} else {
|
||||
String::new()
|
||||
};
|
||||
|
||||
new_content
|
||||
}
|
||||
})
|
||||
}
|
||||
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@@ -104,6 +104,7 @@ impl Project {
|
||||
}
|
||||
|
||||
/// Recursively check a directory for implementation files
|
||||
#[allow(clippy::only_used_in_recursion)]
|
||||
fn check_dir_for_implementation_files(&self, dir: &Path) -> bool {
|
||||
// Common source file extensions
|
||||
let extensions = vec![
|
||||
|
||||
37
crates/g3-core/src/take_screenshot_test.rs
Normal file
37
crates/g3-core/src/take_screenshot_test.rs
Normal file
@@ -0,0 +1,37 @@
|
||||
// Test to verify take_screenshot requires window_id
|
||||
|
||||
#[cfg(test)]
|
||||
mod take_screenshot_tests {
|
||||
use super::*;
|
||||
use serde_json::json;
|
||||
|
||||
#[test]
|
||||
fn test_take_screenshot_requires_window_id() {
|
||||
// Create a tool call without window_id
|
||||
let tool_call = ToolCall {
|
||||
tool: "take_screenshot".to_string(),
|
||||
args: json!({
|
||||
"path": "test.png"
|
||||
}),
|
||||
};
|
||||
|
||||
// Verify that window_id is missing
|
||||
assert!(tool_call.args.get("window_id").is_none());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_take_screenshot_with_window_id() {
|
||||
// Create a tool call with window_id
|
||||
let tool_call = ToolCall {
|
||||
tool: "take_screenshot".to_string(),
|
||||
args: json!({
|
||||
"path": "test.png",
|
||||
"window_id": "Safari"
|
||||
}),
|
||||
};
|
||||
|
||||
// Verify that window_id is present
|
||||
assert!(tool_call.args.get("window_id").is_some());
|
||||
assert_eq!(tool_call.args.get("window_id").unwrap().as_str().unwrap(), "Safari");
|
||||
}
|
||||
}
|
||||
36
crates/g3-core/src/tilde_expansion_tests.rs
Normal file
36
crates/g3-core/src/tilde_expansion_tests.rs
Normal file
@@ -0,0 +1,36 @@
|
||||
#[cfg(test)]
|
||||
mod tilde_expansion_tests {
|
||||
use std::env;
|
||||
|
||||
#[test]
|
||||
fn test_tilde_expansion() {
|
||||
// Test that shellexpand works
|
||||
let path_with_tilde = "~/test.txt";
|
||||
let expanded = shellexpand::tilde(path_with_tilde);
|
||||
|
||||
// Get the actual home directory
|
||||
let home = env::var("HOME").expect("HOME environment variable not set");
|
||||
|
||||
// Verify expansion happened
|
||||
assert_eq!(expanded.as_ref(), format!("{}/test.txt", home));
|
||||
assert!(!expanded.contains("~"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_tilde_expansion_with_subdirs() {
|
||||
let path_with_tilde = "~/Documents/test.txt";
|
||||
let expanded = shellexpand::tilde(path_with_tilde);
|
||||
|
||||
let home = env::var("HOME").expect("HOME environment variable not set");
|
||||
|
||||
assert_eq!(expanded.as_ref(), format!("{}/Documents/test.txt", home));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_no_tilde_unchanged() {
|
||||
let path_without_tilde = "/absolute/path/test.txt";
|
||||
let expanded = shellexpand::tilde(path_without_tilde);
|
||||
|
||||
assert_eq!(expanded.as_ref(), path_without_tilde);
|
||||
}
|
||||
}
|
||||
@@ -17,6 +17,9 @@ pub trait UiWriter: Send + Sync {
|
||||
/// Print a context window status message
|
||||
fn print_context_status(&self, message: &str);
|
||||
|
||||
/// Print a context thinning success message with highlight and animation
|
||||
fn print_context_thinning(&self, message: &str);
|
||||
|
||||
/// Print a tool execution header
|
||||
fn print_tool_header(&self, tool_name: &str);
|
||||
|
||||
@@ -49,6 +52,10 @@ pub trait UiWriter: Send + Sync {
|
||||
|
||||
/// Flush any buffered output
|
||||
fn flush(&self);
|
||||
|
||||
/// Returns true if this UI writer wants full, untruncated output
|
||||
/// Default is false (truncate for human readability)
|
||||
fn wants_full_output(&self) -> bool { false }
|
||||
}
|
||||
|
||||
/// A no-op implementation for when UI output is not needed
|
||||
@@ -60,6 +67,7 @@ impl UiWriter for NullUiWriter {
|
||||
fn print_inline(&self, _message: &str) {}
|
||||
fn print_system_prompt(&self, _prompt: &str) {}
|
||||
fn print_context_status(&self, _message: &str) {}
|
||||
fn print_context_thinning(&self, _message: &str) {}
|
||||
fn print_tool_header(&self, _tool_name: &str) {}
|
||||
fn print_tool_arg(&self, _key: &str, _value: &str) {}
|
||||
fn print_tool_output_header(&self) {}
|
||||
@@ -71,4 +79,5 @@ impl UiWriter for NullUiWriter {
|
||||
fn print_agent_response(&self, _content: &str) {}
|
||||
fn notify_sse_received(&self) {}
|
||||
fn flush(&self) {}
|
||||
fn wants_full_output(&self) -> bool { false }
|
||||
}
|
||||
270
crates/g3-core/tests/test_context_thinning.rs
Normal file
270
crates/g3-core/tests/test_context_thinning.rs
Normal file
@@ -0,0 +1,270 @@
|
||||
use g3_core::ContextWindow;
|
||||
use g3_providers::{Message, MessageRole};
|
||||
|
||||
#[test]
|
||||
fn test_thinning_thresholds() {
|
||||
let mut context = ContextWindow::new(10000);
|
||||
|
||||
// At 0%, should not thin
|
||||
assert!(!context.should_thin());
|
||||
|
||||
// Simulate reaching 50% usage
|
||||
context.used_tokens = 5000;
|
||||
assert!(context.should_thin());
|
||||
|
||||
// After thinning at 50%, should not thin again until next threshold
|
||||
context.last_thinning_percentage = 50;
|
||||
assert!(!context.should_thin());
|
||||
|
||||
// At 60%, should thin again
|
||||
context.used_tokens = 6000;
|
||||
assert!(context.should_thin());
|
||||
|
||||
// After thinning at 60%, should not thin
|
||||
context.last_thinning_percentage = 60;
|
||||
assert!(!context.should_thin());
|
||||
|
||||
// At 70%, should thin
|
||||
context.used_tokens = 7000;
|
||||
assert!(context.should_thin());
|
||||
|
||||
// At 80%, should thin
|
||||
context.last_thinning_percentage = 70;
|
||||
context.used_tokens = 8000;
|
||||
assert!(context.should_thin());
|
||||
|
||||
// After 80%, should not thin (compaction takes over)
|
||||
context.last_thinning_percentage = 80;
|
||||
context.used_tokens = 8500;
|
||||
assert!(!context.should_thin());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_thin_context_basic() {
|
||||
let mut context = ContextWindow::new(10000);
|
||||
|
||||
// Add some messages to the first third
|
||||
for i in 0..9 {
|
||||
if i % 2 == 0 {
|
||||
context.add_message(Message {
|
||||
role: MessageRole::Assistant,
|
||||
content: format!("Assistant message {}", i),
|
||||
});
|
||||
} else {
|
||||
// Add tool results with varying sizes
|
||||
let content = if i == 1 {
|
||||
// Large tool result (> 1000 chars)
|
||||
format!("Tool result: {}", "x".repeat(1500))
|
||||
} else if i == 3 {
|
||||
// Another large tool result
|
||||
format!("Tool result: {}", "y".repeat(2000))
|
||||
} else {
|
||||
// Small tool result (< 1000 chars)
|
||||
format!("Tool result: small result {}", i)
|
||||
};
|
||||
|
||||
context.add_message(Message {
|
||||
role: MessageRole::User,
|
||||
content,
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
// Trigger thinning at 50%
|
||||
context.used_tokens = 5000;
|
||||
let (summary, _chars_saved) = context.thin_context();
|
||||
|
||||
println!("Thinning summary: {}", summary);
|
||||
|
||||
// Should have thinned at least 1 large tool result in the first third
|
||||
assert!(summary.contains("1 tool result"), "Summary was: {}", summary);
|
||||
assert!(summary.contains("50%"));
|
||||
|
||||
// Check that the large tool results were replaced
|
||||
let first_third_end = context.conversation_history.len() / 3;
|
||||
for i in 0..first_third_end {
|
||||
if let Some(msg) = context.conversation_history.get(i) {
|
||||
if matches!(msg.role, MessageRole::User) && msg.content.starts_with("Tool result:") {
|
||||
if msg.content.len() > 1000 {
|
||||
panic!("Found un-thinned large tool result at index {}", i);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_thin_write_file_tool_calls() {
|
||||
let mut context = ContextWindow::new(10000);
|
||||
|
||||
// Add some messages including a write_file tool call with large content
|
||||
context.add_message(Message {
|
||||
role: MessageRole::User,
|
||||
content: "Please create a large file".to_string(),
|
||||
});
|
||||
|
||||
// Add an assistant message with a write_file tool call containing large content
|
||||
let large_content = "x".repeat(1500);
|
||||
let tool_call_json = format!(
|
||||
r#"{{"tool": "write_file", "args": {{"file_path": "test.txt", "content": "{}"}}}}"#,
|
||||
large_content
|
||||
);
|
||||
context.add_message(Message {
|
||||
role: MessageRole::Assistant,
|
||||
content: format!("I'll create that file.\n\n{}", tool_call_json),
|
||||
});
|
||||
|
||||
context.add_message(Message {
|
||||
role: MessageRole::User,
|
||||
content: "Tool result: ✅ Successfully wrote 1500 lines".to_string(),
|
||||
});
|
||||
|
||||
// Add more messages to ensure we have enough for "first third" logic
|
||||
for i in 0..6 {
|
||||
context.add_message(Message {
|
||||
role: MessageRole::Assistant,
|
||||
content: format!("Response {}", i),
|
||||
});
|
||||
}
|
||||
|
||||
// Trigger thinning at 50%
|
||||
context.used_tokens = 5000;
|
||||
let (summary, _chars_saved) = context.thin_context();
|
||||
|
||||
println!("Thinning summary: {}", summary);
|
||||
|
||||
// Should have thinned the write_file tool call
|
||||
assert!(summary.contains("tool call") || summary.contains("chars saved"));
|
||||
|
||||
// Check that the large content was replaced with a file reference
|
||||
let first_third_end = context.conversation_history.len() / 3;
|
||||
for i in 0..first_third_end {
|
||||
if let Some(msg) = context.conversation_history.get(i) {
|
||||
if matches!(msg.role, MessageRole::Assistant) && msg.content.contains("write_file") {
|
||||
// The content should now reference an external file
|
||||
assert!(msg.content.contains("<content saved to"));
|
||||
assert!(!msg.content.contains(&large_content));
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_thin_str_replace_tool_calls() {
|
||||
let mut context = ContextWindow::new(10000);
|
||||
|
||||
// Add some messages including a str_replace tool call with large diff
|
||||
context.add_message(Message {
|
||||
role: MessageRole::User,
|
||||
content: "Please update the file".to_string(),
|
||||
});
|
||||
|
||||
// Add an assistant message with a str_replace tool call containing large diff
|
||||
let large_diff = format!("--- old\n{}\n+++ new\n{}", "-old line\n".repeat(100), "+new line\n".repeat(100));
|
||||
let tool_call_json = format!(
|
||||
r#"{{"tool": "str_replace", "args": {{"file_path": "test.txt", "diff": "{}"}}}}"#,
|
||||
large_diff.replace('\n', "\\n")
|
||||
);
|
||||
context.add_message(Message {
|
||||
role: MessageRole::Assistant,
|
||||
content: format!("I'll update that file.\n\n{}", tool_call_json),
|
||||
});
|
||||
|
||||
context.add_message(Message {
|
||||
role: MessageRole::User,
|
||||
content: "Tool result: ✅ applied unified diff".to_string(),
|
||||
});
|
||||
|
||||
// Add more messages to ensure we have enough for "first third" logic
|
||||
for i in 0..6 {
|
||||
context.add_message(Message {
|
||||
role: MessageRole::Assistant,
|
||||
content: format!("Response {}", i),
|
||||
});
|
||||
}
|
||||
|
||||
// Trigger thinning at 50%
|
||||
context.used_tokens = 5000;
|
||||
let (summary, _chars_saved) = context.thin_context();
|
||||
|
||||
println!("Thinning summary: {}", summary);
|
||||
|
||||
// Should have thinned the str_replace tool call
|
||||
assert!(summary.contains("tool call") || summary.contains("chars saved"));
|
||||
|
||||
// Check that the large diff was replaced with a file reference
|
||||
let first_third_end = context.conversation_history.len() / 3;
|
||||
for i in 0..first_third_end {
|
||||
if let Some(msg) = context.conversation_history.get(i) {
|
||||
if matches!(msg.role, MessageRole::Assistant) && msg.content.contains("str_replace") {
|
||||
// The diff should now reference an external file
|
||||
assert!(msg.content.contains("<diff saved to"));
|
||||
// Should not contain the large diff content
|
||||
assert!(!msg.content.contains("old line"));
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_thin_context_no_large_results() {
|
||||
let mut context = ContextWindow::new(10000);
|
||||
|
||||
// Add only small messages
|
||||
for i in 0..9 {
|
||||
context.add_message(Message {
|
||||
role: MessageRole::User,
|
||||
content: format!("Tool result: small {}", i),
|
||||
});
|
||||
}
|
||||
|
||||
context.used_tokens = 5000;
|
||||
let (summary, _chars_saved) = context.thin_context();
|
||||
|
||||
// Should report no large results found
|
||||
assert!(summary.contains("no large tool results or tool calls found"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_thin_context_only_affects_first_third() {
|
||||
let mut context = ContextWindow::new(10000);
|
||||
|
||||
// Add 12 messages (first third = 4 messages)
|
||||
for i in 0..12 {
|
||||
let content = if i % 2 == 1 {
|
||||
// All odd indices are large tool results
|
||||
format!("Tool result: {}", "x".repeat(1500))
|
||||
} else {
|
||||
format!("Assistant message {}", i)
|
||||
};
|
||||
|
||||
let role = if i % 2 == 1 {
|
||||
MessageRole::User
|
||||
} else {
|
||||
MessageRole::Assistant
|
||||
};
|
||||
|
||||
context.add_message(Message { role, content });
|
||||
}
|
||||
|
||||
context.used_tokens = 5000;
|
||||
let (summary, _chars_saved) = context.thin_context();
|
||||
|
||||
// First third is 4 messages (indices 0-3), so only indices 1 and 3 should be thinned
|
||||
// That's 2 tool results
|
||||
assert!(summary.contains("2 tool results"));
|
||||
|
||||
// Check that messages after the first third are NOT thinned
|
||||
let first_third_end = context.conversation_history.len() / 3;
|
||||
for i in first_third_end..context.conversation_history.len() {
|
||||
if let Some(msg) = context.conversation_history.get(i) {
|
||||
if matches!(msg.role, MessageRole::User) && msg.content.starts_with("Tool result:") {
|
||||
// These should still be large (not thinned)
|
||||
if i % 2 == 1 {
|
||||
assert!(msg.content.len() > 1000,
|
||||
"Message at index {} should not have been thinned", i);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -166,6 +166,31 @@ impl CodeExecutor {
|
||||
|
||||
/// Execute Bash code
|
||||
async fn execute_bash(&self, code: &str) -> Result<ExecutionResult> {
|
||||
// Check if this is a detached/daemon command that should run independently
|
||||
let is_detached = code.trim_start().starts_with("setsid ")
|
||||
|| code.trim_start().starts_with("nohup ")
|
||||
|| code.contains(" disown")
|
||||
|| (code.contains(" &") && (code.contains("nohup") || code.contains("setsid")));
|
||||
|
||||
if is_detached {
|
||||
// For detached commands, just spawn and return immediately
|
||||
use std::process::Stdio;
|
||||
Command::new("bash")
|
||||
.arg("-c")
|
||||
.arg(code)
|
||||
.stdin(Stdio::null())
|
||||
.stdout(Stdio::null())
|
||||
.stderr(Stdio::null())
|
||||
.spawn()?;
|
||||
|
||||
return Ok(ExecutionResult {
|
||||
stdout: "✅ Command launched in background (detached process)".to_string(),
|
||||
stderr: String::new(),
|
||||
exit_code: 0,
|
||||
success: true,
|
||||
});
|
||||
}
|
||||
|
||||
let output = Command::new("bash")
|
||||
.arg("-c")
|
||||
.arg(code)
|
||||
@@ -221,6 +246,29 @@ impl CodeExecutor {
|
||||
use tokio::io::{AsyncBufReadExt, BufReader};
|
||||
use tokio::process::Command as TokioCommand;
|
||||
|
||||
// Check if this is a detached/daemon command that should run independently
|
||||
// Look for patterns like: setsid, nohup with &, or explicit backgrounding with disown
|
||||
let is_detached = code.trim_start().starts_with("setsid ")
|
||||
|| code.trim_start().starts_with("nohup ")
|
||||
|| code.contains(" disown")
|
||||
|| (code.contains(" &") && (code.contains("nohup") || code.contains("setsid")));
|
||||
|
||||
if is_detached {
|
||||
// For detached commands, just spawn and return immediately
|
||||
TokioCommand::new("bash")
|
||||
.arg("-c")
|
||||
.arg(code)
|
||||
.spawn()?;
|
||||
|
||||
// Don't wait for the process - it's meant to run independently
|
||||
return Ok(ExecutionResult {
|
||||
stdout: "✅ Command launched in background (detached process)".to_string(),
|
||||
stderr: String::new(),
|
||||
exit_code: 0,
|
||||
success: true,
|
||||
});
|
||||
}
|
||||
|
||||
let mut child = TokioCommand::new("bash")
|
||||
.arg("-c")
|
||||
.arg(code)
|
||||
@@ -259,7 +307,7 @@ impl CodeExecutor {
|
||||
line = stderr_lines.next_line() => {
|
||||
match line {
|
||||
Ok(Some(line)) => {
|
||||
receiver.on_output_line(&format!("{}", line));
|
||||
receiver.on_output_line(&line.to_string());
|
||||
stderr_output.push(line);
|
||||
}
|
||||
Ok(None) => {}, // stderr EOF, continue
|
||||
|
||||
@@ -156,8 +156,9 @@ impl AnthropicProvider {
|
||||
.post(ANTHROPIC_API_URL)
|
||||
.header("x-api-key", &self.api_key)
|
||||
.header("anthropic-version", ANTHROPIC_VERSION)
|
||||
// Anthropic beta 1m context window. Enable if needed. It costs extra, so check first.
|
||||
// .header("anthropic-beta", "context-1m-2025-08-07")
|
||||
.header("content-type", "application/json");
|
||||
|
||||
if streaming {
|
||||
builder = builder.header("accept", "text/event-stream");
|
||||
}
|
||||
|
||||
@@ -213,7 +213,7 @@ impl DatabricksProvider {
|
||||
|
||||
let mut builder = self
|
||||
.client
|
||||
.post(&format!(
|
||||
.post(format!(
|
||||
"{}/serving-endpoints/{}/invocations",
|
||||
self.host, self.model
|
||||
))
|
||||
@@ -881,6 +881,14 @@ impl LLMProvider for DatabricksProvider {
|
||||
"Processing Databricks streaming request with {} messages",
|
||||
request.messages.len()
|
||||
);
|
||||
|
||||
// Debug: Log tool count
|
||||
if let Some(ref tools) = request.tools {
|
||||
debug!("Request has {} tools", tools.len());
|
||||
for tool in tools.iter().take(5) {
|
||||
debug!(" Tool: {}", tool.name);
|
||||
}
|
||||
}
|
||||
|
||||
let max_tokens = request.max_tokens.unwrap_or(self.max_tokens);
|
||||
let temperature = request.temperature.unwrap_or(self.temperature);
|
||||
|
||||
@@ -88,10 +88,12 @@ pub mod anthropic;
|
||||
pub mod databricks;
|
||||
pub mod embedded;
|
||||
pub mod oauth;
|
||||
pub mod openai;
|
||||
|
||||
pub use anthropic::AnthropicProvider;
|
||||
pub use databricks::DatabricksProvider;
|
||||
pub use embedded::EmbeddedProvider;
|
||||
pub use openai::OpenAIProvider;
|
||||
|
||||
/// Provider registry for managing multiple LLM providers
|
||||
pub struct ProviderRegistry {
|
||||
|
||||
@@ -102,7 +102,7 @@ async fn get_workspace_endpoints(host: &str) -> Result<OidcEndpoints> {
|
||||
if !resp.status().is_success() {
|
||||
return Err(anyhow::anyhow!(
|
||||
"Failed to get OIDC configuration from {}",
|
||||
oidc_url.to_string()
|
||||
oidc_url
|
||||
));
|
||||
}
|
||||
|
||||
|
||||
495
crates/g3-providers/src/openai.rs
Normal file
495
crates/g3-providers/src/openai.rs
Normal file
@@ -0,0 +1,495 @@
|
||||
use anyhow::Result;
|
||||
use async_trait::async_trait;
|
||||
use bytes::Bytes;
|
||||
use futures_util::stream::StreamExt;
|
||||
use reqwest::Client;
|
||||
use serde::Deserialize;
|
||||
use serde_json::json;
|
||||
use tokio::sync::mpsc;
|
||||
use tokio_stream::wrappers::ReceiverStream;
|
||||
use tracing::{debug, error};
|
||||
|
||||
use crate::{
|
||||
CompletionChunk, CompletionRequest, CompletionResponse, CompletionStream, LLMProvider,
|
||||
Message, MessageRole, Tool, ToolCall, Usage,
|
||||
};
|
||||
|
||||
#[derive(Clone)]
|
||||
pub struct OpenAIProvider {
|
||||
client: Client,
|
||||
api_key: String,
|
||||
model: String,
|
||||
base_url: String,
|
||||
max_tokens: Option<u32>,
|
||||
_temperature: Option<f32>,
|
||||
}
|
||||
|
||||
impl OpenAIProvider {
|
||||
pub fn new(
|
||||
api_key: String,
|
||||
model: Option<String>,
|
||||
base_url: Option<String>,
|
||||
max_tokens: Option<u32>,
|
||||
temperature: Option<f32>,
|
||||
) -> Result<Self> {
|
||||
Ok(Self {
|
||||
client: Client::new(),
|
||||
api_key,
|
||||
model: model.unwrap_or_else(|| "gpt-4o".to_string()),
|
||||
base_url: base_url.unwrap_or_else(|| "https://api.openai.com/v1".to_string()),
|
||||
max_tokens,
|
||||
_temperature: temperature,
|
||||
})
|
||||
}
|
||||
|
||||
fn create_request_body(
|
||||
&self,
|
||||
messages: &[Message],
|
||||
tools: Option<&[Tool]>,
|
||||
stream: bool,
|
||||
max_tokens: Option<u32>,
|
||||
_temperature: Option<f32>,
|
||||
) -> serde_json::Value {
|
||||
let mut body = json!({
|
||||
"model": self.model,
|
||||
"messages": convert_messages(messages),
|
||||
"stream": stream,
|
||||
});
|
||||
|
||||
if let Some(max_tokens) = max_tokens.or(self.max_tokens) {
|
||||
body["max_completion_tokens"] = json!(max_tokens);
|
||||
}
|
||||
|
||||
// OpenAI calls with temp setting seem to fail, so don't send one.
|
||||
// if let Some(temperature) = temperature.or(self.temperature) {
|
||||
// body["temperature"] = json!(temperature);
|
||||
// }
|
||||
|
||||
if let Some(tools) = tools {
|
||||
if !tools.is_empty() {
|
||||
body["tools"] = json!(convert_tools(tools));
|
||||
}
|
||||
}
|
||||
|
||||
if stream {
|
||||
body["stream_options"] = json!({
|
||||
"include_usage": true,
|
||||
});
|
||||
}
|
||||
|
||||
body
|
||||
}
|
||||
|
||||
async fn parse_streaming_response(
|
||||
&self,
|
||||
mut stream: impl futures_util::Stream<Item = reqwest::Result<Bytes>> + Unpin,
|
||||
tx: mpsc::Sender<Result<CompletionChunk>>,
|
||||
) -> Option<Usage> {
|
||||
let mut buffer = String::new();
|
||||
let mut accumulated_content = String::new();
|
||||
let mut accumulated_usage: Option<Usage> = None;
|
||||
let mut current_tool_calls: Vec<OpenAIStreamingToolCall> = Vec::new();
|
||||
|
||||
while let Some(chunk_result) = stream.next().await {
|
||||
match chunk_result {
|
||||
Ok(chunk) => {
|
||||
let chunk_str = match std::str::from_utf8(&chunk) {
|
||||
Ok(s) => s,
|
||||
Err(e) => {
|
||||
error!("Failed to parse chunk as UTF-8: {}", e);
|
||||
continue;
|
||||
}
|
||||
};
|
||||
|
||||
buffer.push_str(chunk_str);
|
||||
|
||||
// Process complete lines
|
||||
while let Some(line_end) = buffer.find('\n') {
|
||||
let line = buffer[..line_end].trim().to_string();
|
||||
buffer.drain(..line_end + 1);
|
||||
|
||||
if line.is_empty() {
|
||||
continue;
|
||||
}
|
||||
|
||||
// Parse Server-Sent Events format
|
||||
if let Some(data) = line.strip_prefix("data: ") {
|
||||
if data == "[DONE]" {
|
||||
debug!("Received stream completion marker");
|
||||
|
||||
// Send final chunk with accumulated content and tool calls
|
||||
if !accumulated_content.is_empty() || !current_tool_calls.is_empty() {
|
||||
let tool_calls = if current_tool_calls.is_empty() {
|
||||
None
|
||||
} else {
|
||||
Some(
|
||||
current_tool_calls
|
||||
.iter()
|
||||
.filter_map(|tc| tc.to_tool_call())
|
||||
.collect(),
|
||||
)
|
||||
};
|
||||
|
||||
let final_chunk = CompletionChunk {
|
||||
content: accumulated_content.clone(),
|
||||
finished: true,
|
||||
tool_calls,
|
||||
usage: accumulated_usage.clone(),
|
||||
};
|
||||
let _ = tx.send(Ok(final_chunk)).await;
|
||||
}
|
||||
|
||||
return accumulated_usage;
|
||||
}
|
||||
|
||||
// Parse the JSON data
|
||||
match serde_json::from_str::<OpenAIStreamChunk>(data) {
|
||||
Ok(chunk_data) => {
|
||||
// Handle content
|
||||
for choice in &chunk_data.choices {
|
||||
if let Some(content) = &choice.delta.content {
|
||||
accumulated_content.push_str(content);
|
||||
|
||||
let chunk = CompletionChunk {
|
||||
content: content.clone(),
|
||||
finished: false,
|
||||
tool_calls: None,
|
||||
usage: None,
|
||||
};
|
||||
if tx.send(Ok(chunk)).await.is_err() {
|
||||
debug!("Receiver dropped, stopping stream");
|
||||
return accumulated_usage;
|
||||
}
|
||||
}
|
||||
|
||||
// Handle tool calls
|
||||
if let Some(delta_tool_calls) = &choice.delta.tool_calls {
|
||||
for delta_tool_call in delta_tool_calls {
|
||||
if let Some(index) = delta_tool_call.index {
|
||||
// Ensure we have enough tool calls in our vector
|
||||
while current_tool_calls.len() <= index {
|
||||
current_tool_calls
|
||||
.push(OpenAIStreamingToolCall::default());
|
||||
}
|
||||
|
||||
let tool_call = &mut current_tool_calls[index];
|
||||
|
||||
if let Some(id) = &delta_tool_call.id {
|
||||
tool_call.id = Some(id.clone());
|
||||
}
|
||||
|
||||
if let Some(function) = &delta_tool_call.function {
|
||||
if let Some(name) = &function.name {
|
||||
tool_call.name = Some(name.clone());
|
||||
}
|
||||
if let Some(arguments) = &function.arguments {
|
||||
tool_call.arguments.push_str(arguments);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Handle usage
|
||||
if let Some(usage) = chunk_data.usage {
|
||||
accumulated_usage = Some(Usage {
|
||||
prompt_tokens: usage.prompt_tokens,
|
||||
completion_tokens: usage.completion_tokens,
|
||||
total_tokens: usage.total_tokens,
|
||||
});
|
||||
}
|
||||
}
|
||||
Err(e) => {
|
||||
debug!("Failed to parse stream chunk: {} - Data: {}", e, data);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
Err(e) => {
|
||||
error!("Stream error: {}", e);
|
||||
let _ = tx.send(Err(anyhow::anyhow!("Stream error: {}", e))).await;
|
||||
return accumulated_usage;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Send final chunk if we haven't already
|
||||
let tool_calls = if current_tool_calls.is_empty() {
|
||||
None
|
||||
} else {
|
||||
Some(
|
||||
current_tool_calls
|
||||
.iter()
|
||||
.filter_map(|tc| tc.to_tool_call())
|
||||
.collect(),
|
||||
)
|
||||
};
|
||||
|
||||
let final_chunk = CompletionChunk {
|
||||
content: String::new(),
|
||||
finished: true,
|
||||
tool_calls,
|
||||
usage: accumulated_usage.clone(),
|
||||
};
|
||||
let _ = tx.send(Ok(final_chunk)).await;
|
||||
|
||||
accumulated_usage
|
||||
}
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
impl LLMProvider for OpenAIProvider {
|
||||
async fn complete(&self, request: CompletionRequest) -> Result<CompletionResponse> {
|
||||
debug!(
|
||||
"Processing OpenAI completion request with {} messages",
|
||||
request.messages.len()
|
||||
);
|
||||
|
||||
let body = self.create_request_body(
|
||||
&request.messages,
|
||||
request.tools.as_deref(),
|
||||
false,
|
||||
request.max_tokens,
|
||||
request.temperature,
|
||||
);
|
||||
|
||||
debug!("Sending request to OpenAI API: model={}", self.model);
|
||||
|
||||
let response = self
|
||||
.client
|
||||
.post(format!("{}/chat/completions", self.base_url))
|
||||
.header("Authorization", format!("Bearer {}", self.api_key))
|
||||
.json(&body)
|
||||
.send()
|
||||
.await?;
|
||||
|
||||
let status = response.status();
|
||||
if !status.is_success() {
|
||||
let error_text = response
|
||||
.text()
|
||||
.await
|
||||
.unwrap_or_else(|_| "Unknown error".to_string());
|
||||
return Err(anyhow::anyhow!("OpenAI API error {}: {}", status, error_text));
|
||||
}
|
||||
|
||||
let openai_response: OpenAIResponse = response.json().await?;
|
||||
|
||||
let content = openai_response
|
||||
.choices
|
||||
.first()
|
||||
.and_then(|choice| choice.message.content.clone())
|
||||
.unwrap_or_default();
|
||||
|
||||
let usage = Usage {
|
||||
prompt_tokens: openai_response.usage.prompt_tokens,
|
||||
completion_tokens: openai_response.usage.completion_tokens,
|
||||
total_tokens: openai_response.usage.total_tokens,
|
||||
};
|
||||
|
||||
debug!(
|
||||
"OpenAI completion successful: {} tokens generated",
|
||||
usage.completion_tokens
|
||||
);
|
||||
|
||||
Ok(CompletionResponse {
|
||||
content,
|
||||
usage,
|
||||
model: self.model.clone(),
|
||||
})
|
||||
}
|
||||
|
||||
async fn stream(&self, request: CompletionRequest) -> Result<CompletionStream> {
|
||||
debug!(
|
||||
"Processing OpenAI streaming request with {} messages",
|
||||
request.messages.len()
|
||||
);
|
||||
|
||||
let body = self.create_request_body(
|
||||
&request.messages,
|
||||
request.tools.as_deref(),
|
||||
true,
|
||||
request.max_tokens,
|
||||
request.temperature,
|
||||
);
|
||||
|
||||
debug!("Sending streaming request to OpenAI API: model={}", self.model);
|
||||
|
||||
let response = self
|
||||
.client
|
||||
.post(format!("{}/chat/completions", self.base_url))
|
||||
.header("Authorization", format!("Bearer {}", self.api_key))
|
||||
.json(&body)
|
||||
.send()
|
||||
.await?;
|
||||
|
||||
let status = response.status();
|
||||
if !status.is_success() {
|
||||
let error_text = response
|
||||
.text()
|
||||
.await
|
||||
.unwrap_or_else(|_| "Unknown error".to_string());
|
||||
return Err(anyhow::anyhow!("OpenAI API error {}: {}", status, error_text));
|
||||
}
|
||||
|
||||
let stream = response.bytes_stream();
|
||||
let (tx, rx) = mpsc::channel(100);
|
||||
|
||||
// Spawn task to process the stream
|
||||
let provider = self.clone();
|
||||
tokio::spawn(async move {
|
||||
let usage = provider.parse_streaming_response(stream, tx).await;
|
||||
// Log the final usage if available
|
||||
if let Some(usage) = usage {
|
||||
debug!(
|
||||
"Stream completed with usage - prompt: {}, completion: {}, total: {}",
|
||||
usage.prompt_tokens, usage.completion_tokens, usage.total_tokens
|
||||
);
|
||||
}
|
||||
});
|
||||
|
||||
Ok(ReceiverStream::new(rx))
|
||||
}
|
||||
|
||||
fn name(&self) -> &str {
|
||||
"openai"
|
||||
}
|
||||
|
||||
fn model(&self) -> &str {
|
||||
&self.model
|
||||
}
|
||||
|
||||
fn has_native_tool_calling(&self) -> bool {
|
||||
// OpenAI models support native tool calling
|
||||
true
|
||||
}
|
||||
}
|
||||
|
||||
fn convert_messages(messages: &[Message]) -> Vec<serde_json::Value> {
|
||||
messages
|
||||
.iter()
|
||||
.map(|msg| {
|
||||
json!({
|
||||
"role": match msg.role {
|
||||
MessageRole::System => "system",
|
||||
MessageRole::User => "user",
|
||||
MessageRole::Assistant => "assistant",
|
||||
},
|
||||
"content": msg.content,
|
||||
})
|
||||
})
|
||||
.collect()
|
||||
}
|
||||
|
||||
fn convert_tools(tools: &[Tool]) -> Vec<serde_json::Value> {
|
||||
tools
|
||||
.iter()
|
||||
.map(|tool| {
|
||||
json!({
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": tool.name,
|
||||
"description": tool.description,
|
||||
"parameters": tool.input_schema,
|
||||
}
|
||||
})
|
||||
})
|
||||
.collect()
|
||||
}
|
||||
|
||||
// OpenAI API response structures
|
||||
#[derive(Debug, Deserialize)]
|
||||
struct OpenAIResponse {
|
||||
choices: Vec<OpenAIChoice>,
|
||||
usage: OpenAIUsage,
|
||||
}
|
||||
|
||||
#[derive(Debug, Deserialize)]
|
||||
struct OpenAIChoice {
|
||||
message: OpenAIMessage,
|
||||
}
|
||||
|
||||
#[allow(dead_code)]
|
||||
#[derive(Debug, Deserialize)]
|
||||
struct OpenAIMessage {
|
||||
content: Option<String>,
|
||||
#[serde(default)]
|
||||
tool_calls: Option<Vec<OpenAIToolCall>>,
|
||||
}
|
||||
|
||||
#[allow(dead_code)]
|
||||
#[derive(Debug, Deserialize)]
|
||||
struct OpenAIToolCall {
|
||||
id: String,
|
||||
function: OpenAIFunction,
|
||||
}
|
||||
|
||||
#[allow(dead_code)]
|
||||
#[derive(Debug, Deserialize)]
|
||||
struct OpenAIFunction {
|
||||
name: String,
|
||||
arguments: String,
|
||||
}
|
||||
|
||||
// Streaming tool call accumulator
|
||||
#[derive(Debug, Default)]
|
||||
struct OpenAIStreamingToolCall {
|
||||
id: Option<String>,
|
||||
name: Option<String>,
|
||||
arguments: String,
|
||||
}
|
||||
|
||||
impl OpenAIStreamingToolCall {
|
||||
fn to_tool_call(&self) -> Option<ToolCall> {
|
||||
let id = self.id.as_ref()?;
|
||||
let name = self.name.as_ref()?;
|
||||
|
||||
let args = serde_json::from_str(&self.arguments).unwrap_or(serde_json::Value::Null);
|
||||
|
||||
Some(ToolCall {
|
||||
id: id.clone(),
|
||||
tool: name.clone(),
|
||||
args,
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Debug, Deserialize)]
|
||||
struct OpenAIUsage {
|
||||
prompt_tokens: u32,
|
||||
completion_tokens: u32,
|
||||
total_tokens: u32,
|
||||
}
|
||||
|
||||
// Streaming response structures
|
||||
#[derive(Debug, Deserialize)]
|
||||
struct OpenAIStreamChunk {
|
||||
choices: Vec<OpenAIStreamChoice>,
|
||||
usage: Option<OpenAIUsage>,
|
||||
}
|
||||
|
||||
#[derive(Debug, Deserialize)]
|
||||
struct OpenAIStreamChoice {
|
||||
delta: OpenAIDelta,
|
||||
}
|
||||
|
||||
#[derive(Debug, Deserialize)]
|
||||
struct OpenAIDelta {
|
||||
content: Option<String>,
|
||||
#[serde(default)]
|
||||
tool_calls: Option<Vec<OpenAIDeltaToolCall>>,
|
||||
}
|
||||
|
||||
#[derive(Debug, Deserialize)]
|
||||
struct OpenAIDeltaToolCall {
|
||||
index: Option<usize>,
|
||||
id: Option<String>,
|
||||
function: Option<OpenAIDeltaFunction>,
|
||||
}
|
||||
|
||||
#[derive(Debug, Deserialize)]
|
||||
struct OpenAIDeltaFunction {
|
||||
name: Option<String>,
|
||||
arguments: Option<String>,
|
||||
}
|
||||
389
docs/ACCUMULATIVE_MODE.md
Normal file
389
docs/ACCUMULATIVE_MODE.md
Normal file
@@ -0,0 +1,389 @@
|
||||
# Accumulative Autonomous Mode
|
||||
|
||||
## Overview
|
||||
|
||||
Accumulative Autonomous Mode is the **new default interactive mode** for G3. It combines the ease of interactive chat with the power of autonomous implementation, allowing you to build projects iteratively by describing what you want, one requirement at a time.
|
||||
|
||||
## How It Works
|
||||
|
||||
### The Flow
|
||||
|
||||
1. **Start G3** in any directory (no arguments needed)
|
||||
2. **Describe** what you want to build
|
||||
3. **G3 automatically**:
|
||||
- Adds your input to accumulated requirements
|
||||
- Runs autonomous mode (coach-player feedback loop)
|
||||
- Implements your requirements with quality checks
|
||||
4. **Continue** adding more requirements or refinements
|
||||
5. **Repeat** until your project is complete
|
||||
|
||||
### Example Session
|
||||
|
||||
```bash
|
||||
$ cd ~/projects/my-new-app
|
||||
$ g3
|
||||
|
||||
🪿 G3 AI Coding Agent - Accumulative Mode
|
||||
>> describe what you want, I'll build it iteratively
|
||||
|
||||
📁 Workspace: /Users/you/projects/my-new-app
|
||||
|
||||
💡 Each input you provide will be added to requirements
|
||||
and I'll automatically work on implementing them.
|
||||
|
||||
Type 'exit' or 'quit' to stop, Ctrl+D to finish
|
||||
|
||||
============================================================
|
||||
📝 What would you like me to build? (describe your requirements)
|
||||
============================================================
|
||||
requirement> create a simple web server in Python with Flask that serves a homepage
|
||||
|
||||
📋 Current instructions and requirements (Turn 1):
|
||||
create a simple web server in Python with Flask that serves a homepage
|
||||
|
||||
🚀 Starting autonomous implementation...
|
||||
|
||||
🤖 G3 AI Coding Agent - Autonomous Mode
|
||||
📁 Using workspace: /Users/you/projects/my-new-app
|
||||
📋 Requirements loaded from --requirements flag
|
||||
🔄 Starting coach-player feedback loop...
|
||||
📂 No existing implementation files detected
|
||||
🎯 Starting with player implementation
|
||||
|
||||
=== TURN 1/5 - PLAYER MODE ===
|
||||
🎯 Starting player implementation...
|
||||
📋 Player starting initial implementation (no prior coach feedback)
|
||||
|
||||
[Player creates files, writes code...]
|
||||
|
||||
=== TURN 1/5 - COACH MODE ===
|
||||
🎓 Starting coach review...
|
||||
🎓 Coach review completed
|
||||
Coach feedback:
|
||||
The Flask server is implemented correctly with a homepage route.
|
||||
The code follows best practices and meets the requirements.
|
||||
IMPLEMENTATION_APPROVED
|
||||
|
||||
=== SESSION COMPLETED - IMPLEMENTATION APPROVED ===
|
||||
✅ Coach approved the implementation!
|
||||
|
||||
============================================================
|
||||
📊 AUTONOMOUS MODE SESSION REPORT
|
||||
============================================================
|
||||
⏱️ Total Duration: 12.34s
|
||||
🔄 Turns Taken: 1/5
|
||||
📝 Final Status: ✅ APPROVED
|
||||
...
|
||||
============================================================
|
||||
|
||||
✅ Autonomous run completed
|
||||
|
||||
============================================================
|
||||
📝 Turn 2 - What's next? (add more requirements or refinements)
|
||||
============================================================
|
||||
requirement> add a /api/users endpoint that returns a list of users as JSON
|
||||
|
||||
📋 Current instructions and requirements (Turn 2):
|
||||
add a /api/users endpoint that returns a list of users as JSON
|
||||
|
||||
🚀 Starting autonomous implementation...
|
||||
|
||||
[Autonomous mode runs again with BOTH requirements...]
|
||||
|
||||
============================================================
|
||||
📝 Turn 3 - What's next? (add more requirements or refinements)
|
||||
============================================================
|
||||
requirement> exit
|
||||
|
||||
👋 Goodbye!
|
||||
```
|
||||
|
||||
## Key Features
|
||||
|
||||
### 1. Requirement Accumulation
|
||||
|
||||
Each input you provide is:
|
||||
- **Numbered sequentially** (1, 2, 3, ...)
|
||||
- **Stored in memory** for the session
|
||||
- **Included in all subsequent runs**
|
||||
|
||||
This means the agent always has the full context of what you've asked for.
|
||||
|
||||
### 2. Automatic Requirements Document
|
||||
|
||||
G3 automatically generates a structured requirements document:
|
||||
|
||||
```markdown
|
||||
# Project Requirements
|
||||
|
||||
## Current Instructions and Requirements:
|
||||
|
||||
1. create a simple web server in Python with Flask that serves a homepage
|
||||
2. add a /api/users endpoint that returns a list of users as JSON
|
||||
3. add error handling for 404 and 500 errors
|
||||
|
||||
## Latest Requirement (Turn 3):
|
||||
|
||||
add error handling for 404 and 500 errors
|
||||
```
|
||||
|
||||
This document is passed to autonomous mode, ensuring the agent:
|
||||
- Knows all previous requirements
|
||||
- Focuses on the latest addition
|
||||
- Maintains consistency across iterations
|
||||
|
||||
### 3. Full Autonomous Quality
|
||||
|
||||
Each requirement triggers a complete autonomous run with:
|
||||
- **Coach-Player Feedback Loop**: Quality assurance built-in
|
||||
- **Multiple Turns**: Up to 5 iterations per requirement (configurable with `--max-turns`)
|
||||
- **Compilation Checks**: Ensures code actually works
|
||||
- **Testing**: Coach can run tests to verify functionality
|
||||
|
||||
### 4. Error Recovery
|
||||
|
||||
If an autonomous run fails:
|
||||
- You're notified of the error
|
||||
- You can provide additional requirements to fix issues
|
||||
- The session continues (doesn't crash)
|
||||
|
||||
### 5. Workspace Management
|
||||
|
||||
- Uses **current directory** as workspace
|
||||
- All files created in current directory
|
||||
- No need to specify workspace path
|
||||
- Works with existing projects or empty directories
|
||||
|
||||
## Command-Line Options
|
||||
|
||||
### Default (Accumulative Mode)
|
||||
|
||||
```bash
|
||||
g3
|
||||
```
|
||||
|
||||
Starts accumulative autonomous mode in the current directory.
|
||||
|
||||
### With Options
|
||||
|
||||
```bash
|
||||
# Use a specific workspace
|
||||
g3 --workspace ~/projects/my-app
|
||||
|
||||
# Limit autonomous turns per requirement
|
||||
g3 --max-turns 3
|
||||
|
||||
# Enable macOS Accessibility tools
|
||||
g3 --macax
|
||||
|
||||
# Enable WebDriver browser automation
|
||||
g3 --webdriver
|
||||
|
||||
# Use a specific provider/model
|
||||
g3 --provider anthropic --model claude-3-5-sonnet-20241022
|
||||
|
||||
# Show prompts and code during execution
|
||||
g3 --show-prompt --show-code
|
||||
|
||||
# Disable log files
|
||||
g3 --quiet
|
||||
```
|
||||
|
||||
### Disable Accumulative Mode
|
||||
|
||||
To use the traditional chat mode (without automatic autonomous runs):
|
||||
|
||||
```bash
|
||||
g3 --chat
|
||||
|
||||
# Alternative: legacy flag also works
|
||||
g3 --accumulative
|
||||
```
|
||||
|
||||
This gives you the old behavior where you chat with the agent without automatic autonomous runs.
|
||||
|
||||
## Use Cases
|
||||
|
||||
### 1. Rapid Prototyping
|
||||
|
||||
```bash
|
||||
requirement> create a REST API for a todo app
|
||||
requirement> add SQLite database storage
|
||||
requirement> add authentication with JWT
|
||||
requirement> add rate limiting
|
||||
```
|
||||
|
||||
### 2. Iterative Refinement
|
||||
|
||||
```bash
|
||||
requirement> create a data visualization dashboard
|
||||
requirement> make the charts interactive
|
||||
requirement> add dark mode support
|
||||
requirement> optimize for mobile devices
|
||||
```
|
||||
|
||||
### 3. Bug Fixing
|
||||
|
||||
```bash
|
||||
requirement> fix the login form validation
|
||||
requirement> handle edge case when username is empty
|
||||
requirement> add better error messages
|
||||
```
|
||||
|
||||
### 4. Feature Addition
|
||||
|
||||
```bash
|
||||
requirement> add export to CSV functionality
|
||||
requirement> add email notifications
|
||||
requirement> add admin dashboard
|
||||
```
|
||||
|
||||
## Tips and Best Practices
|
||||
|
||||
### 1. Start Simple
|
||||
|
||||
Begin with a basic requirement, let it be implemented, then add complexity:
|
||||
|
||||
```bash
|
||||
✅ Good:
|
||||
requirement> create a basic Flask web server
|
||||
requirement> add a homepage with a form
|
||||
requirement> add form validation
|
||||
|
||||
❌ Too Complex:
|
||||
requirement> create a full-stack web app with authentication, database, API, and frontend
|
||||
```
|
||||
|
||||
### 2. Be Specific
|
||||
|
||||
The more specific you are, the better the results:
|
||||
|
||||
```bash
|
||||
✅ Good:
|
||||
requirement> add a /api/users endpoint that returns JSON with id, name, and email fields
|
||||
|
||||
❌ Vague:
|
||||
requirement> add users
|
||||
```
|
||||
|
||||
### 3. One Thing at a Time
|
||||
|
||||
Focus each requirement on a single feature or fix:
|
||||
|
||||
```bash
|
||||
✅ Good:
|
||||
requirement> add error handling for database connections
|
||||
requirement> add logging for all API requests
|
||||
|
||||
❌ Multiple Things:
|
||||
requirement> add error handling and logging and monitoring and alerts
|
||||
```
|
||||
|
||||
### 4. Review Between Turns
|
||||
|
||||
After each autonomous run completes:
|
||||
- Check the generated files
|
||||
- Test the functionality
|
||||
- Decide what to add or fix next
|
||||
|
||||
### 5. Use Exit Commands
|
||||
|
||||
When done:
|
||||
- Type `exit` or `quit`
|
||||
- Press `Ctrl+D` (EOF)
|
||||
- Press `Ctrl+C` to cancel current input
|
||||
|
||||
## Comparison with Other Modes
|
||||
|
||||
| Feature | Accumulative (Default) | Traditional Interactive | Autonomous | Single-Shot |
|
||||
|---------|----------------------|------------------------|------------|-------------|
|
||||
| **Command** | `g3` | `g3 --accumulative` | `g3 --autonomous` | `g3 "task"` |
|
||||
| **Input Style** | Iterative prompts | Chat messages | requirements.md file | Command-line arg |
|
||||
| **Auto-Autonomous** | ✅ Yes | ❌ No | ✅ Yes | ❌ No |
|
||||
| **Coach-Player Loop** | ✅ Yes | ❌ No | ✅ Yes | ❌ No |
|
||||
| **Accumulates Requirements** | ✅ Yes | ❌ No | ❌ No | ❌ No |
|
||||
| **Multiple Iterations** | ✅ Yes | ✅ Yes | ✅ Yes | ❌ No |
|
||||
| **Best For** | Iterative development | Quick questions | Pre-planned projects | One-off tasks |
|
||||
|
||||
## Technical Details
|
||||
|
||||
### Requirements Storage
|
||||
|
||||
- Stored in memory (not persisted to disk)
|
||||
- Numbered sequentially starting from 1
|
||||
- Formatted as markdown list
|
||||
- Passed to autonomous mode as `--requirements` override
|
||||
|
||||
### History
|
||||
|
||||
- Saved to `~/.g3_accumulative_history`
|
||||
- Separate from traditional interactive history
|
||||
- Persists across sessions
|
||||
- Uses rustyline for readline support
|
||||
|
||||
### Workspace
|
||||
|
||||
- Defaults to current directory
|
||||
- Can be overridden with `--workspace`
|
||||
- All files created in workspace
|
||||
- Logs saved to `workspace/logs/`
|
||||
|
||||
### Autonomous Execution
|
||||
|
||||
- Full coach-player feedback loop
|
||||
- Configurable max turns (default: 5)
|
||||
- Respects all CLI flags (--macax, --webdriver, etc.)
|
||||
- Error handling allows continuation
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### "No requirements provided"
|
||||
|
||||
This shouldn't happen in accumulative mode, but if it does:
|
||||
- Check that you entered a requirement
|
||||
- Ensure the requirement isn't empty
|
||||
- Try restarting G3
|
||||
|
||||
### "Autonomous run failed"
|
||||
|
||||
If an autonomous run fails:
|
||||
- Read the error message
|
||||
- Provide a new requirement to fix the issue
|
||||
- Or type `exit` and investigate manually
|
||||
|
||||
### "Context window full"
|
||||
|
||||
If you hit token limits:
|
||||
- The agent will auto-summarize
|
||||
- Or you can start a new session
|
||||
- Consider using `--max-turns` to limit iterations
|
||||
|
||||
### "Coach never approves"
|
||||
|
||||
If the coach keeps rejecting:
|
||||
- Check the coach feedback for specific issues
|
||||
- Provide more specific requirements
|
||||
- Consider increasing `--max-turns`
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
Planned improvements:
|
||||
|
||||
1. **Persistence**: Save accumulated requirements to disk
|
||||
2. **Editing**: Edit or remove previous requirements
|
||||
3. **Branching**: Try different approaches
|
||||
4. **Templates**: Pre-defined requirement sets
|
||||
5. **Review**: Show all accumulated requirements
|
||||
6. **Export**: Save to requirements.md
|
||||
7. **Undo**: Remove last requirement
|
||||
8. **Replay**: Re-run with same requirements
|
||||
|
||||
## Feedback
|
||||
|
||||
This is a new feature! Please provide feedback:
|
||||
- What works well?
|
||||
- What's confusing?
|
||||
- What features would you like?
|
||||
- Any bugs or issues?
|
||||
|
||||
Open an issue on GitHub or contribute improvements!
|
||||
39
test-ai-requirements.sh
Executable file
39
test-ai-requirements.sh
Executable file
@@ -0,0 +1,39 @@
|
||||
#!/bin/bash
|
||||
# Test script for AI-enhanced interactive requirements mode
|
||||
|
||||
echo "Testing AI-enhanced interactive requirements mode..."
|
||||
echo ""
|
||||
|
||||
# Create a test workspace
|
||||
TEST_WORKSPACE="/tmp/g3-test-interactive-$(date +%s)"
|
||||
mkdir -p "$TEST_WORKSPACE"
|
||||
|
||||
echo "Test workspace: $TEST_WORKSPACE"
|
||||
echo ""
|
||||
|
||||
# Create sample brief input
|
||||
BRIEF_INPUT="build a calculator cli in rust with basic operations"
|
||||
|
||||
echo "Brief input:"
|
||||
echo "---"
|
||||
echo "$BRIEF_INPUT"
|
||||
echo "---"
|
||||
echo ""
|
||||
|
||||
echo "This will:"
|
||||
echo "1. Send brief input to AI"
|
||||
echo "2. AI generates structured requirements.md"
|
||||
echo "3. Show enhanced requirements"
|
||||
echo "4. Prompt for confirmation (y/e/n)"
|
||||
echo ""
|
||||
|
||||
echo "To test manually, run:"
|
||||
echo "cargo run -- --autonomous --interactive-requirements --workspace $TEST_WORKSPACE"
|
||||
echo ""
|
||||
echo "Then type: $BRIEF_INPUT"
|
||||
echo "Press Ctrl+D"
|
||||
echo "Review the AI-generated requirements"
|
||||
echo "Choose 'y' to proceed, 'e' to edit, or 'n' to cancel"
|
||||
echo ""
|
||||
|
||||
echo "Test workspace will be at: $TEST_WORKSPACE"
|
||||
Reference in New Issue
Block a user