g3/AGENTS.md

# AGENTS.md - Machine Instructions for G3

**Last updated**: January 2025
**Purpose**: Enable AI agents to work safely and effectively with this codebase

## System Overview

G3 is an AI coding agent built in Rust. It uses LLM providers to execute tasks through a tool-based interface. The codebase is organized as a Cargo workspace with 9 crates.

### Quick Reference

| Crate | Purpose | Stability |
|-------|---------|----------|
| `g3-core` | Agent engine, tools, context management | Stable |
| `g3-providers` | LLM provider abstractions | Stable |
| `g3-cli` | Command-line interface | Stable |
| `g3-config` | Configuration management | Stable |
| `g3-execution` | Code execution | Stable |
| `g3-computer-control` | Computer automation | Experimental |
| `g3-planner` | Planning mode | Stable |
| `g3-ensembles` | Multi-agent (flock) mode | Experimental |
| `g3-console` | Web monitoring console | Experimental |

## Critical Invariants

### MUST Hold

1. **Tool calls must be valid JSON** - The streaming parser expects well-formed tool calls
2. **Context window limits must be respected** - Exceeding limits causes API errors
3. **Provider trait implementations must be Send + Sync** - Required for async runtime
4. **Session IDs must be unique** - Used for log file paths and TODO scoping
5. **File paths in tools support tilde expansion** - `~` expands to home directory

### MUST NOT Do

1. **Never block the async runtime** - Use `tokio::spawn` for CPU-intensive work
2. **Never store secrets in logs** - API keys are redacted in error logs
3. **Never modify files outside working directory without explicit permission**
4. **Never assume tool results fit in context** - Large results are thinned automatically

## Recommended Entry Points

### For Understanding the System

1. `src/main.rs` - Entry point (trivial)
2. `crates/g3-cli/src/lib.rs` - CLI logic and execution modes
3. `crates/g3-core/src/lib.rs` - Agent struct and orchestration
4. `crates/g3-providers/src/lib.rs` - Provider trait definition

### For Adding Features

1. **New tool**: `crates/g3-core/src/tool_definitions.rs` → `crates/g3-core/src/tools/`
2. **New provider**: `crates/g3-providers/src/` → implement `LLMProvider` trait
3. **New CLI mode**: `crates/g3-cli/src/lib.rs`
4. **New config option**: `crates/g3-config/src/lib.rs`

### For Debugging

1. Session logs: `.g3/sessions/<session_id>/session.json`
2. Error logs: `logs/errors/`
3. Context state: Use `/stats` command in interactive mode

## Dangerous/Subtle Code Paths

### Context Window Management (`g3-core/src/context_window.rs`)

- **Thinning**: Automatically replaces large tool results with file references
- **Summarization**: Compresses conversation history at 80% capacity
- **Token estimation**: Uses character-based heuristics, not exact tokenization
- **Risk**: Incorrect token estimates can cause context overflow

### Streaming Parser (`g3-core/src/streaming_parser.rs`)

- Parses LLM responses in real-time for tool calls
- Must handle partial JSON across chunk boundaries
- **Risk**: Malformed responses can cause parsing failures

### Tool Dispatch (`g3-core/src/tool_dispatch.rs`)

- Routes tool calls to implementations
- Handles both native and JSON-based tool calling
- **Risk**: Missing dispatch cases cause silent failures

### Retry Logic (`g3-core/src/retry.rs`)

- Exponential backoff with jitter
- Different configs for interactive vs autonomous mode
- **Risk**: Aggressive retries can hit rate limits harder

## Performance Constraints

1. **Streaming is preferred** - Non-streaming requests block UI
2. **Tool results are size-limited** - Large outputs are truncated or thinned
3. **Concurrent tool calls** - Enabled by `allow_multiple_tool_calls` config
4. **Background processes** - Long-running commands use `background_process` tool

## Testing Strategy

### Test Locations

- Unit tests: `crates/*/tests/`
- Integration tests: `crates/*/tests/`
- Test fixtures: `examples/test_code/`

### Running Tests

```bash
# All tests
cargo test

# Specific crate
cargo test -p g3-core

# With output
cargo test -- --nocapture
```

### Test Considerations

- Provider tests may require API keys
- Computer control tests require OS permissions
- WebDriver tests require browser setup

## Do's and Don'ts for Automated Changes

### Do

- ✅ Run `cargo check` after modifications
- ✅ Run `cargo test` before committing
- ✅ Update tool definitions when adding tools
- ✅ Add tests for new functionality
- ✅ Use existing patterns for similar features
- ✅ Keep functions under 80 lines
- ✅ Update documentation for user-facing changes

### Don't

- ❌ Modify `Cargo.toml` dependencies without justification
- ❌ Add blocking code in async contexts
- ❌ Store sensitive data in plain text
- ❌ Ignore error handling
- ❌ Create deeply nested conditionals (>6 levels)
- ❌ Add external dependencies for simple tasks

## Common Incorrect Assumptions

1. **"All providers support tool calling"** - Embedded models use JSON fallback
2. **"Context window is unlimited"** - Each provider has limits (4k-200k tokens)
3. **"Tool results are always small"** - File reads can return megabytes
4. **"Sessions persist across runs"** - Sessions are ephemeral by default
5. **"All platforms are equal"** - macOS has more features (Vision, Accessibility)

## Architecture Decisions

See `DESIGN.md` for original design rationale.

Key decisions:
- **Rust for performance and safety** - Async runtime, memory safety
- **Workspace structure** - Separation of concerns, independent compilation
- **Provider abstraction** - Swap providers without code changes
- **Tool-first philosophy** - Agent acts through tools, not just advice
- **Session-scoped state** - TODO lists, logs tied to sessions

## File Structure Quick Reference

```
g3/
├── src/main.rs                    # Entry point
├── crates/
│   ├── g3-cli/src/
│   │   ├── lib.rs                 # CLI logic (~112k chars)
│   │   └── retro_tui.rs           # Retro TUI mode
│   ├── g3-core/src/
│   │   ├── lib.rs                 # Agent struct (~3400 lines)
│   │   ├── context_window.rs      # Context management
│   │   ├── tool_definitions.rs    # Tool schemas
│   │   ├── tool_dispatch.rs       # Tool routing
│   │   ├── tools/                 # Tool implementations
│   │   ├── streaming_parser.rs    # Response parsing
│   │   └── retry.rs               # Retry logic
│   ├── g3-providers/src/
│   │   ├── lib.rs                 # Provider trait
│   │   ├── anthropic.rs           # Anthropic Claude
│   │   ├── databricks.rs          # Databricks
│   │   ├── openai.rs              # OpenAI
│   │   └── embedded.rs            # Local models
│   ├── g3-config/src/lib.rs       # Configuration
│   ├── g3-planner/src/            # Planning mode
│   ├── g3-ensembles/src/          # Flock mode
│   └── g3-computer-control/src/   # Automation
├── agents/                         # Agent personas
├── docs/                           # Documentation
└── logs/                           # Session logs
```

## Pointers to Documentation

- [Architecture](docs/architecture.md) - System design and data flow
- [Configuration](docs/configuration.md) - Config file format and options
- [Tools Reference](docs/tools.md) - All available tools
- [Providers Guide](docs/providers.md) - LLM provider setup
- [Control Commands](docs/CONTROL_COMMANDS.md) - Interactive commands
- [Code Search](docs/CODE_SEARCH.md) - Tree-sitter search guide
- [Flock Mode](docs/FLOCK_MODE.md) - Multi-agent development

## Dependency Analysis Artifacts

The `analysis/deps/` directory contains static analysis artifacts generated by the Euler agent:

| File | Purpose |
|------|--------|
| `graph.json` | Raw dependency graph data (crate and file-level edges with evidence) |
| `graph.summary.md` | Overview metrics: crate counts, edge counts, fan-in/fan-out rankings |
| `sccs.md` | Strongly Connected Components analysis (cycle detection via Tarjan's algorithm) |
| `layers.observed.md` | Mechanically-derived layer diagram showing crate hierarchy and intra-crate module structure |
| `hotspots.md` | Coupling hotspots: files/crates with disproportionate fan-in or fan-out (>2× average) |
| `limitations.md` | Known limitations of the static analysis (conditional compilation, macros, re-exports) |

**Key findings:**
- No cycles detected at crate or file level (strict DAG structure)
- `g3-config` and `g3-providers` are the most depended-upon crates (fan-in: 4)
- `g3-cli` has highest fan-out (5 crate dependencies) as the composition root
- `ui_writer.rs` is the most imported file (11 dependents)
- `g3-core/src/lib.rs` has highest fan-out (25 module declarations)

These artifacts are useful for understanding coupling, planning refactors, and identifying architectural boundaries.