g3/OLLAMA_CONFIG.md

# Configuring Ollama Provider in G3

This guide shows you how to configure G3 to use Ollama as your LLM provider.

## Quick Start

### 1. Install Ollama

```bash
# Visit https://ollama.ai to download and install
# Or use curl:
curl https://ollama.ai/install.sh | sh
```

### 2. Pull a Model

```bash
ollama pull llama3.2
# or any other model you prefer
```

### 3. Create Configuration File

Copy the example configuration:

```bash
cp config.ollama.example.toml ~/.config/g3/config.toml
```

Or create it manually:

```toml
[providers]
default_provider = "ollama"

[providers.ollama]
model = "llama3.2"
```

### 4. Run G3

```bash
g3
# G3 will now use Ollama with llama3.2!
```

## Configuration Options

### Basic Configuration

```toml
[providers]
default_provider = "ollama"

[providers.ollama]
model = "llama3.2"
```

This is the minimal configuration needed. It uses all defaults:
- Base URL: `http://localhost:11434`
- Temperature: `0.7`
- Max tokens: Not limited (uses model default)

### Full Configuration

```toml
[providers]
default_provider = "ollama"

[providers.ollama]
model = "llama3.2"
base_url = "http://localhost:11434"
max_tokens = 2048
temperature = 0.7
```

### Custom Ollama Host

If you're running Ollama on a different machine or port:

```toml
[providers.ollama]
model = "llama3.2"
base_url = "http://192.168.1.100:11434"
```

### Different Models

You can use any Ollama model:

```toml
[providers.ollama]
model = "qwen2.5:7b"  # Alibaba's Qwen model
```

```toml
[providers.ollama]
model = "mistral"  # Mistral AI
```

```toml
[providers.ollama]
model = "llama3.1:70b"  # Larger Llama model
```

## Multiple Provider Configuration

You can configure multiple providers and switch between them:

```toml
[providers]
default_provider = "ollama"  # Default for most operations

# Ollama for local, fast responses
[providers.ollama]
model = "llama3.2:3b"
temperature = 0.7

# Databricks for more complex tasks
[providers.databricks]
host = "https://your-workspace.cloud.databricks.com"
model = "databricks-claude-sonnet-4"
max_tokens = 4096
temperature = 0.1
use_oauth = true
```

Then switch providers with:

```bash
g3 --provider databricks
```

## Autonomous Mode (Coach-Player)

Use different providers for code review (coach) and implementation (player):

```toml
[providers]
default_provider = "ollama"
coach = "databricks"  # Use powerful cloud model for review
player = "ollama"     # Use local model for implementation

[providers.ollama]
model = "qwen2.5:14b"  # Larger local model for coding

[providers.databricks]
host = "https://your-workspace.cloud.databricks.com"
model = "databricks-claude-sonnet-4"
use_oauth = true
```

This gives you the best of both worlds:
- Fast local execution for coding tasks
- Powerful cloud review for quality assurance

## Recommended Models

### For Coding Tasks

| Model | Size | Speed | Quality | Notes |
|-------|------|-------|---------|-------|
| **qwen2.5:7b** | 7B | Fast | Excellent | Best balance for coding |
| **llama3.2:3b** | 3B | Very Fast | Good | Great for quick tasks |
| **llama3.1:8b** | 8B | Medium | Very Good | Solid all-rounder |
| **mistral** | 7B | Fast | Good | Good for general use |

### For Complex Tasks

| Model | Size | Speed | Quality | Notes |
|-------|------|-------|---------|-------|
| **qwen2.5:14b** | 14B | Medium | Excellent | Best local model for coding |
| **qwen2.5:32b** | 32B | Slow | Outstanding | If you have the resources |
| **llama3.1:70b** | 70B | Very Slow | Outstanding | Requires significant RAM/GPU |

## Temperature Settings

Temperature controls randomness in responses:

- **0.1-0.3**: Deterministic, good for code generation
- **0.5-0.7**: Balanced, good for most tasks
- **0.8-1.0**: Creative, good for brainstorming

```toml
[providers.ollama]
model = "qwen2.5:7b"
temperature = 0.2  # Focused code generation
```

## Max Tokens

Control response length:

```toml
[providers.ollama]
model = "llama3.2"
max_tokens = 1024  # Shorter responses
```

```toml
[providers.ollama]
model = "qwen2.5:7b"
max_tokens = 4096  # Longer, detailed responses
```

Leave it unset for model defaults (recommended).

## Performance Tuning

### GPU Acceleration

Ollama automatically uses GPU if available. To check:

```bash
ollama ps
```

### Quantized Models

For faster responses with less RAM:

```toml
[providers.ollama]
model = "llama3.2:3b-q4_0"  # 4-bit quantization
```

Quantization options:
- `q4_0`: 4-bit, fastest, lowest quality
- `q5_0`: 5-bit, balanced
- `q8_0`: 8-bit, slower, better quality

### Multiple Models

You can pull multiple models and switch easily:

```bash
ollama pull llama3.2:3b    # Fast for chat
ollama pull qwen2.5:7b     # Better for code
ollama pull mistral        # General purpose
```

Then change your config:

```toml
[providers.ollama]
model = "qwen2.5:7b"  # Just change this line
```

## Troubleshooting

### Ollama Not Running

```bash
# Check if Ollama is running
curl http://localhost:11434/api/version

# Start Ollama (macOS/Linux)
ollama serve

# Or just run a model (auto-starts)
ollama run llama3.2
```

### Model Not Found

```bash
# List available models
ollama list

# Pull the model
ollama pull llama3.2
```

### Slow Responses

1. Use a smaller model:
   ```toml
   model = "llama3.2:1b"  # Smallest, fastest
   ```

2. Use quantized version:
   ```toml
   model = "llama3.2:3b-q4_0"
   ```

3. Reduce max_tokens:
   ```toml
   max_tokens = 512
   ```

### Out of Memory

1. Switch to smaller model
2. Use quantized version
3. Close other applications
4. Check GPU memory: `ollama ps`

### Connection Refused

Check base_url is correct:

```toml
[providers.ollama]
model = "llama3.2"
base_url = "http://localhost:11434"  # Default
```

For remote Ollama:

```toml
base_url = "http://your-server:11434"
```

## Complete Example Configs

### Minimal Local Setup

```toml
[providers]
default_provider = "ollama"

[providers.ollama]
model = "llama3.2"

[agent]
max_context_length = 8192
enable_streaming = true
timeout_seconds = 60
```

### Optimized for Coding

```toml
[providers]
default_provider = "ollama"

[providers.ollama]
model = "qwen2.5:7b"
temperature = 0.2
max_tokens = 2048

[agent]
max_context_length = 16384
enable_streaming = true
timeout_seconds = 120
```

### Fast Responses

```toml
[providers]
default_provider = "ollama"

[providers.ollama]
model = "llama3.2:3b-q4_0"
temperature = 0.7
max_tokens = 1024

[agent]
max_context_length = 4096
enable_streaming = true
timeout_seconds = 30
```

### High Quality (Requires Good Hardware)

```toml
[providers]
default_provider = "ollama"

[providers.ollama]
model = "qwen2.5:32b"
temperature = 0.3
max_tokens = 4096

[agent]
max_context_length = 32768
enable_streaming = true
timeout_seconds = 300
```

### Hybrid (Local + Cloud)

```toml
[providers]
default_provider = "ollama"
coach = "databricks"
player = "ollama"

[providers.ollama]
model = "qwen2.5:14b"
temperature = 0.2

[providers.databricks]
host = "https://your-workspace.cloud.databricks.com"
model = "databricks-claude-sonnet-4"
use_oauth = true

[agent]
max_context_length = 16384
enable_streaming = true
timeout_seconds = 120
```

## Environment Variables

You can override config with environment variables:

```bash
# Override model
G3_PROVIDERS_OLLAMA_MODEL=qwen2.5:7b g3

# Override base URL
G3_PROVIDERS_OLLAMA_BASE_URL=http://192.168.1.100:11434 g3

# Override default provider
G3_PROVIDERS_DEFAULT_PROVIDER=ollama g3
```

## Best Practices

1. **Start Small**: Begin with llama3.2:3b, scale up if needed
2. **Use Quantization**: q4_0 or q5_0 for best speed/quality balance
3. **Match Task to Model**:
   - Quick edits: 1B-3B models
   - Code generation: 7B-14B models
   - Complex refactoring: 14B-32B models
4. **Temperature for Code**: Use 0.1-0.3 for deterministic output
5. **Enable Streaming**: Always enable for better UX
6. **Local First**: Use Ollama by default, cloud for special cases

## Comparison with Other Providers

| Feature | Ollama | Databricks | OpenAI | Anthropic |
|---------|--------|------------|--------|-----------|
| Cost | Free | Paid | Paid | Paid |
| Privacy | Full | Medium | Low | Low |
| Speed (small models) | Fast | Fast | Medium | Medium |
| Speed (large models) | Slow | Fast | Fast | Fast |
| Setup Complexity | Low | Medium | Low | Low |
| Authentication | None | OAuth/Token | API Key | API Key |
| Offline Support | Yes | No | No | No |
| Tool Calling | Yes | Yes | Yes | Yes |

## Next Steps

1. Try different models: `ollama pull mistral`, `ollama pull qwen2.5`
2. Experiment with temperature settings
3. Set up hybrid config with cloud provider for complex tasks
4. Share your config in the community!

## Getting Help

- Ollama docs: https://ollama.ai/docs
- G3 issues: https://github.com/your-repo/issues
- Test your config: `g3 --help`