ollama support

This commit is contained in:
Michael Neale
2025-11-05 12:17:01 +11:00
parent 22a0090cdc
commit 217df2f2af
7 changed files with 1545 additions and 0 deletions

456
OLLAMA_CONFIG.md Normal file
View File

@@ -0,0 +1,456 @@
# Configuring Ollama Provider in G3
This guide shows you how to configure G3 to use Ollama as your LLM provider.
## Quick Start
### 1. Install Ollama
```bash
# Visit https://ollama.ai to download and install
# Or use curl:
curl https://ollama.ai/install.sh | sh
```
### 2. Pull a Model
```bash
ollama pull llama3.2
# or any other model you prefer
```
### 3. Create Configuration File
Copy the example configuration:
```bash
cp config.ollama.example.toml ~/.config/g3/config.toml
```
Or create it manually:
```toml
[providers]
default_provider = "ollama"
[providers.ollama]
model = "llama3.2"
```
### 4. Run G3
```bash
g3
# G3 will now use Ollama with llama3.2!
```
## Configuration Options
### Basic Configuration
```toml
[providers]
default_provider = "ollama"
[providers.ollama]
model = "llama3.2"
```
This is the minimal configuration needed. It uses all defaults:
- Base URL: `http://localhost:11434`
- Temperature: `0.7`
- Max tokens: Not limited (uses model default)
### Full Configuration
```toml
[providers]
default_provider = "ollama"
[providers.ollama]
model = "llama3.2"
base_url = "http://localhost:11434"
max_tokens = 2048
temperature = 0.7
```
### Custom Ollama Host
If you're running Ollama on a different machine or port:
```toml
[providers.ollama]
model = "llama3.2"
base_url = "http://192.168.1.100:11434"
```
### Different Models
You can use any Ollama model:
```toml
[providers.ollama]
model = "qwen2.5:7b" # Alibaba's Qwen model
```
```toml
[providers.ollama]
model = "mistral" # Mistral AI
```
```toml
[providers.ollama]
model = "llama3.1:70b" # Larger Llama model
```
## Multiple Provider Configuration
You can configure multiple providers and switch between them:
```toml
[providers]
default_provider = "ollama" # Default for most operations
# Ollama for local, fast responses
[providers.ollama]
model = "llama3.2:3b"
temperature = 0.7
# Databricks for more complex tasks
[providers.databricks]
host = "https://your-workspace.cloud.databricks.com"
model = "databricks-claude-sonnet-4"
max_tokens = 4096
temperature = 0.1
use_oauth = true
```
Then switch providers with:
```bash
g3 --provider databricks
```
## Autonomous Mode (Coach-Player)
Use different providers for code review (coach) and implementation (player):
```toml
[providers]
default_provider = "ollama"
coach = "databricks" # Use powerful cloud model for review
player = "ollama" # Use local model for implementation
[providers.ollama]
model = "qwen2.5:14b" # Larger local model for coding
[providers.databricks]
host = "https://your-workspace.cloud.databricks.com"
model = "databricks-claude-sonnet-4"
use_oauth = true
```
This gives you the best of both worlds:
- Fast local execution for coding tasks
- Powerful cloud review for quality assurance
## Recommended Models
### For Coding Tasks
| Model | Size | Speed | Quality | Notes |
|-------|------|-------|---------|-------|
| **qwen2.5:7b** | 7B | Fast | Excellent | Best balance for coding |
| **llama3.2:3b** | 3B | Very Fast | Good | Great for quick tasks |
| **llama3.1:8b** | 8B | Medium | Very Good | Solid all-rounder |
| **mistral** | 7B | Fast | Good | Good for general use |
### For Complex Tasks
| Model | Size | Speed | Quality | Notes |
|-------|------|-------|---------|-------|
| **qwen2.5:14b** | 14B | Medium | Excellent | Best local model for coding |
| **qwen2.5:32b** | 32B | Slow | Outstanding | If you have the resources |
| **llama3.1:70b** | 70B | Very Slow | Outstanding | Requires significant RAM/GPU |
## Temperature Settings
Temperature controls randomness in responses:
- **0.1-0.3**: Deterministic, good for code generation
- **0.5-0.7**: Balanced, good for most tasks
- **0.8-1.0**: Creative, good for brainstorming
```toml
[providers.ollama]
model = "qwen2.5:7b"
temperature = 0.2 # Focused code generation
```
## Max Tokens
Control response length:
```toml
[providers.ollama]
model = "llama3.2"
max_tokens = 1024 # Shorter responses
```
```toml
[providers.ollama]
model = "qwen2.5:7b"
max_tokens = 4096 # Longer, detailed responses
```
Leave it unset for model defaults (recommended).
## Performance Tuning
### GPU Acceleration
Ollama automatically uses GPU if available. To check:
```bash
ollama ps
```
### Quantized Models
For faster responses with less RAM:
```toml
[providers.ollama]
model = "llama3.2:3b-q4_0" # 4-bit quantization
```
Quantization options:
- `q4_0`: 4-bit, fastest, lowest quality
- `q5_0`: 5-bit, balanced
- `q8_0`: 8-bit, slower, better quality
### Multiple Models
You can pull multiple models and switch easily:
```bash
ollama pull llama3.2:3b # Fast for chat
ollama pull qwen2.5:7b # Better for code
ollama pull mistral # General purpose
```
Then change your config:
```toml
[providers.ollama]
model = "qwen2.5:7b" # Just change this line
```
## Troubleshooting
### Ollama Not Running
```bash
# Check if Ollama is running
curl http://localhost:11434/api/version
# Start Ollama (macOS/Linux)
ollama serve
# Or just run a model (auto-starts)
ollama run llama3.2
```
### Model Not Found
```bash
# List available models
ollama list
# Pull the model
ollama pull llama3.2
```
### Slow Responses
1. Use a smaller model:
```toml
model = "llama3.2:1b" # Smallest, fastest
```
2. Use quantized version:
```toml
model = "llama3.2:3b-q4_0"
```
3. Reduce max_tokens:
```toml
max_tokens = 512
```
### Out of Memory
1. Switch to smaller model
2. Use quantized version
3. Close other applications
4. Check GPU memory: `ollama ps`
### Connection Refused
Check base_url is correct:
```toml
[providers.ollama]
model = "llama3.2"
base_url = "http://localhost:11434" # Default
```
For remote Ollama:
```toml
base_url = "http://your-server:11434"
```
## Complete Example Configs
### Minimal Local Setup
```toml
[providers]
default_provider = "ollama"
[providers.ollama]
model = "llama3.2"
[agent]
max_context_length = 8192
enable_streaming = true
timeout_seconds = 60
```
### Optimized for Coding
```toml
[providers]
default_provider = "ollama"
[providers.ollama]
model = "qwen2.5:7b"
temperature = 0.2
max_tokens = 2048
[agent]
max_context_length = 16384
enable_streaming = true
timeout_seconds = 120
```
### Fast Responses
```toml
[providers]
default_provider = "ollama"
[providers.ollama]
model = "llama3.2:3b-q4_0"
temperature = 0.7
max_tokens = 1024
[agent]
max_context_length = 4096
enable_streaming = true
timeout_seconds = 30
```
### High Quality (Requires Good Hardware)
```toml
[providers]
default_provider = "ollama"
[providers.ollama]
model = "qwen2.5:32b"
temperature = 0.3
max_tokens = 4096
[agent]
max_context_length = 32768
enable_streaming = true
timeout_seconds = 300
```
### Hybrid (Local + Cloud)
```toml
[providers]
default_provider = "ollama"
coach = "databricks"
player = "ollama"
[providers.ollama]
model = "qwen2.5:14b"
temperature = 0.2
[providers.databricks]
host = "https://your-workspace.cloud.databricks.com"
model = "databricks-claude-sonnet-4"
use_oauth = true
[agent]
max_context_length = 16384
enable_streaming = true
timeout_seconds = 120
```
## Environment Variables
You can override config with environment variables:
```bash
# Override model
G3_PROVIDERS_OLLAMA_MODEL=qwen2.5:7b g3
# Override base URL
G3_PROVIDERS_OLLAMA_BASE_URL=http://192.168.1.100:11434 g3
# Override default provider
G3_PROVIDERS_DEFAULT_PROVIDER=ollama g3
```
## Best Practices
1. **Start Small**: Begin with llama3.2:3b, scale up if needed
2. **Use Quantization**: q4_0 or q5_0 for best speed/quality balance
3. **Match Task to Model**:
- Quick edits: 1B-3B models
- Code generation: 7B-14B models
- Complex refactoring: 14B-32B models
4. **Temperature for Code**: Use 0.1-0.3 for deterministic output
5. **Enable Streaming**: Always enable for better UX
6. **Local First**: Use Ollama by default, cloud for special cases
## Comparison with Other Providers
| Feature | Ollama | Databricks | OpenAI | Anthropic |
|---------|--------|------------|--------|-----------|
| Cost | Free | Paid | Paid | Paid |
| Privacy | Full | Medium | Low | Low |
| Speed (small models) | Fast | Fast | Medium | Medium |
| Speed (large models) | Slow | Fast | Fast | Fast |
| Setup Complexity | Low | Medium | Low | Low |
| Authentication | None | OAuth/Token | API Key | API Key |
| Offline Support | Yes | No | No | No |
| Tool Calling | Yes | Yes | Yes | Yes |
## Next Steps
1. Try different models: `ollama pull mistral`, `ollama pull qwen2.5`
2. Experiment with temperature settings
3. Set up hybrid config with cloud provider for complex tasks
4. Share your config in the community!
## Getting Help
- Ollama docs: https://ollama.ai/docs
- G3 issues: https://github.com/your-repo/issues
- Test your config: `g3 --help`