457 lines
8.6 KiB
Markdown
457 lines
8.6 KiB
Markdown
# Configuring Ollama Provider in G3
|
|
|
|
This guide shows you how to configure G3 to use Ollama as your LLM provider.
|
|
|
|
## Quick Start
|
|
|
|
### 1. Install Ollama
|
|
|
|
```bash
|
|
# Visit https://ollama.ai to download and install
|
|
# Or use curl:
|
|
curl https://ollama.ai/install.sh | sh
|
|
```
|
|
|
|
### 2. Pull a Model
|
|
|
|
```bash
|
|
ollama pull llama3.2
|
|
# or any other model you prefer
|
|
```
|
|
|
|
### 3. Create Configuration File
|
|
|
|
Copy the example configuration:
|
|
|
|
```bash
|
|
cp config.ollama.example.toml ~/.config/g3/config.toml
|
|
```
|
|
|
|
Or create it manually:
|
|
|
|
```toml
|
|
[providers]
|
|
default_provider = "ollama"
|
|
|
|
[providers.ollama]
|
|
model = "llama3.2"
|
|
```
|
|
|
|
### 4. Run G3
|
|
|
|
```bash
|
|
g3
|
|
# G3 will now use Ollama with llama3.2!
|
|
```
|
|
|
|
## Configuration Options
|
|
|
|
### Basic Configuration
|
|
|
|
```toml
|
|
[providers]
|
|
default_provider = "ollama"
|
|
|
|
[providers.ollama]
|
|
model = "llama3.2"
|
|
```
|
|
|
|
This is the minimal configuration needed. It uses all defaults:
|
|
- Base URL: `http://localhost:11434`
|
|
- Temperature: `0.7`
|
|
- Max tokens: Not limited (uses model default)
|
|
|
|
### Full Configuration
|
|
|
|
```toml
|
|
[providers]
|
|
default_provider = "ollama"
|
|
|
|
[providers.ollama]
|
|
model = "llama3.2"
|
|
base_url = "http://localhost:11434"
|
|
max_tokens = 2048
|
|
temperature = 0.7
|
|
```
|
|
|
|
### Custom Ollama Host
|
|
|
|
If you're running Ollama on a different machine or port:
|
|
|
|
```toml
|
|
[providers.ollama]
|
|
model = "llama3.2"
|
|
base_url = "http://192.168.1.100:11434"
|
|
```
|
|
|
|
### Different Models
|
|
|
|
You can use any Ollama model:
|
|
|
|
```toml
|
|
[providers.ollama]
|
|
model = "qwen2.5:7b" # Alibaba's Qwen model
|
|
```
|
|
|
|
```toml
|
|
[providers.ollama]
|
|
model = "mistral" # Mistral AI
|
|
```
|
|
|
|
```toml
|
|
[providers.ollama]
|
|
model = "llama3.1:70b" # Larger Llama model
|
|
```
|
|
|
|
## Multiple Provider Configuration
|
|
|
|
You can configure multiple providers and switch between them:
|
|
|
|
```toml
|
|
[providers]
|
|
default_provider = "ollama" # Default for most operations
|
|
|
|
# Ollama for local, fast responses
|
|
[providers.ollama]
|
|
model = "llama3.2:3b"
|
|
temperature = 0.7
|
|
|
|
# Databricks for more complex tasks
|
|
[providers.databricks]
|
|
host = "https://your-workspace.cloud.databricks.com"
|
|
model = "databricks-claude-sonnet-4"
|
|
max_tokens = 4096
|
|
temperature = 0.1
|
|
use_oauth = true
|
|
```
|
|
|
|
Then switch providers with:
|
|
|
|
```bash
|
|
g3 --provider databricks
|
|
```
|
|
|
|
## Autonomous Mode (Coach-Player)
|
|
|
|
Use different providers for code review (coach) and implementation (player):
|
|
|
|
```toml
|
|
[providers]
|
|
default_provider = "ollama"
|
|
coach = "databricks" # Use powerful cloud model for review
|
|
player = "ollama" # Use local model for implementation
|
|
|
|
[providers.ollama]
|
|
model = "qwen2.5:14b" # Larger local model for coding
|
|
|
|
[providers.databricks]
|
|
host = "https://your-workspace.cloud.databricks.com"
|
|
model = "databricks-claude-sonnet-4"
|
|
use_oauth = true
|
|
```
|
|
|
|
This gives you the best of both worlds:
|
|
- Fast local execution for coding tasks
|
|
- Powerful cloud review for quality assurance
|
|
|
|
## Recommended Models
|
|
|
|
### For Coding Tasks
|
|
|
|
| Model | Size | Speed | Quality | Notes |
|
|
|-------|------|-------|---------|-------|
|
|
| **qwen2.5:7b** | 7B | Fast | Excellent | Best balance for coding |
|
|
| **llama3.2:3b** | 3B | Very Fast | Good | Great for quick tasks |
|
|
| **llama3.1:8b** | 8B | Medium | Very Good | Solid all-rounder |
|
|
| **mistral** | 7B | Fast | Good | Good for general use |
|
|
|
|
### For Complex Tasks
|
|
|
|
| Model | Size | Speed | Quality | Notes |
|
|
|-------|------|-------|---------|-------|
|
|
| **qwen2.5:14b** | 14B | Medium | Excellent | Best local model for coding |
|
|
| **qwen2.5:32b** | 32B | Slow | Outstanding | If you have the resources |
|
|
| **llama3.1:70b** | 70B | Very Slow | Outstanding | Requires significant RAM/GPU |
|
|
|
|
## Temperature Settings
|
|
|
|
Temperature controls randomness in responses:
|
|
|
|
- **0.1-0.3**: Deterministic, good for code generation
|
|
- **0.5-0.7**: Balanced, good for most tasks
|
|
- **0.8-1.0**: Creative, good for brainstorming
|
|
|
|
```toml
|
|
[providers.ollama]
|
|
model = "qwen2.5:7b"
|
|
temperature = 0.2 # Focused code generation
|
|
```
|
|
|
|
## Max Tokens
|
|
|
|
Control response length:
|
|
|
|
```toml
|
|
[providers.ollama]
|
|
model = "llama3.2"
|
|
max_tokens = 1024 # Shorter responses
|
|
```
|
|
|
|
```toml
|
|
[providers.ollama]
|
|
model = "qwen2.5:7b"
|
|
max_tokens = 4096 # Longer, detailed responses
|
|
```
|
|
|
|
Leave it unset for model defaults (recommended).
|
|
|
|
## Performance Tuning
|
|
|
|
### GPU Acceleration
|
|
|
|
Ollama automatically uses GPU if available. To check:
|
|
|
|
```bash
|
|
ollama ps
|
|
```
|
|
|
|
### Quantized Models
|
|
|
|
For faster responses with less RAM:
|
|
|
|
```toml
|
|
[providers.ollama]
|
|
model = "llama3.2:3b-q4_0" # 4-bit quantization
|
|
```
|
|
|
|
Quantization options:
|
|
- `q4_0`: 4-bit, fastest, lowest quality
|
|
- `q5_0`: 5-bit, balanced
|
|
- `q8_0`: 8-bit, slower, better quality
|
|
|
|
### Multiple Models
|
|
|
|
You can pull multiple models and switch easily:
|
|
|
|
```bash
|
|
ollama pull llama3.2:3b # Fast for chat
|
|
ollama pull qwen2.5:7b # Better for code
|
|
ollama pull mistral # General purpose
|
|
```
|
|
|
|
Then change your config:
|
|
|
|
```toml
|
|
[providers.ollama]
|
|
model = "qwen2.5:7b" # Just change this line
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### Ollama Not Running
|
|
|
|
```bash
|
|
# Check if Ollama is running
|
|
curl http://localhost:11434/api/version
|
|
|
|
# Start Ollama (macOS/Linux)
|
|
ollama serve
|
|
|
|
# Or just run a model (auto-starts)
|
|
ollama run llama3.2
|
|
```
|
|
|
|
### Model Not Found
|
|
|
|
```bash
|
|
# List available models
|
|
ollama list
|
|
|
|
# Pull the model
|
|
ollama pull llama3.2
|
|
```
|
|
|
|
### Slow Responses
|
|
|
|
1. Use a smaller model:
|
|
```toml
|
|
model = "llama3.2:1b" # Smallest, fastest
|
|
```
|
|
|
|
2. Use quantized version:
|
|
```toml
|
|
model = "llama3.2:3b-q4_0"
|
|
```
|
|
|
|
3. Reduce max_tokens:
|
|
```toml
|
|
max_tokens = 512
|
|
```
|
|
|
|
### Out of Memory
|
|
|
|
1. Switch to smaller model
|
|
2. Use quantized version
|
|
3. Close other applications
|
|
4. Check GPU memory: `ollama ps`
|
|
|
|
### Connection Refused
|
|
|
|
Check base_url is correct:
|
|
|
|
```toml
|
|
[providers.ollama]
|
|
model = "llama3.2"
|
|
base_url = "http://localhost:11434" # Default
|
|
```
|
|
|
|
For remote Ollama:
|
|
|
|
```toml
|
|
base_url = "http://your-server:11434"
|
|
```
|
|
|
|
## Complete Example Configs
|
|
|
|
### Minimal Local Setup
|
|
|
|
```toml
|
|
[providers]
|
|
default_provider = "ollama"
|
|
|
|
[providers.ollama]
|
|
model = "llama3.2"
|
|
|
|
[agent]
|
|
max_context_length = 8192
|
|
enable_streaming = true
|
|
timeout_seconds = 60
|
|
```
|
|
|
|
### Optimized for Coding
|
|
|
|
```toml
|
|
[providers]
|
|
default_provider = "ollama"
|
|
|
|
[providers.ollama]
|
|
model = "qwen2.5:7b"
|
|
temperature = 0.2
|
|
max_tokens = 2048
|
|
|
|
[agent]
|
|
max_context_length = 16384
|
|
enable_streaming = true
|
|
timeout_seconds = 120
|
|
```
|
|
|
|
### Fast Responses
|
|
|
|
```toml
|
|
[providers]
|
|
default_provider = "ollama"
|
|
|
|
[providers.ollama]
|
|
model = "llama3.2:3b-q4_0"
|
|
temperature = 0.7
|
|
max_tokens = 1024
|
|
|
|
[agent]
|
|
max_context_length = 4096
|
|
enable_streaming = true
|
|
timeout_seconds = 30
|
|
```
|
|
|
|
### High Quality (Requires Good Hardware)
|
|
|
|
```toml
|
|
[providers]
|
|
default_provider = "ollama"
|
|
|
|
[providers.ollama]
|
|
model = "qwen2.5:32b"
|
|
temperature = 0.3
|
|
max_tokens = 4096
|
|
|
|
[agent]
|
|
max_context_length = 32768
|
|
enable_streaming = true
|
|
timeout_seconds = 300
|
|
```
|
|
|
|
### Hybrid (Local + Cloud)
|
|
|
|
```toml
|
|
[providers]
|
|
default_provider = "ollama"
|
|
coach = "databricks"
|
|
player = "ollama"
|
|
|
|
[providers.ollama]
|
|
model = "qwen2.5:14b"
|
|
temperature = 0.2
|
|
|
|
[providers.databricks]
|
|
host = "https://your-workspace.cloud.databricks.com"
|
|
model = "databricks-claude-sonnet-4"
|
|
use_oauth = true
|
|
|
|
[agent]
|
|
max_context_length = 16384
|
|
enable_streaming = true
|
|
timeout_seconds = 120
|
|
```
|
|
|
|
## Environment Variables
|
|
|
|
You can override config with environment variables:
|
|
|
|
```bash
|
|
# Override model
|
|
G3_PROVIDERS_OLLAMA_MODEL=qwen2.5:7b g3
|
|
|
|
# Override base URL
|
|
G3_PROVIDERS_OLLAMA_BASE_URL=http://192.168.1.100:11434 g3
|
|
|
|
# Override default provider
|
|
G3_PROVIDERS_DEFAULT_PROVIDER=ollama g3
|
|
```
|
|
|
|
## Best Practices
|
|
|
|
1. **Start Small**: Begin with llama3.2:3b, scale up if needed
|
|
2. **Use Quantization**: q4_0 or q5_0 for best speed/quality balance
|
|
3. **Match Task to Model**:
|
|
- Quick edits: 1B-3B models
|
|
- Code generation: 7B-14B models
|
|
- Complex refactoring: 14B-32B models
|
|
4. **Temperature for Code**: Use 0.1-0.3 for deterministic output
|
|
5. **Enable Streaming**: Always enable for better UX
|
|
6. **Local First**: Use Ollama by default, cloud for special cases
|
|
|
|
## Comparison with Other Providers
|
|
|
|
| Feature | Ollama | Databricks | OpenAI | Anthropic |
|
|
|---------|--------|------------|--------|-----------|
|
|
| Cost | Free | Paid | Paid | Paid |
|
|
| Privacy | Full | Medium | Low | Low |
|
|
| Speed (small models) | Fast | Fast | Medium | Medium |
|
|
| Speed (large models) | Slow | Fast | Fast | Fast |
|
|
| Setup Complexity | Low | Medium | Low | Low |
|
|
| Authentication | None | OAuth/Token | API Key | API Key |
|
|
| Offline Support | Yes | No | No | No |
|
|
| Tool Calling | Yes | Yes | Yes | Yes |
|
|
|
|
## Next Steps
|
|
|
|
1. Try different models: `ollama pull mistral`, `ollama pull qwen2.5`
|
|
2. Experiment with temperature settings
|
|
3. Set up hybrid config with cloud provider for complex tasks
|
|
4. Share your config in the community!
|
|
|
|
## Getting Help
|
|
|
|
- Ollama docs: https://ollama.ai/docs
|
|
- G3 issues: https://github.com/your-repo/issues
|
|
- Test your config: `g3 --help`
|