8.6 KiB
Configuring Ollama Provider in G3
This guide shows you how to configure G3 to use Ollama as your LLM provider.
Quick Start
1. Install Ollama
# Visit https://ollama.ai to download and install
# Or use curl:
curl https://ollama.ai/install.sh | sh
2. Pull a Model
ollama pull llama3.2
# or any other model you prefer
3. Create Configuration File
Copy the example configuration:
cp config.ollama.example.toml ~/.config/g3/config.toml
Or create it manually:
[providers]
default_provider = "ollama"
[providers.ollama]
model = "llama3.2"
4. Run G3
g3
# G3 will now use Ollama with llama3.2!
Configuration Options
Basic Configuration
[providers]
default_provider = "ollama"
[providers.ollama]
model = "llama3.2"
This is the minimal configuration needed. It uses all defaults:
- Base URL:
http://localhost:11434 - Temperature:
0.7 - Max tokens: Not limited (uses model default)
Full Configuration
[providers]
default_provider = "ollama"
[providers.ollama]
model = "llama3.2"
base_url = "http://localhost:11434"
max_tokens = 2048
temperature = 0.7
Custom Ollama Host
If you're running Ollama on a different machine or port:
[providers.ollama]
model = "llama3.2"
base_url = "http://192.168.1.100:11434"
Different Models
You can use any Ollama model:
[providers.ollama]
model = "qwen2.5:7b" # Alibaba's Qwen model
[providers.ollama]
model = "mistral" # Mistral AI
[providers.ollama]
model = "llama3.1:70b" # Larger Llama model
Multiple Provider Configuration
You can configure multiple providers and switch between them:
[providers]
default_provider = "ollama" # Default for most operations
# Ollama for local, fast responses
[providers.ollama]
model = "llama3.2:3b"
temperature = 0.7
# Databricks for more complex tasks
[providers.databricks]
host = "https://your-workspace.cloud.databricks.com"
model = "databricks-claude-sonnet-4"
max_tokens = 4096
temperature = 0.1
use_oauth = true
Then switch providers with:
g3 --provider databricks
Autonomous Mode (Coach-Player)
Use different providers for code review (coach) and implementation (player):
[providers]
default_provider = "ollama"
coach = "databricks" # Use powerful cloud model for review
player = "ollama" # Use local model for implementation
[providers.ollama]
model = "qwen2.5:14b" # Larger local model for coding
[providers.databricks]
host = "https://your-workspace.cloud.databricks.com"
model = "databricks-claude-sonnet-4"
use_oauth = true
This gives you the best of both worlds:
- Fast local execution for coding tasks
- Powerful cloud review for quality assurance
Recommended Models
For Coding Tasks
| Model | Size | Speed | Quality | Notes |
|---|---|---|---|---|
| qwen2.5:7b | 7B | Fast | Excellent | Best balance for coding |
| llama3.2:3b | 3B | Very Fast | Good | Great for quick tasks |
| llama3.1:8b | 8B | Medium | Very Good | Solid all-rounder |
| mistral | 7B | Fast | Good | Good for general use |
For Complex Tasks
| Model | Size | Speed | Quality | Notes |
|---|---|---|---|---|
| qwen2.5:14b | 14B | Medium | Excellent | Best local model for coding |
| qwen2.5:32b | 32B | Slow | Outstanding | If you have the resources |
| llama3.1:70b | 70B | Very Slow | Outstanding | Requires significant RAM/GPU |
Temperature Settings
Temperature controls randomness in responses:
- 0.1-0.3: Deterministic, good for code generation
- 0.5-0.7: Balanced, good for most tasks
- 0.8-1.0: Creative, good for brainstorming
[providers.ollama]
model = "qwen2.5:7b"
temperature = 0.2 # Focused code generation
Max Tokens
Control response length:
[providers.ollama]
model = "llama3.2"
max_tokens = 1024 # Shorter responses
[providers.ollama]
model = "qwen2.5:7b"
max_tokens = 4096 # Longer, detailed responses
Leave it unset for model defaults (recommended).
Performance Tuning
GPU Acceleration
Ollama automatically uses GPU if available. To check:
ollama ps
Quantized Models
For faster responses with less RAM:
[providers.ollama]
model = "llama3.2:3b-q4_0" # 4-bit quantization
Quantization options:
q4_0: 4-bit, fastest, lowest qualityq5_0: 5-bit, balancedq8_0: 8-bit, slower, better quality
Multiple Models
You can pull multiple models and switch easily:
ollama pull llama3.2:3b # Fast for chat
ollama pull qwen2.5:7b # Better for code
ollama pull mistral # General purpose
Then change your config:
[providers.ollama]
model = "qwen2.5:7b" # Just change this line
Troubleshooting
Ollama Not Running
# Check if Ollama is running
curl http://localhost:11434/api/version
# Start Ollama (macOS/Linux)
ollama serve
# Or just run a model (auto-starts)
ollama run llama3.2
Model Not Found
# List available models
ollama list
# Pull the model
ollama pull llama3.2
Slow Responses
-
Use a smaller model:
model = "llama3.2:1b" # Smallest, fastest -
Use quantized version:
model = "llama3.2:3b-q4_0" -
Reduce max_tokens:
max_tokens = 512
Out of Memory
- Switch to smaller model
- Use quantized version
- Close other applications
- Check GPU memory:
ollama ps
Connection Refused
Check base_url is correct:
[providers.ollama]
model = "llama3.2"
base_url = "http://localhost:11434" # Default
For remote Ollama:
base_url = "http://your-server:11434"
Complete Example Configs
Minimal Local Setup
[providers]
default_provider = "ollama"
[providers.ollama]
model = "llama3.2"
[agent]
max_context_length = 8192
enable_streaming = true
timeout_seconds = 60
Optimized for Coding
[providers]
default_provider = "ollama"
[providers.ollama]
model = "qwen2.5:7b"
temperature = 0.2
max_tokens = 2048
[agent]
max_context_length = 16384
enable_streaming = true
timeout_seconds = 120
Fast Responses
[providers]
default_provider = "ollama"
[providers.ollama]
model = "llama3.2:3b-q4_0"
temperature = 0.7
max_tokens = 1024
[agent]
max_context_length = 4096
enable_streaming = true
timeout_seconds = 30
High Quality (Requires Good Hardware)
[providers]
default_provider = "ollama"
[providers.ollama]
model = "qwen2.5:32b"
temperature = 0.3
max_tokens = 4096
[agent]
max_context_length = 32768
enable_streaming = true
timeout_seconds = 300
Hybrid (Local + Cloud)
[providers]
default_provider = "ollama"
coach = "databricks"
player = "ollama"
[providers.ollama]
model = "qwen2.5:14b"
temperature = 0.2
[providers.databricks]
host = "https://your-workspace.cloud.databricks.com"
model = "databricks-claude-sonnet-4"
use_oauth = true
[agent]
max_context_length = 16384
enable_streaming = true
timeout_seconds = 120
Environment Variables
You can override config with environment variables:
# Override model
G3_PROVIDERS_OLLAMA_MODEL=qwen2.5:7b g3
# Override base URL
G3_PROVIDERS_OLLAMA_BASE_URL=http://192.168.1.100:11434 g3
# Override default provider
G3_PROVIDERS_DEFAULT_PROVIDER=ollama g3
Best Practices
- Start Small: Begin with llama3.2:3b, scale up if needed
- Use Quantization: q4_0 or q5_0 for best speed/quality balance
- Match Task to Model:
- Quick edits: 1B-3B models
- Code generation: 7B-14B models
- Complex refactoring: 14B-32B models
- Temperature for Code: Use 0.1-0.3 for deterministic output
- Enable Streaming: Always enable for better UX
- Local First: Use Ollama by default, cloud for special cases
Comparison with Other Providers
| Feature | Ollama | Databricks | OpenAI | Anthropic |
|---|---|---|---|---|
| Cost | Free | Paid | Paid | Paid |
| Privacy | Full | Medium | Low | Low |
| Speed (small models) | Fast | Fast | Medium | Medium |
| Speed (large models) | Slow | Fast | Fast | Fast |
| Setup Complexity | Low | Medium | Low | Low |
| Authentication | None | OAuth/Token | API Key | API Key |
| Offline Support | Yes | No | No | No |
| Tool Calling | Yes | Yes | Yes | Yes |
Next Steps
- Try different models:
ollama pull mistral,ollama pull qwen2.5 - Experiment with temperature settings
- Set up hybrid config with cloud provider for complex tasks
- Share your config in the community!
Getting Help
- Ollama docs: https://ollama.ai/docs
- G3 issues: https://github.com/your-repo/issues
- Test your config:
g3 --help