ollama support

2025-11-05 12:17:01 +11:00
parent 22a0090cdc
commit 217df2f2af
7 changed files with 1545 additions and 0 deletions
--- a/OLLAMA_CONFIG.md
+++ b/OLLAMA_CONFIG.md
@@ -0,0 +1,456 @@
+# Configuring Ollama Provider in G3
+
+This guide shows you how to configure G3 to use Ollama as your LLM provider.
+
+## Quick Start
+
+### 1. Install Ollama
+
+```bash
+# Visit https://ollama.ai to download and install
+# Or use curl:
+curl https://ollama.ai/install.sh | sh
+```
+
+### 2. Pull a Model
+
+```bash
+ollama pull llama3.2
+# or any other model you prefer
+```
+
+### 3. Create Configuration File
+
+Copy the example configuration:
+
+```bash
+cp config.ollama.example.toml ~/.config/g3/config.toml
+```
+
+Or create it manually:
+
+```toml
+[providers]
+default_provider = "ollama"
+
+[providers.ollama]
+model = "llama3.2"
+```
+
+### 4. Run G3
+
+```bash
+g3
+# G3 will now use Ollama with llama3.2!
+```
+
+## Configuration Options
+
+### Basic Configuration
+
+```toml
+[providers]
+default_provider = "ollama"
+
+[providers.ollama]
+model = "llama3.2"
+```
+
+This is the minimal configuration needed. It uses all defaults:
+- Base URL: `http://localhost:11434`
+- Temperature: `0.7`
+- Max tokens: Not limited (uses model default)
+
+### Full Configuration
+
+```toml
+[providers]
+default_provider = "ollama"
+
+[providers.ollama]
+model = "llama3.2"
+base_url = "http://localhost:11434"
+max_tokens = 2048
+temperature = 0.7
+```
+
+### Custom Ollama Host
+
+If you're running Ollama on a different machine or port:
+
+```toml
+[providers.ollama]
+model = "llama3.2"
+base_url = "http://192.168.1.100:11434"
+```
+
+### Different Models
+
+You can use any Ollama model:
+
+```toml
+[providers.ollama]
+model = "qwen2.5:7b"  # Alibaba's Qwen model
+```
+
+```toml
+[providers.ollama]
+model = "mistral"  # Mistral AI
+```
+
+```toml
+[providers.ollama]
+model = "llama3.1:70b"  # Larger Llama model
+```
+
+## Multiple Provider Configuration
+
+You can configure multiple providers and switch between them:
+
+```toml
+[providers]
+default_provider = "ollama"  # Default for most operations
+
+# Ollama for local, fast responses
+[providers.ollama]
+model = "llama3.2:3b"
+temperature = 0.7
+
+# Databricks for more complex tasks
+[providers.databricks]
+host = "https://your-workspace.cloud.databricks.com"
+model = "databricks-claude-sonnet-4"
+max_tokens = 4096
+temperature = 0.1
+use_oauth = true
+```
+
+Then switch providers with:
+
+```bash
+g3 --provider databricks
+```
+
+## Autonomous Mode (Coach-Player)
+
+Use different providers for code review (coach) and implementation (player):
+
+```toml
+[providers]
+default_provider = "ollama"
+coach = "databricks"  # Use powerful cloud model for review
+player = "ollama"     # Use local model for implementation
+
+[providers.ollama]
+model = "qwen2.5:14b"  # Larger local model for coding
+
+[providers.databricks]
+host = "https://your-workspace.cloud.databricks.com"
+model = "databricks-claude-sonnet-4"
+use_oauth = true
+```
+
+This gives you the best of both worlds:
+- Fast local execution for coding tasks
+- Powerful cloud review for quality assurance
+
+## Recommended Models
+
+### For Coding Tasks
+
+| Model | Size | Speed | Quality | Notes |
+|-------|------|-------|---------|-------|
+| **qwen2.5:7b** | 7B | Fast | Excellent | Best balance for coding |
+| **llama3.2:3b** | 3B | Very Fast | Good | Great for quick tasks |
+| **llama3.1:8b** | 8B | Medium | Very Good | Solid all-rounder |
+| **mistral** | 7B | Fast | Good | Good for general use |
+
+### For Complex Tasks
+
+| Model | Size | Speed | Quality | Notes |
+|-------|------|-------|---------|-------|
+| **qwen2.5:14b** | 14B | Medium | Excellent | Best local model for coding |
+| **qwen2.5:32b** | 32B | Slow | Outstanding | If you have the resources |
+| **llama3.1:70b** | 70B | Very Slow | Outstanding | Requires significant RAM/GPU |
+
+## Temperature Settings
+
+Temperature controls randomness in responses:
+
+- **0.1-0.3**: Deterministic, good for code generation
+- **0.5-0.7**: Balanced, good for most tasks
+- **0.8-1.0**: Creative, good for brainstorming
+
+```toml
+[providers.ollama]
+model = "qwen2.5:7b"
+temperature = 0.2  # Focused code generation
+```
+
+## Max Tokens
+
+Control response length:
+
+```toml
+[providers.ollama]
+model = "llama3.2"
+max_tokens = 1024  # Shorter responses
+```
+
+```toml
+[providers.ollama]
+model = "qwen2.5:7b"
+max_tokens = 4096  # Longer, detailed responses
+```
+
+Leave it unset for model defaults (recommended).
+
+## Performance Tuning
+
+### GPU Acceleration
+
+Ollama automatically uses GPU if available. To check:
+
+```bash
+ollama ps
+```
+
+### Quantized Models
+
+For faster responses with less RAM:
+
+```toml
+[providers.ollama]
+model = "llama3.2:3b-q4_0"  # 4-bit quantization
+```
+
+Quantization options:
+- `q4_0`: 4-bit, fastest, lowest quality
+- `q5_0`: 5-bit, balanced
+- `q8_0`: 8-bit, slower, better quality
+
+### Multiple Models
+
+You can pull multiple models and switch easily:
+
+```bash
+ollama pull llama3.2:3b    # Fast for chat
+ollama pull qwen2.5:7b     # Better for code
+ollama pull mistral        # General purpose
+```
+
+Then change your config:
+
+```toml
+[providers.ollama]
+model = "qwen2.5:7b"  # Just change this line
+```
+
+## Troubleshooting
+
+### Ollama Not Running
+
+```bash
+# Check if Ollama is running
+curl http://localhost:11434/api/version
+
+# Start Ollama (macOS/Linux)
+ollama serve
+
+# Or just run a model (auto-starts)
+ollama run llama3.2
+```
+
+### Model Not Found
+
+```bash
+# List available models
+ollama list
+
+# Pull the model
+ollama pull llama3.2
+```
+
+### Slow Responses
+
+1. Use a smaller model:
+   ```toml
+   model = "llama3.2:1b"  # Smallest, fastest
+   ```
+
+2. Use quantized version:
+   ```toml
+   model = "llama3.2:3b-q4_0"
+   ```
+
+3. Reduce max_tokens:
+   ```toml
+   max_tokens = 512
+   ```
+
+### Out of Memory
+
+1. Switch to smaller model
+2. Use quantized version
+3. Close other applications
+4. Check GPU memory: `ollama ps`
+
+### Connection Refused
+
+Check base_url is correct:
+
+```toml
+[providers.ollama]
+model = "llama3.2"
+base_url = "http://localhost:11434"  # Default
+```
+
+For remote Ollama:
+
+```toml
+base_url = "http://your-server:11434"
+```
+
+## Complete Example Configs
+
+### Minimal Local Setup
+
+```toml
+[providers]
+default_provider = "ollama"
+
+[providers.ollama]
+model = "llama3.2"
+
+[agent]
+max_context_length = 8192
+enable_streaming = true
+timeout_seconds = 60
+```
+
+### Optimized for Coding
+
+```toml
+[providers]
+default_provider = "ollama"
+
+[providers.ollama]
+model = "qwen2.5:7b"
+temperature = 0.2
+max_tokens = 2048
+
+[agent]
+max_context_length = 16384
+enable_streaming = true
+timeout_seconds = 120
+```
+
+### Fast Responses
+
+```toml
+[providers]
+default_provider = "ollama"
+
+[providers.ollama]
+model = "llama3.2:3b-q4_0"
+temperature = 0.7
+max_tokens = 1024
+
+[agent]
+max_context_length = 4096
+enable_streaming = true
+timeout_seconds = 30
+```
+
+### High Quality (Requires Good Hardware)
+
+```toml
+[providers]
+default_provider = "ollama"
+
+[providers.ollama]
+model = "qwen2.5:32b"
+temperature = 0.3
+max_tokens = 4096
+
+[agent]
+max_context_length = 32768
+enable_streaming = true
+timeout_seconds = 300
+```
+
+### Hybrid (Local + Cloud)
+
+```toml
+[providers]
+default_provider = "ollama"
+coach = "databricks"
+player = "ollama"
+
+[providers.ollama]
+model = "qwen2.5:14b"
+temperature = 0.2
+
+[providers.databricks]
+host = "https://your-workspace.cloud.databricks.com"
+model = "databricks-claude-sonnet-4"
+use_oauth = true
+
+[agent]
+max_context_length = 16384
+enable_streaming = true
+timeout_seconds = 120
+```
+
+## Environment Variables
+
+You can override config with environment variables:
+
+```bash
+# Override model
+G3_PROVIDERS_OLLAMA_MODEL=qwen2.5:7b g3
+
+# Override base URL
+G3_PROVIDERS_OLLAMA_BASE_URL=http://192.168.1.100:11434 g3
+
+# Override default provider
+G3_PROVIDERS_DEFAULT_PROVIDER=ollama g3
+```
+
+## Best Practices
+
+1. **Start Small**: Begin with llama3.2:3b, scale up if needed
+2. **Use Quantization**: q4_0 or q5_0 for best speed/quality balance
+3. **Match Task to Model**: 
+   - Quick edits: 1B-3B models
+   - Code generation: 7B-14B models
+   - Complex refactoring: 14B-32B models
+4. **Temperature for Code**: Use 0.1-0.3 for deterministic output
+5. **Enable Streaming**: Always enable for better UX
+6. **Local First**: Use Ollama by default, cloud for special cases
+
+## Comparison with Other Providers
+
+| Feature | Ollama | Databricks | OpenAI | Anthropic |
+|---------|--------|------------|--------|-----------|
+| Cost | Free | Paid | Paid | Paid |
+| Privacy | Full | Medium | Low | Low |
+| Speed (small models) | Fast | Fast | Medium | Medium |
+| Speed (large models) | Slow | Fast | Fast | Fast |
+| Setup Complexity | Low | Medium | Low | Low |
+| Authentication | None | OAuth/Token | API Key | API Key |
+| Offline Support | Yes | No | No | No |
+| Tool Calling | Yes | Yes | Yes | Yes |
+
+## Next Steps
+
+1. Try different models: `ollama pull mistral`, `ollama pull qwen2.5`
+2. Experiment with temperature settings
+3. Set up hybrid config with cloud provider for complex tasks
+4. Share your config in the community!
+
+## Getting Help
+
+- Ollama docs: https://ollama.ai/docs
+- G3 issues: https://github.com/your-repo/issues
+- Test your config: `g3 --help`