Files
g3/OLLAMA_CONFIG.md
Michael Neale 217df2f2af ollama support
2025-11-05 12:17:01 +11:00

8.6 KiB

Configuring Ollama Provider in G3

This guide shows you how to configure G3 to use Ollama as your LLM provider.

Quick Start

1. Install Ollama

# Visit https://ollama.ai to download and install
# Or use curl:
curl https://ollama.ai/install.sh | sh

2. Pull a Model

ollama pull llama3.2
# or any other model you prefer

3. Create Configuration File

Copy the example configuration:

cp config.ollama.example.toml ~/.config/g3/config.toml

Or create it manually:

[providers]
default_provider = "ollama"

[providers.ollama]
model = "llama3.2"

4. Run G3

g3
# G3 will now use Ollama with llama3.2!

Configuration Options

Basic Configuration

[providers]
default_provider = "ollama"

[providers.ollama]
model = "llama3.2"

This is the minimal configuration needed. It uses all defaults:

  • Base URL: http://localhost:11434
  • Temperature: 0.7
  • Max tokens: Not limited (uses model default)

Full Configuration

[providers]
default_provider = "ollama"

[providers.ollama]
model = "llama3.2"
base_url = "http://localhost:11434"
max_tokens = 2048
temperature = 0.7

Custom Ollama Host

If you're running Ollama on a different machine or port:

[providers.ollama]
model = "llama3.2"
base_url = "http://192.168.1.100:11434"

Different Models

You can use any Ollama model:

[providers.ollama]
model = "qwen2.5:7b"  # Alibaba's Qwen model
[providers.ollama]
model = "mistral"  # Mistral AI
[providers.ollama]
model = "llama3.1:70b"  # Larger Llama model

Multiple Provider Configuration

You can configure multiple providers and switch between them:

[providers]
default_provider = "ollama"  # Default for most operations

# Ollama for local, fast responses
[providers.ollama]
model = "llama3.2:3b"
temperature = 0.7

# Databricks for more complex tasks
[providers.databricks]
host = "https://your-workspace.cloud.databricks.com"
model = "databricks-claude-sonnet-4"
max_tokens = 4096
temperature = 0.1
use_oauth = true

Then switch providers with:

g3 --provider databricks

Autonomous Mode (Coach-Player)

Use different providers for code review (coach) and implementation (player):

[providers]
default_provider = "ollama"
coach = "databricks"  # Use powerful cloud model for review
player = "ollama"     # Use local model for implementation

[providers.ollama]
model = "qwen2.5:14b"  # Larger local model for coding

[providers.databricks]
host = "https://your-workspace.cloud.databricks.com"
model = "databricks-claude-sonnet-4"
use_oauth = true

This gives you the best of both worlds:

  • Fast local execution for coding tasks
  • Powerful cloud review for quality assurance

For Coding Tasks

Model Size Speed Quality Notes
qwen2.5:7b 7B Fast Excellent Best balance for coding
llama3.2:3b 3B Very Fast Good Great for quick tasks
llama3.1:8b 8B Medium Very Good Solid all-rounder
mistral 7B Fast Good Good for general use

For Complex Tasks

Model Size Speed Quality Notes
qwen2.5:14b 14B Medium Excellent Best local model for coding
qwen2.5:32b 32B Slow Outstanding If you have the resources
llama3.1:70b 70B Very Slow Outstanding Requires significant RAM/GPU

Temperature Settings

Temperature controls randomness in responses:

  • 0.1-0.3: Deterministic, good for code generation
  • 0.5-0.7: Balanced, good for most tasks
  • 0.8-1.0: Creative, good for brainstorming
[providers.ollama]
model = "qwen2.5:7b"
temperature = 0.2  # Focused code generation

Max Tokens

Control response length:

[providers.ollama]
model = "llama3.2"
max_tokens = 1024  # Shorter responses
[providers.ollama]
model = "qwen2.5:7b"
max_tokens = 4096  # Longer, detailed responses

Leave it unset for model defaults (recommended).

Performance Tuning

GPU Acceleration

Ollama automatically uses GPU if available. To check:

ollama ps

Quantized Models

For faster responses with less RAM:

[providers.ollama]
model = "llama3.2:3b-q4_0"  # 4-bit quantization

Quantization options:

  • q4_0: 4-bit, fastest, lowest quality
  • q5_0: 5-bit, balanced
  • q8_0: 8-bit, slower, better quality

Multiple Models

You can pull multiple models and switch easily:

ollama pull llama3.2:3b    # Fast for chat
ollama pull qwen2.5:7b     # Better for code
ollama pull mistral        # General purpose

Then change your config:

[providers.ollama]
model = "qwen2.5:7b"  # Just change this line

Troubleshooting

Ollama Not Running

# Check if Ollama is running
curl http://localhost:11434/api/version

# Start Ollama (macOS/Linux)
ollama serve

# Or just run a model (auto-starts)
ollama run llama3.2

Model Not Found

# List available models
ollama list

# Pull the model
ollama pull llama3.2

Slow Responses

  1. Use a smaller model:

    model = "llama3.2:1b"  # Smallest, fastest
    
  2. Use quantized version:

    model = "llama3.2:3b-q4_0"
    
  3. Reduce max_tokens:

    max_tokens = 512
    

Out of Memory

  1. Switch to smaller model
  2. Use quantized version
  3. Close other applications
  4. Check GPU memory: ollama ps

Connection Refused

Check base_url is correct:

[providers.ollama]
model = "llama3.2"
base_url = "http://localhost:11434"  # Default

For remote Ollama:

base_url = "http://your-server:11434"

Complete Example Configs

Minimal Local Setup

[providers]
default_provider = "ollama"

[providers.ollama]
model = "llama3.2"

[agent]
max_context_length = 8192
enable_streaming = true
timeout_seconds = 60

Optimized for Coding

[providers]
default_provider = "ollama"

[providers.ollama]
model = "qwen2.5:7b"
temperature = 0.2
max_tokens = 2048

[agent]
max_context_length = 16384
enable_streaming = true
timeout_seconds = 120

Fast Responses

[providers]
default_provider = "ollama"

[providers.ollama]
model = "llama3.2:3b-q4_0"
temperature = 0.7
max_tokens = 1024

[agent]
max_context_length = 4096
enable_streaming = true
timeout_seconds = 30

High Quality (Requires Good Hardware)

[providers]
default_provider = "ollama"

[providers.ollama]
model = "qwen2.5:32b"
temperature = 0.3
max_tokens = 4096

[agent]
max_context_length = 32768
enable_streaming = true
timeout_seconds = 300

Hybrid (Local + Cloud)

[providers]
default_provider = "ollama"
coach = "databricks"
player = "ollama"

[providers.ollama]
model = "qwen2.5:14b"
temperature = 0.2

[providers.databricks]
host = "https://your-workspace.cloud.databricks.com"
model = "databricks-claude-sonnet-4"
use_oauth = true

[agent]
max_context_length = 16384
enable_streaming = true
timeout_seconds = 120

Environment Variables

You can override config with environment variables:

# Override model
G3_PROVIDERS_OLLAMA_MODEL=qwen2.5:7b g3

# Override base URL
G3_PROVIDERS_OLLAMA_BASE_URL=http://192.168.1.100:11434 g3

# Override default provider
G3_PROVIDERS_DEFAULT_PROVIDER=ollama g3

Best Practices

  1. Start Small: Begin with llama3.2:3b, scale up if needed
  2. Use Quantization: q4_0 or q5_0 for best speed/quality balance
  3. Match Task to Model:
    • Quick edits: 1B-3B models
    • Code generation: 7B-14B models
    • Complex refactoring: 14B-32B models
  4. Temperature for Code: Use 0.1-0.3 for deterministic output
  5. Enable Streaming: Always enable for better UX
  6. Local First: Use Ollama by default, cloud for special cases

Comparison with Other Providers

Feature Ollama Databricks OpenAI Anthropic
Cost Free Paid Paid Paid
Privacy Full Medium Low Low
Speed (small models) Fast Fast Medium Medium
Speed (large models) Slow Fast Fast Fast
Setup Complexity Low Medium Low Low
Authentication None OAuth/Token API Key API Key
Offline Support Yes No No No
Tool Calling Yes Yes Yes Yes

Next Steps

  1. Try different models: ollama pull mistral, ollama pull qwen2.5
  2. Experiment with temperature settings
  3. Set up hybrid config with cloud provider for complex tasks
  4. Share your config in the community!

Getting Help