alex/g3

Fork 0

Files

Michael Neale 217df2f2af ollama support

2025-11-05 12:17:01 +11:00

8.6 KiB

Raw Blame History

Configuring Ollama Provider in G3

This guide shows you how to configure G3 to use Ollama as your LLM provider.

Quick Start

1. Install Ollama

# Visit https://ollama.ai to download and install
# Or use curl:
curl https://ollama.ai/install.sh | sh

2. Pull a Model

ollama pull llama3.2
# or any other model you prefer

3. Create Configuration File

Copy the example configuration:

cp config.ollama.example.toml ~/.config/g3/config.toml

Or create it manually:

[providers]
default_provider = "ollama"

[providers.ollama]
model = "llama3.2"

4. Run G3

g3
# G3 will now use Ollama with llama3.2!

Configuration Options

Basic Configuration

[providers]
default_provider = "ollama"

[providers.ollama]
model = "llama3.2"

This is the minimal configuration needed. It uses all defaults:

Base URL: http://localhost:11434
Temperature: 0.7
Max tokens: Not limited (uses model default)

Full Configuration

[providers]
default_provider = "ollama"

[providers.ollama]
model = "llama3.2"
base_url = "http://localhost:11434"
max_tokens = 2048
temperature = 0.7

Custom Ollama Host

If you're running Ollama on a different machine or port:

[providers.ollama]
model = "llama3.2"
base_url = "http://192.168.1.100:11434"

Different Models

You can use any Ollama model:

[providers.ollama]
model = "qwen2.5:7b"  # Alibaba's Qwen model

[providers.ollama]
model = "mistral"  # Mistral AI

[providers.ollama]
model = "llama3.1:70b"  # Larger Llama model

Multiple Provider Configuration

You can configure multiple providers and switch between them:

[providers]
default_provider = "ollama"  # Default for most operations

# Ollama for local, fast responses
[providers.ollama]
model = "llama3.2:3b"
temperature = 0.7

# Databricks for more complex tasks
[providers.databricks]
host = "https://your-workspace.cloud.databricks.com"
model = "databricks-claude-sonnet-4"
max_tokens = 4096
temperature = 0.1
use_oauth = true

Then switch providers with:

g3 --provider databricks

Autonomous Mode (Coach-Player)

Use different providers for code review (coach) and implementation (player):

[providers]
default_provider = "ollama"
coach = "databricks"  # Use powerful cloud model for review
player = "ollama"     # Use local model for implementation

[providers.ollama]
model = "qwen2.5:14b"  # Larger local model for coding

[providers.databricks]
host = "https://your-workspace.cloud.databricks.com"
model = "databricks-claude-sonnet-4"
use_oauth = true

This gives you the best of both worlds:

Fast local execution for coding tasks
Powerful cloud review for quality assurance

Recommended Models

For Coding Tasks

Model	Size	Speed	Quality	Notes
qwen2.5:7b	7B	Fast	Excellent	Best balance for coding
llama3.2:3b	3B	Very Fast	Good	Great for quick tasks
llama3.1:8b	8B	Medium	Very Good	Solid all-rounder
mistral	7B	Fast	Good	Good for general use

For Complex Tasks

Model	Size	Speed	Quality	Notes
qwen2.5:14b	14B	Medium	Excellent	Best local model for coding
qwen2.5:32b	32B	Slow	Outstanding	If you have the resources
llama3.1:70b	70B	Very Slow	Outstanding	Requires significant RAM/GPU

Temperature Settings

Temperature controls randomness in responses:

0.1-0.3: Deterministic, good for code generation
0.5-0.7: Balanced, good for most tasks
0.8-1.0: Creative, good for brainstorming

[providers.ollama]
model = "qwen2.5:7b"
temperature = 0.2  # Focused code generation

Max Tokens

Control response length:

[providers.ollama]
model = "llama3.2"
max_tokens = 1024  # Shorter responses

[providers.ollama]
model = "qwen2.5:7b"
max_tokens = 4096  # Longer, detailed responses

Leave it unset for model defaults (recommended).

Performance Tuning

GPU Acceleration

Ollama automatically uses GPU if available. To check:

ollama ps

Quantized Models

For faster responses with less RAM:

[providers.ollama]
model = "llama3.2:3b-q4_0"  # 4-bit quantization

Quantization options:

q4_0: 4-bit, fastest, lowest quality
q5_0: 5-bit, balanced
q8_0: 8-bit, slower, better quality

Multiple Models

You can pull multiple models and switch easily:

ollama pull llama3.2:3b    # Fast for chat
ollama pull qwen2.5:7b     # Better for code
ollama pull mistral        # General purpose

Then change your config:

[providers.ollama]
model = "qwen2.5:7b"  # Just change this line

Troubleshooting

Ollama Not Running

# Check if Ollama is running
curl http://localhost:11434/api/version

# Start Ollama (macOS/Linux)
ollama serve

# Or just run a model (auto-starts)
ollama run llama3.2

Model Not Found

# List available models
ollama list

# Pull the model
ollama pull llama3.2

Slow Responses

Use a smaller model:

model = "llama3.2:1b"  # Smallest, fastest

Use quantized version:
```
model = "llama3.2:3b-q4_0"
```
Reduce max_tokens:
```
max_tokens = 512
```

Out of Memory

Switch to smaller model
Use quantized version
Close other applications
Check GPU memory: ollama ps

Connection Refused

Check base_url is correct:

[providers.ollama]
model = "llama3.2"
base_url = "http://localhost:11434"  # Default

For remote Ollama:

base_url = "http://your-server:11434"

Complete Example Configs

Minimal Local Setup

[providers]
default_provider = "ollama"

[providers.ollama]
model = "llama3.2"

[agent]
max_context_length = 8192
enable_streaming = true
timeout_seconds = 60

Optimized for Coding

[providers]
default_provider = "ollama"

[providers.ollama]
model = "qwen2.5:7b"
temperature = 0.2
max_tokens = 2048

[agent]
max_context_length = 16384
enable_streaming = true
timeout_seconds = 120

Fast Responses

[providers]
default_provider = "ollama"

[providers.ollama]
model = "llama3.2:3b-q4_0"
temperature = 0.7
max_tokens = 1024

[agent]
max_context_length = 4096
enable_streaming = true
timeout_seconds = 30

High Quality (Requires Good Hardware)

[providers]
default_provider = "ollama"

[providers.ollama]
model = "qwen2.5:32b"
temperature = 0.3
max_tokens = 4096

[agent]
max_context_length = 32768
enable_streaming = true
timeout_seconds = 300

Hybrid (Local + Cloud)

[providers]
default_provider = "ollama"
coach = "databricks"
player = "ollama"

[providers.ollama]
model = "qwen2.5:14b"
temperature = 0.2

[providers.databricks]
host = "https://your-workspace.cloud.databricks.com"
model = "databricks-claude-sonnet-4"
use_oauth = true

[agent]
max_context_length = 16384
enable_streaming = true
timeout_seconds = 120

Environment Variables

You can override config with environment variables:

# Override model
G3_PROVIDERS_OLLAMA_MODEL=qwen2.5:7b g3

# Override base URL
G3_PROVIDERS_OLLAMA_BASE_URL=http://192.168.1.100:11434 g3

# Override default provider
G3_PROVIDERS_DEFAULT_PROVIDER=ollama g3

Best Practices

Start Small: Begin with llama3.2:3b, scale up if needed
Use Quantization: q4_0 or q5_0 for best speed/quality balance
Match Task to Model:
- Quick edits: 1B-3B models
- Code generation: 7B-14B models
- Complex refactoring: 14B-32B models
Temperature for Code: Use 0.1-0.3 for deterministic output
Enable Streaming: Always enable for better UX
Local First: Use Ollama by default, cloud for special cases

Comparison with Other Providers

Feature	Ollama	Databricks	OpenAI	Anthropic
Cost	Free	Paid	Paid	Paid
Privacy	Full	Medium	Low	Low
Speed (small models)	Fast	Fast	Medium	Medium
Speed (large models)	Slow	Fast	Fast	Fast
Setup Complexity	Low	Medium	Low	Low
Authentication	None	OAuth/Token	API Key	API Key
Offline Support	Yes	No	No	No
Tool Calling	Yes	Yes	Yes	Yes

Next Steps

Try different models: ollama pull mistral, ollama pull qwen2.5
Experiment with temperature settings
Set up hybrid config with cloud provider for complex tasks
Share your config in the community!

Getting Help

Ollama docs: https://ollama.ai/docs
G3 issues: https://github.com/your-repo/issues
Test your config: g3 --help

8.6 KiB Raw Blame History