# Ollama Provider for g3 A simple, local LLM provider implementation for g3 that connects to Ollama. ## Features - ✅ **Simple Setup**: No API keys or authentication required - ✅ **Local Execution**: Runs entirely on your machine - ✅ **Tool Calling Support**: Native tool calling for compatible models - ✅ **Streaming**: Full streaming support with real-time responses - ✅ **Flexible Configuration**: Custom base URL, temperature, and max tokens - ✅ **Model Discovery**: Automatic detection of available models ## Quick Start ### Prerequisites 1. Install and start Ollama: https://ollama.ai 2. Pull a model: `ollama pull llama3.2` ### Basic Usage ```rust use g3_providers::{OllamaProvider, LLMProvider, CompletionRequest, Message, MessageRole}; #[tokio::main] async fn main() -> anyhow::Result<()> { // Create provider with default settings (localhost:11434) let provider = OllamaProvider::new( "llama3.2".to_string(), None, // base_url: defaults to http://localhost:11434 None, // max_tokens: optional None, // temperature: defaults to 0.7 )?; // Create a simple request let request = CompletionRequest { messages: vec![ Message { role: MessageRole::User, content: "What is the capital of France?".to_string(), }, ], max_tokens: Some(1000), temperature: Some(0.7), stream: false, tools: None, }; // Get completion let response = provider.complete(request).await?; println!("Response: {}", response.content); println!("Tokens: {}", response.usage.total_tokens); Ok(()) } ``` ### Streaming Example ```rust use futures_util::StreamExt; let request = CompletionRequest { messages: vec![ Message { role: MessageRole::User, content: "Write a short poem about coding".to_string(), }, ], max_tokens: Some(500), temperature: Some(0.8), stream: true, tools: None, }; let mut stream = provider.stream(request).await?; while let Some(chunk_result) = stream.next().await { match chunk_result { Ok(chunk) => { print!("{}", chunk.content); if chunk.finished { println!("\n\nDone!"); if let Some(usage) = chunk.usage { println!("Total tokens: {}", usage.total_tokens); } } } Err(e) => eprintln!("Error: {}", e), } } ``` ### Tool Calling Example ```rust use serde_json::json; let tools = vec![Tool { name: "get_weather".to_string(), description: "Get current weather for a location".to_string(), input_schema: json!({ "type": "object", "properties": { "location": { "type": "string", "description": "City name" }, "unit": { "type": "string", "enum": ["celsius", "fahrenheit"], "description": "Temperature unit" } }, "required": ["location"] }), }]; let request = CompletionRequest { messages: vec![ Message { role: MessageRole::User, content: "What's the weather in Paris?".to_string(), }, ], max_tokens: Some(500), temperature: Some(0.5), stream: false, tools: Some(tools), }; let response = provider.complete(request).await?; println!("Response: {}", response.content); ``` ### Custom Ollama Host ```rust // Connect to remote Ollama instance let provider = OllamaProvider::new( "llama3.2".to_string(), Some("http://192.168.1.100:11434".to_string()), None, None, )?; ``` ### Fetch Available Models ```rust // Discover what models are available let models = provider.fetch_available_models().await?; println!("Available models:"); for model in models { println!(" - {}", model); } ``` ## Supported Models The provider works with any Ollama model, including: - **llama3.2** (1B, 3B) - Meta's latest Llama models - **llama3.1** (8B, 70B, 405B) - Previous generation - **qwen2.5** (7B, 14B, 32B) - Alibaba's Qwen models - **mistral** - Mistral AI models - **mixtral** - Mixture of experts model - **phi3** - Microsoft's Phi-3 - **gemma2** - Google's Gemma 2 ## Configuration ### Constructor Parameters ```rust OllamaProvider::new( model: String, // Model name (e.g., "llama3.2") base_url: Option, // Ollama API URL (default: http://localhost:11434) max_tokens: Option, // Maximum tokens to generate (optional) temperature: Option, // Sampling temperature (default: 0.7) ) ``` ### Request Options ```rust CompletionRequest { messages: Vec, // Conversation history max_tokens: Option, // Override provider's max_tokens temperature: Option, // Override provider's temperature stream: bool, // Enable streaming responses tools: Option>, // Tools for function calling } ``` ## Comparison with Other Providers | Feature | Ollama | OpenAI | Anthropic | Databricks | |---------|--------|--------|-----------|------------| | Local Execution | ✅ | ❌ | ❌ | ❌ | | Authentication | None | API Key | API Key | OAuth/Token | | Tool Calling | ✅ | ✅ | ✅ | ✅ | | Streaming | ✅ | ✅ | ✅ | ✅ | | Cost | Free | Paid | Paid | Paid | | Privacy | High | Low | Low | Medium | ## Implementation Details ### API Endpoints - **Chat Completion**: `POST /api/chat` - **Model List**: `GET /api/tags` ### Response Format Ollama uses a simple JSON-per-line streaming format: ```json {"message":{"role":"assistant","content":"Hello"},"done":false} {"message":{"role":"assistant","content":" there"},"done":false} {"done":true,"prompt_eval_count":10,"eval_count":20} ``` ### Tool Call Format Tool calls are returned in the message structure: ```json { "message": { "role": "assistant", "content": "", "tool_calls": [ { "function": { "name": "get_weather", "arguments": {"location": "Paris", "unit": "celsius"} } } ] }, "done": true } ``` ## Troubleshooting ### Connection Errors If you see connection errors, ensure Ollama is running: ```bash # Check if Ollama is running curl http://localhost:11434/api/version # Start Ollama (if needed) ollama serve ``` ### Model Not Found Pull the model first: ```bash ollama pull llama3.2 ollama list # Check available models ``` ### Performance Issues - Use smaller models (1B, 3B) for faster responses - Reduce `max_tokens` to limit generation length - Enable GPU acceleration if available - Consider quantized models (e.g., `llama3.2:3b-q4_0`) ## Testing Run the included tests: ```bash cargo test --package g3-providers ollama ``` All tests should pass: ``` running 4 tests test ollama::tests::test_custom_base_url ... ok test ollama::tests::test_message_conversion ... ok test ollama::tests::test_provider_creation ... ok test ollama::tests::test_tool_conversion ... ok ``` ## Architecture The provider follows the same architecture as other g3 providers: 1. **OllamaProvider**: Main struct implementing `LLMProvider` trait 2. **Request/Response Structures**: Internal types for Ollama API 3. **Streaming Parser**: Handles line-by-line JSON parsing 4. **Tool Call Handling**: Accumulates and converts tool calls 5. **Error Handling**: Robust error handling with retries ## Contributing The provider is part of the g3-providers crate. To contribute: 1. Add features to `ollama.rs` 2. Update tests 3. Run `cargo test --package g3-providers` 4. Update this documentation ## License Same as the g3 project.