Providers
Ollama
Run models locally on your machine
🦙
Ollama
Run models locally - free & private
Ollama
Run open-source LLMs locally. Free, private, no API keys needed.
Ollama is perfect for development, testing, or applications requiring data privacy.
Setup
1. Install Ollama
brew install ollamaOr download from ollama.ai
curl -fsSL https://ollama.ai/install.sh | shDownload from ollama.ai
2. Pull a Model
# Llama 3.1 (recommended)
ollama pull llama3.1
# Or smaller models
ollama pull llama3.1:8b
ollama pull mistral
ollama pull codellama3. Start Ollama Server
ollama serve
# Server runs on http://localhost:114344. Configure Provider
<YourGPTProvider
runtimeUrl="/api/chat"
llm={{
provider: 'ollama',
model: 'llama3.1',
baseUrl: 'http://localhost:11434',
}}
>
<CopilotChat />
</YourGPTProvider>Available Models
| Model | Size | RAM Needed | Best For |
|---|---|---|---|
llama3.1 | 8B | 8GB | General use |
llama3.1:70b | 70B | 48GB | High quality |
mistral | 7B | 8GB | Fast |
codellama | 7B | 8GB | Code |
llava | 7B | 8GB | Vision |
Larger models need more RAM. Start with 8B models if you have 16GB RAM.
Configuration Options
llm={{
provider: 'ollama',
model: 'llama3.1',
baseUrl: 'http://localhost:11434',
temperature: 0.7,
maxTokens: 4096,
// Ollama-specific options
numCtx: 4096, // Context window
numPredict: 128, // Max tokens to predict
}}Use Cases
Local Development
No API costs during development:
// Use Ollama locally, switch to OpenAI in production
const provider = process.env.NODE_ENV === 'development'
? { provider: 'ollama', model: 'llama3.1', baseUrl: 'http://localhost:11434' }
: { provider: 'openai', model: 'gpt-4o' };Data Privacy
All data stays on your machine:
// Sensitive data never leaves your network
<YourGPTProvider
llm={{ provider: 'ollama', model: 'llama3.1' }}
systemPrompt="Process this confidential data..."
>Offline Usage
Works without internet:
// Perfect for air-gapped environments
<YourGPTProvider
llm={{
provider: 'ollama',
model: 'llama3.1',
baseUrl: 'http://internal-server:11434',
}}
>Tool Calling
Llama 3.1 supports function calling:
useToolWithSchema({
name: 'local_search',
description: 'Search local files',
schema: z.object({
query: z.string(),
path: z.string().optional(),
}),
handler: async ({ query, path }) => {
// Search runs locally too
const results = await searchLocalFiles(query, path);
return { success: true, data: results };
},
});Performance Tips
- Use GPU acceleration - Ollama auto-detects NVIDIA/Apple Silicon
- Quantized models - Use
:q4variants for faster inference - Adjust context - Lower
numCtxfor faster responses
# Run with less memory
ollama run llama3.1:8b-q4_0Runtime Configuration
// app/api/chat/route.ts
const runtime = createRuntime({
providers: {
ollama: {
baseUrl: process.env.OLLAMA_BASE_URL || 'http://localhost:11434',
},
},
});Next Steps
- Custom Provider - Build your own
- Architecture - How it works