Providers

Ollama

Run models locally on your machine

🦙

Ollama

Run models locally - free & private

Ollama

Run open-source LLMs locally. Free, private, no API keys needed.

Ollama is perfect for development, testing, or applications requiring data privacy.


Setup

1. Install Ollama

brew install ollama

Or download from ollama.ai

curl -fsSL https://ollama.ai/install.sh | sh

Download from ollama.ai

2. Pull a Model

# Llama 3.1 (recommended)
ollama pull llama3.1

# Or smaller models
ollama pull llama3.1:8b
ollama pull mistral
ollama pull codellama

3. Start Ollama Server

ollama serve
# Server runs on http://localhost:11434

4. Configure Provider

<YourGPTProvider
  runtimeUrl="/api/chat"
  llm={{
    provider: 'ollama',
    model: 'llama3.1',
    baseUrl: 'http://localhost:11434',
  }}
>
  <CopilotChat />
</YourGPTProvider>

Available Models

ModelSizeRAM NeededBest For
llama3.18B8GBGeneral use
llama3.1:70b70B48GBHigh quality
mistral7B8GBFast
codellama7B8GBCode
llava7B8GBVision

Larger models need more RAM. Start with 8B models if you have 16GB RAM.


Configuration Options

llm={{
  provider: 'ollama',
  model: 'llama3.1',
  baseUrl: 'http://localhost:11434',
  temperature: 0.7,
  maxTokens: 4096,
  // Ollama-specific options
  numCtx: 4096,           // Context window
  numPredict: 128,        // Max tokens to predict
}}

Use Cases

Local Development

No API costs during development:

// Use Ollama locally, switch to OpenAI in production
const provider = process.env.NODE_ENV === 'development'
  ? { provider: 'ollama', model: 'llama3.1', baseUrl: 'http://localhost:11434' }
  : { provider: 'openai', model: 'gpt-4o' };

Data Privacy

All data stays on your machine:

// Sensitive data never leaves your network
<YourGPTProvider
  llm={{ provider: 'ollama', model: 'llama3.1' }}
  systemPrompt="Process this confidential data..."
>

Offline Usage

Works without internet:

// Perfect for air-gapped environments
<YourGPTProvider
  llm={{
    provider: 'ollama',
    model: 'llama3.1',
    baseUrl: 'http://internal-server:11434',
  }}
>

Tool Calling

Llama 3.1 supports function calling:

useToolWithSchema({
  name: 'local_search',
  description: 'Search local files',
  schema: z.object({
    query: z.string(),
    path: z.string().optional(),
  }),
  handler: async ({ query, path }) => {
    // Search runs locally too
    const results = await searchLocalFiles(query, path);
    return { success: true, data: results };
  },
});

Performance Tips

  1. Use GPU acceleration - Ollama auto-detects NVIDIA/Apple Silicon
  2. Quantized models - Use :q4 variants for faster inference
  3. Adjust context - Lower numCtx for faster responses
# Run with less memory
ollama run llama3.1:8b-q4_0

Runtime Configuration

// app/api/chat/route.ts
const runtime = createRuntime({
  providers: {
    ollama: {
      baseUrl: process.env.OLLAMA_BASE_URL || 'http://localhost:11434',
    },
  },
});

Next Steps

On this page