Together AI
Cost-effective open-source model inference — Llama, DeepSeek, Qwen, Gemma and more
Together AI is a high-performance inference platform for open-source models. It offers fast, scalable serving for Llama, DeepSeek, Qwen, Gemma, Mistral and many others through an OpenAI-compatible API.
Setup
1. Install packages
npm install @yourgpt/copilot-sdk @yourgpt/llm-sdk openaiTogether AI uses an OpenAI-compatible API, so the openai package is the only peer dependency needed.
2. Get API key
Sign up and get your API key at api.together.xyz/settings/api-keys.
3. Add environment variable
TOGETHER_API_KEY=your-key-here4. Streaming API route
import { streamText } from '@yourgpt/llm-sdk';
import { togetherai } from '@yourgpt/llm-sdk/togetherai';
export async function POST(req: Request) {
const { messages } = await req.json();
const result = await streamText({
model: togetherai('meta-llama/Llama-3.3-70B-Instruct-Turbo'),
system: 'You are a helpful assistant.',
messages,
});
return result.toTextStreamResponse();
}5. Generate text
import { generateText } from '@yourgpt/llm-sdk';
import { togetherai } from '@yourgpt/llm-sdk/togetherai';
const result = await generateText({
model: togetherai('deepseek-ai/DeepSeek-V3'),
prompt: 'Explain quantum entanglement simply.',
});
console.log(result.text);Available Models
// DeepSeek
togetherai('deepseek-ai/DeepSeek-V3') // 128K ctx, tools
togetherai('deepseek-ai/DeepSeek-V3.1') // 128K ctx, tools
togetherai('deepseek-ai/DeepSeek-R1') // reasoning model
// Llama
togetherai('meta-llama/Llama-3.3-70B-Instruct-Turbo') // 131K ctx, fast
togetherai('meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo') // 130K ctx
togetherai('meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo')
togetherai('meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo')
// Qwen
togetherai('Qwen/Qwen3.5-397B-A17B') // 262K ctx
togetherai('Qwen/Qwen3.5-9B')
// Gemma
togetherai('google/gemma-4-31B-it')
// Kimi
togetherai('moonshotai/Kimi-K2.5') // 262K ctx
// GLM
togetherai('zai-org/GLM-5.1') // 202K ctxAny model ID listed on together.ai/models works.
Configuration
import { togetherai } from '@yourgpt/llm-sdk/togetherai';
// Explicit API key
const model = togetherai('meta-llama/Llama-3.3-70B-Instruct-Turbo', {
apiKey: 'your-key',
});
// Custom base URL (e.g. self-hosted or proxy)
const model = togetherai('meta-llama/Llama-3.3-70B-Instruct-Turbo', {
baseURL: 'https://my-proxy.example.com/v1',
});Tool Calling
Many Together AI models support tool calling:
import { generateText, tool } from '@yourgpt/llm-sdk';
import { togetherai } from '@yourgpt/llm-sdk/togetherai';
import { z } from 'zod';
const result = await generateText({
model: togetherai('meta-llama/Llama-3.3-70B-Instruct-Turbo'),
prompt: 'What is the weather in Miami?',
tools: {
getWeather: tool({
description: 'Get weather for a city',
parameters: z.object({ city: z.string() }),
execute: async ({ city }) => ({ temperature: 82, condition: 'sunny' }),
}),
},
maxSteps: 5,
});deepseek-ai/DeepSeek-R1 is a reasoning model and does not support tool calling. Use DeepSeek-V3 or a Llama model for tool use.
With Copilot UI
'use client';
import { CopilotProvider } from '@yourgpt/copilot-sdk/react';
export function Providers({ children }: { children: React.ReactNode }) {
return (
<CopilotProvider runtimeUrl="/api/chat">
{children}
</CopilotProvider>
);
}Next Steps
- Fireworks - Another fast open-source model platform
- OpenRouter - Access 500+ models with one API key
- Fallback Chain - Automatic failover between providers
- generateText() - Full LLM SDK reference