Fallback
Automatic failover and load distribution across LLM providers
Automatically retry failed requests with backup models. When your primary provider returns a 5xx error, rate limit (429), or times out — the SDK silently tries the next model in your chain. Your application code doesn't change.
import { createFallbackChain } from '@yourgpt/llm-sdk/fallback';
import { createOpenAI } from '@yourgpt/llm-sdk/openai';
import { createAnthropic } from '@yourgpt/llm-sdk/anthropic';
const chain = createFallbackChain({
models: [
createOpenAI({ apiKey: '...' }).languageModel('gpt-5.4'),
createAnthropic({ apiKey: '...' }).languageModel('claude-haiku-4-5'),
],
});
const runtime = createRuntime({ adapter: chain });Installation
npm install @yourgpt/llm-sdk openai @anthropic-ai/sdkBasic Usage
Pass createFallbackChain() as the adapter in createRuntime(). The rest of your server code — streaming, tools, sessions — stays exactly the same.
import { createRuntime } from '@yourgpt/llm-sdk';
import { createFallbackChain } from '@yourgpt/llm-sdk/fallback';
import { createOpenAI } from '@yourgpt/llm-sdk/openai';
import { createAnthropic } from '@yourgpt/llm-sdk/anthropic';
const openai = createOpenAI({ apiKey: process.env.OPENAI_API_KEY });
const anthropic = createAnthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
const runtime = createRuntime({
adapter: createFallbackChain({
models: [
openai.languageModel('gpt-5.4'), // tried first
anthropic.languageModel('claude-haiku-4-5'), // tried if OpenAI fails
],
}),
systemPrompt: 'You are a helpful assistant.',
});
// Use exactly as normal — no other changes
app.post('/api/chat', async (req, res) => {
await runtime.stream(req.body).pipeToResponse(res);
});What Triggers Fallback
| Error type | Triggers fallback? |
|---|---|
5xx server errors | ✅ Yes |
429 rate limit | ✅ Yes |
| Network timeout / connection refused | ✅ Yes |
4xx client errors (bad key, bad request) | ❌ No — these are your bugs, not provider failures |
Once content has started streaming to the client, fallback is not attempted. You cannot restart a stream mid-flight.
Routing Strategies
Control which model is tried first on each request.
Always tries models in the order defined. First model handles all traffic until it fails.
createFallbackChain({
models: [primaryModel, backupModel],
strategy: 'priority', // default — can be omitted
});Distributes load evenly. Request 1 starts at model A, request 2 starts at model B, and so on. If the starting model fails, the chain falls through to the next one as usual.
createFallbackChain({
models: [openaiModel, anthropicModel],
strategy: 'round-robin',
});Multi-instance deployments: Round-robin state is in-memory by default and resets on restart. For shared state across instances, plug in a custom store (Redis, Upstash, etc.) via the store option.
Per-Model Retries
Retry the same model before moving to the next one. Useful for transient errors like brief rate limits or flaky connections.
createFallbackChain({
models: [openaiModel, anthropicModel],
retries: 2, // retry each model up to 2 times
retryDelay: 500, // base delay: 500ms
retryBackoff: 'exponential', // 500ms → 1000ms → 2000ms (default)
onRetry: ({ model, retryAttempt, maxRetries, delayMs, error }) => {
console.warn(`[retry] ${model} attempt ${retryAttempt}/${maxRetries} — waiting ${delayMs}ms`);
},
});Backoff options:
retryBackoff | Pattern (retryDelay=500) |
|---|---|
exponential (default) | 500ms → 1000ms → 2000ms |
fixed | 500ms → 500ms → 500ms |
With retries: 2, each model gets 3 total attempts (1 initial + 2 retries) before the chain moves to the next model.
Observability Callbacks
Two callbacks give you visibility into what the chain is doing.
createFallbackChain({
models: [openaiModel, anthropicModel, googleModel],
// Fires on each per-model retry (before the wait delay)
onRetry: ({ model, provider, error, retryAttempt, maxRetries, delayMs }) => {
console.warn(`[retry] ${provider}/${model} — attempt ${retryAttempt}/${maxRetries}`);
metrics.increment('llm.retry', { provider, model });
},
// Fires when a model is abandoned and the next one is tried
onFallback: ({ attemptedModel, nextModel, error, attempt }) => {
console.warn(`[fallback] ${attemptedModel} → ${nextModel}: ${error.message}`);
metrics.increment('llm.fallback', { from: attemptedModel, to: nextModel });
},
});Handling Full Failure
When every model in the chain fails, FallbackExhaustedError is thrown. It includes a per-model breakdown of what failed.
import { FallbackExhaustedError } from '@yourgpt/llm-sdk/fallback';
try {
await runtime.stream(req.body).pipeToResponse(res);
} catch (err) {
if (err instanceof FallbackExhaustedError) {
// Per-model breakdown
for (const f of err.failures) {
console.error(
`${f.provider}/${f.model} failed after ${f.retriesAttempted} retries: ${f.error.message}`
);
}
res.status(503).json({ error: 'All models unavailable. Try again later.' });
}
}Custom Error Filtering
By default the chain uses sensible rules (5xx, 429, network errors trigger fallback; 4xx does not). Override with retryableErrors for custom logic.
createFallbackChain({
models: [openaiModel, anthropicModel],
// Fall back on any error at all
retryableErrors: () => true,
// Or — only fall back on rate limits
retryableErrors: (err) => {
return err instanceof Error && /429|rate.?limit/i.test(err.message);
},
});Shared Routing Store (Multi-Instance)
For round-robin to work correctly across multiple server instances or serverless functions, plug in a shared store.
import { createFallbackChain, type RoutingStore } from '@yourgpt/llm-sdk/fallback';
// Implement RoutingStore with any backend — Redis, Upstash, Cloudflare KV, etc.
// The SDK ships the interface. You own the implementation.
const redisStore: RoutingStore = {
async get(key) {
const val = await redis.get(key);
return val ? Number(val) : undefined;
},
async set(key, value) {
await redis.set(key, String(value));
},
};
createFallbackChain({
models: [openaiModel, anthropicModel],
strategy: 'round-robin',
store: redisStore,
});The default MemoryRoutingStore is zero-config and works for single-process apps. No store configuration is needed unless you run multiple instances.
With Tools
Tools work transparently across fallback providers. Define your tools once — whichever provider handles the request formats them natively.
const runtime = createRuntime({
adapter: createFallbackChain({
models: [openaiModel, anthropicModel],
}),
tools: [
{
name: 'get_weather',
description: 'Get current weather for a city',
location: 'server',
inputSchema: {
type: 'object',
properties: { city: { type: 'string' } },
required: ['city'],
},
handler: async ({ city }) => fetchWeather(city),
},
],
});OpenAI receives tools as function-calling JSON. Anthropic receives them as tool_use blocks. Your handler always runs on your server regardless of which provider responded.
Full Configuration Reference
import { createFallbackChain } from '@yourgpt/llm-sdk/fallback';
createFallbackChain({
// Required: adapters to try in order
models: LLMAdapter[],
// Routing strategy (default: 'priority')
strategy?: 'priority' | 'round-robin',
// Pluggable store for round-robin state (default: MemoryRoutingStore)
store?: RoutingStore,
// Retries per model before moving to next (default: 0)
retries?: number,
// Base delay between retries in ms (default: 500)
retryDelay?: number,
// Backoff strategy (default: 'exponential')
retryBackoff?: 'exponential' | 'fixed',
// Called on each per-model retry attempt
onRetry?: (info: RetryInfo) => void,
// Called when a model is abandoned and next one is tried
onFallback?: (info: FallbackInfo) => void,
// Custom predicate to decide which errors trigger fallback/retry
retryableErrors?: (error: unknown) => boolean,
})Next Steps
- OpenAI — Configure your primary provider
- Anthropic — Add a Claude fallback
- Server Storage — Persist sessions alongside fallback chains