Token Tracking
Monitor context window usage in real time with useContextStats
Experimental — This feature is under active development. APIs may change before stable release.
Monitor how much of the AI's context window is being used — broken down by message history, system prompt, tools, and injected context.
useContextStats
import { useContextStats } from "@yourgpt/copilot-sdk/react";
function ContextMonitor() {
const {
contextUsage, // Full breakdown by bucket
totalTokens, // Convenience: total estimated tokens
usagePercent, // Convenience: window fill 0–1
contextChars, // Characters contributed by AI context injections
toolCount, // Number of currently registered tools
messageCount, // Visible (non-system) messages
lastResponseUsage, // Token usage from last assistant message
} = useContextStats();
return (
<div>
<p>{Math.round(usagePercent * 100)}% of context used</p>
<p>{totalTokens} tokens / {toolCount} tools</p>
{lastResponseUsage && (
<p>Last turn: {lastResponseUsage.total_tokens} tokens</p>
)}
</div>
);
}Return type
interface ContextStats {
contextUsage: ContextUsage | null; // null until first message
totalTokens: number;
usagePercent: number; // 0 until first message
contextChars: number;
toolCount: number;
messageCount: number;
lastResponseUsage: MessageTokenUsage | null;
}
interface MessageTokenUsage {
prompt_tokens: number;
completion_tokens: number;
total_tokens: number;
}Token Counting Utilities
Two tiers — pick the right trade-off between speed and accuracy.
Tier 1: Fast (zero dependencies)
Uses a chars / 3.5 heuristic. ~85–90% accurate for English. Always available, no bundle cost.
import {
estimateTokensFast,
estimateMessageTokens,
estimateMessagesTokens,
} from "@yourgpt/copilot-sdk/react";
const tokens = estimateTokensFast("Hello world"); // fast, synchronous
const msgTokens = estimateMessagesTokens(llmMessages);Tier 2: Accurate (lazy-loaded)
Uses gpt-tokenizer with the o200k_base encoding. Lazy-loaded only when called — no upfront bundle cost.
import {
countTokensAccurate,
countMessagesTokensAccurate,
} from "@yourgpt/copilot-sdk/react";
// Only loads gpt-tokenizer on first call
const tokens = await countTokensAccurate("Hello world");
const msgTokens = await countMessagesTokensAccurate(llmMessages);Set estimation mode in useMessageHistory
useMessageHistory({ tokenEstimation: "accurate" }); // "fast" | "accurate" | "off"Context Budget Enforcement
Experimental — APIs may change before stable release.
Automatically enforce per-bucket token limits so the prompt never overflows. Configured via optimization.contextBudget on <CopilotProvider>.
<CopilotProvider
optimization={{
contextBudget: {
enabled: true,
budget: {
contextWindowTokens: 128000, // Total window size
inputHeadroomRatio: 0.75, // Use 75% for input, reserve rest for output
systemPromptShare: 0.15, // 15% of input budget for system prompt
historyShare: 0.50, // 50% for conversation history
toolResultsShare: 0.30, // 30% for tool results
toolDefinitionsShare: 0.05, // 5% for tool definitions
},
enforcement: {
mode: "truncate", // "warn" | "truncate" | "error"
onBudgetExceeded: (usage) => {
console.warn("Budget exceeded", usage.total.percent);
},
},
monitoring: {
enabled: true,
onUsageUpdate: (usage) => trackMetrics(usage),
},
},
}}
>enforcement.mode | Behaviour |
|---|---|
"warn" | Logs a warning, sends full payload |
"truncate" | Trims content to fit — history first, then tool results |
"error" | Throws before sending if budget is exceeded |
Tool Result Truncation
Experimental — APIs may change before stable release.
Prevent a single large tool response from consuming the entire context. Configure under optimization.toolResultConfig:
<CopilotProvider
optimization={{
toolResultConfig: {
truncation: {
enabled: true,
strategy: "head-tail", // Keep first + last chunk, drop middle
maxContextShare: 0.3, // Tool results can use at most 30% of context
hardMaxChars: 40000, // Absolute cap regardless of context size
minKeepChars: 2000, // Always keep at least this much per result
preserveErrors: true, // Never truncate error results
},
},
}}
>strategy | Behaviour |
|---|---|
"head-tail" | Keeps the beginning and end of the result; drops the middle |
"head" | Keeps only the beginning |
"tail" | Keeps only the end |
A truncation notice is appended so the model knows the result was trimmed.
Example: Context Usage Indicator
import { useContextStats, useMessageHistory } from "@yourgpt/copilot-sdk/react";
export function ChatPanel() {
const { tokenUsage, isCompacting, compactSession } = useMessageHistory();
const { usagePercent, toolCount } = useContextStats();
return (
<div>
<p>
{Math.round(usagePercent * 100)}% context used · {toolCount} tools
</p>
{tokenUsage.isApproaching && (
<button onClick={() => compactSession()}>Compact now</button>
)}
{isCompacting && <span>Summarizing history…</span>}
</div>
);
}