Token Tracking

Experimental — This feature is under active development. APIs may change before stable release.

Monitor how much of the AI's context window is being used — broken down by message history, system prompt, tools, and injected context.

useContextStats

import { useContextStats } from "@yourgpt/copilot-sdk/react";

function ContextMonitor() {
  const {
    contextUsage,       // Full breakdown by bucket
    totalTokens,        // Convenience: total estimated tokens
    usagePercent,       // Convenience: window fill 0–1
    contextChars,       // Characters contributed by AI context injections
    toolCount,          // Number of currently registered tools
    messageCount,       // Visible (non-system) messages
    lastResponseUsage,  // Token usage from last assistant message
  } = useContextStats();

  return (
    <div>
      <p>{Math.round(usagePercent * 100)}% of context used</p>
      <p>{totalTokens} tokens / {toolCount} tools</p>
      {lastResponseUsage && (
        <p>Last turn: {lastResponseUsage.total_tokens} tokens</p>
      )}
    </div>
  );
}

Return type

interface ContextStats {
  contextUsage: ContextUsage | null;        // null until first message
  totalTokens: number;
  usagePercent: number;                     // 0 until first message
  contextChars: number;
  toolCount: number;
  messageCount: number;
  lastResponseUsage: MessageTokenUsage | null;
}

interface MessageTokenUsage {
  prompt_tokens: number;
  completion_tokens: number;
  total_tokens: number;
}

Token Counting Utilities

Two tiers — pick the right trade-off between speed and accuracy.

Tier 1: Fast (zero dependencies)

Uses a chars / 3.5 heuristic. ~85–90% accurate for English. Always available, no bundle cost.

import {
  estimateTokensFast,
  estimateMessageTokens,
  estimateMessagesTokens,
} from "@yourgpt/copilot-sdk/react";

const tokens = estimateTokensFast("Hello world");        // fast, synchronous
const msgTokens = estimateMessagesTokens(llmMessages);

Tier 2: Accurate (lazy-loaded)

Uses gpt-tokenizer with the o200k_base encoding. Lazy-loaded only when called — no upfront bundle cost.

import {
  countTokensAccurate,
  countMessagesTokensAccurate,
} from "@yourgpt/copilot-sdk/react";

// Only loads gpt-tokenizer on first call
const tokens = await countTokensAccurate("Hello world");
const msgTokens = await countMessagesTokensAccurate(llmMessages);

Set estimation mode in `useMessageHistory`

useMessageHistory({ tokenEstimation: "accurate" }); // "fast" | "accurate" | "off"

Context Budget Enforcement

Experimental — APIs may change before stable release.

Automatically enforce per-bucket token limits so the prompt never overflows. Configured via optimization.contextBudget on <CopilotProvider>.

<CopilotProvider
  optimization={{
    contextBudget: {
      enabled: true,
      budget: {
        contextWindowTokens: 128000,   // Total window size
        inputHeadroomRatio: 0.75,      // Use 75% for input, reserve rest for output
        systemPromptShare: 0.15,       // 15% of input budget for system prompt
        historyShare: 0.50,            // 50% for conversation history
        toolResultsShare: 0.30,        // 30% for tool results
        toolDefinitionsShare: 0.05,    // 5% for tool definitions
      },
      enforcement: {
        mode: "truncate",              // "warn" | "truncate" | "error"
        onBudgetExceeded: (usage) => {
          console.warn("Budget exceeded", usage.total.percent);
        },
      },
      monitoring: {
        enabled: true,
        onUsageUpdate: (usage) => trackMetrics(usage),
      },
    },
  }}
>

`enforcement.mode`	Behaviour
`"warn"`	Logs a warning, sends full payload
`"truncate"`	Trims content to fit — history first, then tool results
`"error"`	Throws before sending if budget is exceeded

Tool Result Truncation

Experimental — APIs may change before stable release.

Prevent a single large tool response from consuming the entire context. Configure under optimization.toolResultConfig:

<CopilotProvider
  optimization={{
    toolResultConfig: {
      truncation: {
        enabled: true,
        strategy: "head-tail",     // Keep first + last chunk, drop middle
        maxContextShare: 0.3,      // Tool results can use at most 30% of context
        hardMaxChars: 40000,       // Absolute cap regardless of context size
        minKeepChars: 2000,        // Always keep at least this much per result
        preserveErrors: true,      // Never truncate error results
      },
    },
  }}
>

`strategy`	Behaviour
`"head-tail"`	Keeps the beginning and end of the result; drops the middle
`"head"`	Keeps only the beginning
`"tail"`	Keeps only the end

A truncation notice is appended so the model knows the result was trimmed.

Example: Context Usage Indicator

import { useContextStats, useMessageHistory } from "@yourgpt/copilot-sdk/react";

export function ChatPanel() {
  const { tokenUsage, isCompacting, compactSession } = useMessageHistory();
  const { usagePercent, toolCount } = useContextStats();

  return (
    <div>
      <p>
        {Math.round(usagePercent * 100)}% context used · {toolCount} tools
      </p>
      {tokenUsage.isApproaching && (
        <button onClick={() => compactSession()}>Compact now</button>
      )}
      {isCompacting && <span>Summarizing history…</span>}
    </div>
  );
}

Token Tracking

On this page