LLM Integration Reference

Canonical context for AI assistants implementing UsageTap integrations. Link to this page inside prompt kits instead of duplicating details.

Short Include Snippet (copy into your prompts):

(Reference: https://usagetap.com/llmreference — contains API flow, endpoints, auth, usage fields, SDK patterns, security rules)

📚 Complete Documentation:

• Integration Guide — One-shot integration examples for Next.js, Express, Node.js, React
• API Reference — Complete REST API & SDK documentation with all endpoints and types
• API & SDK Examples — Interactive code examples and prompt kits

0. API vs SDK: Quick Decision Guide

Choose your integration approach based on your environment and requirements:

✅ Use SDK When:

•You're using JavaScript/TypeScript in Node.js 18+, Next.js, Express, or edge runtimes
•You want automatic begin/end handling with error recovery
•You're integrating with OpenAI or OpenRouter (wrapOpenAI, wrapFetch)
•You need streaming support with automatic finalization
•You want built-in retries and idempotency handling
•You're building production apps that need reliability

Quick Start:

npm install @usagetap/sdk openai

import { UsageTapClient } from "@usagetap/sdk";
import { wrapOpenAI } from "@usagetap/sdk/openai";

const usageTap = new UsageTapClient({
  apiKey: process.env.USAGETAP_API_KEY!
});

const ai = wrapOpenAI(openai, usageTap, {
  defaultContext: { 
    customerId: "user_123",
    feature: "chat.send"
  }
});

// That's it! Use OpenAI SDK normally

🔧 Use Direct API When:

•You're using Python, Go, Ruby, Java, or other languages
•You need to integrate from serverless functions without Node.js
•You want full control over HTTP requests and retry logic
•You're integrating non-OpenAI vendors (Anthropic, Cohere, custom APIs)
•You need minimal dependencies or custom HTTP clients
•You're building CLIs, background jobs, or batch processors

Quick Start:

POST https://api.usagetap.com/call_begin
Headers:
  x-api-key: YOUR_KEY
  Accept: application/vnd.usagetap.v1+json
  Content-Type: application/json
Body:
  {
    "customerId": "user_123",
    "feature": "chat.send",
    "idempotencyKey": "uuid-here",
    "requested": { "standard": true }
  }

# Use begin.data.callId for call_end
POST https://api.usagetap.com/call_end
Headers:
  x-api-key: YOUR_KEY
  Accept: application/vnd.usagetap.v1+json
  Content-Type: application/json

Every raw HTTP request must include the canonical media type header (Accept: application/vnd.usagetap.v1+json) along with Content-Type: application/json; the API responds with 406 Not Acceptable if the header is missing. The SDK sets these automatically.

💡 Hybrid Approach

You can mix approaches! Use the SDK for your main application and direct API calls for background jobs, webhooks, or services in other languages. Both share the same authentication and data model.

Next.js API routes: Use SDK with wrapOpenAI
Python data pipeline: Use direct API with requests/httpx
Go microservice: Use direct API with net/http
Express middleware: Use SDK Express adapter

Still unsure? Start with the SDK if you're on Node.js—it handles all the complexity. Fall back to direct API only if you need a different language or have specific requirements the SDK doesn't cover.

1. Core Lifecycle

All tracked AI work follows a strict begin → vendor call → end lifecycle. You MUST always finalize a call (even on errors / aborts) so usage, entitlements, and limits stay consistent.

Before kicking off that flow, idempotently provision the customer with POST /customers (or usageTap.createCustomer()). The response mirrors call_begin, includes newCustomer, and is safe to run on every login or provisioning hook.

POST /call_begin
Invoke vendor (OpenAI, OpenRouter, other) — may stream or return JSON
POST /call_end with actual usage (or error)

Include Accept: application/vnd.usagetap.v1+json on every begin/end POST (and Content-Type: application/json); omitting the media type returns 406 Not Acceptable. The JavaScript SDK handles this header automatically.

Checking usage without a call: To query current quota, plan, or balances without creating a call record, use GET /customers/{customerId}/usage or the SDK's checkUsage() method. This returns the same rich snapshot (meters, entitlements, subscription details) without tracking vendor usage.

2. Authentication

Send x-api-key: YOUR_API_KEY or Authorization: Bearer YOUR_API_KEY.
Always include Accept: application/vnd.usagetap.v1+json plus Content-Type: application/json on /call_begin, /call_end, and other POSTs; requests without the media type receive 406 Not Acceptable.
Keys are server-side only. Never expose them in browser bundles or public repos.
Environment variable examples: USAGETAP_API_KEY, USAGETAP_BASE_URL (optional override, default https://api.usagetap.com).

3. /call_begin Request Body

{
  "customerId": "CUSTOMER_123",        // required; stable per end-user / account
  "feature": "chat.send",              // optional but recommended; granular feature tag
  "tags": ["production", "web"],       // optional; classification / segmentation
  "customerName": "Ada Lovelace",      // optional; friendly display name
  "customerEmail": "ada@example.com",  // optional; contact email (validated server-side)
  "stripeCustomerId": "cus_123",       // optional; helps sync metering with Stripe
  "requested": {
    "standard": true,                   // ask for base tier access (maps to gpt5-mini)
    "premium": true,                    // ask for premium tier access (maps to gpt5)
    "audio": true,                      // ask for audio generation allowance
    "image": false,                     // ask for image generation allowance
    "search": true,                     // request web search tool allowance
    "reasoningLevel": "HIGH"           // maximum reasoning tier desired
  },
  "idempotencyKey": "UUID_OR_STABLE_KEY"  // optional but strongly recommended (also accepts deprecated "idempotency")
}

Response includes callId, entitlement decisions (allowed.*), entitlement hints, plan + subscription snapshots, remaining meter ratios, balances, and any sanitized profile fields echoed back (including stripeCustomerId for chaining to /call_end).

Fields such as feature, customerName, customerEmail, and stripeCustomerId are optional—supply them when you have clean data, otherwise UsageTap will omit them without error.

3a. Entitlement Decisions

A successful begin response surfaces an allowed map plus extra guidance describing what the customer can use and how close they are to depletion:

allowed.standard / allowed.premium — choose gpt5-mini for standard, gpt5 for premium.
allowed.audio / allowed.image — gate access to audio or image generation helpers before invoking the vendor.
allowed.search — toggle the OpenAI Web Search tool ({ type: "web_search" }) when true; the model name stays the same.
allowed.reasoningLevel — cap your reasoning.effort (LOW/MEDIUM/HIGH) to this returned tier.

Use entitlementHints (policy + downgrade nudges) to pick fallbacks, remainingRatios to surface progress bars, and subscription/plan/balances for dashboards. The response also echoes a sanitized stripeCustomerId when present, so you can pass it straight into /call_end without juggling additional state.

const allow = begin.data.allowed;
const model = allow.premium ? "gpt5" : "gpt5-mini";
const reasoning = allow.reasoningLevel === "HIGH"
  ? { effort: "high" }
  : allow.reasoningLevel === "MEDIUM"
    ? { effort: "medium" }
    : allow.reasoningLevel === "LOW"
      ? { effort: "low" }
      : undefined;

const response = await openai.responses.create({
  model,
  tools: allow.search ? [{ type: "web_search" }] : undefined,
  reasoning,
  input: "What was a positive news story from today?",
});

4. /call_end Request Body

{
  "callId": "FROM_begin",
  "modelUsed": "gpt5-mini",       // echo the actual model invoked
  "inputTokens": 123,               // prompt/input tokens
  "responseTokens": 456,            // output/completion tokens
  "inputCacheTokens": 0,            // (optional) cached tokens (reused / context reuse)
  "reasoningTokens": 32,            // (optional) reasoning usage when enabled
  "searches": 1,                    // (optional) increment when web_search tool used
  "audio": 0,                       // (optional) audio generation units
  "audioSeconds": 0,                // (optional) processed audio seconds
  "isPremium": true,                // (optional) override: explicitly mark as premium/standard
  "stripeCustomerId": "cus_123",   // optional; propagates Stripe linkage if missing on begin
  "responseStatusCode": 200,        // HTTP status from vendor response
  "error": {                        // only if failure
    "code": "VENDOR_ERROR",
    "message": "Error detail"
  }
}

Always send an error object when the vendor call fails; UsageTap will still finalize gracefully. The response provides costUSD, optional costUsdNano + breakdown fields, metered usage totals, any updated balances, and the effective stripeCustomerId.

Premium detection: UsageTap auto-detects premium calls when output token price > $4 per million. Override by passing isPremium: true or isPremium: false.

5. Streaming Guidance

Begin before starting vendor stream; finalize in finally or stream close handlers.
Extract final usage from vendor payload if provided (response.usage for OpenAI/OpenRouter).
If vendor omits usage, estimate tokens (fallback: bytes / 4 heuristic) then send in /call_end.
Abort handling: ensure /call_end is called on user aborts (AbortController / signal).

6. SDK Integration Paths (Node.js/TypeScript Only)

Installation: npm install @usagetap/sdk openai

🚀 Level 1: wrapOpenAI (Recommended)

Zero-boilerplate integration. Wraps OpenAI SDK with automatic begin/end, idempotency, streaming support, and entitlement-aware defaults.

Best for: Production apps, API routes, most use cases

Lines of code: ~8

import { UsageTapClient } from "@usagetap/sdk";
import { wrapOpenAI } from "@usagetap/sdk/openai";

const usageTap = new UsageTapClient({ apiKey: process.env.USAGETAP_API_KEY! });
const ai = wrapOpenAI(openai, usageTap, {
  defaultContext: { customerId: "user_123", feature: "chat.send" }
});

// Use OpenAI SDK normally - tracking happens automatically
const completion = await ai.chat.completions.create({ 
  model: "gpt-4o-mini", 
  messages 
});

⚡ Level 2: wrapFetch (Universal)

Wrap the fetch function used by OpenAI SDK. Works with any HTTP-based AI API (OpenAI, OpenRouter, Anthropic, etc.)

Best for: Minimal changes, non-OpenAI vendors, maximum flexibility

Lines of code: ~4

import { UsageTapClient, wrapFetch } from "@usagetap/sdk";

const usageTap = new UsageTapClient({ apiKey: process.env.USAGETAP_API_KEY! });
const wrappedFetch = wrapFetch(usageTap, {
  defaultContext: { customerId: "user_123", feature: "chat" }
});

const openai = new OpenAI({ 
  apiKey: process.env.OPENAI_API_KEY!, 
  fetch: wrappedFetch 
});

// Use SDK normally - begin/end happens in fetch wrapper

🎯 Level 3: withUsage (Manual Control)

Explicit begin/end with automatic error handling. Full control over vendor calls and usage reporting.

Best for: Learning fundamentals, custom vendors, complex workflows

Lines of code: ~15

const result = await usageTap.withUsage(
  { customerId: "user_123", feature: "chat", idempotencyKey: crypto.randomUUID() },
  async ({ begin, setUsage }) => {
    // Read begin.data.allowed to make entitlement-aware decisions
    const model = begin.data.allowed.premium ? "gpt-4o" : "gpt-4o-mini";
    
    const completion = await openai.chat.completions.create({ model, messages });
    
    // Report usage back
    setUsage({
      modelUsed: model,
      inputTokens: completion.usage?.prompt_tokens ?? 0,
      responseTokens: completion.usage?.completion_tokens ?? 0
    });
    
    return completion;
  }
);

🔧 Level 4: Explicit beginCall/endCall (Maximum Control)

Direct API calls through SDK client. You manage the entire lifecycle including error handling.

Best for: Advanced use cases, debugging, non-standard workflows

Lines of code: ~20+

const begin = await usageTap.beginCall({
  customerId: "user_123",
  feature: "chat.send",
  idempotencyKey: crypto.randomUUID(),
  requested: { standard: true, premium: true }
});

try {
  const model = begin.data.allowed.premium ? "gpt-4o" : "gpt-4o-mini";
  const completion = await openai.chat.completions.create({ model, messages });
  
  await usageTap.endCall({
    callId: begin.data.callId,
    modelUsed: model,
    inputTokens: completion.usage?.prompt_tokens ?? 0,
    responseTokens: completion.usage?.completion_tokens ?? 0
  });
  
  return completion;
} catch (error) {
  await usageTap.endCall({
    callId: begin.data.callId,
    error: { code: "OPENAI_ERROR", message: String(error) }
  });
  throw error;
}

📦 Additional SDK Features

Express middleware: import { withUsage } from "@usagetap/sdk/express"
React hooks: import { useChatWithUsage } from "@usagetap/sdk/react"
Streaming helpers: toNextResponse(stream), pipeToResponse(stream, res)
Custom adapters: createOpenAIAdapter() for advanced vendor integrations

Documentation: See API & SDK page for complete examples, Quickstart for step-by-step guide, and Integration Guide for one-shot patterns.

7. Error & Idempotency Best Practices

UsageTap supports three methods for idempotency:

Method 1: Explicit Keys (Recommended)

Generate unique keys per request for maximum control:

// JavaScript/TypeScript
import crypto from "crypto";
const idempotencyKey = crypto.randomUUID();

await usageTap.beginCall({
  customerId: "cust_123",
  feature: "chat.send",
  idempotencyKey, // Pass explicit key
  requested: { standard: true, premium: true }
});

// Python
import uuid
idempotency_key = str(uuid.uuid4())

✅ Best for API endpoints, webhooks, and user-facing features

Method 2: Deterministic Keys

Generate keys from request data for natural deduplication:

// Create consistent key from request identifiers
import crypto from "crypto";

function generateIdempotencyKey(userId, sessionId, messageId) {
  const data = `${userId}:${sessionId}:${messageId}`;
  return crypto.createHash("sha256").update(data).digest("hex");
}

const key = generateIdempotencyKey("user_123", "sess_456", "msg_789");

await usageTap.beginCall({
  customerId: "user_123",
  idempotencyKey: key,
  requested: { standard: true }
});

✅ Best for distributed systems with at-least-once delivery

Method 3: Auto-Generated (Default)

Omit idempotencyKey and UsageTap derives one:

await usageTap.beginCall({
  customerId: "cust_123",
  feature: "chat.send",
  // No idempotencyKey - auto-generated from org + customer + feature + requested
  requested: { standard: true, premium: true }
});

// Same inputs = same callId returned

✅ Best for bulk operations and internal tools with natural deduplication

💡 Quick Decision Guide

Scenario	Approach	Example
API endpoints	Explicit UUID	`crypto.randomUUID()`
Background jobs	Deterministic	`job_${jobId}`
Webhooks	Event ID	`event.id`
Bulk operations	Auto-generated	`(omit key)`
Testing	Fixed string	`"test-scenario-1"`

Retry begin/end on transient network failures with exponential backoff (SDK handles automatically).
Always finalize even when vendor throws — wrap in try/finally or use provided helpers.
Prevent duplicate billing by reusing the same idempotency key for retried logical calls.
The SDK auto-generates keys by default; set autoIdempotency: false to rely on server-side derivation.

8. Security Rules

Never expose UsageTap or vendor API keys to browsers / mobile clients.
Perform all UsageTap operations in trusted server runtimes (Node 18+, edge workers, serverless functions).
Derive customerId from authenticated server context (session / JWT / DB lookup).
Use environment variables (.env / platform secrets) — no hard-coded secrets.

9. Usage Queries & Embeds

Provision customer subscriptions:

Endpoint: POST /customers (idempotent)
Returns newCustomer plus the same plan/entitlement snapshot exposed by call_begin and GET /customers/{customerId}/usage
SDK: await usageTap.createCustomer({ customerId, customerFriendlyName, customerEmail })
Repeat calls safely reuse the existing subscription—pass an idempotencyKey when coordinating across services.

Change customer plans:

Endpoint: POST /customers/{customerId}/change_plan
Body: { planId: "plan_premium_v2", strategy: "IMMEDIATE_RESET" | "IMMEDIATE_PRORATED" | "AT_NEXT_REPLENISH" }
SDK: await usageTap.changePlan({ customerId: "cust_123", planId: "plan_premium_v2", strategy: "IMMEDIATE_RESET" })
Strategies: IMMEDIATE_RESET (reset usage), IMMEDIATE_PRORATED (prorate usage), AT_NEXT_REPLENISH (schedule for next cycle, default)
Returns updated subscription with new plan details and any pending changes

Check usage without creating a call:

Endpoint: GET /customers/{customerId}/usage (URL-encode customerId)
Returns same snapshot as call_begin: meters, entitlements, subscription, plan, balances, models
SDK: await usageTap.checkUsage({ customerId: "cust_123" })
Use for dashboard widgets, pre-flight checks, or displaying quota status without tracking vendor usage

Update custom meters (record custom usage):

Endpoint: POST /custom_meter with Accept: application/vnd.usagetap.v1+json, Content-Type: application/json, and an Idempotency-Key whenever you may retry.
Body: { customerId, meterSlot: "CUSTOM1" | "CUSTOM2", amount, feature?, tags?, metadata? } — amount must be positive; we subtract it from the meter's remaining quota.
Response: returns a eventId, updated meter snapshot (remaining/limit/label), and blocked flag when policy = BLOCK.
SDK helper: await usageTap.incrementCustomMeter({ customerId, meterSlot: "CUSTOM1", amount }) mirrors the API and auto-adds headers.
Requirements: enable the custom meter on the customer's plan, keep amount integers if you want whole-unit quotas, and log events/tags for downstream analytics.

const response = await fetch("https://api.usagetap.com/custom_meter", {
  method: "POST",
  headers: {
    "Authorization": `Bearer undefined`,
    "Accept": "application/vnd.usagetap.v1+json",
    "Content-Type": "application/json",
    "Idempotency-Key": crypto.randomUUID()
  },
  body: JSON.stringify({
    customerId: "cust_123",
    meterSlot: "CUSTOM1",
    amount: 5,
    feature: "agent_actions",
    tags: ["workflow_automation"],
    metadata: { workflowId: "wf_abc123" }
  })
});

const payload = await response.json();
console.log(payload.data.eventId, payload.data.meter.remaining, payload.data.blocked);

Use a deterministic idempotency key when replaying background jobs so duplicate meter updates collapse into a single event.

Embed widgets (legacy):

Use an API key with embed permissions to request widget data.
Endpoint: GET /embed/widget-data?organization_id=ORG&customer_id=CUST&type=usage|plan|calls
Returned JSON powers the configurable widget shown on the authenticated /embed page.

10. Field Reference

Begin Response (key fields)

{
  callId: string,
  startTime: string,
  newCustomer: boolean,
  policy: "NONE" | "BLOCK" | "DOWNGRADE",
  allowed: {
    standard: boolean,
    premium: boolean,
    audio: boolean,
    image: boolean,
    search: boolean,
    reasoningLevel: "NONE" | "LOW" | "MEDIUM" | "HIGH"
  },
  entitlementHints: {
    suggestedModelTier: "premium" | "standard" | "none",
    policy: "NONE" | "BLOCK" | "DOWNGRADE",
    downgrade?: { reason: string, fallbackTier?: "premium" | "standard" | "none" }
  },
  meters: { [meter: string]: { remaining: number | null, limit: number | null, used: number | null, unlimited: boolean, ratio: number | null } },
  remainingRatios: { [meter: string]: number | null },
  subscription: {
    id: string | null,
    usagePlanVersionId: string | null,
    planName: string | null,
    planVersion: string | null,
    stripeCustomerId?: string | null,
    customerFriendlyName?: string | null,
    customerEmail?: string | null,
    pending?: { usagePlanVersionId: string | null, strategy: string | null, effectiveAt: string | null }
  },
  plan?: { id: string | null, name: string | null, version: string | null },
  balances?: { tokensRemaining?: number, standardCallsRemaining?: number, premiumCallsRemaining?: number, searchesRemaining?: number, audioSecondsRemaining?: number },
  models?: { [tier: string]: string[] },
  idempotency?: { key: string, source: "explicit" | "derived" },
  stripeCustomerId?: string | null
}

End Request + Response

request {
  modelUsed?: string,
  inputTokens?: number,
  responseTokens?: number,
  inputCacheTokens?: number,
  reasoningTokens?: number,
  searches?: number,
  audio?: number,
  audioSeconds?: number,
  isPremium?: boolean,
  stripeCustomerId?: string,
  responseStatusCode?: number,
  error?: { code: string, message: string }
}

response {
  costUSD: number,
  costUsdNano?: string,
  promptCostUsd?: number,
  completionCostUsd?: number,
  cacheReadCostUsd?: number,
  reasoningCostUsd?: number,
  metered?: {
    calls?: number,
    tokens?: number,
    reasoningTokens?: number,
    searches?: number,
    audio?: number,
    audioSeconds?: number
  },
  balances?: { ...same shape as begin },
  stripeCustomerId?: string | null
}

11. Minimal Pseudocode

// Pseudocode pattern
idempotency = uuid()
begin = POST /call_begin { customerId, feature, idempotency }
try {
  vendor = openai.chat.completions.create(...)
  usage = vendor.usage // or derive counts
  POST /call_end { callId: begin.callId, modelUsed, ...usage, responseStatusCode: 200 }
} catch (e) {
  POST /call_end { callId: begin.callId, error: { code: 'VENDOR_ERROR', message: e.message } }
  throw e
}

12. Recommended Prompt Include

Paste this line near the top of any AI assistant prompt:

Use the canonical integration facts at https://usagetap.com/llmreference for lifecycle, auth, fields, and best practices. Don't hallucinate: if uncertain, say so.

13. Change Log (Human Maintained)

v1 (initial): Core lifecycle, usage fields, streaming, SDK paths, embeds summary.