LLM Integration Reference
Canonical context for AI assistants implementing UsageTap integrations. Link to this page inside prompt kits instead of duplicating details.
Short Include Snippet (copy into your prompts):
(Reference: https://usagetap.com/llmreference — contains API flow, endpoints, auth, usage fields, SDK patterns, security rules)📚 Complete Documentation:
- • Integration Guide — One-shot integration examples for Next.js, Express, Node.js, React
- • API Reference — Complete REST API & SDK documentation with all endpoints and types
- • API & SDK Examples — Interactive code examples and prompt kits
0. API vs SDK: Quick Decision Guide
Choose your integration approach based on your environment and requirements:
✅ Use SDK When:
- •You're using JavaScript/TypeScript in Node.js 18+, Next.js, Express, or edge runtimes
- •You want automatic begin/end handling with error recovery
- •You're integrating with OpenAI or OpenRouter (wrapOpenAI, wrapFetch)
- •You need streaming support with automatic finalization
- •You want built-in retries and idempotency handling
- •You're building production apps that need reliability
npm install @usagetap/sdk openai
import { UsageTapClient } from "@usagetap/sdk";
import { wrapOpenAI } from "@usagetap/sdk/openai";
const usageTap = new UsageTapClient({
apiKey: process.env.USAGETAP_API_KEY!
});
const ai = wrapOpenAI(openai, usageTap, {
defaultContext: {
customerId: "user_123",
feature: "chat.send"
}
});
// That's it! Use OpenAI SDK normally🔧 Use Direct API When:
- •You're using Python, Go, Ruby, Java, or other languages
- •You need to integrate from serverless functions without Node.js
- •You want full control over HTTP requests and retry logic
- •You're integrating non-OpenAI vendors (Anthropic, Cohere, custom APIs)
- •You need minimal dependencies or custom HTTP clients
- •You're building CLIs, background jobs, or batch processors
POST https://api.usagetap.com/call_begin
Headers:
x-api-key: YOUR_KEY
Accept: application/vnd.usagetap.v1+json
Content-Type: application/json
Body:
{
"customerId": "user_123",
"feature": "chat.send",
"idempotencyKey": "uuid-here",
"requested": { "standard": true }
}
# Use begin.data.callId for call_end
POST https://api.usagetap.com/call_end
Headers:
x-api-key: YOUR_KEY
Accept: application/vnd.usagetap.v1+json
Content-Type: application/jsonEvery raw HTTP request must include the canonical media type header (Accept: application/vnd.usagetap.v1+json) along with Content-Type: application/json; the API responds with 406 Not Acceptable if the header is missing. The SDK sets these automatically.
💡 Hybrid Approach
You can mix approaches! Use the SDK for your main application and direct API calls for background jobs, webhooks, or services in other languages. Both share the same authentication and data model.
- Next.js API routes: Use SDK with wrapOpenAI
- Python data pipeline: Use direct API with requests/httpx
- Go microservice: Use direct API with net/http
- Express middleware: Use SDK Express adapter
Still unsure? Start with the SDK if you're on Node.js—it handles all the complexity. Fall back to direct API only if you need a different language or have specific requirements the SDK doesn't cover.
1. Core Lifecycle
All tracked AI work follows a strict begin → vendor call → end lifecycle. You MUST always finalize a call (even on errors / aborts) so usage, entitlements, and limits stay consistent.
Before kicking off that flow, idempotently provision the customer with POST /customers (or usageTap.createCustomer()). The response mirrors call_begin, includes newCustomer, and is safe to run on every login or provisioning hook.
- POST
/call_begin - Invoke vendor (OpenAI, OpenRouter, other) — may stream or return JSON
- POST
/call_endwith actual usage (or error)
Include Accept: application/vnd.usagetap.v1+json on every begin/end POST (and Content-Type: application/json); omitting the media type returns 406 Not Acceptable. The JavaScript SDK handles this header automatically.
Checking usage without a call: To query current quota, plan, or balances without creating a call record, use GET /customers/{customerId}/usage or the SDK's checkUsage() method. This returns the same rich snapshot (meters, entitlements, subscription details) without tracking vendor usage.
2. Authentication
- Send
x-api-key: YOUR_API_KEYorAuthorization: Bearer YOUR_API_KEY. - Always include
Accept: application/vnd.usagetap.v1+jsonplusContent-Type: application/jsonon/call_begin,/call_end, and other POSTs; requests without the media type receive406 Not Acceptable. - Keys are server-side only. Never expose them in browser bundles or public repos.
- Environment variable examples:
USAGETAP_API_KEY,USAGETAP_BASE_URL(optional override, defaulthttps://api.usagetap.com).
3. /call_begin Request Body
{
"customerId": "CUSTOMER_123", // required; stable per end-user / account
"feature": "chat.send", // optional but recommended; granular feature tag
"tags": ["production", "web"], // optional; classification / segmentation
"customerName": "Ada Lovelace", // optional; friendly display name
"customerEmail": "ada@example.com", // optional; contact email (validated server-side)
"stripeCustomerId": "cus_123", // optional; helps sync metering with Stripe
"requested": {
"standard": true, // ask for base tier access (maps to gpt5-mini)
"premium": true, // ask for premium tier access (maps to gpt5)
"audio": true, // ask for audio generation allowance
"image": false, // ask for image generation allowance
"search": true, // request web search tool allowance
"reasoningLevel": "HIGH" // maximum reasoning tier desired
},
"idempotencyKey": "UUID_OR_STABLE_KEY" // optional but strongly recommended (also accepts deprecated "idempotency")
}Response includes callId, entitlement decisions (allowed.*), entitlement hints, plan + subscription snapshots, remaining meter ratios, balances, and any sanitized profile fields echoed back (including stripeCustomerId for chaining to /call_end).
Fields such as feature, customerName, customerEmail, and stripeCustomerId are optional—supply them when you have clean data, otherwise UsageTap will omit them without error.
3a. Entitlement Decisions
A successful begin response surfaces an allowed map plus extra guidance describing what the customer can use and how close they are to depletion:
allowed.standard/allowed.premium— choosegpt5-minifor standard,gpt5for premium.allowed.audio/allowed.image— gate access to audio or image generation helpers before invoking the vendor.allowed.search— toggle the OpenAI Web Search tool ({ type: "web_search" }) when true; the model name stays the same.allowed.reasoningLevel— cap yourreasoning.effort(LOW/MEDIUM/HIGH) to this returned tier.
Use entitlementHints (policy + downgrade nudges) to pick fallbacks, remainingRatios to surface progress bars, and subscription/plan/balances for dashboards. The response also echoes a sanitized stripeCustomerId when present, so you can pass it straight into /call_end without juggling additional state.
const allow = begin.data.allowed;
const model = allow.premium ? "gpt5" : "gpt5-mini";
const reasoning = allow.reasoningLevel === "HIGH"
? { effort: "high" }
: allow.reasoningLevel === "MEDIUM"
? { effort: "medium" }
: allow.reasoningLevel === "LOW"
? { effort: "low" }
: undefined;
const response = await openai.responses.create({
model,
tools: allow.search ? [{ type: "web_search" }] : undefined,
reasoning,
input: "What was a positive news story from today?",
});4. /call_end Request Body
{
"callId": "FROM_begin",
"modelUsed": "gpt5-mini", // echo the actual model invoked
"inputTokens": 123, // prompt/input tokens
"responseTokens": 456, // output/completion tokens
"inputCacheTokens": 0, // (optional) cached tokens (reused / context reuse)
"reasoningTokens": 32, // (optional) reasoning usage when enabled
"searches": 1, // (optional) increment when web_search tool used
"audio": 0, // (optional) audio generation units
"audioSeconds": 0, // (optional) processed audio seconds
"isPremium": true, // (optional) override: explicitly mark as premium/standard
"stripeCustomerId": "cus_123", // optional; propagates Stripe linkage if missing on begin
"responseStatusCode": 200, // HTTP status from vendor response
"error": { // only if failure
"code": "VENDOR_ERROR",
"message": "Error detail"
}
}Always send an error object when the vendor call fails; UsageTap will still finalize gracefully. The response provides costUSD, optional costUsdNano + breakdown fields, metered usage totals, any updated balances, and the effective stripeCustomerId.
Premium detection: UsageTap auto-detects premium calls when output token price > $4 per million. Override by passing isPremium: true or isPremium: false.
5. Streaming Guidance
- Begin before starting vendor stream; finalize in
finallyor stream close handlers. - Extract final usage from vendor payload if provided (
response.usagefor OpenAI/OpenRouter). - If vendor omits usage, estimate tokens (fallback: bytes / 4 heuristic) then send in
/call_end. - Abort handling: ensure
/call_endis called on user aborts (AbortController / signal).
6. SDK Integration Paths (Node.js/TypeScript Only)
Installation: npm install @usagetap/sdk openai
🚀 Level 1: wrapOpenAI (Recommended)
Zero-boilerplate integration. Wraps OpenAI SDK with automatic begin/end, idempotency, streaming support, and entitlement-aware defaults.
Best for: Production apps, API routes, most use cases
Lines of code: ~8
import { UsageTapClient } from "@usagetap/sdk";
import { wrapOpenAI } from "@usagetap/sdk/openai";
const usageTap = new UsageTapClient({ apiKey: process.env.USAGETAP_API_KEY! });
const ai = wrapOpenAI(openai, usageTap, {
defaultContext: { customerId: "user_123", feature: "chat.send" }
});
// Use OpenAI SDK normally - tracking happens automatically
const completion = await ai.chat.completions.create({
model: "gpt-4o-mini",
messages
});⚡ Level 2: wrapFetch (Universal)
Wrap the fetch function used by OpenAI SDK. Works with any HTTP-based AI API (OpenAI, OpenRouter, Anthropic, etc.)
Best for: Minimal changes, non-OpenAI vendors, maximum flexibility
Lines of code: ~4
import { UsageTapClient, wrapFetch } from "@usagetap/sdk";
const usageTap = new UsageTapClient({ apiKey: process.env.USAGETAP_API_KEY! });
const wrappedFetch = wrapFetch(usageTap, {
defaultContext: { customerId: "user_123", feature: "chat" }
});
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY!,
fetch: wrappedFetch
});
// Use SDK normally - begin/end happens in fetch wrapper🎯 Level 3: withUsage (Manual Control)
Explicit begin/end with automatic error handling. Full control over vendor calls and usage reporting.
Best for: Learning fundamentals, custom vendors, complex workflows
Lines of code: ~15
const result = await usageTap.withUsage(
{ customerId: "user_123", feature: "chat", idempotencyKey: crypto.randomUUID() },
async ({ begin, setUsage }) => {
// Read begin.data.allowed to make entitlement-aware decisions
const model = begin.data.allowed.premium ? "gpt-4o" : "gpt-4o-mini";
const completion = await openai.chat.completions.create({ model, messages });
// Report usage back
setUsage({
modelUsed: model,
inputTokens: completion.usage?.prompt_tokens ?? 0,
responseTokens: completion.usage?.completion_tokens ?? 0
});
return completion;
}
);🔧 Level 4: Explicit beginCall/endCall (Maximum Control)
Direct API calls through SDK client. You manage the entire lifecycle including error handling.
Best for: Advanced use cases, debugging, non-standard workflows
Lines of code: ~20+
const begin = await usageTap.beginCall({
customerId: "user_123",
feature: "chat.send",
idempotencyKey: crypto.randomUUID(),
requested: { standard: true, premium: true }
});
try {
const model = begin.data.allowed.premium ? "gpt-4o" : "gpt-4o-mini";
const completion = await openai.chat.completions.create({ model, messages });
await usageTap.endCall({
callId: begin.data.callId,
modelUsed: model,
inputTokens: completion.usage?.prompt_tokens ?? 0,
responseTokens: completion.usage?.completion_tokens ?? 0
});
return completion;
} catch (error) {
await usageTap.endCall({
callId: begin.data.callId,
error: { code: "OPENAI_ERROR", message: String(error) }
});
throw error;
}📦 Additional SDK Features
- Express middleware:
import { withUsage } from "@usagetap/sdk/express" - React hooks:
import { useChatWithUsage } from "@usagetap/sdk/react" - Streaming helpers:
toNextResponse(stream),pipeToResponse(stream, res) - Custom adapters:
createOpenAIAdapter()for advanced vendor integrations
Documentation: See API & SDK page for complete examples, Quickstart for step-by-step guide, and Integration Guide for one-shot patterns.
7. Error & Idempotency Best Practices
UsageTap supports three methods for idempotency:
Method 1: Explicit Keys (Recommended)
Generate unique keys per request for maximum control:
// JavaScript/TypeScript
import crypto from "crypto";
const idempotencyKey = crypto.randomUUID();
await usageTap.beginCall({
customerId: "cust_123",
feature: "chat.send",
idempotencyKey, // Pass explicit key
requested: { standard: true, premium: true }
});
// Python
import uuid
idempotency_key = str(uuid.uuid4())✅ Best for API endpoints, webhooks, and user-facing features
Method 2: Deterministic Keys
Generate keys from request data for natural deduplication:
// Create consistent key from request identifiers
import crypto from "crypto";
function generateIdempotencyKey(userId, sessionId, messageId) {
const data = `${userId}:${sessionId}:${messageId}`;
return crypto.createHash("sha256").update(data).digest("hex");
}
const key = generateIdempotencyKey("user_123", "sess_456", "msg_789");
await usageTap.beginCall({
customerId: "user_123",
idempotencyKey: key,
requested: { standard: true }
});✅ Best for distributed systems with at-least-once delivery
Method 3: Auto-Generated (Default)
Omit idempotencyKey and UsageTap derives one:
await usageTap.beginCall({
customerId: "cust_123",
feature: "chat.send",
// No idempotencyKey - auto-generated from org + customer + feature + requested
requested: { standard: true, premium: true }
});
// Same inputs = same callId returned✅ Best for bulk operations and internal tools with natural deduplication
💡 Quick Decision Guide
| Scenario | Approach | Example |
|---|---|---|
| API endpoints | Explicit UUID | crypto.randomUUID() |
| Background jobs | Deterministic | job_${jobId} |
| Webhooks | Event ID | event.id |
| Bulk operations | Auto-generated | (omit key) |
| Testing | Fixed string | "test-scenario-1" |
- Retry begin/end on transient network failures with exponential backoff (SDK handles automatically).
- Always finalize even when vendor throws — wrap in
try/finallyor use provided helpers. - Prevent duplicate billing by reusing the same idempotency key for retried logical calls.
- The SDK auto-generates keys by default; set
autoIdempotency: falseto rely on server-side derivation.
8. Security Rules
- Never expose UsageTap or vendor API keys to browsers / mobile clients.
- Perform all UsageTap operations in trusted server runtimes (Node 18+, edge workers, serverless functions).
- Derive
customerIdfrom authenticated server context (session / JWT / DB lookup). - Use environment variables (.env / platform secrets) — no hard-coded secrets.
9. Usage Queries & Embeds
Provision customer subscriptions:
- Endpoint:
POST /customers(idempotent) - Returns
newCustomerplus the same plan/entitlement snapshot exposed bycall_beginandGET /customers/{customerId}/usage - SDK:
await usageTap.createCustomer({ customerId, customerFriendlyName, customerEmail }) - Repeat calls safely reuse the existing subscription—pass an
idempotencyKeywhen coordinating across services.
Change customer plans:
- Endpoint:
POST /customers/{customerId}/change_plan - Body:
{ planId: "plan_premium_v2", strategy: "IMMEDIATE_RESET" | "IMMEDIATE_PRORATED" | "AT_NEXT_REPLENISH" } - SDK:
await usageTap.changePlan({ customerId: "cust_123", planId: "plan_premium_v2", strategy: "IMMEDIATE_RESET" }) - Strategies:
IMMEDIATE_RESET(reset usage),IMMEDIATE_PRORATED(prorate usage),AT_NEXT_REPLENISH(schedule for next cycle, default) - Returns updated subscription with new plan details and any pending changes
Check usage without creating a call:
- Endpoint:
GET /customers/{customerId}/usage(URL-encode customerId) - Returns same snapshot as call_begin: meters, entitlements, subscription, plan, balances, models
- SDK:
await usageTap.checkUsage({ customerId: "cust_123" }) - Use for dashboard widgets, pre-flight checks, or displaying quota status without tracking vendor usage
Update custom meters (record custom usage):
- Endpoint:
POST /custom_meterwithAccept: application/vnd.usagetap.v1+json,Content-Type: application/json, and anIdempotency-Keywhenever you may retry. - Body:
{ customerId, meterSlot: "CUSTOM1" | "CUSTOM2", amount, feature?, tags?, metadata? }—amountmust be positive; we subtract it from the meter's remaining quota. - Response: returns a
eventId, updated meter snapshot (remaining/limit/label), andblockedflag when policy = BLOCK. - SDK helper:
await usageTap.incrementCustomMeter({ customerId, meterSlot: "CUSTOM1", amount })mirrors the API and auto-adds headers. - Requirements: enable the custom meter on the customer's plan, keep
amountintegers if you want whole-unit quotas, and log events/tags for downstream analytics.
const response = await fetch("https://api.usagetap.com/custom_meter", {
method: "POST",
headers: {
"Authorization": `Bearer undefined`,
"Accept": "application/vnd.usagetap.v1+json",
"Content-Type": "application/json",
"Idempotency-Key": crypto.randomUUID()
},
body: JSON.stringify({
customerId: "cust_123",
meterSlot: "CUSTOM1",
amount: 5,
feature: "agent_actions",
tags: ["workflow_automation"],
metadata: { workflowId: "wf_abc123" }
})
});
const payload = await response.json();
console.log(payload.data.eventId, payload.data.meter.remaining, payload.data.blocked);Use a deterministic idempotency key when replaying background jobs so duplicate meter updates collapse into a single event.
Embed widgets (legacy):
- Use an API key with embed permissions to request widget data.
- Endpoint:
GET /embed/widget-data?organization_id=ORG&customer_id=CUST&type=usage|plan|calls - Returned JSON powers the configurable widget shown on the authenticated
/embedpage.
10. Field Reference
Begin Response (key fields)
{
callId: string,
startTime: string,
newCustomer: boolean,
policy: "NONE" | "BLOCK" | "DOWNGRADE",
allowed: {
standard: boolean,
premium: boolean,
audio: boolean,
image: boolean,
search: boolean,
reasoningLevel: "NONE" | "LOW" | "MEDIUM" | "HIGH"
},
entitlementHints: {
suggestedModelTier: "premium" | "standard" | "none",
policy: "NONE" | "BLOCK" | "DOWNGRADE",
downgrade?: { reason: string, fallbackTier?: "premium" | "standard" | "none" }
},
meters: { [meter: string]: { remaining: number | null, limit: number | null, used: number | null, unlimited: boolean, ratio: number | null } },
remainingRatios: { [meter: string]: number | null },
subscription: {
id: string | null,
usagePlanVersionId: string | null,
planName: string | null,
planVersion: string | null,
stripeCustomerId?: string | null,
customerFriendlyName?: string | null,
customerEmail?: string | null,
pending?: { usagePlanVersionId: string | null, strategy: string | null, effectiveAt: string | null }
},
plan?: { id: string | null, name: string | null, version: string | null },
balances?: { tokensRemaining?: number, standardCallsRemaining?: number, premiumCallsRemaining?: number, searchesRemaining?: number, audioSecondsRemaining?: number },
models?: { [tier: string]: string[] },
idempotency?: { key: string, source: "explicit" | "derived" },
stripeCustomerId?: string | null
}End Request + Response
request {
modelUsed?: string,
inputTokens?: number,
responseTokens?: number,
inputCacheTokens?: number,
reasoningTokens?: number,
searches?: number,
audio?: number,
audioSeconds?: number,
isPremium?: boolean,
stripeCustomerId?: string,
responseStatusCode?: number,
error?: { code: string, message: string }
}
response {
costUSD: number,
costUsdNano?: string,
promptCostUsd?: number,
completionCostUsd?: number,
cacheReadCostUsd?: number,
reasoningCostUsd?: number,
metered?: {
calls?: number,
tokens?: number,
reasoningTokens?: number,
searches?: number,
audio?: number,
audioSeconds?: number
},
balances?: { ...same shape as begin },
stripeCustomerId?: string | null
}11. Minimal Pseudocode
// Pseudocode pattern
idempotency = uuid()
begin = POST /call_begin { customerId, feature, idempotency }
try {
vendor = openai.chat.completions.create(...)
usage = vendor.usage // or derive counts
POST /call_end { callId: begin.callId, modelUsed, ...usage, responseStatusCode: 200 }
} catch (e) {
POST /call_end { callId: begin.callId, error: { code: 'VENDOR_ERROR', message: e.message } }
throw e
}12. Recommended Prompt Include
Paste this line near the top of any AI assistant prompt:
Use the canonical integration facts at https://usagetap.com/llmreference for lifecycle, auth, fields, and best practices. Don't hallucinate: if uncertain, say so.13. Change Log (Human Maintained)
- v1 (initial): Core lifecycle, usage fields, streaming, SDK paths, embeds summary.