UsageTap Integration Guide for AI Coding Tools

Quick Start for LLM Agents: This guide provides everything needed to integrate UsageTap into applications in a single pass. Copy the relevant code snippet for your stack, replace the placeholders, and you're done.

Table of Contents


Overview

UsageTap tracks LLM API usage and enforces quotas automatically. The flow is simple:

  1. Begin Call: Request entitlements for a customer/feature
  2. Execute LLM Call: Use the allowed capabilities from step 1
  3. End Call: Report actual usage back to UsageTap

The SDK handles retries, idempotency, and automatic usage tracking for you.


Installation

npm install @usagetap/sdk openai

Environment Variables Required:

USAGETAP_API_KEY=your_api_key_here
USAGETAP_BASE_URL=https://api.usagetap.com/
OPENAI_API_KEY=sk-...

One-Shot Integration Examples

Next.js App Router

File: app/api/chat/route.ts

import { NextRequest } from "next/server";
import OpenAI from "openai";
import { UsageTapClient } from "@usagetap/sdk";
import { wrapOpenAI, toNextResponse } from "@usagetap/sdk/openai";

// Initialize clients
const usageTap = new UsageTapClient({
  apiKey: process.env.USAGETAP_API_KEY!,
  baseUrl: process.env.USAGETAP_BASE_URL!,
});

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY!,
});

// Wrap OpenAI client with UsageTap tracking
const ai = wrapOpenAI(openai, usageTap, {
  defaultContext: {
    // Set defaults that apply to all calls
    feature: "chat.completions",
    requested: {
      standard: true,
      premium: true,
      search: true,
      reasoningLevel: "HIGH",
    },
  },
});

export async function POST(req: NextRequest) {
  try {
    // Get user ID from your auth system
    const userId = req.headers.get("x-user-id") || "anonymous";
    
    const { messages } = await req.json();

    // Make streaming LLM call with automatic usage tracking
    const stream = await ai.chat.completions.create(
      {
        messages,
        stream: true,
        // model is optional - UsageTap selects based on entitlements
      },
      {
        usageTap: {
          customerId: userId,
          // Generate unique idempotency key for this request
          idempotencyKey: crypto.randomUUID(),
        },
      }
    );

    // Return streaming response
    return toNextResponse(stream, { mode: "text" });
  } catch (error) {
    console.error("Chat error:", error);
    return new Response(
      JSON.stringify({ error: "Failed to process chat request" }),
      { status: 500, headers: { "Content-Type": "application/json" } }
    );
  }
}

Key Features:

  • ✅ Automatic model selection based on customer's plan
  • ✅ Usage tracking with zero boilerplate
  • ✅ Idempotent requests with idempotencyKey
  • ✅ Streaming responses

Express.js Server

File: server.ts

import express from "express";
import OpenAI from "openai";
import { UsageTapClient } from "@usagetap/sdk";
import { withUsage } from "@usagetap/sdk/express";

const app = express();
app.use(express.json());

// Initialize UsageTap client
const usageTap = new UsageTapClient({
  apiKey: process.env.USAGETAP_API_KEY!,
  baseUrl: process.env.USAGETAP_BASE_URL!,
});

// Attach UsageTap context to all requests
app.use(
  withUsage(usageTap, (req) => {
    // Extract customer ID from your auth middleware
    return req.user?.id || "anonymous";
  })
);

// Chat endpoint with automatic tracking
app.post("/api/chat", async (req, res) => {
  try {
    const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY! });
    
    // Get UsageTap-wrapped OpenAI client from request
    const ai = req.usageTap!.openai(openai, {
      feature: "chat.assistant",
      requested: {
        standard: true,
        premium: true,
        search: true,
        reasoningLevel: "HIGH",
      },
    });

    const stream = await ai.chat.completions.create(
      {
        messages: req.body.messages,
        stream: true,
      },
      {
        usageTap: {
          // Auto-generate idempotency key
          idempotencyKey: crypto.randomUUID(),
        },
      }
    );

    // Pipe stream to response and finalize usage automatically
    req.usageTap!.pipeToResponse(stream, res);
  } catch (error) {
    console.error("Chat error:", error);
    res.status(500).json({ error: "Failed to process chat request" });
  }
});

app.listen(3000, () => {
  console.log("Server running on http://localhost:3000");
});

Key Features:

  • ✅ Middleware attaches UsageTap to every request
  • ✅ Extract customer ID once from auth system
  • ✅ Automatic usage tracking and streaming

Node.js Script

File: generate-summary.ts

import OpenAI from "openai";
import { UsageTapClient } from "@usagetap/sdk";
import crypto from "crypto";

// Initialize clients
const usageTap = new UsageTapClient({
  apiKey: process.env.USAGETAP_API_KEY!,
  baseUrl: process.env.USAGETAP_BASE_URL!,
});

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY!,
});

async function generateSummary(customerId: string, text: string) {
  // High-level helper that handles begin → call → end automatically
  const completion = await usageTap.withUsage(
    {
      customerId,
      feature: "summarization",
      requested: {
        standard: true,
        premium: true,
        search: false,
        reasoningLevel: "LOW",
      },
      // Generate idempotency key for safe retries
      idempotencyKey: crypto.randomUUID(),
    },
    async ({ begin, setUsage }) => {
      // Select model based on what customer is allowed
      const model = begin.data.allowed.premium ? "gpt-4o" : "gpt-4o-mini";

      // Make LLM call
      const response = await openai.chat.completions.create({
        model,
        messages: [
          {
            role: "system",
            content: "Summarize the following text concisely.",
          },
          {
            role: "user",
            content: text,
          },
        ],
      });

      // Report usage back to UsageTap
      setUsage({
        modelUsed: model,
        inputTokens: response.usage?.prompt_tokens ?? 0,
        responseTokens: response.usage?.completion_tokens ?? 0,
      });

      return response.choices[0].message.content;
    }
  );

  return completion;
}

// Example usage
generateSummary("customer_123", "Long document text here...")
  .then((summary) => console.log("Summary:", summary))
  .catch((error) => console.error("Error:", error));

Key Features:

  • withUsage handles entire lifecycle
  • ✅ Automatic error handling and usage reporting
  • ✅ Works in any Node.js environment

React Chat UI

File: components/Chat.tsx

import { useChatWithUsage } from "@usagetap/sdk/react";

interface ChatProps {
  userId: string;
}

export function Chat({ userId }: ChatProps) {
  const { messages, input, setInput, handleSubmit, isLoading, error } =
    useChatWithUsage({
      api: "/api/chat", // Your Next.js API route
      customerId: userId,
      feature: "chat.assistant",
    });

  return (
    <div className="chat-container">
      {/* Messages display */}
      <div className="messages">
        {messages.map((m) => (
          <div key={m.id} className={`message ${m.role}`}>
            <strong>{m.role}:</strong>
            <p>{m.content}</p>
          </div>
        ))}
      </div>

      {/* Error display */}
      {error && (
        <div className="error">
          Error: {error.message}
        </div>
      )}

      {/* Input form */}
      <form onSubmit={handleSubmit} className="input-form">
        <input
          value={input}
          onChange={(e) => setInput(e.target.value)}
          placeholder="Type a message..."
          disabled={isLoading}
        />
        <button type="submit" disabled={isLoading}>
          {isLoading ? "Sending..." : "Send"}
        </button>
      </form>
    </div>
  );
}

Key Features:

  • ✅ React hook manages entire chat state
  • ✅ Automatic usage tracking through API endpoint
  • ✅ Loading and error states included

Idempotency Best Practices

Idempotency ensures safe retries without duplicate charges. UsageTap supports three approaches:

1. Explicit Idempotency Keys (Recommended)

Generate a unique key per logical operation:

import crypto from "crypto";

await usageTap.beginCall({
  customerId: "cust_123",
  feature: "chat.completions",
  // Generate unique key for this request
  idempotencyKey: crypto.randomUUID(),
  requested: {
    standard: true,
    premium: true,
  },
});

When to use:

  • API endpoints that may be retried by clients
  • Background jobs that might restart
  • Critical operations requiring duplicate prevention

2. Deterministic Keys

Use request-specific data to generate consistent keys:

import crypto from "crypto";

function generateIdempotencyKey(
  userId: string,
  sessionId: string,
  messageId: string
): string {
  const data = `${userId}:${sessionId}:${messageId}`;
  return crypto.createHash("sha256").update(data).digest("hex");
}

await usageTap.beginCall({
  customerId: userId,
  feature: "chat.send",
  idempotencyKey: generateIdempotencyKey(userId, sessionId, messageId),
  requested: { standard: true, premium: true },
});

When to use:

  • Multi-step workflows where steps might retry
  • Distributed systems with at-least-once delivery
  • Request IDs already exist in your system

3. Auto-Generated Keys (Default)

Omit idempotencyKey and let UsageTap derive one:

await usageTap.beginCall({
  customerId: "cust_123",
  feature: "chat.completions",
  // No idempotencyKey - UsageTap generates deterministically
  requested: { standard: true, premium: true },
});

How it works:

  • UsageTap hashes: orgId + customerId + feature + requested entitlements
  • Same inputs = same callId returned
  • Great for bulk operations with natural deduplication

When to use:

  • Simple scripts without explicit request IDs
  • Internal tools where deduplication by inputs is acceptable
  • Testing and development

4. Header-Based Idempotency

For raw HTTP requests, use the Idempotency-Key header:

const response = await fetch(`${baseUrl}/call_begin`, {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${apiKey}`,
    "Content-Type": "application/json",
    "Idempotency-Key": crypto.randomUUID(),
    "Accept": "application/vnd.usagetap.v1+json",
  },
  body: JSON.stringify({
    customerId: "cust_123",
    feature: "chat.completions",
    requested: { standard: true, premium: true },
    holdUsd: 0.05,
  }),
});

Best Practice Summary

Scenario Recommended Approach Example
API endpoints Explicit crypto.randomUUID() idempotencyKey: crypto.randomUUID()
Background jobs Deterministic from job ID idempotencyKey: job_${jobId}``
Webhooks Use webhook event ID idempotencyKey: event.id
Bulk operations Auto-generated (omit key) No idempotencyKey field
Testing Fixed string for reproducibility idempotencyKey: "test-scenario-1"

API Reference

Core Methods

beginCall(request, options?)

Start a usage tracking session and get entitlements.

Request:

{
  customerId: string;           // Required: Your customer's ID
  feature?: string;             // Optional: Feature being accessed
  requested?: {                 // Optional: Requested capabilities
    standard?: boolean;         // Access to standard models
    premium?: boolean;          // Access to premium models
    search?: boolean;           // Web search capability
    reasoningLevel?: "NONE" | "LOW" | "MEDIUM" | "HIGH";
  };
  idempotencyKey?: string;      // Optional: Unique key for safe retries
  tags?: string[];              // Optional: Tags for analytics
}

Response:

{
  result: { status: "ACCEPTED", code: string, timestamp: string },
  data: {
    callId: string;             // Use this in endCall()
    allowed: {                  // What customer can actually use
      standard: boolean;
      premium: boolean;
      search: boolean;
      reasoningLevel: "NONE" | "LOW" | "MEDIUM" | "HIGH";
    };
    entitlementHints: {
      suggestedModelTier: "premium" | "standard" | "none";
      policy: "NONE" | "BLOCK" | "DOWNGRADE";
    };
    subscription: {
      planName: string;
      limitType: string;
      // ... more subscription details
    };
    meters: {                   // Current usage levels
      [meterName: string]: {
        remaining: number | null;
        limit: number | null;
        used: number;
        unlimited: boolean;
        ratio: number | null;
      };
    };
  },
  correlationId: string;
}

endCall(request, options?)

Report actual usage for a call.

Request:

{
  callId: string;               // Required: From beginCall response
  modelUsed?: string;           // Model identifier (e.g., "gpt-4o")
  inputTokens?: number;         // Prompt tokens
  responseTokens?: number;      // Completion tokens
  reasoningTokens?: number;     // Reasoning tokens (o1 models)
  searches?: number;            // Number of web searches
  audioSeconds?: number;        // Audio processing time
  error?: {                     // If call failed
    code: string;
    message: string;
  };
}

Response:

{
  result: { status: "ACCEPTED", code: string, timestamp: string },
  data: {
    callId: string;
    costUSD: number;            // Calculated cost
    metered: {                  // Usage that was counted
      tokens: number;
      calls: number;
      searches: number;
    };
    balances: {                 // Remaining quotas
      tokensRemaining: number;
      searchesRemaining: number;
    };
  },
  correlationId: string;
}

withUsage(request, handler, options?)

High-level helper that handles entire lifecycle:

const result = await usageTap.withUsage(
  {
    customerId: "cust_123",
    feature: "chat.send",
    idempotencyKey: crypto.randomUUID(),
    requested: { standard: true, premium: true },
  },
  async ({ begin, setUsage, setError }) => {
    // Your LLM call here
    const response = await openai.chat.completions.create({
      model: begin.data.allowed.premium ? "gpt-4o" : "gpt-4o-mini",
      messages: [{ role: "user", content: "Hello" }],
    });

    // Report usage
    setUsage({
      modelUsed: response.model,
      inputTokens: response.usage?.prompt_tokens ?? 0,
      responseTokens: response.usage?.completion_tokens ?? 0,
    });

    return response.choices[0].message.content;
  }
);

Automatic behavior:

  • Calls beginCall before handler
  • Calls endCall after handler (even if it throws)
  • Captures errors and reports them
  • Returns handler result

createCustomer(request, options?)

Ensure a customer subscription exists before making calls.

Request:

{
  customerId: string;           // Required: Your customer's ID
  customerFriendlyName?: string; // HIGHLY IMPORTANT BUT OPTIONAL: Display name (aka customerName)
  customerEmail?: string;       // HIGHLY IMPORTANT BUT OPTIONAL: For billing notifications
  stripeCustomerId?: string;    // Link to Stripe customer
}

Response:

{
  result: { status: "ACCEPTED", ... },
  data: {
    customerId: string;
    newCustomer: boolean;       // true if just created
    allowed: { ... };           // Current entitlements
    subscription: { ... };      // Subscription details
    plan: { ... };              // Active plan info
  },
  correlationId: string;
}

Idempotent: Safe to call multiple times. Returns newCustomer: false if customer already exists.

changePlan(request, options?)

Switch customer to different usage plan.

Request:

{
  customerId: string;           // Required
  planId: string;               // Required: Target plan ID
  strategy?: "IMMEDIATE_RESET" | "IMMEDIATE_PRORATED" | "AT_NEXT_REPLENISH";
}

Strategy options:

  • IMMEDIATE_RESET: Switch now, reset usage to zero
  • IMMEDIATE_PRORATED: Switch now, prorate existing usage
  • AT_NEXT_REPLENISH: Schedule change for next billing cycle (default)

checkUsage(request, options?)

Query current usage status without creating a call.

Request:

{
  customerId: string;           // Required
}

Response: Same structure as beginCall but without creating a call record. Use for dashboard widgets or pre-flight checks.


Error Handling

All SDK methods throw UsageTapError on failure:

import { UsageTapError, isUsageTapError } from "@usagetap/sdk";

try {
  await usageTap.beginCall({
    customerId: "cust_123",
    idempotencyKey: crypto.randomUUID(),
  });
} catch (error) {
  if (isUsageTapError(error)) {
    console.error("UsageTap error:", {
      code: error.code,          // Error code (e.g., "USAGETAP_RATE_LIMITED")
      message: error.message,    // Human-readable message
      retryable: error.retryable, // Whether retry might succeed
      correlationId: error.correlationId, // For support
      details: error.details,    // Additional context
    });

    if (error.retryable) {
      // Retry with exponential backoff
    } else {
      // Permanent error - handle gracefully
    }
  } else {
    // Non-UsageTap error
    console.error("Unexpected error:", error);
  }
}

Common Error Codes:

  • USAGETAP_AUTH_ERROR: Invalid API key
  • USAGETAP_RATE_LIMITED: Too many requests (retryable)
  • USAGETAP_BAD_REQUEST: Invalid request parameters
  • USAGETAP_SERVER_ERROR: Server issue (retryable)
  • USAGETAP_NETWORK_ERROR: Network failure (retryable)

Automatic Retries: The SDK retries transient errors automatically with exponential backoff. Configure retry behavior:

const usageTap = new UsageTapClient({
  apiKey: process.env.USAGETAP_API_KEY!,
  baseUrl: process.env.USAGETAP_BASE_URL!,
  retries: {
    maxAttempts: 3,       // Default: 3
    baseDelayMs: 250,     // Default: 250ms
    maxDelayMs: 5000,     // Default: 5000ms
    jitterRatio: 0.2,     // Default: 0.2 (20% jitter)
  },
});

Quick Reference: Entitlement Mapping

How to use begin.data.allowed to configure LLM calls:

const begin = await usageTap.beginCall({
  customerId: "cust_123",
  idempotencyKey: crypto.randomUUID(),
  requested: {
    standard: true,
    premium: true,
    search: true,
    reasoningLevel: "HIGH",
  },
});

// Select model tier
const model = begin.data.allowed.premium ? "gpt-4o" : "gpt-4o-mini";

// Configure reasoning effort (for o1 models)
const reasoningEffort = (() => {
  switch (begin.data.allowed.reasoningLevel) {
    case "HIGH": return "high";
    case "MEDIUM": return "medium";
    case "LOW": return "low";
    default: return undefined;
  }
})();

// Enable web search if allowed
const tools = begin.data.allowed.search
  ? [{ type: "web_search" }]
  : undefined;

// Make the call
const response = await openai.chat.completions.create({
  model,
  messages: [...],
  reasoning: reasoningEffort ? { effort: reasoningEffort } : undefined,
  tools,
});

Migration from Direct OpenAI Usage

Before (Direct OpenAI):

const response = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Hello" }],
});

After (With UsageTap):

const completion = await usageTap.withUsage(
  {
    customerId: userId,
    feature: "chat.send",
    idempotencyKey: crypto.randomUUID(),
    requested: { standard: true, premium: true },
  },
  async ({ begin, setUsage }) => {
    const model = begin.data.allowed.premium ? "gpt-4o" : "gpt-4o-mini";
    
    const response = await openai.chat.completions.create({
      model,
      messages: [{ role: "user", content: "Hello" }],
    });

    setUsage({
      modelUsed: model,
      inputTokens: response.usage?.prompt_tokens ?? 0,
      responseTokens: response.usage?.completion_tokens ?? 0,
    });

    return response.choices[0].message.content;
  }
);

Changes Required:

  1. Wrap call in withUsage()
  2. Get customer ID from auth system
  3. Add idempotencyKey: crypto.randomUUID()
  4. Use entitlements to select model
  5. Report usage with setUsage()

That's it! 🎉


Support