Quickstart Guide

Add usage tracking to your AI app in under 10 minutes.

<10 min Server-side only 4–15 LOC

1 Get your API key

Create an API key in Configure and keep it server-side only.

Security Notice

Never expose USAGETAP_API_KEY in the browser or commit it to git.

2 Install the SDK

Install the package with your preferred manager:

npm install @usagetap/sdk
# or
pnpm add @usagetap/sdk
# or
yarn add @usagetap/sdk

SDK enforces server-only usage (throws if imported client-side).

3 Configure environment variables

Add a .env.local file:

# .env.local
USAGETAP_API_KEY=ut_example_server_key
OPENAI_API_KEY=sk-proj-example-openai-key

See .env.example for more options.

4 Add tracking

Pick one approach below:

Option A: withUsage + entitlements (recommended)

Call UsageTap first, then choose the OpenAI model, reasoning effort, and search tools from the returned allowances.

// app/api/chat/route.ts
import OpenAI from 'openai';
import { UsageTapClient } from '@usagetap/sdk';

const usageTap = new UsageTapClient({
  apiKey: process.env.USAGETAP_API_KEY!,
  baseUrl: process.env.USAGETAP_BASE_URL!,
});

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY! });

function selectCapabilities(allowed: {
  standard?: boolean;
  premium?: boolean;
  reasoningLevel?: 'LOW' | 'MEDIUM' | 'HIGH' | null;
  search?: boolean;
}) {
  const tier = allowed.premium ? 'premium' : 'standard';
  const model = tier === 'premium' ? 'gpt5' : 'gpt5-mini';
  const reasoningEffort =
    allowed.reasoningLevel === 'HIGH'
      ? 'high'
      : allowed.reasoningLevel === 'MEDIUM'
        ? 'medium'
        : allowed.reasoningLevel === 'LOW'
          ? 'low'
          : undefined;

  return {
    model,
    reasoning: reasoningEffort ? { effort: reasoningEffort } : undefined,
    tools: allowed.search ? [{ type: 'web_search' as const }] : undefined,
  };
}

export async function POST(req: Request) {
  const { messages } = await req.json();
  const prompt = messages
    .map((m: { role: string; content: string }) => `${m.role{'}'}: ${m.content{'}'}`)
    .join('
');

  const completion = await usageTap.withUsage(
    {
      customerId: 'quickstart-user',
      feature: 'chat.send',
      requested: { standard: true, premium: true, search: true, reasoningLevel: 'HIGH' },
    },
    async ({ begin, setUsage }) => {
      const { model, reasoning, tools } = selectCapabilities(begin.data.allowed);

      const response = await openai.responses.create({
        model,
        input: prompt,
        reasoning,
        tools,
      });

      setUsage({
        modelUsed: model,
        inputTokens: response.usage?.input_tokens ?? response.usage?.prompt_tokens ?? 0,
        responseTokens: response.usage?.output_tokens ?? response.usage?.completion_tokens ?? 0,
        reasoningTokens: reasoning ? response.usage?.reasoning_tokens ?? 0 : 0,
        searches: tools?.length ? response.usage?.web_search_queries ?? 0 : 0,
      });

      return response;
    },
  );

  return new Response(completion.output_text ?? '', {
    headers: { 'Content-Type': 'text/plain' },
  });
}

Option B: wrapOpenAI (zero-boilerplate)

wrapOpenAI automatically applies entitlement-aware defaults. It picks gpt5 (premium) or gpt5-mini (standard) based on begin.data.allowed.

// app/api/chat/route.ts - Minimal wrapOpenAI
import OpenAI from 'openai';
import { UsageTapClient, wrapOpenAI } from '@usagetap/sdk';

const usageTap = new UsageTapClient({
  apiKey: process.env.USAGETAP_API_KEY!,
  baseUrl: process.env.USAGETAP_BASE_URL!,
});

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY! });

const ai = wrapOpenAI(openai, usageTap, {
  defaultContext: {
    customerId: 'quickstart-user',
    feature: 'chat.send',
    requested: { standard: true, premium: true, search: true, reasoningLevel: 'HIGH' },
  },
});

export async function POST(req: Request) {
  const { messages } = await req.json();

  // wrapOpenAI automatically selects gpt5 (premium) or gpt5-mini (standard)
  // based on begin.data.allowed
  const completion = await ai.chat.completions.create(
    { messages },
    {
      usageTap: {
        requested: { standard: true, premium: true, search: true, reasoningLevel: 'MEDIUM' },
      },
    },
  );

  return new Response(completion.choices[0]?.message?.content ?? '', {
    headers: { 'Content-Type': 'text/plain' },
  });
}

Pro tip: Replace quickstart-user with a real authenticated user id.

5 Test locally

Start dev server and send a test request:

curl -X POST http://localhost:3000/api/chat 
  -H "Content-Type: application/json" 
  -d '{"messages":[{"role":"user","content":"Hello!"}]}'

Look for a streaming response and UsageTap logging.

Verify in dashboard

Real-time token & cost metrics
Per-customer feature usage
Quota and plan attribution
Error vs success breakdown

Success!

You're now tracking AI usage with UsageTap.

Next steps

📚 SDK Documentation

Explore wrapOpenAI, Express middleware, React hooks, streaming helpers, checkUsage(), and advanced overrides.

🤖 LLM Prompt Kits

Copy-paste instructions for Cursor / Copilot to scaffold integrations fast.

📊 Usage Plans

Configure quota tiers and feature flags for customers.

💬 Get Help

Questions? Reach out via support or community.

Check usage without creating a call

Need to display current quota status or plan details without tracking a vendor call? Use checkUsage():

const usageStatus = await usageTap.checkUsage({ customerId: "cust_123" });

console.log("Meters:", usageStatus.data.meters);
console.log("Allowed:", usageStatus.data.allowed);
console.log("Plan:", usageStatus.data.plan);
console.log("Balances:", usageStatus.data.balances);

Returns the same rich snapshot as call_begin (meters, entitlements, subscription details, plan info, balances) but without creating a call record. Perfect for dashboard widgets, pre-flight checks, or displaying quota status.

`call_begin` envelope (live response)

UsageTap now responds with the canonical { result, data, correlationId } envelope. The SDK sends the required Accept: application/vnd.usagetap.v1+json header for you:

{
  "result": {
    "status": "ACCEPTED",
    "code": "CALL_BEGIN_SUCCESS",
    "timestamp": "2025-10-04T18:21:37.482Z"
  },
  "data": {
    "callId": "call_123",
    "startTime": "2025-10-04T18:21:37.482Z",
    "policy": "DOWNGRADE",
    "newCustomer": false,
    "canceled": false,
    "allowed": {
      "standard": true,
      "premium": true,
      "audio": false,
      "image": false,
      "search": true,
      "reasoningLevel": "MEDIUM"
    },
    "entitlementHints": {
      "suggestedModelTier": "standard",
      "reasoningLevel": "MEDIUM",
      "policy": "DOWNGRADE",
      "downgrade": {
        "reason": "PREMIUM_QUOTA_EXHAUSTED",
        "fallbackTier": "standard"
      }
    },
    "meters": {
      "standardCalls": {
        "remaining": 12,
        "limit": 20,
        "used": 8,
        "unlimited": false,
        "ratio": 0.6
      },
      "premiumCalls": {
        "remaining": null,
        "limit": null,
        "used": null,
        "unlimited": true,
        "ratio": null
      },
      "standardTokens": {
        "remaining": 800,
        "limit": 1000,
        "used": 200,
        "unlimited": false,
        "ratio": 0.8
      }
    },
    "remainingRatios": {
      "standardCalls": 0.6,
      "standardTokens": 0.8
    },
    "subscription": {
      "id": "sub_123",
      "usagePlanVersionId": "plan_2025_01",
      "planName": "Pro",
      "planVersion": "2025-01",
      "limitType": "DOWNGRADE",
      "reasoningLevel": "MEDIUM",
      "lastReplenishedAt": "2025-10-04T00:00:00.000Z",
      "nextReplenishAt": "2025-11-04T00:00:00.000Z",
      "subscriptionVersion": 14
    },
    "models": {
      "standard": ["gpt5-mini"],
      "premium": ["gpt5"]
    },
    "idempotency": {
      "key": "call_123",
      "source": "derived"
    }
  },
  "correlationId": "corr_abc123"
}

The envelope now includes richer metadata: entitlementHints summarizes the recommended model tier and downgrade rationale, meters provides per-counter snapshots with remaining quotas and ratios, remainingRatios offers compact lookups, subscription contains plan identity and replenishment timestamps, and models surfaces vendor hints (standard vs premium model shortlists). data.idempotency.key always matches callId. When you omit idempotency in the request, the backend derives a deterministic hash from organization, customer, feature, and requested entitlements.

Unified `/call` shortcut

Prefer a one-shot API flow without the SDK? Call the public `/call` endpoint: UsageTap runs begin → optional vendor call → end for you and returns both envelopes in one response.

const baseUrl = process.env.USAGETAP_BASE_URL ?? "https://api.usagetap.com";

const res = await fetch(`${baseUrl}/call`, {
  method: 'POST',
  headers: {
    Authorization: 'Bearer ' + process.env.USAGETAP_API_KEY,
    Accept: 'application/vnd.usagetap.v1+json',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    customerId: 'cust_demo',
    requested: { standard: true },
    feature: 'chat.completions',
    vendor: {
      url: 'https://api.openai.com/v1/chat/completions',
      method: 'POST',
      headers: {
        Authorization: 'Bearer ' + process.env.OPENAI_API_KEY,
        'Content-Type': 'application/json',
      },
      body: {
        model: 'gpt-4o-mini',
        messages: [{ role: 'user', content: 'Hi!' }],
      },
      responseType: 'json',
    },
  }),
});

const envelope = await res.json();
if (envelope.result.code !== 'CALL_SUCCESS') {
  console.error('UsageTap call failed', envelope);
}

Omit the vendor block if you just want begin → end with your own usage numbers. Vendor errors downgrade to `CALL_VENDOR_WARNING`, but UsageTap still records the end-of-call telemetry so quotas stay accurate.

`call_end` success response

When you finalize usage, the envelope mirrors the same structure and surfaces a `metered` summary derived from the Dynamo counters:

{
  "result": {
    "status": "ACCEPTED",
    "code": "CALL_END_SUCCESS",
    "timestamp": "2025-10-04T18:21:52.103Z"
  },
  "data": {
    "callId": "call_123",
    "costUSD": 0,
    "metered": {
      "tokens": 768,
      "calls": 1,
      "searches": 1
    }
  },
  "correlationId": "corr_abc123"
}

metered is derived from the raw Dynamo deltas and reports the amounts consumed. Additional meters (audio seconds, reasoning tokens, balances) will populate in later phases without breaking the contract. BLOCK policy violations still return HTTP 429 with an error envelope.

Premium detection: UsageTap automatically classifies calls as premium when the model's output token price exceeds $4 per million. You can override this by passing isPremium: true or isPremium: false in your call_end request.