TokenSurf Cloud Routing API

Reference for the existing TokenSurf Cloud routing API — the managed, OpenAI-compatible proxy backend. Point your OpenAI SDK at the base URL below and your existing code works unchanged.

New direction: TokenSurf is now an AI-Agent-Quality framework — offline evaluation plus online monitoring for your agents. The self-hostable platform (SDK, server, dashboard, and database) is open source, launching soon. This page documents the existing TokenSurf Cloud routing backend, which is still running.
Base URL: https://api.tokensurf.io/v1
All endpoints follow the OpenAI API spec. Your existing code works unchanged.

Quickstart

1. Sign up — get 1,000 free credits:

curl -X POST https://tokensurf.io/api/signup \
  -H "Content-Type: application/json" \
  -d '{"email": "you@company.com"}'

2. Save your API key (starts with ts_, shown only once).

3. Add your provider key (e.g. your OpenAI key):

curl -X POST https://tokensurf.io/api/keys/ \
  -H "Authorization: Bearer ts_your_key" \
  -H "Content-Type: application/json" \
  -d '{"provider": "openai", "apiKey": "sk-your-openai-key"}'

4. Use it — change your base URL:

from openai import OpenAI

client = OpenAI(
    api_key="ts_your_key",
    base_url="https://api.tokensurf.io/v1")

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What is 2+2?"}]
)
# Simple query → auto-routed to gpt-4o-mini by the backend
# Check response headers: X-TokenSurf-Downgraded: true

Authentication

All requests require a Bearer token in the Authorization header:

Authorization: Bearer ts_your_api_key_here

API keys start with ts_ and are generated at signup. We store only the SHA-256 hash — if you lose your key, you'll need to create a new account.

Provider Keys

TokenSurf doesn't call LLMs directly. You bring your own API keys for each provider. Keys are encrypted with AES-256-GCM before storage.

Supported providers:

ProviderKey formatGet one
OpenAIsk-...platform.openai.com
Anthropicsk-ant-...console.anthropic.com
GoogleAIza...aistudio.google.com
OpenRoutersk-or-...openrouter.ai/keys

Credits & Billing

1 credit = 1 routed request, regardless of tokens or model. The backend tracks credits per account.

TokenSurf Cloud is in early access. Plans and pricing are not self-serve — join the waitlist to get access.

Chat Completions

POST /v1/chat/completions

100% OpenAI-compatible. Supports streaming, tool calls, JSON mode.

Request body

FieldTypeRequiredDescription
modelstringYesModel ID (e.g. gpt-4o, claude-sonnet-4.6, deepseek/deepseek-r1)
messagesarrayYesArray of message objects (role, content)
streambooleanNoEnable SSE streaming (default: false)
temperaturenumberNoSampling temperature
max_tokensintegerNoMaximum output tokens
toolsarrayNoTool/function definitions (forces COMPLEX routing)
response_formatobjectNoJSON mode (forces COMPLEX routing)

Example: non-streaming

curl -X POST https://api.tokensurf.io/v1/chat/completions \
  -H "Authorization: Bearer ts_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "What is the capital of France?"}]
  }'

Response

{
  "id": "chatcmpl-abc123",
  "model": "gpt-4o-mini",  // ← downgraded for simple query
  "choices": [{
    "message": { "role": "assistant", "content": "Paris." },
    "finish_reason": "stop"
  }],
  "usage": { "prompt_tokens": 14, "completion_tokens": 2, "total_tokens": 16 }
}

Example: streaming

curl -X POST https://api.tokensurf.io/v1/chat/completions \
  -H "Authorization: Bearer ts_your_key" \
  -H "Content-Type: application/json" \
  -d '{"model": "claude-sonnet-4.6", "messages": [{"role": "user", "content": "Hello"}], "stream": true}'

Example: OpenRouter model

curl -X POST https://api.tokensurf.io/v1/chat/completions \
  -H "Authorization: Bearer ts_your_key" \
  -H "Content-Type: application/json" \
  -d '{"model": "deepseek/deepseek-r1", "messages": [{"role": "user", "content": "Explain quantum computing"}]}'
# Any model with provider/name format → routes via OpenRouter

List Models

GET /v1/models

Returns all supported models with pricing and routing info.

curl https://api.tokensurf.io/v1/models

Signup

POST /api/signup

Create a new account. Returns an API key (shown only once) and 1,000 free credits.

curl -X POST https://tokensurf.io/api/signup \
  -H "Content-Type: application/json" \
  -d '{"email": "dev@example.com"}'

Response

{
  "apiKey": "ts_7d96f6aac5009f1b...",
  "apiKeyPrefix": "ts_7d96f6...",
  "credits": 1000,
  "message": "Save your API key — it cannot be recovered."
}

Dashboard

GET /api/dashboard/

Returns your credit balance and usage stats for the current month.

curl https://tokensurf.io/api/dashboard/ \
  -H "Authorization: Bearer ts_your_key"

Manage Provider Keys

GET /api/keys/ — Check which providers are configured

POST /api/keys/ — Save a provider key

DELETE /api/keys/ — Remove a provider key

Save a key

curl -X POST https://tokensurf.io/api/keys/ \
  -H "Authorization: Bearer ts_your_key" \
  -H "Content-Type: application/json" \
  -d '{"provider": "anthropic", "apiKey": "sk-ant-your-key"}'

Valid providers: openai, anthropic, google, openrouter

Rotate API Key

POST /api/keys/ with {"action": "rotate"}

Generates a new API key. Your old key continues to work for 24 hours (grace period).

curl -X POST https://tokensurf.io/api/keys/ \
  -H "Authorization: Bearer YOUR_FIREBASE_ID_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"action": "rotate"}'
# Returns: {"status": "rotated", "apiKey": "ts_new...", "prefix": "ts_abc123..."}

Routing Config

GET /api/routingConfigApi/ — Get current routing configuration

PUT /api/routingConfigApi/ — Update routing configuration

DELETE /api/routingConfigApi/ — Reset to defaults

Configuration fields

FieldTypeDescription
enabledbooleanMaster switch for backend model routing
aiClassifierbooleanUse Gemini Flash for ambiguous queries
ambiguousFallback"conservative" | "aggressive"How to handle ambiguous queries when AI is off
modelOverridesobjectPer-model {enabled, customTarget} overrides
providerEnabledobjectEnable/disable routing to each provider
providerPrioritystring[]Provider preference order for fallback chains

Buy Credits

POST /api/checkout

Creates a Stripe Checkout session. Redirect the user to the returned URL.

curl -X POST https://tokensurf.io/api/checkout \
  -H "Authorization: Bearer ts_your_key" \
  -H "Content-Type: application/json" \
  -d '{"amount": 25}'
# Returns: {"url": "https://checkout.stripe.com/..."}

Amount must be between $5 and $500. $1 = 1,000 credits.

Subscribe to a Plan

POST /api/subscribe

Creates a Stripe Checkout session for a monthly subscription plan.

curl -X POST https://tokensurf.io/api/subscribe \
  -H "Authorization: Bearer ts_your_key" \
  -H "Content-Type: application/json" \
  -d '{"planId": "growth", "annual": false}'
# Returns: {"url": "https://checkout.stripe.com/..."}

Valid plan IDs: growth, scale. The annual flag selects annual billing.

Health Check

GET /api/health

Returns system status, provider health, cache hit rates, and latency percentiles. No authentication required.

curl https://tokensurf.io/api/health
# Returns: {"status":"healthy","region":"us-central1","providers":{...},"metrics":{...}}

Organizations (Teams)

Manage organizations for team-based API key management with per-key budgets and rate limits.

MethodEndpointDescription
GET/api/orgs/List your organizations
POST/api/orgs/Create organization ({"name": "..."})
GET/api/orgs/:idGet org details + members
PUT/api/orgs/:idUpdate org (owner/admin)
DELETE/api/orgs/:idDelete org (owner only)
POST/api/orgs/:id/membersAdd member ({"email": "...", "role": "member"})
DELETE/api/orgs/:id/membersRemove member ({"userId": "..."})

Roles: owner (full control), admin (manage keys + members), member (read-only).

Team API Keys

Create labeled API keys for your organization with per-key budgets, rate limits, and model restrictions.

MethodEndpointDescription
GET/api/org-keys/:orgIdList org's API keys
POST/api/org-keys/:orgIdCreate key (owner/admin)
DELETE/api/org-keys/:orgIdDelete key (owner/admin)

Create team key

curl -X POST https://tokensurf.io/api/org-keys/ORG_ID \
  -H "Authorization: Bearer YOUR_FIREBASE_ID_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"label": "production", "monthlyBudget": 10000, "rpm": 60}'
# Returns: {"apiKey": "ts_org_...", "prefix": "ts_org_abc123..."}

Team keys use the ts_org_ prefix. They consume credits from the organization's balance. Each key can have a monthly budget cap and model allowlist.

Architecture

TokenSurf is a single proxy that sits between your app and LLM providers. Every request flows through this pipeline:

Request Pipeline

// 1. Your app sends a standard OpenAI SDK request
POST /v1/chat/completions
Authorization: Bearer ts_your_key
{ "model": "gpt-4o", "messages": [...] }

// 2. TokenSurf proxy handles it
rate-limit  → In-memory token bucket (60 req/min, 10 req/s burst per key)
auth        → Validate API key (Redis cache → Firestore fallback)
abuse       → Abuse detection (throttle/block on anomalous patterns)
credits     → Redis DECR (atomic, lock-free, ~1ms) with Firestore background sync
classify    → Rule engine (0ms) → classifier cache → AI classifier (~50ms)
route       → If simple + downgrade target exists: swap model
circuit-brk → Check provider health → fallback to alternative provider if down
forward     → Pooled HTTP connection with retry (2 retries, exponential backoff)
translate   → Convert response to OpenAI format (if Anthropic or Google)
quality     → 5% of responses scored async by Gemini Flash-Lite (1-10 scale)
log         → Structured logging + async usage aggregation

// 3. Response returned to your app in OpenAI format
+ X-TokenSurf-Model: gpt-4o-mini
+ X-TokenSurf-Downgraded: true
+ X-TokenSurf-Complexity: simple
+ X-TokenSurf-Request-Id: a1b2c3d4...
+ X-TokenSurf-Region: us-central1

Complexity Classification

The classifier runs in two stages. It's conservative by design — when uncertain, it keeps your original model.

SignalResultWhat triggers it
Tools / function callingComplexAny request with tools parameter
Structured outputComplexAny request with response_format
Code patternsComplex"analyze", "implement", "refactor", "debug", code blocks
Long conversationComplex6+ messages in the conversation
Long messageComplex500+ estimated tokens in last user message
Factual questionSimple"What is", "Define", "Translate", "Calculate"
Very short querySimpleUnder 50 tokens, 1-2 messages
Everything elseAmbiguousSent to Gemini Flash AI classifier or treated as complex

Provider Translation

You always send and receive the OpenAI format. TokenSurf translates internally:

ProviderRequest translationResponse translation
OpenAIPass-throughPass-through
AnthropicExtract system messages, merge consecutive roles, ensure first message is userMap end_turnstop, reconstruct choices array
GoogleSystem → systemInstruction, assistantmodel roleMap STOP/MAX_TOKENS/SAFETY finish reasons
OpenRouterPass-through (OpenAI-compatible)Pass-through

Security

Streaming

Streaming ("stream": true) is fully supported across all providers. SSE events from Anthropic and Google are translated in real-time to the OpenAI chat.completion.chunk format.

Resilience

TokenSurf is built for millions of requests per month with multiple layers of fault tolerance:

LayerMechanismDetails
Rate LimitingToken bucket60 req/min, 10 req/sec burst per API key. Returns 429 with Retry-After header.
Circuit BreakerPer-provider state machineCLOSED → OPEN (fail fast for 30s) → HALF_OPEN (probe) → CLOSED. Triggers on 5+ failures in 60s.
RetryExponential backoff2 retries with jitter on 429/500/502/503. Respects Retry-After headers.
Fallback ChainsCross-provider equivalencesWhen a provider is down, routes to an equivalent model on another provider (e.g. gpt-4o → claude-sonnet-4-6).
Connection Poolingundici HTTP poolsPersistent TCP/TLS connections to all providers. Saves 50-100ms per request.
Abuse DetectionBehavioral analysisThrottles on high request rates (>600/hour) or error rates (>50%). Escalates to key blocking.

Caching

Redis (Memorystore) caching eliminates Firestore from the hot path. All caching is transparent and gracefully degrades if Redis is unavailable.

CacheKeyTTLImpact
Authapikey:{hash}5 min90%+ of Firestore auth queries eliminated (50ms → 1ms)
Creditscredits:{userId}10 minAtomic Redis DECR replaces Firestore transactions (30ms → 1ms)
Classifierclassify:{hash}1 hourSkips Gemini AI call for repeated ambiguous queries (~200ms saved)

Cache is invalidated on: credit top-ups, key rotation, provider key changes, and routing config updates.

Quality Scoring

TokenSurf automatically samples 5% of non-streaming responses and scores them for quality using Gemini Flash-Lite. This helps you verify that downgraded models still meet your quality bar.

ScoreRatingMeaning
9-10ExcellentComprehensive, accurate, well-structured
7-8GoodMostly correct with minor issues
4-6FairPartially correct or vague
1-3PoorIncorrect, irrelevant, or harmful

Quality scores are aggregated per model per month and visible in your dashboard. Downgraded responses are tracked separately so you can compare original vs routed model quality.

How Routing Works

Every request goes through a two-stage classifier:

  1. Rule-based pre-filter (0ms) — catches obvious simple/complex queries using pattern matching:
    • SIMPLE: Short factual questions, translations, calculations, definitions
    • COMPLEX: Code blocks, multi-step reasoning, tool calls, JSON mode, long prompts (500+ tokens), long conversations (6+ messages)
  2. AI classifier — for ambiguous queries, a Gemini Flash-Lite call classifies in <3 seconds. If it times out, defaults to COMPLEX.
Conservative by design: When in doubt, the backend keeps your original model. Complex queries are never downgraded.

Routing Table

Simple queries get downgraded to a lighter model. Complex queries and already-light models pass through unchanged. This is the existing backend's model-mapping behavior.

ProviderModelSimple → Routes To
OpenAIgpt-4ogpt-4o-mini
gpt-4-turbogpt-4o-mini
gpt-4gpt-4o-mini
gpt-4o-mini / gpt-3.5-turbopass-through
Anthropicclaude-opus-4.6 / 4.5claude-haiku-4.5
claude-sonnet-4.6 / 4.5claude-haiku-4.5
claude-opus-4.1 / 4.0claude-haiku-4.5
claude-sonnet-4.0claude-haiku-4.5
claude-haiku-*pass-through
Googlegemini-3.1-pro-previewgemini-2.5-flash
gemini-2.5-progemini-2.5-flash
gemini-*-flash*pass-through
OpenRouterAny provider/model formatpass-through (300+ models)

Fallback Chains

When a provider is unavailable (circuit breaker open or persistent 5xx), TokenSurf automatically routes to an equivalent model on another provider. Fallback order follows your providerPriority setting.

Primary ModelAnthropic FallbackGoogle Fallback
gpt-4oclaude-sonnet-4-6gemini-2.5-pro
gpt-4o-miniclaude-haiku-4-5gemini-2.5-flash
Primary ModelOpenAI FallbackGoogle Fallback
claude-sonnet-4-6gpt-4ogemini-2.5-pro
claude-haiku-4-5gpt-4o-minigemini-2.5-flash

Fallback only triggers when the provider is fully down (not for 4xx client errors). The X-TokenSurf-Fallback: true header indicates a fallback was used.

Response Headers

Every proxy response includes these headers:

HeaderValueDescription
X-TokenSurf-Modelgpt-4o-miniThe model that actually served the request
X-TokenSurf-Downgradedtrue / falseWhether the model was downgraded
X-TokenSurf-Complexitysimple / complexHow the query was classified
X-TokenSurf-Request-Ida1b2c3d4-...Unique ID for tracing and support
X-TokenSurf-Regionus-central1Which region served the request
X-TokenSurf-FallbacktruePresent when a fallback provider was used

OpenAI

Requests for OpenAI models are forwarded directly to api.openai.com. Format is pass-through — no translation needed.

Models: gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-4, gpt-3.5-turbo

Required key: openai

Anthropic

Requests are translated from OpenAI format to the Anthropic Messages API. System messages are extracted into the system parameter. Streaming events are transformed to OpenAI SSE format.

Models: claude-opus-4.6, claude-sonnet-4.6, claude-haiku-4.5, claude-opus-4.5, claude-sonnet-4.5, claude-opus-4.1, claude-sonnet-4.0, claude-opus-4.0, claude-haiku-3.5

Required key: anthropic

Google Gemini

Requests are translated to Gemini's generateContent format. System messages become systemInstruction. Roles are mapped (assistantmodel).

Models: gemini-3.1-pro-preview, gemini-3-flash-preview, gemini-3.1-flash-lite-preview, gemini-2.5-pro, gemini-2.5-flash, gemini-2.5-flash-lite

Required key: google

OpenRouter

Any model ID containing a / (e.g. deepseek/deepseek-r1) is automatically routed through OpenRouter. Format is OpenAI-compatible — no translation needed.

Popular models: meta-llama/llama-3.3-70b-instruct, meta-llama/llama-4-maverick, deepseek/deepseek-chat, deepseek/deepseek-r1, mistralai/mistral-large-latest, qwen/qwen-2.5-72b-instruct, cohere/command-r-plus

Required key: openrouter

See all 300+ models at openrouter.ai/models

Python

from openai import OpenAI

client = OpenAI(
    api_key="ts_your_key",
    base_url="https://api.tokensurf.io/v1"
)

# Works with any supported model
response = client.chat.completions.create(
    model="gpt-4o",  # or claude-sonnet-4.6, gemini-2.5-pro, deepseek/deepseek-r1
    messages=[{"role": "user", "content": "Hello"}]
)

# Check if downgraded
# response.headers["X-TokenSurf-Downgraded"]

Node.js

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "ts_your_key",
  baseURL: "https://api.tokensurf.io/v1",
});

const response = await client.chat.completions.create({
  model: "claude-opus-4.6",
  messages: [{ role: "user", content: "Hello" }],
});

cURL

curl https://api.tokensurf.io/v1/chat/completions \
  -H "Authorization: Bearer ts_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-2.5-pro",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Error Codes

HTTPTypeMeaning
400invalid_request_errorMissing model/messages, unsupported model, no provider key, body too large, or model not allowed for org key
401authentication_errorMissing or invalid API key (ts_ or ts_org_)
402insufficient_creditsNo credits remaining, or org key monthly budget exhausted
403invalid_request_errorModel not in org key's allowlist
405invalid_request_errorWrong HTTP method
409Email already registered (signup), or member already in org
429rate_limit_errorRate limit exceeded (per-key bucket) or abuse detection throttle. Check Retry-After header.
502provider_errorUpstream provider failed after retries — credit is automatically refunded
503provider_unavailableProvider circuit breaker is open and no fallback configured — credit refunded
Provider errors (502/503): If the upstream provider fails, your credit is automatically refunded. You only pay for successful requests. The proxy retries up to 2 times with exponential backoff before returning an error.