TokenSurf Cloud Routing API

Reference for the existing TokenSurf Cloud routing API — the managed, OpenAI-compatible proxy backend. Point your OpenAI SDK at the base URL below and your existing code works unchanged.

New direction: TokenSurf is now an AI-Agent-Quality framework — offline evaluation plus online monitoring for your agents. The self-hostable platform (SDK, server, dashboard, and database) is open source, launching soon. This page documents the existing TokenSurf Cloud routing backend, which is still running.

Base URL: https://api.tokensurf.io/v1
All endpoints follow the OpenAI API spec. Your existing code works unchanged.

Quickstart

1. Sign up — get 1,000 free credits:

curl -X POST https://tokensurf.io/api/signup \
  -H "Content-Type: application/json" \
  -d '{"email": "you@company.com"}'

2. Save your API key (starts with ts_, shown only once).

3. Add your provider key (e.g. your OpenAI key):

curl -X POST https://tokensurf.io/api/keys/ \
  -H "Authorization: Bearer ts_your_key" \
  -H "Content-Type: application/json" \
  -d '{"provider": "openai", "apiKey": "sk-your-openai-key"}'

4. Use it — change your base URL:

from openai import OpenAI

client = OpenAI(
    api_key="ts_your_key",
    base_url="https://api.tokensurf.io/v1")

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What is 2+2?"}]
)
# Simple query → auto-routed to gpt-4o-mini by the backend
# Check response headers: X-TokenSurf-Downgraded: true

Authentication

All requests require a Bearer token in the Authorization header:

Authorization: Bearer ts_your_api_key_here

API keys start with ts_ and are generated at signup. We store only the SHA-256 hash — if you lose your key, you'll need to create a new account.

Provider Keys

TokenSurf doesn't call LLMs directly. You bring your own API keys for each provider. Keys are encrypted with AES-256-GCM before storage.

Supported providers:

Provider	Key format	Get one
OpenAI	`sk-...`	platform.openai.com
Anthropic	`sk-ant-...`	console.anthropic.com
Google	`AIza...`	aistudio.google.com
OpenRouter	`sk-or-...`	openrouter.ai/keys

Credits & Billing

1 credit = 1 routed request, regardless of tokens or model. The backend tracks credits per account.

Each account has a credit balance that is decremented per routed request
When credits hit 0, requests return 402
If a provider call fails, the credit is refunded automatically

TokenSurf Cloud is in early access. Plans and pricing are not self-serve — join the waitlist to get access.

Chat Completions

POST /v1/chat/completions

100% OpenAI-compatible. Supports streaming, tool calls, JSON mode.

Request body

Field	Type	Required	Description
`model`	string	Yes	Model ID (e.g. `gpt-4o`, `claude-sonnet-4.6`, `deepseek/deepseek-r1`)
`messages`	array	Yes	Array of message objects (`role`, `content`)
`stream`	boolean	No	Enable SSE streaming (default: false)
`temperature`	number	No	Sampling temperature
`max_tokens`	integer	No	Maximum output tokens
`tools`	array	No	Tool/function definitions (forces COMPLEX routing)
`response_format`	object	No	JSON mode (forces COMPLEX routing)

Example: non-streaming

curl -X POST https://api.tokensurf.io/v1/chat/completions \
  -H "Authorization: Bearer ts_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "What is the capital of France?"}]
  }'

Response

{
  "id": "chatcmpl-abc123",
  "model": "gpt-4o-mini",  // ← downgraded for simple query
  "choices": [{
    "message": { "role": "assistant", "content": "Paris." },
    "finish_reason": "stop"
  }],
  "usage": { "prompt_tokens": 14, "completion_tokens": 2, "total_tokens": 16 }
}

Example: streaming

curl -X POST https://api.tokensurf.io/v1/chat/completions \
  -H "Authorization: Bearer ts_your_key" \
  -H "Content-Type: application/json" \
  -d '{"model": "claude-sonnet-4.6", "messages": [{"role": "user", "content": "Hello"}], "stream": true}'

Example: OpenRouter model

curl -X POST https://api.tokensurf.io/v1/chat/completions \
  -H "Authorization: Bearer ts_your_key" \
  -H "Content-Type: application/json" \
  -d '{"model": "deepseek/deepseek-r1", "messages": [{"role": "user", "content": "Explain quantum computing"}]}'
# Any model with provider/name format → routes via OpenRouter

List Models

GET /v1/models

Returns all supported models with pricing and routing info.

curl https://api.tokensurf.io/v1/models

POST /api/signup

Create a new account. Returns an API key (shown only once) and 1,000 free credits.

curl -X POST https://tokensurf.io/api/signup \
  -H "Content-Type: application/json" \
  -d '{"email": "dev@example.com"}'

Response

{
  "apiKey": "ts_7d96f6aac5009f1b...",
  "apiKeyPrefix": "ts_7d96f6...",
  "credits": 1000,
  "message": "Save your API key — it cannot be recovered."
}

Dashboard

GET /api/dashboard/

Returns your credit balance and usage stats for the current month.

curl https://tokensurf.io/api/dashboard/ \
  -H "Authorization: Bearer ts_your_key"

Manage Provider Keys

GET /api/keys/ — Check which providers are configured

POST /api/keys/ — Save a provider key

DELETE /api/keys/ — Remove a provider key

Save a key

curl -X POST https://tokensurf.io/api/keys/ \
  -H "Authorization: Bearer ts_your_key" \
  -H "Content-Type: application/json" \
  -d '{"provider": "anthropic", "apiKey": "sk-ant-your-key"}'

Valid providers: openai, anthropic, google, openrouter

Rotate API Key

POST /api/keys/ with {"action": "rotate"}

Generates a new API key. Your old key continues to work for 24 hours (grace period).

curl -X POST https://tokensurf.io/api/keys/ \
  -H "Authorization: Bearer YOUR_FIREBASE_ID_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"action": "rotate"}'
# Returns: {"status": "rotated", "apiKey": "ts_new...", "prefix": "ts_abc123..."}

Routing Config

GET /api/routingConfigApi/ — Get current routing configuration

PUT /api/routingConfigApi/ — Update routing configuration

DELETE /api/routingConfigApi/ — Reset to defaults

Configuration fields

Field	Type	Description
`enabled`	boolean	Master switch for backend model routing
`aiClassifier`	boolean	Use Gemini Flash for ambiguous queries
`ambiguousFallback`	`"conservative"` \| `"aggressive"`	How to handle ambiguous queries when AI is off
`modelOverrides`	object	Per-model `{enabled, customTarget}` overrides
`providerEnabled`	object	Enable/disable routing to each provider
`providerPriority`	string[]	Provider preference order for fallback chains

Buy Credits

POST /api/checkout

Creates a Stripe Checkout session. Redirect the user to the returned URL.

curl -X POST https://tokensurf.io/api/checkout \
  -H "Authorization: Bearer ts_your_key" \
  -H "Content-Type: application/json" \
  -d '{"amount": 25}'
# Returns: {"url": "https://checkout.stripe.com/..."}

Amount must be between $5 and $500. $1 = 1,000 credits.

Subscribe to a Plan

POST /api/subscribe

Creates a Stripe Checkout session for a monthly subscription plan.

curl -X POST https://tokensurf.io/api/subscribe \
  -H "Authorization: Bearer ts_your_key" \
  -H "Content-Type: application/json" \
  -d '{"planId": "growth", "annual": false}'
# Returns: {"url": "https://checkout.stripe.com/..."}

Valid plan IDs: growth, scale. The annual flag selects annual billing.

Health Check

GET /api/health

Returns system status, provider health, cache hit rates, and latency percentiles. No authentication required.

curl https://tokensurf.io/api/health
# Returns: {"status":"healthy","region":"us-central1","providers":{...},"metrics":{...}}

Organizations (Teams)

Manage organizations for team-based API key management with per-key budgets and rate limits.

Method	Endpoint	Description
GET	`/api/orgs/`	List your organizations
POST	`/api/orgs/`	Create organization (`{"name": "..."}`)
GET	`/api/orgs/:id`	Get org details + members
PUT	`/api/orgs/:id`	Update org (owner/admin)
DELETE	`/api/orgs/:id`	Delete org (owner only)
POST	`/api/orgs/:id/members`	Add member (`{"email": "...", "role": "member"}`)
DELETE	`/api/orgs/:id/members`	Remove member (`{"userId": "..."}`)

Roles: owner (full control), admin (manage keys + members), member (read-only).

Team API Keys

Create labeled API keys for your organization with per-key budgets, rate limits, and model restrictions.

Method	Endpoint	Description
GET	`/api/org-keys/:orgId`	List org's API keys
POST	`/api/org-keys/:orgId`	Create key (owner/admin)
DELETE	`/api/org-keys/:orgId`	Delete key (owner/admin)

Create team key

curl -X POST https://tokensurf.io/api/org-keys/ORG_ID \
  -H "Authorization: Bearer YOUR_FIREBASE_ID_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"label": "production", "monthlyBudget": 10000, "rpm": 60}'
# Returns: {"apiKey": "ts_org_...", "prefix": "ts_org_abc123..."}

Team keys use the ts_org_ prefix. They consume credits from the organization's balance. Each key can have a monthly budget cap and model allowlist.

Architecture

TokenSurf is a single proxy that sits between your app and LLM providers. Every request flows through this pipeline:

Request Pipeline

// 1. Your app sends a standard OpenAI SDK request
POST /v1/chat/completions
Authorization: Bearer ts_your_key
{ "model": "gpt-4o", "messages": [...] }

// 2. TokenSurf proxy handles it
rate-limit  → In-memory token bucket (60 req/min, 10 req/s burst per key)
auth        → Validate API key (Redis cache → Firestore fallback)
abuse       → Abuse detection (throttle/block on anomalous patterns)
credits     → Redis DECR (atomic, lock-free, ~1ms) with Firestore background sync
classify    → Rule engine (0ms) → classifier cache → AI classifier (~50ms)
route       → If simple + downgrade target exists: swap model
circuit-brk → Check provider health → fallback to alternative provider if down
forward     → Pooled HTTP connection with retry (2 retries, exponential backoff)
translate   → Convert response to OpenAI format (if Anthropic or Google)
quality     → 5% of responses scored async by Gemini Flash-Lite (1-10 scale)
log         → Structured logging + async usage aggregation

// 3. Response returned to your app in OpenAI format
+ X-TokenSurf-Model: gpt-4o-mini
+ X-TokenSurf-Downgraded: true
+ X-TokenSurf-Complexity: simple
+ X-TokenSurf-Request-Id: a1b2c3d4...
+ X-TokenSurf-Region: us-central1

Complexity Classification

The classifier runs in two stages. It's conservative by design — when uncertain, it keeps your original model.

Signal	Result	What triggers it
Tools / function calling	Complex	Any request with `tools` parameter
Structured output	Complex	Any request with `response_format`
Code patterns	Complex	"analyze", "implement", "refactor", "debug", code blocks
Long conversation	Complex	6+ messages in the conversation
Long message	Complex	500+ estimated tokens in last user message
Factual question	Simple	"What is", "Define", "Translate", "Calculate"
Very short query	Simple	Under 50 tokens, 1-2 messages
Everything else	Ambiguous	Sent to Gemini Flash AI classifier or treated as complex

Provider Translation

You always send and receive the OpenAI format. TokenSurf translates internally:

Provider	Request translation	Response translation
OpenAI	Pass-through	Pass-through
Anthropic	Extract `system` messages, merge consecutive roles, ensure first message is `user`	Map `end_turn` → `stop`, reconstruct `choices` array
Google	System → `systemInstruction`, `assistant` → `model` role	Map `STOP`/`MAX_TOKENS`/`SAFETY` finish reasons
OpenRouter	Pass-through (OpenAI-compatible)	Pass-through

Security

Your TokenSurf API key: Only the SHA-256 hash is stored. Plaintext is shown once at signup, then deleted.
Provider API keys: Encrypted with AES-256-GCM at rest. Decrypted only in-memory when forwarding a request.
Credits: Deducted via atomic Redis DECR with Firestore background sync. Refunded automatically on provider errors.
Key rotation: Generate a new key with 24-hour grace period for the old key.
Abuse detection: Automatic throttling and blocking on anomalous request patterns.
Audit logging: All security events (key changes, config updates, purchases) logged to Cloud Logging.

Streaming

Streaming ("stream": true) is fully supported across all providers. SSE events from Anthropic and Google are translated in real-time to the OpenAI chat.completion.chunk format.

Resilience

TokenSurf is built for millions of requests per month with multiple layers of fault tolerance:

Layer	Mechanism	Details
Rate Limiting	Token bucket	60 req/min, 10 req/sec burst per API key. Returns `429` with `Retry-After` header.
Circuit Breaker	Per-provider state machine	CLOSED → OPEN (fail fast for 30s) → HALF_OPEN (probe) → CLOSED. Triggers on 5+ failures in 60s.
Retry	Exponential backoff	2 retries with jitter on 429/500/502/503. Respects `Retry-After` headers.
Fallback Chains	Cross-provider equivalences	When a provider is down, routes to an equivalent model on another provider (e.g. gpt-4o → claude-sonnet-4-6).
Connection Pooling	undici HTTP pools	Persistent TCP/TLS connections to all providers. Saves 50-100ms per request.
Abuse Detection	Behavioral analysis	Throttles on high request rates (>600/hour) or error rates (>50%). Escalates to key blocking.

Caching

Redis (Memorystore) caching eliminates Firestore from the hot path. All caching is transparent and gracefully degrades if Redis is unavailable.

Cache	Key	TTL	Impact
Auth	`apikey:{hash}`	5 min	90%+ of Firestore auth queries eliminated (50ms → 1ms)
Credits	`credits:{userId}`	10 min	Atomic Redis DECR replaces Firestore transactions (30ms → 1ms)
Classifier	`classify:{hash}`	1 hour	Skips Gemini AI call for repeated ambiguous queries (~200ms saved)

Cache is invalidated on: credit top-ups, key rotation, provider key changes, and routing config updates.

Quality Scoring

TokenSurf automatically samples 5% of non-streaming responses and scores them for quality using Gemini Flash-Lite. This helps you verify that downgraded models still meet your quality bar.

Score	Rating	Meaning
9-10	Excellent	Comprehensive, accurate, well-structured
7-8	Good	Mostly correct with minor issues
4-6	Fair	Partially correct or vague
1-3	Poor	Incorrect, irrelevant, or harmful

Quality scores are aggregated per model per month and visible in your dashboard. Downgraded responses are tracked separately so you can compare original vs routed model quality.

How Routing Works

Every request goes through a two-stage classifier:

Rule-based pre-filter (0ms) — catches obvious simple/complex queries using pattern matching:
- SIMPLE: Short factual questions, translations, calculations, definitions
- COMPLEX: Code blocks, multi-step reasoning, tool calls, JSON mode, long prompts (500+ tokens), long conversations (6+ messages)
AI classifier — for ambiguous queries, a Gemini Flash-Lite call classifies in <3 seconds. If it times out, defaults to COMPLEX.

Conservative by design: When in doubt, the backend keeps your original model. Complex queries are never downgraded.

Routing Table

Simple queries get downgraded to a lighter model. Complex queries and already-light models pass through unchanged. This is the existing backend's model-mapping behavior.

Provider	Model	Simple → Routes To
OpenAI	`gpt-4o`	`gpt-4o-mini`
	`gpt-4-turbo`	`gpt-4o-mini`
	`gpt-4`	`gpt-4o-mini`
	`gpt-4o-mini` / `gpt-3.5-turbo`	pass-through
Anthropic	`claude-opus-4.6` / `4.5`	`claude-haiku-4.5`
	`claude-sonnet-4.6` / `4.5`	`claude-haiku-4.5`
	`claude-opus-4.1` / `4.0`	`claude-haiku-4.5`
	`claude-sonnet-4.0`	`claude-haiku-4.5`
	`claude-haiku-*`	pass-through
Google	`gemini-3.1-pro-preview`	`gemini-2.5-flash`
	`gemini-2.5-pro`	`gemini-2.5-flash`
	`gemini--flash`	pass-through
OpenRouter	Any `provider/model` format	pass-through (300+ models)

Fallback Chains

When a provider is unavailable (circuit breaker open or persistent 5xx), TokenSurf automatically routes to an equivalent model on another provider. Fallback order follows your providerPriority setting.

Primary Model	Anthropic Fallback	Google Fallback
`gpt-4o`	`claude-sonnet-4-6`	`gemini-2.5-pro`
`gpt-4o-mini`	`claude-haiku-4-5`	`gemini-2.5-flash`

Primary Model	OpenAI Fallback	Google Fallback
`claude-sonnet-4-6`	`gpt-4o`	`gemini-2.5-pro`
`claude-haiku-4-5`	`gpt-4o-mini`	`gemini-2.5-flash`

Fallback only triggers when the provider is fully down (not for 4xx client errors). The X-TokenSurf-Fallback: true header indicates a fallback was used.

Response Headers

Every proxy response includes these headers:

Header	Value	Description
`X-TokenSurf-Model`	`gpt-4o-mini`	The model that actually served the request
`X-TokenSurf-Downgraded`	`true` / `false`	Whether the model was downgraded
`X-TokenSurf-Complexity`	`simple` / `complex`	How the query was classified
`X-TokenSurf-Request-Id`	`a1b2c3d4-...`	Unique ID for tracing and support
`X-TokenSurf-Region`	`us-central1`	Which region served the request
`X-TokenSurf-Fallback`	`true`	Present when a fallback provider was used

OpenAI

Requests for OpenAI models are forwarded directly to api.openai.com. Format is pass-through — no translation needed.

Models: gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-4, gpt-3.5-turbo

Required key: openai

Anthropic

Requests are translated from OpenAI format to the Anthropic Messages API. System messages are extracted into the system parameter. Streaming events are transformed to OpenAI SSE format.

Models: claude-opus-4.6, claude-sonnet-4.6, claude-haiku-4.5, claude-opus-4.5, claude-sonnet-4.5, claude-opus-4.1, claude-sonnet-4.0, claude-opus-4.0, claude-haiku-3.5

Required key: anthropic

Google Gemini

Requests are translated to Gemini's generateContent format. System messages become systemInstruction. Roles are mapped (assistant → model).

Models: gemini-3.1-pro-preview, gemini-3-flash-preview, gemini-3.1-flash-lite-preview, gemini-2.5-pro, gemini-2.5-flash, gemini-2.5-flash-lite

Required key: google

OpenRouter

Any model ID containing a / (e.g. deepseek/deepseek-r1) is automatically routed through OpenRouter. Format is OpenAI-compatible — no translation needed.

Popular models: meta-llama/llama-3.3-70b-instruct, meta-llama/llama-4-maverick, deepseek/deepseek-chat, deepseek/deepseek-r1, mistralai/mistral-large-latest, qwen/qwen-2.5-72b-instruct, cohere/command-r-plus

Required key: openrouter

See all 300+ models at openrouter.ai/models

Python

from openai import OpenAI

client = OpenAI(
    api_key="ts_your_key",
    base_url="https://api.tokensurf.io/v1"
)

# Works with any supported model
response = client.chat.completions.create(
    model="gpt-4o",  # or claude-sonnet-4.6, gemini-2.5-pro, deepseek/deepseek-r1
    messages=[{"role": "user", "content": "Hello"}]
)

# Check if downgraded
# response.headers["X-TokenSurf-Downgraded"]

Node.js

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "ts_your_key",
  baseURL: "https://api.tokensurf.io/v1",
});

const response = await client.chat.completions.create({
  model: "claude-opus-4.6",
  messages: [{ role: "user", content: "Hello" }],
});

cURL

curl https://api.tokensurf.io/v1/chat/completions \
  -H "Authorization: Bearer ts_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-2.5-pro",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Error Codes

HTTP	Type	Meaning
`400`	`invalid_request_error`	Missing model/messages, unsupported model, no provider key, body too large, or model not allowed for org key
`401`	`authentication_error`	Missing or invalid API key (`ts_` or `ts_org_`)
`402`	`insufficient_credits`	No credits remaining, or org key monthly budget exhausted
`403`	`invalid_request_error`	Model not in org key's allowlist
`405`	`invalid_request_error`	Wrong HTTP method
`409`	—	Email already registered (signup), or member already in org
`429`	`rate_limit_error`	Rate limit exceeded (per-key bucket) or abuse detection throttle. Check `Retry-After` header.
`502`	`provider_error`	Upstream provider failed after retries — credit is automatically refunded
`503`	`provider_unavailable`	Provider circuit breaker is open and no fallback configured — credit refunded

Provider errors (502/503): If the upstream provider fails, your credit is automatically refunded. You only pay for successful requests. The proxy retries up to 2 times with exponential backoff before returning an error.

TokenSurf Cloud Routing API

Quickstart

Authentication

Provider Keys

Credits & Billing

Chat Completions

Request body

Example: non-streaming

Example: streaming

Example: OpenRouter model

List Models

Signup

Dashboard

Manage Provider Keys

Save a key

Rotate API Key

Routing Config

Configuration fields

Buy Credits

Subscribe to a Plan

Health Check

Organizations (Teams)

Team API Keys

Create team key

Architecture

Request Pipeline

Complexity Classification

Provider Translation

Security

Streaming

Resilience

Caching

Quality Scoring

How Routing Works

Routing Table

Fallback Chains

Response Headers

OpenAI

Anthropic

Google Gemini

OpenRouter

Python

Node.js

cURL

Error Codes