TokenSurf Cloud Routing API
Reference for the existing TokenSurf Cloud routing API — the managed, OpenAI-compatible proxy backend. Point your OpenAI SDK at the base URL below and your existing code works unchanged.
https://api.tokensurf.io/v1All endpoints follow the OpenAI API spec. Your existing code works unchanged.
Quickstart
1. Sign up — get 1,000 free credits:
curl -X POST https://tokensurf.io/api/signup \ -H "Content-Type: application/json" \ -d '{"email": "you@company.com"}'
2. Save your API key (starts with ts_, shown only once).
3. Add your provider key (e.g. your OpenAI key):
curl -X POST https://tokensurf.io/api/keys/ \ -H "Authorization: Bearer ts_your_key" \ -H "Content-Type: application/json" \ -d '{"provider": "openai", "apiKey": "sk-your-openai-key"}'
4. Use it — change your base URL:
from openai import OpenAI client = OpenAI( api_key="ts_your_key", base_url="https://api.tokensurf.io/v1") response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "What is 2+2?"}] ) # Simple query → auto-routed to gpt-4o-mini by the backend # Check response headers: X-TokenSurf-Downgraded: true
Authentication
All requests require a Bearer token in the Authorization header:
Authorization: Bearer ts_your_api_key_here
API keys start with ts_ and are generated at signup. We store only the SHA-256 hash — if you lose your key, you'll need to create a new account.
Provider Keys
TokenSurf doesn't call LLMs directly. You bring your own API keys for each provider. Keys are encrypted with AES-256-GCM before storage.
Supported providers:
| Provider | Key format | Get one |
|---|---|---|
| OpenAI | sk-... | platform.openai.com |
| Anthropic | sk-ant-... | console.anthropic.com |
AIza... | aistudio.google.com | |
| OpenRouter | sk-or-... | openrouter.ai/keys |
Credits & Billing
1 credit = 1 routed request, regardless of tokens or model. The backend tracks credits per account.
- Each account has a credit balance that is decremented per routed request
- When credits hit 0, requests return
402 - If a provider call fails, the credit is refunded automatically
Chat Completions
POST /v1/chat/completions
100% OpenAI-compatible. Supports streaming, tool calls, JSON mode.
Request body
| Field | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model ID (e.g. gpt-4o, claude-sonnet-4.6, deepseek/deepseek-r1) |
messages | array | Yes | Array of message objects (role, content) |
stream | boolean | No | Enable SSE streaming (default: false) |
temperature | number | No | Sampling temperature |
max_tokens | integer | No | Maximum output tokens |
tools | array | No | Tool/function definitions (forces COMPLEX routing) |
response_format | object | No | JSON mode (forces COMPLEX routing) |
Example: non-streaming
curl -X POST https://api.tokensurf.io/v1/chat/completions \ -H "Authorization: Bearer ts_your_key" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-4o", "messages": [{"role": "user", "content": "What is the capital of France?"}] }'
Response
{
"id": "chatcmpl-abc123",
"model": "gpt-4o-mini", // ← downgraded for simple query
"choices": [{
"message": { "role": "assistant", "content": "Paris." },
"finish_reason": "stop"
}],
"usage": { "prompt_tokens": 14, "completion_tokens": 2, "total_tokens": 16 }
}
Example: streaming
curl -X POST https://api.tokensurf.io/v1/chat/completions \ -H "Authorization: Bearer ts_your_key" \ -H "Content-Type: application/json" \ -d '{"model": "claude-sonnet-4.6", "messages": [{"role": "user", "content": "Hello"}], "stream": true}'
Example: OpenRouter model
curl -X POST https://api.tokensurf.io/v1/chat/completions \ -H "Authorization: Bearer ts_your_key" \ -H "Content-Type: application/json" \ -d '{"model": "deepseek/deepseek-r1", "messages": [{"role": "user", "content": "Explain quantum computing"}]}' # Any model with provider/name format → routes via OpenRouter
List Models
GET /v1/models
Returns all supported models with pricing and routing info.
curl https://api.tokensurf.io/v1/models
Signup
POST /api/signup
Create a new account. Returns an API key (shown only once) and 1,000 free credits.
curl -X POST https://tokensurf.io/api/signup \ -H "Content-Type: application/json" \ -d '{"email": "dev@example.com"}'
Response
{
"apiKey": "ts_7d96f6aac5009f1b...",
"apiKeyPrefix": "ts_7d96f6...",
"credits": 1000,
"message": "Save your API key — it cannot be recovered."
}
Dashboard
GET /api/dashboard/
Returns your credit balance and usage stats for the current month.
curl https://tokensurf.io/api/dashboard/ \
-H "Authorization: Bearer ts_your_key"
Manage Provider Keys
GET /api/keys/ — Check which providers are configured
POST /api/keys/ — Save a provider key
DELETE /api/keys/ — Remove a provider key
Save a key
curl -X POST https://tokensurf.io/api/keys/ \ -H "Authorization: Bearer ts_your_key" \ -H "Content-Type: application/json" \ -d '{"provider": "anthropic", "apiKey": "sk-ant-your-key"}'
Valid providers: openai, anthropic, google, openrouter
Rotate API Key
POST /api/keys/ with {"action": "rotate"}
Generates a new API key. Your old key continues to work for 24 hours (grace period).
curl -X POST https://tokensurf.io/api/keys/ \ -H "Authorization: Bearer YOUR_FIREBASE_ID_TOKEN" \ -H "Content-Type: application/json" \ -d '{"action": "rotate"}' # Returns: {"status": "rotated", "apiKey": "ts_new...", "prefix": "ts_abc123..."}
Routing Config
GET /api/routingConfigApi/ — Get current routing configuration
PUT /api/routingConfigApi/ — Update routing configuration
DELETE /api/routingConfigApi/ — Reset to defaults
Configuration fields
| Field | Type | Description |
|---|---|---|
enabled | boolean | Master switch for backend model routing |
aiClassifier | boolean | Use Gemini Flash for ambiguous queries |
ambiguousFallback | "conservative" | "aggressive" | How to handle ambiguous queries when AI is off |
modelOverrides | object | Per-model {enabled, customTarget} overrides |
providerEnabled | object | Enable/disable routing to each provider |
providerPriority | string[] | Provider preference order for fallback chains |
Buy Credits
POST /api/checkout
Creates a Stripe Checkout session. Redirect the user to the returned URL.
curl -X POST https://tokensurf.io/api/checkout \ -H "Authorization: Bearer ts_your_key" \ -H "Content-Type: application/json" \ -d '{"amount": 25}' # Returns: {"url": "https://checkout.stripe.com/..."}
Amount must be between $5 and $500. $1 = 1,000 credits.
Subscribe to a Plan
POST /api/subscribe
Creates a Stripe Checkout session for a monthly subscription plan.
curl -X POST https://tokensurf.io/api/subscribe \ -H "Authorization: Bearer ts_your_key" \ -H "Content-Type: application/json" \ -d '{"planId": "growth", "annual": false}' # Returns: {"url": "https://checkout.stripe.com/..."}
Valid plan IDs: growth, scale. The annual flag selects annual billing.
Health Check
GET /api/health
Returns system status, provider health, cache hit rates, and latency percentiles. No authentication required.
curl https://tokensurf.io/api/health
# Returns: {"status":"healthy","region":"us-central1","providers":{...},"metrics":{...}}
Organizations (Teams)
Manage organizations for team-based API key management with per-key budgets and rate limits.
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/orgs/ | List your organizations |
| POST | /api/orgs/ | Create organization ({"name": "..."}) |
| GET | /api/orgs/:id | Get org details + members |
| PUT | /api/orgs/:id | Update org (owner/admin) |
| DELETE | /api/orgs/:id | Delete org (owner only) |
| POST | /api/orgs/:id/members | Add member ({"email": "...", "role": "member"}) |
| DELETE | /api/orgs/:id/members | Remove member ({"userId": "..."}) |
Roles: owner (full control), admin (manage keys + members), member (read-only).
Team API Keys
Create labeled API keys for your organization with per-key budgets, rate limits, and model restrictions.
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/org-keys/:orgId | List org's API keys |
| POST | /api/org-keys/:orgId | Create key (owner/admin) |
| DELETE | /api/org-keys/:orgId | Delete key (owner/admin) |
Create team key
curl -X POST https://tokensurf.io/api/org-keys/ORG_ID \ -H "Authorization: Bearer YOUR_FIREBASE_ID_TOKEN" \ -H "Content-Type: application/json" \ -d '{"label": "production", "monthlyBudget": 10000, "rpm": 60}' # Returns: {"apiKey": "ts_org_...", "prefix": "ts_org_abc123..."}
Team keys use the ts_org_ prefix. They consume credits from the organization's balance. Each key can have a monthly budget cap and model allowlist.
Architecture
TokenSurf is a single proxy that sits between your app and LLM providers. Every request flows through this pipeline:
Request Pipeline
// 1. Your app sends a standard OpenAI SDK request POST /v1/chat/completions Authorization: Bearer ts_your_key { "model": "gpt-4o", "messages": [...] } // 2. TokenSurf proxy handles it rate-limit → In-memory token bucket (60 req/min, 10 req/s burst per key) auth → Validate API key (Redis cache → Firestore fallback) abuse → Abuse detection (throttle/block on anomalous patterns) credits → Redis DECR (atomic, lock-free, ~1ms) with Firestore background sync classify → Rule engine (0ms) → classifier cache → AI classifier (~50ms) route → If simple + downgrade target exists: swap model circuit-brk → Check provider health → fallback to alternative provider if down forward → Pooled HTTP connection with retry (2 retries, exponential backoff) translate → Convert response to OpenAI format (if Anthropic or Google) quality → 5% of responses scored async by Gemini Flash-Lite (1-10 scale) log → Structured logging + async usage aggregation // 3. Response returned to your app in OpenAI format + X-TokenSurf-Model: gpt-4o-mini + X-TokenSurf-Downgraded: true + X-TokenSurf-Complexity: simple + X-TokenSurf-Request-Id: a1b2c3d4... + X-TokenSurf-Region: us-central1
Complexity Classification
The classifier runs in two stages. It's conservative by design — when uncertain, it keeps your original model.
| Signal | Result | What triggers it |
|---|---|---|
| Tools / function calling | Complex | Any request with tools parameter |
| Structured output | Complex | Any request with response_format |
| Code patterns | Complex | "analyze", "implement", "refactor", "debug", code blocks |
| Long conversation | Complex | 6+ messages in the conversation |
| Long message | Complex | 500+ estimated tokens in last user message |
| Factual question | Simple | "What is", "Define", "Translate", "Calculate" |
| Very short query | Simple | Under 50 tokens, 1-2 messages |
| Everything else | Ambiguous | Sent to Gemini Flash AI classifier or treated as complex |
Provider Translation
You always send and receive the OpenAI format. TokenSurf translates internally:
| Provider | Request translation | Response translation |
|---|---|---|
| OpenAI | Pass-through | Pass-through |
| Anthropic | Extract system messages, merge consecutive roles, ensure first message is user | Map end_turn → stop, reconstruct choices array |
System → systemInstruction, assistant → model role | Map STOP/MAX_TOKENS/SAFETY finish reasons | |
| OpenRouter | Pass-through (OpenAI-compatible) | Pass-through |
Security
- Your TokenSurf API key: Only the SHA-256 hash is stored. Plaintext is shown once at signup, then deleted.
- Provider API keys: Encrypted with AES-256-GCM at rest. Decrypted only in-memory when forwarding a request.
- Credits: Deducted via atomic Redis DECR with Firestore background sync. Refunded automatically on provider errors.
- Key rotation: Generate a new key with 24-hour grace period for the old key.
- Abuse detection: Automatic throttling and blocking on anomalous request patterns.
- Audit logging: All security events (key changes, config updates, purchases) logged to Cloud Logging.
Streaming
Streaming ("stream": true) is fully supported across all providers. SSE events from Anthropic and Google are translated in real-time to the OpenAI chat.completion.chunk format.
Resilience
TokenSurf is built for millions of requests per month with multiple layers of fault tolerance:
| Layer | Mechanism | Details |
|---|---|---|
| Rate Limiting | Token bucket | 60 req/min, 10 req/sec burst per API key. Returns 429 with Retry-After header. |
| Circuit Breaker | Per-provider state machine | CLOSED → OPEN (fail fast for 30s) → HALF_OPEN (probe) → CLOSED. Triggers on 5+ failures in 60s. |
| Retry | Exponential backoff | 2 retries with jitter on 429/500/502/503. Respects Retry-After headers. |
| Fallback Chains | Cross-provider equivalences | When a provider is down, routes to an equivalent model on another provider (e.g. gpt-4o → claude-sonnet-4-6). |
| Connection Pooling | undici HTTP pools | Persistent TCP/TLS connections to all providers. Saves 50-100ms per request. |
| Abuse Detection | Behavioral analysis | Throttles on high request rates (>600/hour) or error rates (>50%). Escalates to key blocking. |
Caching
Redis (Memorystore) caching eliminates Firestore from the hot path. All caching is transparent and gracefully degrades if Redis is unavailable.
| Cache | Key | TTL | Impact |
|---|---|---|---|
| Auth | apikey:{hash} | 5 min | 90%+ of Firestore auth queries eliminated (50ms → 1ms) |
| Credits | credits:{userId} | 10 min | Atomic Redis DECR replaces Firestore transactions (30ms → 1ms) |
| Classifier | classify:{hash} | 1 hour | Skips Gemini AI call for repeated ambiguous queries (~200ms saved) |
Cache is invalidated on: credit top-ups, key rotation, provider key changes, and routing config updates.
Quality Scoring
TokenSurf automatically samples 5% of non-streaming responses and scores them for quality using Gemini Flash-Lite. This helps you verify that downgraded models still meet your quality bar.
| Score | Rating | Meaning |
|---|---|---|
| 9-10 | Excellent | Comprehensive, accurate, well-structured |
| 7-8 | Good | Mostly correct with minor issues |
| 4-6 | Fair | Partially correct or vague |
| 1-3 | Poor | Incorrect, irrelevant, or harmful |
Quality scores are aggregated per model per month and visible in your dashboard. Downgraded responses are tracked separately so you can compare original vs routed model quality.
How Routing Works
Every request goes through a two-stage classifier:
- Rule-based pre-filter (0ms) — catches obvious simple/complex queries using pattern matching:
- SIMPLE: Short factual questions, translations, calculations, definitions
- COMPLEX: Code blocks, multi-step reasoning, tool calls, JSON mode, long prompts (500+ tokens), long conversations (6+ messages)
- AI classifier — for ambiguous queries, a Gemini Flash-Lite call classifies in <3 seconds. If it times out, defaults to COMPLEX.
Routing Table
Simple queries get downgraded to a lighter model. Complex queries and already-light models pass through unchanged. This is the existing backend's model-mapping behavior.
| Provider | Model | Simple → Routes To |
|---|---|---|
| OpenAI | gpt-4o | gpt-4o-mini |
gpt-4-turbo | gpt-4o-mini | |
gpt-4 | gpt-4o-mini | |
gpt-4o-mini / gpt-3.5-turbo | pass-through | |
| Anthropic | claude-opus-4.6 / 4.5 | claude-haiku-4.5 |
claude-sonnet-4.6 / 4.5 | claude-haiku-4.5 | |
claude-opus-4.1 / 4.0 | claude-haiku-4.5 | |
claude-sonnet-4.0 | claude-haiku-4.5 | |
claude-haiku-* | pass-through | |
gemini-3.1-pro-preview | gemini-2.5-flash | |
gemini-2.5-pro | gemini-2.5-flash | |
gemini-*-flash* | pass-through | |
| OpenRouter | Any provider/model format | pass-through (300+ models) |
Fallback Chains
When a provider is unavailable (circuit breaker open or persistent 5xx), TokenSurf automatically routes to an equivalent model on another provider. Fallback order follows your providerPriority setting.
| Primary Model | Anthropic Fallback | Google Fallback |
|---|---|---|
gpt-4o | claude-sonnet-4-6 | gemini-2.5-pro |
gpt-4o-mini | claude-haiku-4-5 | gemini-2.5-flash |
| Primary Model | OpenAI Fallback | Google Fallback |
|---|---|---|
claude-sonnet-4-6 | gpt-4o | gemini-2.5-pro |
claude-haiku-4-5 | gpt-4o-mini | gemini-2.5-flash |
Fallback only triggers when the provider is fully down (not for 4xx client errors). The X-TokenSurf-Fallback: true header indicates a fallback was used.
Response Headers
Every proxy response includes these headers:
| Header | Value | Description |
|---|---|---|
X-TokenSurf-Model | gpt-4o-mini | The model that actually served the request |
X-TokenSurf-Downgraded | true / false | Whether the model was downgraded |
X-TokenSurf-Complexity | simple / complex | How the query was classified |
X-TokenSurf-Request-Id | a1b2c3d4-... | Unique ID for tracing and support |
X-TokenSurf-Region | us-central1 | Which region served the request |
X-TokenSurf-Fallback | true | Present when a fallback provider was used |
OpenAI
Requests for OpenAI models are forwarded directly to api.openai.com. Format is pass-through — no translation needed.
Models: gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-4, gpt-3.5-turbo
Required key: openai
Anthropic
Requests are translated from OpenAI format to the Anthropic Messages API. System messages are extracted into the system parameter. Streaming events are transformed to OpenAI SSE format.
Models: claude-opus-4.6, claude-sonnet-4.6, claude-haiku-4.5, claude-opus-4.5, claude-sonnet-4.5, claude-opus-4.1, claude-sonnet-4.0, claude-opus-4.0, claude-haiku-3.5
Required key: anthropic
Google Gemini
Requests are translated to Gemini's generateContent format. System messages become systemInstruction. Roles are mapped (assistant → model).
Models: gemini-3.1-pro-preview, gemini-3-flash-preview, gemini-3.1-flash-lite-preview, gemini-2.5-pro, gemini-2.5-flash, gemini-2.5-flash-lite
Required key: google
OpenRouter
Any model ID containing a / (e.g. deepseek/deepseek-r1) is automatically routed through OpenRouter. Format is OpenAI-compatible — no translation needed.
Popular models: meta-llama/llama-3.3-70b-instruct, meta-llama/llama-4-maverick, deepseek/deepseek-chat, deepseek/deepseek-r1, mistralai/mistral-large-latest, qwen/qwen-2.5-72b-instruct, cohere/command-r-plus
Required key: openrouter
See all 300+ models at openrouter.ai/models
Python
from openai import OpenAI client = OpenAI( api_key="ts_your_key", base_url="https://api.tokensurf.io/v1" ) # Works with any supported model response = client.chat.completions.create( model="gpt-4o", # or claude-sonnet-4.6, gemini-2.5-pro, deepseek/deepseek-r1 messages=[{"role": "user", "content": "Hello"}] ) # Check if downgraded # response.headers["X-TokenSurf-Downgraded"]
Node.js
import OpenAI from "openai"; const client = new OpenAI({ apiKey: "ts_your_key", baseURL: "https://api.tokensurf.io/v1", }); const response = await client.chat.completions.create({ model: "claude-opus-4.6", messages: [{ role: "user", content: "Hello" }], });
cURL
curl https://api.tokensurf.io/v1/chat/completions \ -H "Authorization: Bearer ts_your_key" \ -H "Content-Type: application/json" \ -d '{ "model": "gemini-2.5-pro", "messages": [{"role": "user", "content": "Hello"}] }'
Error Codes
| HTTP | Type | Meaning |
|---|---|---|
400 | invalid_request_error | Missing model/messages, unsupported model, no provider key, body too large, or model not allowed for org key |
401 | authentication_error | Missing or invalid API key (ts_ or ts_org_) |
402 | insufficient_credits | No credits remaining, or org key monthly budget exhausted |
403 | invalid_request_error | Model not in org key's allowlist |
405 | invalid_request_error | Wrong HTTP method |
409 | — | Email already registered (signup), or member already in org |
429 | rate_limit_error | Rate limit exceeded (per-key bucket) or abuse detection throttle. Check Retry-After header. |
502 | provider_error | Upstream provider failed after retries — credit is automatically refunded |
503 | provider_unavailable | Provider circuit breaker is open and no fallback configured — credit refunded |