Overview
0 credits
Get started with TokenSurf
Complete these steps to start saving on LLM costs.
Create account — get your API key
Done
Add a provider key — OpenAI, Anthropic, Google, or OpenRouter
Add key
Make your first request — swap your base URL and send a query
See code
Check your savings — see how much you're saving on the Usage page
View
Credits Remaining
0
$0.001 per request
Total Savings
$0.00
This month
Requests
0
0% auto-routed
Cost Saved
$0.00
vs $0.00 direct

Provider Status

OA
OpenAI
Not set
AN
Anthropic
Not set
GG
Google
Not set
OR
OpenRouter
Not set

Recent Activity

No requests yet. Make your first API call to see activity here.

Total Requests
0
This month
Downgraded
0
0% of requests
Original Cost
$0.00
Without routing
Actual Cost
$0.00
$0.00 saved

Cost Breakdown

Routing intelligence saves you money by sending simple queries to cheaper models.

MetricValue
Total Requests0
Prompt Tokens0
Completion Tokens0
Original Cost$0.0000
Actual Cost$0.0000
Total Savings$0.0000
Auto-Routed Requests0
Downgrade Rate0%

Request Logs

Recent API requests with routing decisions. Logs are available via response headers on each request.

No logs yet

Make your first API request to see logs here. Each response includes X-TokenSurf-Model and X-TokenSurf-Downgraded headers.

Response Headers Reference

Every proxy response includes these headers for debugging:

HeaderDescriptionExample
X-TokenSurf-ModelFinal model usedgpt-4o-mini
X-TokenSurf-DowngradedWas the request routed to a cheaper model?true
X-TokenSurf-ComplexityClassified complexity levelsimple

1. Your API Key

Use this as your api_key in the OpenAI SDK.

ts_...
Current key: ts_...

2. Base URL

Point your OpenAI SDK to this base URL:

https://api.tokensurf.io/v1

3. Add a Provider Key

Go to Providers and add at least one API key (OpenAI, Anthropic, Google, or OpenRouter).

4. Drop-in Replacement

Replace your base URL. That's it. Same SDK, same code, lower costs.

Python Node.js cURL
from openai import OpenAI client = OpenAI( api_key="ts_...", base_url="https://api.tokensurf.io/v1" ) response = client.chat.completions.create( model="gpt-4o", # will auto-route simple queries to gpt-4o-mini messages=[{"role": "user", "content": "Hello!"}] ) print(response.choices[0].message.content)

5. Check Response Headers

# Check response headers print(response.headers["X-TokenSurf-Model"]) # gpt-4o-mini print(response.headers["X-TokenSurf-Downgraded"]) # true print(response.headers["X-TokenSurf-Complexity"]) # simple
Connect at least one provider to start routing requests.

Provider API Keys

Add your provider keys. They're encrypted with AES-256-GCM at rest. TokenSurf never stores plaintext keys.

OA
OpenAI
Not set
AN
Anthropic
Not set
GG
Google Gemini
Not set
OR
OpenRouter
Not set

How Routing Works

TokenSurf analyzes each request's complexity and routes it to the cheapest compatible model:

If you requestSimple queries route toYou save
gpt-4ogpt-4o-mini~90%
claude-sonnet-4-6claude-haiku-4-5~85%
gemini-2.5-progemini-2.0-flash~80%
gpt-4-turbogpt-4o-mini~95%
All OpenAI Anthropic Google OpenRouter
ModelProviderInput $/1MOutput $/1MDowngrades to
Credit Balance
0
1 credit = 1 request
Total Saved
$0.00
Lifetime savings

Top Up Credits

Pay as you go. Credits never expire. Powered by Stripe.

$5
5,000 credits
$0.001 / request
$100
100,000 credits
$0.001 / request

Pricing

Simple, transparent pricing. You only pay for API requests, not for tokens.

ItemPriceNotes
API Request1 creditRegardless of tokens or model
Credit Cost$0.001$1 = 1,000 requests
Free Tier1,000 creditsOn signup, no card required
Provider CostsYour keysYou pay providers directly via your own keys
Loading routing configuration...

Routing Engine

Control exactly how TokenSurf routes your requests. Changes take effect immediately. View model catalog

Smart Routing
Master switch. When off, requests always use the exact model you specify — no downgrades, no cost savings.
AI Classifier (Gemini)
Uses Gemini Flash Lite to classify ambiguous queries. Adds ~50ms latency but improves routing accuracy. Requires a Google provider key.
Ambiguous Query Fallback
When the rule-based classifier can't decide and AI classifier is off — what should happen?

Provider Routing

Enable or disable routing to each provider without removing your API keys. Disabled providers will reject requests for their models.

OpenAI
gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-3.5-turbo
Anthropic
claude-opus, claude-sonnet, claude-haiku families
Google Gemini
gemini-2.5-pro, gemini-2.5-flash, gemini-3.x previews
OpenRouter
Llama, DeepSeek, Mistral, Qwen, Cohere, and 300+ models

Model Downgrade Rules

Fine-tune which models get downgraded and where they route to. Only models with a default downgrade target are shown.

Model Downgrade Routes to Savings

How Classification Works

Understanding the routing pipeline helps you tune it for your workload.

// 1. Rule-based classifier runs first (0ms) if (hasTools || hasResponseFormat) → "complex" // never downgrade if (matchesComplexPattern(lastMessage)) → "complex" // code, analyze, etc. if (messages >= 6 || tokens >= 500) → "complex" // long context if (tokens <= 50 && messages <= 2) → "simple" // short question if (matchesSimplePattern(lastMessage)) → "simple" // "what is", "define" else"ambiguous" // needs AI or fallback // 2. If ambiguous + AI classifier enabled → Gemini classifies (~50ms) // 3. If ambiguous + AI disabled → uses your fallback setting // 4. If simple + model has downgrade target → route to cheaper model // 5. Your per-model overrides apply last
SignalClassified asWhat triggers it
Tools / function callingComplexAny request with tools parameter
Structured outputComplexAny request with response_format
Code patternsComplex"analyze", "implement", "refactor", code blocks
Long conversationComplex6+ messages in the conversation
Long messageComplex500+ estimated tokens in last user message
Factual questionSimple"What is", "Define", "Translate", "Calculate"
Very short querySimpleUnder 50 tokens, 1-2 messages
Everything elseAmbiguousHandled by AI classifier or your fallback

Actions

Model Quality Scores

Auto-sampled

5% of API responses are automatically scored for quality using Gemini Flash-Lite. Scores help you verify that downgraded models still meet your quality bar.

Quality scores will appear after your first requests are sampled.

Score Scale

ScoreRatingMeaning
9-10ExcellentComprehensive, accurate, well-structured response
7-8GoodMostly correct with minor issues
4-6FairPartially correct or vague
1-3PoorIncorrect, irrelevant, or harmful

How It Works

For each sampled response, TokenSurf sends the original prompt and the model's response to Gemini Flash-Lite, which scores it on accuracy, completeness, relevance, and helpfulness. Downgraded responses are tracked separately so you can compare quality between your original and routed models.

System Health

Error Rate
0%
Proxy p95
Cache Hit
Redis

Provider Health

Circuit breaker status per provider. When a provider has repeated failures, the circuit opens and requests fail fast or route to fallback providers.

Response Headers

Every proxy response includes these headers for debugging and observability:

HeaderDescriptionExample
X-TokenSurf-ModelFinal model usedgpt-4o-mini
X-TokenSurf-DowngradedWas the request routed to a cheaper model?true
X-TokenSurf-ComplexityClassified complexity levelsimple
X-TokenSurf-Request-IdUnique request ID for tracinga1b2c3d4...
X-TokenSurf-RegionRegion that served the requestus-central1
X-TokenSurf-FallbackWas a fallback provider used?true

Account

Email
...
User ID
...
API Key Prefix
ts_...

Routing

Full routing controls are in the Routing Engine page.

Danger Zone

Regenerate API Key
Your old key will stop working immediately
Delete Account
Permanently delete your account, API keys, usage data, and provider keys. This cannot be undone.