TokenSurf - Cut Your LLM Costs by 50%

Integration in 1 Line

Same OpenAI SDK. Works with GPT, Claude, and Gemini. Just change the URL.

from openai import OpenAI

client = OpenAI(
    api_key="ts_your_tokensurf_key",
    base_url="https://api.tokensurf.io/v1"   # That's it)

# OpenAI — gpt-4o routed to gpt-4o-mini (94% savings)
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What is 2+2?"}]
)

# Anthropic — claude-opus-4 routed to claude-haiku-3.5 (94% savings)
response = client.chat.completions.create(
    model="claude-opus-4",
    messages=[{"role": "user", "content": "Translate hello to French"}]
)

# Google — gemini-2.5-pro routed to gemini-2.5-flash (72% savings)
response = client.chat.completions.create(
    model="gemini-2.5-pro",
    messages=[{"role": "user", "content": "Define photosynthesis"}]
)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "ts_your_tokensurf_key",
  baseURL: "https://api.tokensurf.io/v1"   // That's it});

// OpenAI — gpt-4o routed to gpt-4o-mini
const r1 = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "What is 2+2?" }],
});

// Anthropic — claude-sonnet-4 routed to claude-haiku-3.5
const r2 = await client.chat.completions.create({
  model: "claude-sonnet-4",
  messages: [{ role: "user", content: "Translate hello to French" }],
});

// Google — gemini-2.5-pro routed to gemini-2.5-flash
const r3 = await client.chat.completions.create({
  model: "gemini-2.5-pro",
  messages: [{ role: "user", content: "Define photosynthesis" }],
});

# OpenAI — routed to gpt-4o-mini (94% savings)
curl https://api.tokensurf.io/v1/chat/completions \
  -H "Authorization: Bearer ts_your_tokensurf_key" \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "What is 2+2?"}]}'

# Anthropic — routed to claude-haiku-3.5 (67% savings)
curl https://api.tokensurf.io/v1/chat/completions \
  -H "Authorization: Bearer ts_your_tokensurf_key" \
  -H "Content-Type: application/json" \
  -d '{"model": "claude-sonnet-4", "messages": [{"role": "user", "content": "Translate hello to French"}]}'

# Google — routed to gemini-2.5-flash (72% savings)
curl https://api.tokensurf.io/v1/chat/completions \
  -H "Authorization: Bearer ts_your_tokensurf_key" \
  -H "Content-Type: application/json" \
  -d '{"model": "gemini-2.5-pro", "messages": [{"role": "user", "content": "Define photosynthesis"}]}'

How It Works

One proxy between you and the providers. Change one URL.

Step 1

Swap one URL

Point your SDK at TokenSurf. Same code, same models. One line change.

Step 2

We classify & route

"What is 2+2?" goes to a cheap model. "Write me a React app" keeps yours.

Step 3

Cut your bill in half

Save 50-99% on simple calls. Same quality. Your keys, your providers.

Works with OpenAI, Anthropic, Google, and 300+ models via OpenRouter. Read the full architecture →

Real Pricing, Real Savings

When a query is simple, we route it to a cheaper model. Here's exactly what you pay and save.

You Request	Cost per 1M tokens		We Route To	Cost per 1M tokens	You Save
gpt-4	$30 / $60	→	gpt-4o-mini	$0.15 / $0.60	99%
gpt-4-turbo	$5 / $15	→	gpt-4o-mini	$0.15 / $0.60	97%
gpt-4o	$2.50 / $10	→	gpt-4o-mini	$0.15 / $0.60	94%
claude-opus-4	$15 / $75	→	claude-haiku-3.5	$0.80 / $4	95%
claude-sonnet-4	$3 / $15	→	claude-haiku-3.5	$0.80 / $4	73%
gemini-2.5-pro	$1.25 / $10	→	gemini-2.5-flash	$0.30 / $2.50	76%

Prices shown as input / output per 1M tokens. Already on a cheap model? We pass through unchanged. 300+ models available via OpenRouter.

Built for Production

Every feature you need to run LLMs at scale. All included on every plan.

Routing

⚙

Smart Cost Routing

AI classifies every request as simple or complex. Simple queries go to cheaper models automatically. Complex queries stay on your original model. You save 40-94% without losing quality.

📋

Content-Based Rules

Define regex patterns that override the classifier. "If the prompt contains a code block, never downgrade." "If the system prompt says translate, always downgrade." Your domain knowledge, our routing engine.

⏲

Priority & Latency Routing

Tag requests as high-priority (never downgrade) or low-priority (always downgrade). Set max latency targets — if a provider is slow, auto-switch to a faster model.

🧠

Custom Classifier

Replace our default AI classifier with your own prompt. Define what "simple" means for your business. A home improvement chatbot's "simple" is different from a coding assistant's.

Caching & Performance

⚡

Semantic Response Cache

Identical requests return cached responses instantly. Zero API call, zero token cost, zero credits consumed. Your chatbot answers "What are your hours?" 500 times a day — you pay once.

📈

Context Window Management

Long conversations that exceed model limits get auto-trimmed. System prompt preserved, oldest messages dropped, newest kept. No more 400 errors on long chats.

Reliability

🛡

Cross-Provider Fallbacks

If OpenAI is down, your request automatically goes to Claude or Gemini. Circuit breakers detect outages, retry logic handles transient errors, and fallback chains keep your app running.

🌐

Multi-Region Deployment

Deployed in US, EU, and Asia. Requests are served from the nearest region. Redis caching, connection pooling, and rate limiting built in.

Observability

📊

Analytics Dashboard

See every request: which model was used, was it downgraded, how much you saved. Cost breakdown by model and by feature tag. "Your checkout flow costs $12K/month — 60% is one bad prompt."

🔔

Alerts & Quality Scoring

Get warned when credits run low, error rates spike, or daily spend exceeds budget. 5% of responses are auto-scored for quality so you can verify downgraded models still meet your bar.

📨

Webhooks

POST every routing decision to your endpoint. Feed into Slack, PagerDuty, Datadog, or your own analytics. Know exactly what's happening in real time.

Security

🔒

PII Redaction

Auto-detect and strip emails, Social Security numbers, credit card numbers, phone numbers, and IP addresses from prompts before they reach the LLM provider. One toggle. Enterprise compliance checkbox done.

🔑

Bring Your Own Keys

Your API keys are encrypted with AES-256-GCM at rest. We never store them in plaintext. Key rotation with 24-hour grace period. Audit logging on every security event. Your keys, your control.

Platform

👥

Teams & Organizations

Create teams, invite members with roles (owner/admin/member), issue labeled API keys with per-key budget caps and rate limits. One bill, full control over who uses what.

📝

Prompt Template Library

Store system prompts server-side and reference them by ID. Change how your AI responds without redeploying your app. Version and manage prompts from the dashboard.

▶

API Playground

Test any model from the dashboard. Send a prompt, see the response, the routing decision, cache status, and cost breakdown side by side. No code required.

All features. Every plan. No gating.

The only difference between plans is volume and price per request.

Get Started Free See the Roadmap

Simple, Predictable Pricing

Bring your own API keys. $0.001 per request. Commit to volume, pay less.

Free

1,000 requests/month

No credit card required
All providers (BYOK)
Basic smart routing
Dashboard analytics

Get Started

Pay As You Go

$0.001/req

Top up anytime, credits never expire

AI-powered routing
All providers (BYOK)
No commitment required
Full analytics dashboard
$10 minimum top-up

Buy Credits

Growth

$400/mo

$0.0008/req Save 20%

500K requests/month included
AI-powered routing
Priority support
Team access (up to 5 keys)
Overage at $0.0009/req
Annual billing: save 15% more

Start Growth Plan

Scale

$3,000/mo

$0.0006/req Save 40%

5M requests/month included
Dedicated support
Unlimited team keys
Custom routing rules
Quality scoring
Overage at $0.0007/req

Start Scale Plan

Need 50M+ requests/month?

Contact for Enterprise Pricing

Enterprise: volume pricing from $0.0004/req · 99.9% SLA · dedicated account manager · annual contracts

One line change.
Half the cost.

Integration in 1 Line

How It Works

Swap one URL

We classify & route

Cut your bill in half

Real Pricing, Real Savings

Built for Production

Simple, Predictable Pricing

Free

Pay As You Go

Growth

Scale

Ready to Start?

One line change.Half the cost.

Integration in 1 Line

How It Works

Swap one URL

We classify & route

Cut your bill in half

Real Pricing, Real Savings

Built for Production

Simple, Predictable Pricing

Free

Pay As You Go

Growth

Scale

Ready to Start?

One line change.
Half the cost.