TokenSurf - Product Roadmap

Live

Smart Cost Routing

Automatic complexity classification routes simple queries to cheaper models. Rule-based + Gemini AI classifier pipeline.

CriticalCore

Live

Full Routing Controls

Per-model downgrade toggles, custom downgrade targets, AI classifier toggle, provider enable/disable, ambiguity fallback control.

CriticalRouting

Live

Multi-Provider Support

OpenAI, Anthropic, Google Gemini, OpenRouter with 300+ models. Unified OpenAI-compatible API format across all providers.

CriticalProviders

Live

Fallback Chains

If a provider is down, automatically retry with an equivalent model on another provider. Cross-provider model mapping (e.g. gpt-4o to claude-sonnet-4-6).

CriticalReliability

Live

Retry with Exponential Backoff

Auto-retry on 429/500/502/503 with jitter. Respects Retry-After headers. Up to 2 retries before failing.

HighReliability

Live

Rate Limiting & Abuse Detection

Token bucket rate limiter (60 req/min per key). Automatic abuse detection with throttle and block escalation on anomalous patterns.

CriticalSecurity

Live

Model Quality Scoring

5% of responses auto-scored by Gemini Flash-Lite (1-10 scale). Per-model quality aggregation, downgraded vs original comparison in dashboard.

HighEvaluation

Live

Teams & Organizations

Create orgs, invite members with roles (owner/admin/member). Labeled API keys with per-key budgets, rate limits, and model allowlists.

HighPlatform

Live

System Health & Observability

Real-time health endpoint, circuit breaker per provider, latency percentiles (p50/p95/p99), cache hit rates, structured audit logging.

HighObservability

Live

Multi-Region Deployment

Proxy deployed in US, EU, and Asia (us-central1, europe-west1, asia-northeast1). Redis caching, connection pooling, API key rotation with 24h grace period.

HighInfrastructure

Live

Semantic Response Cache

Exact-match caching of identical requests. Cache hit = instant response, zero provider cost, zero credits. Configurable TTL, toggle per account.

CriticalPerformance

Live

Volume Subscription Plans

Growth ($400/mo, 500K requests) and Scale ($3K/mo, 5M requests) plans with volume discounts up to 40%. Annual billing saves 15% more. Enterprise custom pricing.

CriticalBilling

Live

Request Logs & Cost Breakdown

Per-request log viewer in dashboard with model routing, tokens, cost, savings, and latency. Per-model cost breakdown with bar charts.

CriticalObservability

Live

Teams Dashboard

Full org management UI: create teams, invite members, manage roles, create labeled API keys with per-key budget caps and rate limits.

HighPlatform

Live

Content-Based Routing Rules

Custom regex/pattern rules for routing overrides. "If prompt contains code block, never downgrade." Rules evaluated in order, first match wins.

CriticalRouting

Live

Billing History

Full payment history in the dashboard — dates, amounts, credits, and transaction IDs. Supports enterprise expense reporting.

HighBilling

Live

Usage Alerts

In-dashboard alerts for low credits, high error rates, and daily spend exceeding budget. Catch runaway loops before they drain your wallet.

HighObservability

Live

API Playground

Test any model from the dashboard. Send a prompt, see the response, routing decision, cost breakdown, and cache status side-by-side.

MediumDeveloper

Live

Webhooks on Routing Decisions

POST to your webhook URL for every routing decision. Configurable in dashboard. Feed into Slack, PagerDuty, or custom analytics. Fire-and-forget, never blocks.

HighIntegrations

Live

Streaming Metrics (TTFT, tok/s)

Time-to-first-token and throughput tracking for streaming responses. P50/P95 percentiles on system health dashboard. Know which provider is actually fastest.

HighObservability

Live

PII Redaction Guardrails

Auto-detect and redact emails, SSNs, credit cards, phone numbers, IP addresses from prompts before forwarding to providers. Toggle per account. Enterprise compliance checkbox.

CriticalSecurity

Live

Cost Analytics by Feature Tag

Tag requests with X-TokenSurf-Tag header to track spend per feature, team, or environment. Dashboard shows cost breakdown by tag with savings.

CriticalObservability

Live

Prompt Template Library

Store and version system prompts server-side. Reference by ID via X-TokenSurf-Template header. Change routing behavior without redeploying your app.

MediumDeveloper

Live

Custom Classifier Prompt

Override the default AI classifier with your own domain-specific prompt. Define what "simple" means for your use case. Full control over routing decisions.

MediumRouting

Live

Latency-Based Routing

Set max latency target — if a provider's p95 exceeds it, auto-downgrade to faster models. Dashboard config with preset thresholds.

MediumRouting

Live

Priority Routing

Per-request priority via X-TokenSurf-Priority header. High = never downgrade (user-facing). Low = always downgrade (batch jobs). Toggle in dashboard.

MediumPlatform

Live

Context Window Management

Auto-trim conversation history to fit model context limits. Prevents 400 errors on long conversations. Returns X-TokenSurf-Context-Trimmed header.

MediumRouting

Product Roadmap

Want to shape what we build?