About TokenSurf

Quality testing & monitoring for AI agents.

The Founder

Cem Bas

Full-Stack AI Engineer

San Diego, CA

cem@appbaker.ai

The Mission

TokenSurf was built to solve a hard problem: shipping AI agents you can actually trust. Agents fail in ways traditional tests never catch — they hallucinate, drift, and regress silently between deploys.

TokenSurf is a quality framework for AI agents: offline evaluation that runs in CI to catch regressions before they ship, plus online monitoring that watches quality in production. It's Python-first, wraps your agent with a simple @track decorator, and stays out of your data path — TokenSurf never sits between your app and your model providers.

Quality is measured across four scorer families — correctness, safety, relevance, and task completion — so you can see exactly where an agent is strong and where it slips.

Open core

TokenSurf is built open core. The self-hostable platform — SDK, server, dashboard, and its own database — will be free and Apache-2.0; the open-source release is launching soon. For teams that want a managed experience, TokenSurf Cloud will host the same platform, and is currently in early access.

The team kept its original Firebase and Cloud Run backend, which now serves as the seed of the managed TokenSurf Cloud product.