Our p50 latency is 47ms end to end. People ask how. The answer is unromantic: you write down the budget, hold each layer to its number, and accept that anything left over is variance you have to design around.

The budget started as a whiteboard exercise in early 2025. We had a target — 50ms p50 — and we knew we couldn't borrow latency from anywhere. So we sat down and divided the budget between every piece of the pipeline.

The breakdown

Network (12ms). Edge POP to origin, TLS resumption assumed. We don't terminate on first hop because the savings aren't worth the operational complexity.
Queueing (8ms). Submission lands on a per-tenant queue, fronted by a worker pool. We've measured queue depth in production and 8ms is the 95th percentile, not the median.
Cheap checks (3ms). Rate limits, threat score, blocklist lookups. All in-memory or Redis.
Inference (24ms). Seven classifiers running in parallel on a shared embedding. Bound by the slowest.
Routing & emit (3ms). Compose the verdict object, fire webhooks asynchronously, return the response.

If you don't write the budget down, the budget writes you.

When p50 starts to creep, we don't go hunting. We pull up the breakdown, see which line item moved, and fix that one. Most months it's the inference layer — a model got fatter or an embedding cache went cold. Once or twice it's been the network — a routing change in the cloud provider, and our edge POP suddenly takes a worse path.

Latency budgets aren't glamorous. They don't make a good keynote slide. But they're the only thing standing between a fast product and a slow one, and they have to be written down somewhere a junior engineer can find them when something breaks at 3am.

P. Sundaram

Infrastructure · INFODIVE LABS

Infrastructure at Zenith. Spends most of her time staring at flame graphs.

The 50-millisecond
budget.

The breakdown

Related
posts.

Anatomy of a spam score.

Why we built our own webhook relay.

We removed our pricing page. Then we put it back, simpler.

The 50-millisecondbudget.

The breakdown

Relatedposts.

Anatomy of a spam score.

Why we built our own webhook relay.

We removed our pricing page. Then we put it back, simpler.

The 50-millisecond
budget.

Related
posts.