← BACK TO JOURNAL/ENGINEERING/POST 008

The 50-millisecond
budget.

Twelve milliseconds for network, eight for queueing, thirty for inference. Everything that's left is variance.

FIG.05 — LATENCY BREAKDOWN

Our p50 latency is 47ms end to end. People ask how. The answer is unromantic: you write down the budget, hold each layer to its number, and accept that anything left over is variance you have to design around.

The budget started as a whiteboard exercise in early 2025. We had a target — 50ms p50 — and we knew we couldn't borrow latency from anywhere. So we sat down and divided the budget between every piece of the pipeline.

The breakdown

  1. Network (12ms). Edge POP to origin, TLS resumption assumed. We don't terminate on first hop because the savings aren't worth the operational complexity.
  2. Queueing (8ms). Submission lands on a per-tenant queue, fronted by a worker pool. We've measured queue depth in production and 8ms is the 95th percentile, not the median.
  3. Cheap checks (3ms). Rate limits, threat score, blocklist lookups. All in-memory or Redis.
  4. Inference (24ms). Seven classifiers running in parallel on a shared embedding. Bound by the slowest.
  5. Routing & emit (3ms). Compose the verdict object, fire webhooks asynchronously, return the response.
If you don't write the budget down, the budget writes you.

When p50 starts to creep, we don't go hunting. We pull up the breakdown, see which line item moved, and fix that one. Most months it's the inference layer — a model got fatter or an embedding cache went cold. Once or twice it's been the network — a routing change in the cloud provider, and our edge POP suddenly takes a worse path.


Latency budgets aren't glamorous. They don't make a good keynote slide. But they're the only thing standing between a fast product and a slow one, and they have to be written down somewhere a junior engineer can find them when something breaks at 3am.

§ — KEEP READING

Related
posts.

More from the journal. New posts roughly every other week — engineering notes, product decisions, security writeups, the occasional changelog.