Around 80% of the submissions hitting Zenith are spam. The interesting question isn't whether we can block them — that's table stakes. It's how cheaply, and how fast, and without false positives that strand real leads.
Each layer in the pipeline has a different cost. The cheapest checks run first; the expensive ones only see traffic that survived the cheap ones. By the time the AI ensemble runs, 78% of the original requests have already been rejected. That's why the median latency is 47ms and not 470.
Cheap first, expensive last
Bot detection at the edge costs us a CIDR lookup and a header check. Threat scoring is a hash join against a Redis set. IP and email rate limits are atomic increment operations. None of these touch the database, none of them call a model. Together they reject roughly two-thirds of incoming requests in under a millisecond.
The disposable-email block and the domain blocklist are slightly more expensive — they hit a trie that lives in memory and is rebuilt nightly. The duplicate-content fingerprint is a min-hash similarity check against the last hour's submissions. Both run in single-digit milliseconds.
Eight layers of cheap is cheaper than one layer of expensive.
Why not just AI?
We get this question a lot. The honest answer: an LLM is expensive enough that you can't afford to call it on every request. If you did, your latency budget evaporates and your inference bill embarrasses you. The cheap layers exist to make the expensive layer affordable.
The other reason: the cheap layers are deterministic. When a request is rejected for hitting a rate limit, we can explain it. When it's rejected because an LLM said 'this feels spammy,' we can't, really. The eight-layer architecture lets us reserve the model for the cases where its judgement is actually adding signal.
Defense in depth gets a bad reputation because people associate it with bureaucratic security theater. In our case it's the opposite — it's how you stay fast. Every layer earns its place by being cheaper than what comes after it.