Webhook delivery is one of those problems that looks trivial on a whiteboard and produces a postmortem the first time a customer's endpoint goes down for an afternoon. We wrote our own relay last quarter and it's been the highest-leverage piece of infrastructure work we've done all year.

What we needed

Signed payloads. HMAC-SHA256 with a per-tenant secret. Standard, but easy to get wrong if you sign the wrong bytes.
Retries with exponential backoff. Customer endpoints fail. We retry with jitter, cap the backoff at four hours, and give up after 48 hours with a dead-letter queue.
Replay protection. Every payload includes a timestamp and a nonce. Customers can reject anything older than five minutes.
At-least-once delivery. Idempotency keys included in every payload. Customers de-duplicate on their side.
Per-tenant isolation. A slow customer endpoint can't queue head-of-line block other tenants. Each tenant has its own worker pool with bounded concurrency.

Why not a third-party service

We looked at hosted webhook-delivery services. They're good products, and we'd recommend them to teams that don't want to run their own relay. The reason we built ours in-house instead was the extra network hop — every additional service we put in the delivery path was time we couldn't get back, on what is fundamentally a fan-out of one HTTP request per submission.

There's also the supply-chain dimension. Webhook delivery is the most security-critical part of our pipeline besides authentication. We wanted to own the signing path end to end, and to keep custody of customer signing secrets within our own boundary.

Webhook delivery is too important to outsource and too boring to be exciting. That's a sign you should build it.

What's tricky

The hardest part isn't the delivery logic — it's the observability. When a customer reports 'I'm not getting webhooks,' the answer can be 'your endpoint is returning 500s,' 'your signature is failing,' 'you've been rate-limited,' or 'we're not sending them.' We built a delivery-log UI in the dashboard before we built the retry logic, because we knew we'd need it.

If you're building a SaaS that ships webhooks, write your own relay. Or buy a third-party one. But pick deliberately. The implicit choice — 'we'll add retries later' — is the worst of both worlds.

P. Sundaram

Infrastructure · INFODIVE LABS

Infrastructure at Zenith. Spent two weeks writing the webhook relay; spends most months keeping it boring.

Why we built our own
webhook relay.

What we needed

Why not a third-party service

What's tricky

Related
posts.

Anatomy of a spam score.

The 50-millisecond budget.

We removed our pricing page. Then we put it back, simpler.

Why we built our ownwebhook relay.

What we needed

Why not a third-party service

What's tricky

Relatedposts.

Anatomy of a spam score.

The 50-millisecond budget.

We removed our pricing page. Then we put it back, simpler.

Why we built our own
webhook relay.

Related
posts.