Blog Post
Webhooks Done Right: The Production-Safe Playbook (Queues, Retries, Signatures, Idempotency)

If your store or fintech app depends on webhooks (Stripe, PayPal, shipping carriers, ERPs), treat them like money-moving infrastructure:
- Verify signatures and log what you reject.
- Acknowledge fast (200 OK) and do the real work async via a queue.
- Store every incoming event (raw payload + headers) for replay/debug.
- Make handlers idempotent and safe for out-of-order delivery.
- Add retries + dead-letter and alert on backlog/processing failures.
This is the exact pattern I implement in MERN/Next.js builds when webhooks are causing duplicated orders, missing payments, stuck shipments, or angry accounting teams.
Why webhooks go wrong in real production
Webhooks aren’t “requests from another API”. They’re at-least-once notifications sent by an external system with its own retries, timeouts, and ordering.
In ecommerce/fintech, common failure modes look like this:
- Duplicate events → duplicate fulfillment, duplicate emails, or double credits.
- Out-of-order events → you receive
invoice.payment_failedafterinvoice.paidand your logic flips the subscription to “past_due”. - Slow endpoint → provider retries → you process the same event multiple times.
- Bad signature verification (raw body issues) → you silently drop real payments.
- No observability → “Stripe says they sent it” turns into a 3-hour blame game.
So the goal isn’t “receive webhooks”. The goal is:
Make webhook ingestion boring and deterministic.
The production architecture (simple, but strict)
Here’s the model I recommend for MERN apps:
- Ingress endpoint (Express / Next.js API route)
- verifies signature
- stores the event (raw)
- enqueues a job
- returns 200 quickly
- Queue worker (BullMQ)
- processes one event with retry/backoff
- writes domain changes (orders, payments, ledger)
- Idempotency + locking
- each provider event processed exactly once from your perspective
- Dead-letter + replay tools
- failed jobs are visible and replayable
This separation is what stops “webhook storm” incidents.
Step 1: Design your webhook data model (MongoDB)
You want a collection that becomes your audit trail and your replay buffer.
Minimal schema:
// MongoDB collection: webhook_events
{
_id: ObjectId,
provider: "stripe", // stripe | paypal | shippo | etc
eventId: "evt_123", // provider's event id
type: "payment_intent.succeeded",
livemode: false,
receivedAt: ISODate("..."),
processedAt: ISODate("...") | null,
status: "received" | "processing" | "processed" | "failed",
// store raw for debugging + re-verification
rawBody: "{...}",
headers: { ... },
// helpful for routing
resourceId: "pi_..." | "ch_..." | null,
// idempotency
processingAttempts: 0,
lastError: "..." | null
}
Critical indexes:
- Unique:
{ provider: 1, eventId: 1 }(hard stop for duplicates) - Query:
{ status: 1, receivedAt: -1 } - Optional:
{ type: 1, receivedAt: -1 }
Why store rawBody? Because when finance says “a charge exists but no order”, you can replay the event without begging Stripe/PayPal support.
Step 2: Ingest webhooks safely (Express + raw body)
Signature verification often fails because frameworks parse JSON and change whitespace/newlines. For Stripe, you must verify using the raw request body.
Express: raw body + Stripe verification
import express from 'express';
import Stripe from 'stripe';
const app = express();
const stripe = new Stripe(process.env.STRIPE_SECRET_KEY);
// IMPORTANT: raw body for this route
app.post('/webhooks/stripe', express.raw({ type: 'application/json' }), async (req, res) => {
const sig = req.headers['stripe-signature'];
let event;
try {
event = stripe.webhooks.constructEvent(
req.body, // Buffer
sig,
process.env.STRIPE_WEBHOOK_SECRET
);
} catch (err) {
// Log + 400 so Stripe knows it's invalid
console.error('Stripe signature verification failed', err.message);
return res.status(400).send('Invalid signature');
}
// 1) Persist event (idempotent insert)
// 2) Enqueue job
// 3) Return 200 quickly
res.status(200).send('ok');
});
Rule: The webhook route should do almost nothing besides validation + persistence + queueing.
Next.js route handler note
In Next.js, you must disable body parsing and read the raw stream (App Router + Route Handlers makes this cleaner, but you still need raw).
Step 3: Persist first, then enqueue (the “don’t lose money” rule)
Providers will retry delivery, but you still want your own durability.
Pseudo-flow:
- Insert webhook event document using a unique index.
- If duplicate key error → it’s a resend → return 200 (don’t reprocess).
- Enqueue job
{ provider, eventId }. - Return 200.
Example (Mongo + BullMQ conceptually):
try {
await WebhookEvent.create({
provider: 'stripe',
eventId: event.id,
type: event.type,
livemode: event.livemode,
receivedAt: new Date(),
status: 'received',
rawBody: req.body.toString('utf8'),
headers: req.headers,
});
} catch (e) {
if (e.code === 11000) {
// duplicate event id — already received
return res.status(200).send('ok');
}
throw e;
}
await webhookQueue.add(
'processWebhook',
{ provider: 'stripe', eventId: event.id },
{
removeOnComplete: true,
attempts: 8,
backoff: { type: 'exponential', delay: 2000 },
}
);
return res.status(200).send('ok');
This is the pattern that prevents “Stripe resent and we double-shipped”.
Step 4: Build idempotent handlers (your code must tolerate repeats)
Idempotency isn’t optional. It’s the difference between:
- “Webhook processed twice” → nothing changes
- vs.
- “Webhook processed twice” → duplicated order + corrupted ledger
Practical idempotency patterns I use
Pattern A: Provider event id as a processed marker
Before applying business logic, atomically mark the event as processing/processed.
const ev = await WebhookEvent.findOneAndUpdate(
{ provider, eventId, status: { $in: ['received', 'failed'] } },
{ $set: { status: 'processing' }, $inc: { processingAttempts: 1 } },
{ new: true }
);
if (!ev) {
// already processing or processed
return;
}
Then on success:
await WebhookEvent.updateOne(
{ provider, eventId },
{ $set: { status: 'processed', processedAt: new Date() } }
);
Pattern B: Domain-level idempotency key (recommended for money)
For payment flows, use a domain unique constraint like:
payments.stripePaymentIntentIduniqueledger_entries.externalRefunique
Then even if your worker runs twice, the DB stops duplicates.
Step 5: Handle out-of-order events without breaking state
Webhooks are not guaranteed to arrive in order.
Example: subscription status
If you naively set status from the last event you processed, you can regress state.
Better approach:
- Store provider “source of truth” identifiers (subscription id, invoice id)
- On important transitions, fetch current state from provider API (with rate limits)
- Or apply state changes using monotonic rules (don’t move “paid” back to “unpaid” unless you see a later invoice/chargeback)
A simple monotonic rule:
paidbeatsopenrefundedbeatspaidchargebackbeats everything
For ecommerce, similar logic applies to fulfillment:
- Don’t mark an order “unfulfilled” after it was “shipped”.
Step 6: Use a queue worker (BullMQ) with retries and a dead-letter view
In production, you want predictable retries and visibility.
BullMQ worker skeleton
import { Worker } from 'bullmq';
const worker = new Worker(
'webhooks',
async (job) => {
const { provider, eventId } = job.data;
const ev = await WebhookEvent.findOne({ provider, eventId });
if (!ev) return;
// Parse raw body if needed
const payload = JSON.parse(ev.rawBody);
// Route by type
switch (ev.type) {
case 'payment_intent.succeeded':
await handlePaymentIntentSucceeded(payload);
break;
case 'charge.refunded':
await handleRefund(payload);
break;
default:
// Keep unknown types visible but non-fatal
console.log('Unhandled webhook type', ev.type);
}
await WebhookEvent.updateOne(
{ provider, eventId },
{ $set: { status: 'processed', processedAt: new Date() } }
);
},
{ connection: { host: '127.0.0.1', port: 6379 } }
);
worker.on('failed', async (job, err) => {
await WebhookEvent.updateOne(
{ provider: job.data.provider, eventId: job.data.eventId },
{ $set: { status: 'failed', lastError: err.message } }
);
});
Don’t hide failures. A failed webhook should:
- be visible in an admin screen
- be replayable with one click
- alert you when it piles up
Step 7: Observability — the minimum dashboard I install
If you only implement one “nice to have”, make it this.
Metrics (Prometheus/Sentry/New Relic — pick one)
Track:
webhook_received_total{provider,type}webhook_rejected_total{reason}webhook_processing_latency_ms{provider,type}webhook_queue_depth+ age of oldest jobwebhook_failed_total{provider,type}
Logs
Log with a consistent correlation key:
- provider + eventId
- orderId/paymentId when you map it
Alerts (operator-grade)
Alert when:
- queue depth > X for > 5 minutes
- failed events > Y per hour
- signature rejection spikes
Step 8: Security checklist
Webhooks are a public endpoint. Treat it accordingly:
- ✅ Verify signatures (Stripe/PayPal)
- ✅ Allowlist IPs only if provider supports it (many don’t reliably)
- ✅ Keep endpoint unguessable is NOT security; signatures are
- ✅ Rate limit carefully (don’t block provider retries)
- ✅ Store minimal headers (avoid storing secrets)
- ✅ Separate secrets per environment (test vs live)
Step 9: Replay strategy (the “3 AM fix”)
When things go wrong, you need replay without manual scripts.
I usually build:
- Admin table: received/failed webhooks with filters
- Detail view: raw payload + derived IDs + error
- Button: “Requeue” (adds
{provider,eventId}again)
Because the payload is stored, you’re not relying on provider redelivery.
Common ecommerce/fintech webhook flows (and what I watch)
Stripe Checkout / Payment Intents
Watch for:
checkout.session.completedpayment_intent.succeededcharge.refundedcharge.dispute.created
My rule: create the order before taking money, then attach payment results via webhooks.
Shipping provider events (fulfillment + tracking)
Events are messy. Expect:
- duplicate tracking updates
- out-of-order statuses
Store the timeline and compute the final state from a “max status” model instead of trusting the last event.
Marketplace payouts / ledgers
Anything that affects balances should write a ledger entry with a unique externalRef:
externalRef = provider + ":" + eventId
Then duplicates become harmless.
Quick implementation checklist (copy/paste)
- Webhook endpoint uses raw body (no JSON parsing before signature verification)
- Signature verified; invalid signatures return 400 and are logged
- Event stored in DB with unique index (provider + eventId)
- Endpoint returns 200 quickly (under ~300ms)
- Worker processes event from queue with exponential backoff
- Business logic idempotent (DB unique constraints + processed markers)
- Failures visible + replayable; dead-letter queue or failed status
- Metrics + alerts for queue depth, failures, rejection spikes
Closing note (how I help)
If your webhooks are currently causing duplicate orders, missing payments, stuck subscriptions, or accounting mismatches, I can run a short stabilization sprint:
- audit your current webhook flows end-to-end
- implement the ingress + queue + idempotency pattern
- add monitoring + replay tooling
It’s the kind of cleanup that pays back fast because it directly protects revenue and customer trust.