Blog Post

Webhooks Done Right: The Production-Safe Playbook (Queues, Retries, Signatures, Idempotency)

April 5, 2026

If your store or fintech app depends on webhooks (Stripe, PayPal, shipping carriers, ERPs), treat them like money-moving infrastructure:

Verify signatures and log what you reject.
Acknowledge fast (200 OK) and do the real work async via a queue.
Store every incoming event (raw payload + headers) for replay/debug.
Make handlers idempotent and safe for out-of-order delivery.
Add retries + dead-letter and alert on backlog/processing failures.

This is the exact pattern I implement in MERN/Next.js builds when webhooks are causing duplicated orders, missing payments, stuck shipments, or angry accounting teams.

Why webhooks go wrong in real production

Webhooks aren’t “requests from another API”. They’re at-least-once notifications sent by an external system with its own retries, timeouts, and ordering.

In ecommerce/fintech, common failure modes look like this:

Duplicate events → duplicate fulfillment, duplicate emails, or double credits.
Out-of-order events → you receive invoice.payment_failed after invoice.paid and your logic flips the subscription to “past_due”.
Slow endpoint → provider retries → you process the same event multiple times.
Bad signature verification (raw body issues) → you silently drop real payments.
No observability → “Stripe says they sent it” turns into a 3-hour blame game.

So the goal isn’t “receive webhooks”. The goal is:

Make webhook ingestion boring and deterministic.

The production architecture (simple, but strict)

Here’s the model I recommend for MERN apps:

Ingress endpoint (Express / Next.js API route)
- verifies signature
- stores the event (raw)
- enqueues a job
- returns 200 quickly
Queue worker (BullMQ)
- processes one event with retry/backoff
- writes domain changes (orders, payments, ledger)
Idempotency + locking
- each provider event processed exactly once from your perspective
Dead-letter + replay tools
- failed jobs are visible and replayable

This separation is what stops “webhook storm” incidents.

Step 1: Design your webhook data model (MongoDB)

You want a collection that becomes your audit trail and your replay buffer.

Minimal schema:

// MongoDB collection: webhook_events
{
  _id: ObjectId,
  provider: "stripe",           // stripe | paypal | shippo | etc
  eventId: "evt_123",           // provider's event id
  type: "payment_intent.succeeded",
  livemode: false,

  receivedAt: ISODate("..."),
  processedAt: ISODate("...") | null,
  status: "received" | "processing" | "processed" | "failed",

  // store raw for debugging + re-verification
  rawBody: "{...}",
  headers: { ... },

  // helpful for routing
  resourceId: "pi_..." | "ch_..." | null,

  // idempotency
  processingAttempts: 0,
  lastError: "..." | null
}

Critical indexes:

Unique: { provider: 1, eventId: 1 } (hard stop for duplicates)
Query: { status: 1, receivedAt: -1 }
Optional: { type: 1, receivedAt: -1 }

Why store rawBody? Because when finance says “a charge exists but no order”, you can replay the event without begging Stripe/PayPal support.

Step 2: Ingest webhooks safely (Express + raw body)

Signature verification often fails because frameworks parse JSON and change whitespace/newlines. For Stripe, you must verify using the raw request body.

Express: raw body + Stripe verification

import express from 'express';
import Stripe from 'stripe';

const app = express();
const stripe = new Stripe(process.env.STRIPE_SECRET_KEY);

// IMPORTANT: raw body for this route
app.post('/webhooks/stripe', express.raw({ type: 'application/json' }), async (req, res) => {
  const sig = req.headers['stripe-signature'];

  let event;
  try {
    event = stripe.webhooks.constructEvent(
      req.body, // Buffer
      sig,
      process.env.STRIPE_WEBHOOK_SECRET
    );
  } catch (err) {
    // Log + 400 so Stripe knows it's invalid
    console.error('Stripe signature verification failed', err.message);
    return res.status(400).send('Invalid signature');
  }

  // 1) Persist event (idempotent insert)
  // 2) Enqueue job
  // 3) Return 200 quickly

  res.status(200).send('ok');
});

Rule: The webhook route should do almost nothing besides validation + persistence + queueing.

Next.js route handler note

In Next.js, you must disable body parsing and read the raw stream (App Router + Route Handlers makes this cleaner, but you still need raw).

Step 3: Persist first, then enqueue (the “don’t lose money” rule)

Providers will retry delivery, but you still want your own durability.

Pseudo-flow:

Insert webhook event document using a unique index.
If duplicate key error → it’s a resend → return 200 (don’t reprocess).
Enqueue job { provider, eventId }.
Return 200.

Example (Mongo + BullMQ conceptually):

try {
  await WebhookEvent.create({
    provider: 'stripe',
    eventId: event.id,
    type: event.type,
    livemode: event.livemode,
    receivedAt: new Date(),
    status: 'received',
    rawBody: req.body.toString('utf8'),
    headers: req.headers,
  });
} catch (e) {
  if (e.code === 11000) {
    // duplicate event id — already received
    return res.status(200).send('ok');
  }
  throw e;
}

await webhookQueue.add(
  'processWebhook',
  { provider: 'stripe', eventId: event.id },
  {
    removeOnComplete: true,
    attempts: 8,
    backoff: { type: 'exponential', delay: 2000 },
  }
);

return res.status(200).send('ok');

This is the pattern that prevents “Stripe resent and we double-shipped”.

Step 4: Build idempotent handlers (your code must tolerate repeats)

Idempotency isn’t optional. It’s the difference between:

“Webhook processed twice” → nothing changes
vs.
“Webhook processed twice” → duplicated order + corrupted ledger

Practical idempotency patterns I use

Pattern A: Provider event id as a processed marker

Before applying business logic, atomically mark the event as processing/processed.

const ev = await WebhookEvent.findOneAndUpdate(
  { provider, eventId, status: { $in: ['received', 'failed'] } },
  { $set: { status: 'processing' }, $inc: { processingAttempts: 1 } },
  { new: true }
);

if (!ev) {
  // already processing or processed
  return;
}

Then on success:

await WebhookEvent.updateOne(
  { provider, eventId },
  { $set: { status: 'processed', processedAt: new Date() } }
);

Pattern B: Domain-level idempotency key (recommended for money)

For payment flows, use a domain unique constraint like:

payments.stripePaymentIntentId unique
ledger_entries.externalRef unique

Then even if your worker runs twice, the DB stops duplicates.

Step 5: Handle out-of-order events without breaking state

Webhooks are not guaranteed to arrive in order.

Example: subscription status

If you naively set status from the last event you processed, you can regress state.

Better approach:

Store provider “source of truth” identifiers (subscription id, invoice id)
On important transitions, fetch current state from provider API (with rate limits)
Or apply state changes using monotonic rules (don’t move “paid” back to “unpaid” unless you see a later invoice/chargeback)

A simple monotonic rule:

paid beats open
refunded beats paid
chargeback beats everything

For ecommerce, similar logic applies to fulfillment:

Don’t mark an order “unfulfilled” after it was “shipped”.

Step 6: Use a queue worker (BullMQ) with retries and a dead-letter view

In production, you want predictable retries and visibility.

BullMQ worker skeleton

import { Worker } from 'bullmq';

const worker = new Worker(
  'webhooks',
  async (job) => {
    const { provider, eventId } = job.data;

    const ev = await WebhookEvent.findOne({ provider, eventId });
    if (!ev) return;

    // Parse raw body if needed
    const payload = JSON.parse(ev.rawBody);

    // Route by type
    switch (ev.type) {
      case 'payment_intent.succeeded':
        await handlePaymentIntentSucceeded(payload);
        break;
      case 'charge.refunded':
        await handleRefund(payload);
        break;
      default:
        // Keep unknown types visible but non-fatal
        console.log('Unhandled webhook type', ev.type);
    }

    await WebhookEvent.updateOne(
      { provider, eventId },
      { $set: { status: 'processed', processedAt: new Date() } }
    );
  },
  { connection: { host: '127.0.0.1', port: 6379 } }
);

worker.on('failed', async (job, err) => {
  await WebhookEvent.updateOne(
    { provider: job.data.provider, eventId: job.data.eventId },
    { $set: { status: 'failed', lastError: err.message } }
  );
});

Don’t hide failures. A failed webhook should:

be visible in an admin screen
be replayable with one click
alert you when it piles up

Step 7: Observability — the minimum dashboard I install

If you only implement one “nice to have”, make it this.

Metrics (Prometheus/Sentry/New Relic — pick one)

Track:

webhook_received_total{provider,type}
webhook_rejected_total{reason}
webhook_processing_latency_ms{provider,type}
webhook_queue_depth + age of oldest job
webhook_failed_total{provider,type}

Logs

Log with a consistent correlation key:

provider + eventId
orderId/paymentId when you map it

Alerts (operator-grade)

Alert when:

queue depth > X for > 5 minutes
failed events > Y per hour
signature rejection spikes

Step 8: Security checklist

Webhooks are a public endpoint. Treat it accordingly:

✅ Verify signatures (Stripe/PayPal)
✅ Allowlist IPs only if provider supports it (many don’t reliably)
✅ Keep endpoint unguessable is NOT security; signatures are
✅ Rate limit carefully (don’t block provider retries)
✅ Store minimal headers (avoid storing secrets)
✅ Separate secrets per environment (test vs live)

Step 9: Replay strategy (the “3 AM fix”)

When things go wrong, you need replay without manual scripts.

I usually build:

Admin table: received/failed webhooks with filters
Detail view: raw payload + derived IDs + error
Button: “Requeue” (adds {provider,eventId} again)

Because the payload is stored, you’re not relying on provider redelivery.

Common ecommerce/fintech webhook flows (and what I watch)

Stripe Checkout / Payment Intents

Watch for:

checkout.session.completed
payment_intent.succeeded
charge.refunded
charge.dispute.created

My rule: create the order before taking money, then attach payment results via webhooks.

Shipping provider events (fulfillment + tracking)

Events are messy. Expect:

duplicate tracking updates
out-of-order statuses

Store the timeline and compute the final state from a “max status” model instead of trusting the last event.

Marketplace payouts / ledgers

Anything that affects balances should write a ledger entry with a unique externalRef:

externalRef = provider + ":" + eventId

Then duplicates become harmless.

Quick implementation checklist (copy/paste)

Webhook endpoint uses raw body (no JSON parsing before signature verification)
Signature verified; invalid signatures return 400 and are logged
Event stored in DB with unique index (provider + eventId)
Endpoint returns 200 quickly (under ~300ms)
Worker processes event from queue with exponential backoff
Business logic idempotent (DB unique constraints + processed markers)
Failures visible + replayable; dead-letter queue or failed status
Metrics + alerts for queue depth, failures, rejection spikes

Closing note (how I help)

If your webhooks are currently causing duplicate orders, missing payments, stuck subscriptions, or accounting mismatches, I can run a short stabilization sprint:

audit your current webhook flows end-to-end
implement the ingress + queue + idempotency pattern
add monitoring + replay tooling

It’s the kind of cleanup that pays back fast because it directly protects revenue and customer trust.

AhmadRaza365