AhmadRaza365 Logo

AhmadRaza365

Blog Post

Webhooks Done Right: The Production-Safe Playbook (Queues, Retries, Signatures, Idempotency)

April 5, 2026
Webhooks Done Right: The Production-Safe Playbook (Queues, Retries, Signatures, Idempotency)

If your store or fintech app depends on webhooks (Stripe, PayPal, shipping carriers, ERPs), treat them like money-moving infrastructure:

  • Verify signatures and log what you reject.
  • Acknowledge fast (200 OK) and do the real work async via a queue.
  • Store every incoming event (raw payload + headers) for replay/debug.
  • Make handlers idempotent and safe for out-of-order delivery.
  • Add retries + dead-letter and alert on backlog/processing failures.

This is the exact pattern I implement in MERN/Next.js builds when webhooks are causing duplicated orders, missing payments, stuck shipments, or angry accounting teams.


Why webhooks go wrong in real production

Webhooks aren’t “requests from another API”. They’re at-least-once notifications sent by an external system with its own retries, timeouts, and ordering.

In ecommerce/fintech, common failure modes look like this:

  1. Duplicate events → duplicate fulfillment, duplicate emails, or double credits.
  2. Out-of-order events → you receive invoice.payment_failed after invoice.paid and your logic flips the subscription to “past_due”.
  3. Slow endpoint → provider retries → you process the same event multiple times.
  4. Bad signature verification (raw body issues) → you silently drop real payments.
  5. No observability → “Stripe says they sent it” turns into a 3-hour blame game.

So the goal isn’t “receive webhooks”. The goal is:

Make webhook ingestion boring and deterministic.


The production architecture (simple, but strict)

Here’s the model I recommend for MERN apps:

  1. Ingress endpoint (Express / Next.js API route)
    • verifies signature
    • stores the event (raw)
    • enqueues a job
    • returns 200 quickly
  2. Queue worker (BullMQ)
    • processes one event with retry/backoff
    • writes domain changes (orders, payments, ledger)
  3. Idempotency + locking
    • each provider event processed exactly once from your perspective
  4. Dead-letter + replay tools
    • failed jobs are visible and replayable

This separation is what stops “webhook storm” incidents.


Step 1: Design your webhook data model (MongoDB)

You want a collection that becomes your audit trail and your replay buffer.

Minimal schema:

// MongoDB collection: webhook_events
{
  _id: ObjectId,
  provider: "stripe",           // stripe | paypal | shippo | etc
  eventId: "evt_123",           // provider's event id
  type: "payment_intent.succeeded",
  livemode: false,

  receivedAt: ISODate("..."),
  processedAt: ISODate("...") | null,
  status: "received" | "processing" | "processed" | "failed",

  // store raw for debugging + re-verification
  rawBody: "{...}",
  headers: { ... },

  // helpful for routing
  resourceId: "pi_..." | "ch_..." | null,

  // idempotency
  processingAttempts: 0,
  lastError: "..." | null
}

Critical indexes:

  • Unique: { provider: 1, eventId: 1 } (hard stop for duplicates)
  • Query: { status: 1, receivedAt: -1 }
  • Optional: { type: 1, receivedAt: -1 }

Why store rawBody? Because when finance says “a charge exists but no order”, you can replay the event without begging Stripe/PayPal support.


Step 2: Ingest webhooks safely (Express + raw body)

Signature verification often fails because frameworks parse JSON and change whitespace/newlines. For Stripe, you must verify using the raw request body.

Express: raw body + Stripe verification

import express from 'express';
import Stripe from 'stripe';

const app = express();
const stripe = new Stripe(process.env.STRIPE_SECRET_KEY);

// IMPORTANT: raw body for this route
app.post('/webhooks/stripe', express.raw({ type: 'application/json' }), async (req, res) => {
  const sig = req.headers['stripe-signature'];

  let event;
  try {
    event = stripe.webhooks.constructEvent(
      req.body, // Buffer
      sig,
      process.env.STRIPE_WEBHOOK_SECRET
    );
  } catch (err) {
    // Log + 400 so Stripe knows it's invalid
    console.error('Stripe signature verification failed', err.message);
    return res.status(400).send('Invalid signature');
  }

  // 1) Persist event (idempotent insert)
  // 2) Enqueue job
  // 3) Return 200 quickly

  res.status(200).send('ok');
});

Rule: The webhook route should do almost nothing besides validation + persistence + queueing.

Next.js route handler note

In Next.js, you must disable body parsing and read the raw stream (App Router + Route Handlers makes this cleaner, but you still need raw).


Step 3: Persist first, then enqueue (the “don’t lose money” rule)

Providers will retry delivery, but you still want your own durability.

Pseudo-flow:

  1. Insert webhook event document using a unique index.
  2. If duplicate key error → it’s a resend → return 200 (don’t reprocess).
  3. Enqueue job { provider, eventId }.
  4. Return 200.

Example (Mongo + BullMQ conceptually):

try {
  await WebhookEvent.create({
    provider: 'stripe',
    eventId: event.id,
    type: event.type,
    livemode: event.livemode,
    receivedAt: new Date(),
    status: 'received',
    rawBody: req.body.toString('utf8'),
    headers: req.headers,
  });
} catch (e) {
  if (e.code === 11000) {
    // duplicate event id — already received
    return res.status(200).send('ok');
  }
  throw e;
}

await webhookQueue.add(
  'processWebhook',
  { provider: 'stripe', eventId: event.id },
  {
    removeOnComplete: true,
    attempts: 8,
    backoff: { type: 'exponential', delay: 2000 },
  }
);

return res.status(200).send('ok');

This is the pattern that prevents “Stripe resent and we double-shipped”.


Step 4: Build idempotent handlers (your code must tolerate repeats)

Idempotency isn’t optional. It’s the difference between:

  • “Webhook processed twice” → nothing changes
  • vs.
  • “Webhook processed twice” → duplicated order + corrupted ledger

Practical idempotency patterns I use

Pattern A: Provider event id as a processed marker

Before applying business logic, atomically mark the event as processing/processed.

const ev = await WebhookEvent.findOneAndUpdate(
  { provider, eventId, status: { $in: ['received', 'failed'] } },
  { $set: { status: 'processing' }, $inc: { processingAttempts: 1 } },
  { new: true }
);

if (!ev) {
  // already processing or processed
  return;
}

Then on success:

await WebhookEvent.updateOne(
  { provider, eventId },
  { $set: { status: 'processed', processedAt: new Date() } }
);

Pattern B: Domain-level idempotency key (recommended for money)

For payment flows, use a domain unique constraint like:

  • payments.stripePaymentIntentId unique
  • ledger_entries.externalRef unique

Then even if your worker runs twice, the DB stops duplicates.


Step 5: Handle out-of-order events without breaking state

Webhooks are not guaranteed to arrive in order.

Example: subscription status

If you naively set status from the last event you processed, you can regress state.

Better approach:

  • Store provider “source of truth” identifiers (subscription id, invoice id)
  • On important transitions, fetch current state from provider API (with rate limits)
  • Or apply state changes using monotonic rules (don’t move “paid” back to “unpaid” unless you see a later invoice/chargeback)

A simple monotonic rule:

  • paid beats open
  • refunded beats paid
  • chargeback beats everything

For ecommerce, similar logic applies to fulfillment:

  • Don’t mark an order “unfulfilled” after it was “shipped”.

Step 6: Use a queue worker (BullMQ) with retries and a dead-letter view

In production, you want predictable retries and visibility.

BullMQ worker skeleton

import { Worker } from 'bullmq';

const worker = new Worker(
  'webhooks',
  async (job) => {
    const { provider, eventId } = job.data;

    const ev = await WebhookEvent.findOne({ provider, eventId });
    if (!ev) return;

    // Parse raw body if needed
    const payload = JSON.parse(ev.rawBody);

    // Route by type
    switch (ev.type) {
      case 'payment_intent.succeeded':
        await handlePaymentIntentSucceeded(payload);
        break;
      case 'charge.refunded':
        await handleRefund(payload);
        break;
      default:
        // Keep unknown types visible but non-fatal
        console.log('Unhandled webhook type', ev.type);
    }

    await WebhookEvent.updateOne(
      { provider, eventId },
      { $set: { status: 'processed', processedAt: new Date() } }
    );
  },
  { connection: { host: '127.0.0.1', port: 6379 } }
);

worker.on('failed', async (job, err) => {
  await WebhookEvent.updateOne(
    { provider: job.data.provider, eventId: job.data.eventId },
    { $set: { status: 'failed', lastError: err.message } }
  );
});

Don’t hide failures. A failed webhook should:

  • be visible in an admin screen
  • be replayable with one click
  • alert you when it piles up

Step 7: Observability — the minimum dashboard I install

If you only implement one “nice to have”, make it this.

Metrics (Prometheus/Sentry/New Relic — pick one)

Track:

  • webhook_received_total{provider,type}
  • webhook_rejected_total{reason}
  • webhook_processing_latency_ms{provider,type}
  • webhook_queue_depth + age of oldest job
  • webhook_failed_total{provider,type}

Logs

Log with a consistent correlation key:

  • provider + eventId
  • orderId/paymentId when you map it

Alerts (operator-grade)

Alert when:

  • queue depth > X for > 5 minutes
  • failed events > Y per hour
  • signature rejection spikes

Step 8: Security checklist

Webhooks are a public endpoint. Treat it accordingly:

  • ✅ Verify signatures (Stripe/PayPal)
  • ✅ Allowlist IPs only if provider supports it (many don’t reliably)
  • ✅ Keep endpoint unguessable is NOT security; signatures are
  • ✅ Rate limit carefully (don’t block provider retries)
  • ✅ Store minimal headers (avoid storing secrets)
  • ✅ Separate secrets per environment (test vs live)

Step 9: Replay strategy (the “3 AM fix”)

When things go wrong, you need replay without manual scripts.

I usually build:

  • Admin table: received/failed webhooks with filters
  • Detail view: raw payload + derived IDs + error
  • Button: “Requeue” (adds {provider,eventId} again)

Because the payload is stored, you’re not relying on provider redelivery.


Common ecommerce/fintech webhook flows (and what I watch)

Stripe Checkout / Payment Intents

Watch for:

  • checkout.session.completed
  • payment_intent.succeeded
  • charge.refunded
  • charge.dispute.created

My rule: create the order before taking money, then attach payment results via webhooks.

Shipping provider events (fulfillment + tracking)

Events are messy. Expect:

  • duplicate tracking updates
  • out-of-order statuses

Store the timeline and compute the final state from a “max status” model instead of trusting the last event.

Marketplace payouts / ledgers

Anything that affects balances should write a ledger entry with a unique externalRef:

  • externalRef = provider + ":" + eventId

Then duplicates become harmless.


Quick implementation checklist (copy/paste)

  • Webhook endpoint uses raw body (no JSON parsing before signature verification)
  • Signature verified; invalid signatures return 400 and are logged
  • Event stored in DB with unique index (provider + eventId)
  • Endpoint returns 200 quickly (under ~300ms)
  • Worker processes event from queue with exponential backoff
  • Business logic idempotent (DB unique constraints + processed markers)
  • Failures visible + replayable; dead-letter queue or failed status
  • Metrics + alerts for queue depth, failures, rejection spikes

Closing note (how I help)

If your webhooks are currently causing duplicate orders, missing payments, stuck subscriptions, or accounting mismatches, I can run a short stabilization sprint:

  • audit your current webhook flows end-to-end
  • implement the ingress + queue + idempotency pattern
  • add monitoring + replay tooling

It’s the kind of cleanup that pays back fast because it directly protects revenue and customer trust.

You can find me on different platforms