When an order is short on stock at a warehouse, the system creates an internal transfer request to move inventory from another location. The existing flow publishes one Kafka message per order, which triggers one transfer request per order.

During peak hours, a single warehouse can generate hundreds of these messages in minutes. Each one spawns a small transfer note. The warehouse team ends up processing dozens of tiny requests for the same destination when a single consolidated batch would do.

Before:  Order A001 → transfer #1 for WH-42    (100 transfers in 5 min)
         Order A002 → transfer #2 for WH-42
         Order A003 → transfer #3 for WH-42
         ...

After:   WH-42 (1 consolidated batch)         → downstream

The fix: a debounced aggregator that sits between the existing Kafka topic and the downstream consumer. It collects per-order messages, buffers them by warehouse, and publishes a single aggregated message per warehouse after a configurable quiet period.

Architecture

The aggregator is a new service that consumes from the existing topic without touching the upstream flow. This approach guarantees zero impact on our existing system.

Debounced aggregator architecture: existing flow (CDC, usecase, per-order topic) and new aggregator (ingest, buffer, flush, aggregated topic)

The aggregator has two primary operations: ingest (Kafka → MySQL buffer) and flush (buffer → aggregated Kafka topic). We use location_id as the partition key. Delivery guarantees are at-least-once, utilizing a batch_id for idempotency. Throughout this flow, Kafka remains the ultimate source of truth.

Ingest

The consumer reads from the per-order topic and UPSERTs the data into a MySQL buffer table. If we encounter the same order for the same warehouse, we simply update the existing row instead of inserting a duplicate.

INSERT INTO event_buffer_table 
    (partition_key, location_id, order_id, payload, status)
VALUES (?, ?, ?, ?, 'PENDING')
ON DUPLICATE KEY UPDATE 
    payload    = IF(status = 'PENDING', VALUES(payload), payload),
    updated_at = IF(status = 'PENDING', NOW(), updated_at);

To maintain an at-least-once guarantee, we commit the Kafka offset manually (enable.auto.commit: false) only after a successful database write. If the consumer crashes, Kafka naturally replays the uncommitted messages upon restart.

Gotcha: If a message arrives for an order whose row is already claimed or sent, we must reject it. Inserting a second row would cause the same order to appear in multiple downstream batches. To enforce this, the unique key must be exactly (partition_key, location_id, order_id) with status excluded. If we had included the status in the unique key, a new ingest could mistakenly create a duplicate row while the existing one was mid-flush.

Flush

A poller runs on a frequent tick (e.g., every 30 seconds) and applies a hybrid debounce + hard window cap strategy to decide when a warehouse bucket is ready for aggregation.

  • Debounce idle: We flush when no new records arrive for 5 minutes.
  • Hard window cap: We force a flush when the first un-flushed record is older than 30 minutes.

Whichever condition fires first wins. We fundamentally need both checks. Debounce alone could run indefinitely for a warehouse with a steady, slow trickle of orders—there would always be a new record resetting the idle timer, starving the flush entirely. The hard cap fixes that.

Conversely, without debounce, we would only flush when the hard cap is reached. A single order could be stranded alone in the buffer for up to 30 minutes, or under heavy load, thousands of records could accumulate right up to the cap, risking a massive system spike. The debounce effectively splits the load by continually flushing quiet queues. This dual approach provides efficient batching when traffic is heavy and bounded latency when it’s quiet.

Hybrid debounce sequence diagram: 5 minute idle vs 30 minute hard cap triggering claim and flush

The flush operation runs in two distinct stages backed by three safely orchestrated queries:

Stage 1: Claim ready batches. We discover buckets that meet our debounce or window criteria and atomically claim their rows by assigning a unique batch_id. If a warehouse bucket exceeds our max_batch_size (e.g., 500 rows), we claim only the first N rows; the rest patiently wait for the next tick. This acts as a natural backpressure mechanism.

-- Step 1.1: Find flushable location buckets
SELECT location_id
FROM event_buffer_table
WHERE status = 'PENDING' 
GROUP BY location_id
HAVING MAX(updated_at) < NOW() - INTERVAL 5 MINUTE    -- debounce idle
    OR MIN(created_at) < NOW() - INTERVAL 30 MINUTE;  -- hard window cap

-- Step 1.2: Claim rows atomically (capped by max_batch_size)
UPDATE event_buffer_table
SET status = 'CLAIMED',
    batch_id = UUID()
WHERE status = 'PENDING' AND location_id = ?
ORDER BY created_at ASC
LIMIT 500;

Stage 2: Flush claimed batches. We read the claimed rows, aggregate them into a single payload, produce this to Kafka, and mark the rows as sent. We specifically read every claimed batch in the system—not just the ones claimed in the immediate tick. Batches that failed to produce smoothly on a prior tick (e.g., due to temporary Kafka unavailability) remain claimed and naturally get retried here.

-- Step 2.1: Read claimed batches for aggregation
SELECT * 
FROM event_buffer_table 
WHERE batch_id IN (
    SELECT batch_id FROM event_buffer_table
    WHERE status = 'CLAIMED'
    GROUP BY batch_id
    LIMIT 100
);

-- Step 2.2: Mark sent after successful downstream publish
UPDATE event_buffer_table
SET status = 'SENT' 
WHERE batch_id IN (?);

The batch_id Trick

We assign the batch_id at claim time and never alter it until the record is definitively marked as sent. If the process crashes directly after the message is produced to Kafka but before the MySQL marking transaction commits, the next flush cycle safely re-reads the claimed rows and re-produces them with the exact same batch_id.

This transforms our batch_id into a robust idempotency key. Downstream consumers simply deduplicate incoming batch messages on this key to achieve at-least-once delivery minus any disruptive processing duplication.

Buffer and Cleanup

Because we retain sent rows indefinitely to serve as an audit log, the table naturally balloons in size. Running massive DML DELETE statements on this scale would induce severe InnoDB page fragmentation and hinder overall throughput.

Instead, we lean into MySQL range partitioning, separating data out by month. We drop the oldest partitions via an automated scheduled chore. This frees disk space immediately and cleanly at the file-system level. Right before dropping a month’s partition, an automated guard rail verifies that zero unprocessed rows (status != 'SENT') exist inside.

Downstream

The dedicated downstream consumer receives one consolidated message per warehouse on every successful flush. The message’s partition key maps to location_id, and its payload unpacks the batch_id identity along with a flushed_at timestamp. Most importantly, it unwraps order_ids and an items array summarizing our short-stock quantities neatly aggregated by SKU.

TL;DR

  • Ingest: Consumes Kafka messages and uses an UPSERT buffer. Commits the manual offset strictly after durability. Opts for a unique key explicitly void of status to shield against processing the same order sequence in multiple conflicting batches.
  • Flush: Executes efficiently via two main phases: (1) Select and claim eligible event batches. (2) Process and flush to downstream. batch_id persistency effectively guarantees deterministic, safe retry loops.
  • Resilience: Kafka is the singular source of truth while the MySQL buffer functions safely as a derived materialization. Range partitioning guarantees an unfragmented cleanup pipeline over time.