Library
00/07 · ~36 min
GUIDEDECK · for data in motion

Message Queues &
Streaming — moving events
between systems that don't wait on each other.

A 36-minute working session on asynchronous messaging: why we put a broker between services, the difference between a work queue and a replayable log, Kafka's topics and partitions, the delivery guarantees that actually matter, and the patterns that keep a stream honest in production.

~36 MINDATA TEAMKAFKA-FOCUSED
SCROLL
01 · Why queues & streams exist 4 min

When one service calls another directly,
they rise and fall together.

A checkout finishes. Now five things need to happen — charge the card, email a receipt, update inventory, notify analytics, warm the recommendations. If checkout calls each of those in turn and waits, it is only as fast as the slowest one and only as available as the flakiest one. A message brokerbreaks that chain: checkout announces "order placed" and moves on.

Message broker a server that sits between systems and carries messages for them. A producer (or publisher) writes a message; the broker holds it; one or more consumers read it on their own schedule. Because neither side calls the other directly, this is asynchronous messaging — the sender doesn't block waiting for the receiver.

Synchronous vs. asynchronous

  • Synchronous — the caller sends a request and waitsfor the reply. Simple, but the caller's fate is tied to the callee's speed and uptime.
  • Asynchronous — the caller hands a message to the broker and returns immediately. The work still happens, just later, decoupled in time.
  • The trade: you gain resilience and scale, you give up the instant answer — the consumer's result arrives out-of-band.
SYNC — a fragile chain Checkout Email Inventory Analytics one slow link blocks them all ASYNC — announce once, fan out Checkout broker order.placed Email Inventory Analytics

Direct calls chain services into one fragile line. A broker lets checkout announce the event once and each consumer react on its own time.

Direct call — coupled in time
async function placeOrder(o) { await save(o) await email.sendReceipt(o) // blocks… await inventory.reserve(o) // blocks… await analytics.track(o) // if this is down, checkout fails }
Publish an event — decoupled
async function placeOrder(o) { await save(o) await broker.publish("order.placed", o) // one write, returns fast } // email, inventory, analytics each consume on their own // one of them being down can't fail the checkout

Like the difference between phoning each colleague and waiting on hold, versus posting one note on the team board that everyone reads when they're free.

02 · Two shapes of broker 5 min

A queue hands out work once.
A log remembers everything.

"Message broker" covers two genuinely different designs. Pick the wrong one and you'll fight the tool forever. The split is simple: does a message disappear once someone handles it, or does it stay so anyone can read it again?

Work queue each message is delivered to one consumer and then removed; the broker tracks what's been done. A log (event log / commit log) is an append-only sequence: messages are kept for a retention window and consumers track their own position with an offset, so the same message can be read by many consumers and re-read later.
WORK QUEUE — consume once, then gone m3 m2 m1 worker m1 acked → deleted LOG — append-only, replayable 0 1 2 3 4 5 → append A @3 B @1 each reader keeps its own offset

The queue deletes m1 once the worker acks it. The log keeps every message; readers A and B sit at different offsets and can rewind.

What each is good at

  • Queue— task distribution. "Resize this image", "send this email". One job, one worker, then it's done. Add workers to drain a backlog faster.
  • Log — an event history many systems care about. Email, inventory and analytics all read the same order.placed stream independently.
  • Replayis the log's superpower: add a new consumer tomorrow and let it read from offset 0 to rebuild its state from the full history.
  • A queue can't replay — once a message is acked and deleted, it's gone. That's a feature for tasks, a limit for events.
Offset a message's sequential position in the log (0, 1, 2, …). A consumer "commits" the offset it has processed up to; that bookmark is the only thing tying a consumer to the log. Move the bookmark back and you replay; the broker itself never changes.
03 · The log, scaled out 6 min

Topics, partitions, offsets,
and consumer groups.

Apache Kafka is the reference design for the replayable log, and its four nouns explain how a log scales to millions of messages a second while still keeping order where it counts. Learn these four and most streaming systems read the same.

Topic — a named stream (order.placed). Partition one topic is split into N ordered logs, and partitions are what let a topic scale across machines. Offset — position within a partition. Consumer group a set of consumers that share the work of a topic, with each partition read by exactly one member of the group.
topic: order.placed P0 0 1 2 P1 0 1 P2 0 1 2 each partition = one ordered log producerkey → hash group: billing consumer 1 consumer 2 consumer 3 one partition → one consumer in the group

The producer hashes each message's key to choose a partition. The billing group spreads the three partitions across its three consumers — one each.

How the pieces fit

  • Order is per-partition, not per-topic. Messages within a partition are strictly ordered; across partitions, all bets are off.
  • The key decides the partition. Same key (e.g. customer_id) → same partition → ordered for that customer. No key → round-robin.
  • Partitions are the unit of parallelism. Six partitions means at most six consumers in a group work in parallel — extras sit idle.
  • Groups give you both patterns. One group = a work queue (work split). Many groups on one topic = pub/sub (each group gets every message).
// producer — the key pins related events to one partition await producer.send({ topic: "order.placed", messages: [{ key: order.customerId, value: JSON.stringify(order) }], }) // consumer — joins a group; Kafka assigns it partitions await consumer.subscribe({ topic: "order.placed" }) await consumer.run({ groupId: "billing", eachMessage: async ({ message }) => charge(message), })
order.placed topic grp billing grp analytics every group sees every message

Two groups on one topic: billing and analytics each receive the full stream, independently.

04 · Guarantees that matter 6 min

Messages get lost, or
messages get duplicated. Pick.

Networks fail mid-handshake, so the broker can never be sure a consumer finished. The only honest question is which failure you prefer: dropping a message, or delivering it twice. The answer is set by when you commit the offset relative to doing the work.

Delivery semantics the guarantee a system makes about how many times a message is processed: at-most-once (0 or 1 — may lose), at-least-once (1 or more — may duplicate), or exactly-once (precisely 1 — hardest, and narrower than it sounds).
commit BEFORE work → at-most-once read commit @5 crash work never ran; message lost ✕ commit AFTER work → at-least-once read process crash offset not committed → redelivered → processed twice ✕ fix: make the work idempotent → effectively exactly-once

Commit the offset first and a crash loses the message. Commit last and a crash re-delivers it. There is no third option at the transport layer.

Why exactly-once is subtle

  • At-least-once is the sane default. Lost data is usually worse than duplicate data, and duplicates are fixable.
  • True exactly-once needs the broker and your processing to share one transaction. Kafka offers it for Kafka-to-Kafka stream processing — not for the email you send to the outside world.
  • The practical answer is at-least-once delivery plus an idempotent consumer: process twice, end up in the same state. That's "effectively-once".
Idempotent consumer one where handling the same message twice has the same effect as handling it once. The usual trick: dedupe on a stable message id, or write with an upsert keyed on a business id — the same idempotency idea from the ETL & ELT pipelines deck, now applied per-message.
Blind handler — duplicates corrupt state
eachMessage(msg) { const o = JSON.parse(msg.value) ledger.addCharge(o.amount) // redelivery ⇒ charged twice } // at-least-once + non-idempotent = double billing
Idempotent handler — retry is a no-op
eachMessage(msg) { const o = JSON.parse(msg.value) ledger.upsertCharge({ id: o.orderId, // dedupe key amount: o.amount, }) // same id ⇒ same single row }
05 · What bites in production 6 min

The broker is easy.
The edges are where it hurts.

Getting messages flowing takes an afternoon. Keeping them ordered, absorbing bursts, handling the message that always fails, and replaying after a bug — that's the real work. Five edges every team meets.

O
Ordering
Order is per-partition — protect it with the key.
+

You only get ordering within a partition. If two events for the same order land in different partitions, a consumer can see shipped before paid. Route by a stable key so everything for one entity stays on one partition.

No key — order scrambles
producer.send({ topic: "order.events", messages: [{ value: evt }] }) // round-robin partition // paid & shipped may land in different partitions
Key by entity — order preserved
producer.send({ topic: "order.events", messages: [{ key: evt.orderId, value: evt }] }) // same orderId ⇒ same partition ⇒ ordered
B
Backpressure & lag
Producers can outrun consumers.
+

When producers write faster than consumers read, the gap grows. That gap has a name — consumer lag: how many messages behind the head of the log a group is. In a queue an overwhelmed consumer feels backpressure; in a log it just falls further behind.

  • Watch lag— it's the single best health metric for a stream; alert when it climbs and doesn't recover.
  • Add consumers up to the partition count — beyond that, add partitions first.
  • Don't drop silently. A log absorbs bursts in its retention buffer; that buffer is the shock absorber, not a place to fall off the end.
D
Dead-letter queue
Quarantine the message that always fails.
+

A poison message — malformed, or one that always throws — will be retried forever and block everything behind it. Cap the retries, then move it aside to a dead-letter queue (DLQ) so the rest of the stream keeps flowing and a human can inspect the reject later.

eachMessage(msg) { try { handle(msg) } catch (e) { if (msg.attempts >= 5) await dlq.send(msg) // quarantine else throw e // retry } }
R
Replay & reprocessing
Rewind the offset to rebuild state.
+

Shipped a transform bug? Because the log keeps history, you can reset a group's offset and re-read. New consumer that needs the full past? Start it at offset 0. Replay only works if your consumers are idempotent (Part 4) — otherwise you replay the duplicates too.

# move the analytics group back to the start of the topic kafka-consumer-groups --reset-offsets --to-earliest \ --group analytics --topic order.placed --execute
S
Schema drift
A changed message shape breaks every consumer.
+

Producers and consumers deploy independently, so a renamed field silently breaks downstream readers. Register message schemas and enforce compatible evolution (add optional fields, never repurpose one) — a schema registry rejects an incompatible change before it ships, the streaming cousin of the schema checks in the pipelines deck.

topic consumer retry ×5 dead-letter inspect later give up stream keeps flowing for everyone else

After a few failed attempts, the poison message is shunted to the dead-letter queue — the rest of the stream is never blocked.

  • Ordering — key by the entity you need ordered.
  • Lag — monitor it; scale consumers to the partition count.
  • Poison messages — bound retries, then DLQ.
  • Replay — keep consumers idempotent so rewinding is safe.
  • Schemas — evolve compatibly behind a registry.
06 · The streaming & queue landscape 5 min

Five systems, two questions:
log or queue, managed or self-run.

The market splits cleanly along the queue-vs-log line from Part 2, crossed with how much you want to operate yourself. Here are the leading systems, each with a one-line strength and the catch.

Replayable log

Apache Kafka

Pro — the de-facto standard for high-throughput event streaming; huge ecosystem, true replay, exactly-once for stream processing.
Con — operationally heavy to self-host and tune; overkill for simple task queues.

Work queue

RabbitMQ

Pro — a mature, flexible message broker with rich routing; ideal for task queues and request/reply.
Con— not a replayable log; once a message is acked it's gone, and throughput trails Kafka at scale.

Managed · AWS

SQS & Kinesis

Pro — fully managed, near-zero ops: SQS for simple queues, Kinesis for the log/streaming shape.
Con — AWS lock-in; smaller feature set and ecosystem than Kafka, and costs scale with usage.

Log + queue

Apache Pulsar

Pro — does both queueing and streaming in one system, with built-in geo-replication and tiered storage.
Con — more moving parts (it relies on a separate storage layer); smaller community than Kafka.

Kafka-compatible

Redpanda

Pro — speaks the Kafka API but is a single binary (no JVM, no separate coordinator); simpler to run, lower latency.
Con — younger project and smaller ecosystem; core is open, some features are commercial.

Rule of thumb

Default picks

Already on AWS and want no ops? SQS / Kinesis. Need a real event log and own your infra? Kafka (or Redpanda for a lighter run). Classic task queue with rich routing? RabbitMQ.

How to choose — Start from Part 2: do you need replayand multiple independent readers of the same history? That's a log — Kafka, Redpanda, Kinesis or Pulsar. Do you just need to hand discrete tasks to workers? That's a queue — RabbitMQ or SQS. Then weigh operations: a managed service (SQS, Kinesis, or hosted Kafka/Redpanda) trades money and lock-in for far less to run; self-hosting trades effort for control and portability. Optimize for throughput and replay only when the workload truly needs them — most teams overreach for Kafka when a queue would do.
07 · Putting it together 4 min

One event in,
many reactions out.

Trace a single order.placedevent through everything we've covered: published once, partitioned by customer, read by independent groups, each idempotent, with a DLQ for the rejects.

Checkout producer order.placed P0 ▸▸▸ P1 ▸▸▸ key = customerId billing email inventory analytics dead-letter each group: own offset, idempotent, replayable

One publish; four independent consumer groups, each at its own offset and idempotent; failures peel off to a dead-letter queue. Add a fifth consumer tomorrow and replay from offset 0 — nothing upstream changes.

1Put a broker between servicesto decouple them in time — the producer shouldn't rise and fall with the consumer.
2Queue for tasks, log for events.Need replay and many independent readers? That's a log (Kafka). One job, one worker? A queue.
3Order lives in the partition. Key by the entity you need ordered; partitions are also your unit of parallelism.
4Assume at-least-once; build idempotent consumers.That combination buys you "effectively-once" without the exactly-once fine print.
5Mind the edges. Monitor lag, DLQ poison messages, evolve schemas compatibly, and keep replay safe.

Keep going

  • Kafka: The Definitive Guide— Narkhede, Shapira & Palino
  • Designing Data-Intensive Applications — Kleppmann (ch. 11, stream processing)
  • ETL & ELT pipelines — batch vs streaming and idempotency in the data flow
  • System design — where async messaging fits the bigger picture

One sentence to remember

"Publish the event once; let everyone read it on their own clock."

— the whole talk, compressed

Knowledge check

Did it stick?

Five quick questions on queues vs logs, Kafka, delivery semantics and the production edges — instant feedback, no sign-in.

Rate this deck
be the first

Navigate with ← → or scroll · back to library