Message Queues & Streaming

01 · Why queues & streams exist 4 min

When one service calls another directly,
they rise and fall together.

A checkout finishes. Now five things need to happen — charge the card, email a receipt, update inventory, notify analytics, warm the recommendations. If checkout calls each of those in turn and waits, it is only as fast as the slowest one and only as available as the flakiest one. A message brokerbreaks that chain: checkout announces "order placed" and moves on.

Message broker — a server that sits between systems and carries messages for them. A producer (or publisher) writes a message; the broker holds it; one or more consumers read it on their own schedule. Because neither side calls the other directly, this is asynchronous messaging — the sender doesn't block waiting for the receiver.

Synchronous vs. asynchronous

Synchronous — the caller sends a request and waitsfor the reply. Simple, but the caller's fate is tied to the callee's speed and uptime.
Asynchronous — the caller hands a message to the broker and returns immediately. The work still happens, just later, decoupled in time.
The trade: you gain resilience and scale, you give up the instant answer — the consumer's result arrives out-of-band.

Direct calls chain services into one fragile line. A broker lets checkout announce the event once and each consumer react on its own time.

Direct call — coupled in time

async function placeOrder(o) { await save(o) await email.sendReceipt(o) // blocks… await inventory.reserve(o) // blocks… await analytics.track(o) // if this is down, checkout fails }

Publish an event — decoupled

async function placeOrder(o) { await save(o) await broker.publish("order.placed", o) // one write, returns fast } // email, inventory, analytics each consume on their own // one of them being down can't fail the checkout

Like the difference between phoning each colleague and waiting on hold, versus posting one note on the team board that everyone reads when they're free.

02 · Two shapes of broker 5 min

A queue hands out work once.
A log remembers everything.

"Message broker" covers two genuinely different designs. Pick the wrong one and you'll fight the tool forever. The split is simple: does a message disappear once someone handles it, or does it stay so anyone can read it again?

Work queue — each message is delivered to one consumer and then removed; the broker tracks what's been done. A log (event log / commit log) is an append-only sequence: messages are kept for a retention window and consumers track their own position with an offset, so the same message can be read by many consumers and re-read later.

The queue deletes m1 once the worker acks it. The log keeps every message; readers A and B sit at different offsets and can rewind.

What each is good at

Queue— task distribution. "Resize this image", "send this email". One job, one worker, then it's done. Add workers to drain a backlog faster.
Log — an event history many systems care about. Email, inventory and analytics all read the same order.placed stream independently.
Replayis the log's superpower: add a new consumer tomorrow and let it read from offset 0 to rebuild its state from the full history.
A queue can't replay — once a message is acked and deleted, it's gone. That's a feature for tasks, a limit for events.

Offset — a message's sequential position in the log (0, 1, 2, …). A consumer "commits" the offset it has processed up to; that bookmark is the only thing tying a consumer to the log. Move the bookmark back and you replay; the broker itself never changes.

03 · The log, scaled out 6 min

Topics, partitions, offsets,
and consumer groups.

Apache Kafka is the reference design for the replayable log, and its four nouns explain how a log scales to millions of messages a second while still keeping order where it counts. Learn these four and most streaming systems read the same.

Topic — a named stream (order.placed). Partition — one topic is split into N ordered logs, and partitions are what let a topic scale across machines. Offset — position within a partition. Consumer group — a set of consumers that share the work of a topic, with each partition read by exactly one member of the group.

The producer hashes each message's key to choose a partition. The billing group spreads the three partitions across its three consumers — one each.

How the pieces fit

Order is per-partition, not per-topic. Messages within a partition are strictly ordered; across partitions, all bets are off.
The key decides the partition. Same key (e.g. customer_id) → same partition → ordered for that customer. No key → round-robin.
Partitions are the unit of parallelism. Six partitions means at most six consumers in a group work in parallel — extras sit idle.
Groups give you both patterns. One group = a work queue (work split). Many groups on one topic = pub/sub (each group gets every message).

// producer — the key pins related events to one partition await producer.send({ topic: "order.placed", messages: [{ key: order.customerId, value: JSON.stringify(order) }], }) // consumer — joins a group; Kafka assigns it partitions await consumer.subscribe({ topic: "order.placed" }) await consumer.run({ groupId: "billing", eachMessage: async ({ message }) => charge(message), })

Two groups on one topic: billing and analytics each receive the full stream, independently.

04 · Guarantees that matter 6 min

Messages get lost, or
messages get duplicated. Pick.

Networks fail mid-handshake, so the broker can never be sure a consumer finished. The only honest question is which failure you prefer: dropping a message, or delivering it twice. The answer is set by when you commit the offset relative to doing the work.

Delivery semantics — the guarantee a system makes about how many times a message is processed: at-most-once (0 or 1 — may lose), at-least-once (1 or more — may duplicate), or exactly-once (precisely 1 — hardest, and narrower than it sounds).

Commit the offset first and a crash loses the message. Commit last and a crash re-delivers it. There is no third option at the transport layer.

Why exactly-once is subtle

At-least-once is the sane default. Lost data is usually worse than duplicate data, and duplicates are fixable.
True exactly-once needs the broker and your processing to share one transaction. Kafka offers it for Kafka-to-Kafka stream processing — not for the email you send to the outside world.
The practical answer is at-least-once delivery plus an idempotent consumer: process twice, end up in the same state. That's "effectively-once".

Idempotent consumer — one where handling the same message twice has the same effect as handling it once. The usual trick: dedupe on a stable message id, or write with an upsert keyed on a business id — the same idempotency idea from the ETL & ELT pipelines deck, now applied per-message.

Blind handler — duplicates corrupt state

eachMessage(msg) { const o = JSON.parse(msg.value) ledger.addCharge(o.amount) // redelivery ⇒ charged twice } // at-least-once + non-idempotent = double billing

Idempotent handler — retry is a no-op

eachMessage(msg) { const o = JSON.parse(msg.value) ledger.upsertCharge({ id: o.orderId, // dedupe key amount: o.amount, }) // same id ⇒ same single row }

05 · What bites in production 6 min

The broker is easy.
The edges are where it hurts.

Getting messages flowing takes an afternoon. Keeping them ordered, absorbing bursts, handling the message that always fails, and replaying after a bug — that's the real work. Five edges every team meets.

Ordering

Order is per-partition — protect it with the key.

You only get ordering within a partition. If two events for the same order land in different partitions, a consumer can see shipped before paid. Route by a stable key so everything for one entity stays on one partition.

No key — order scrambles

producer.send({ topic: "order.events", messages: [{ value: evt }] }) // round-robin partition // paid & shipped may land in different partitions

Key by entity — order preserved

producer.send({ topic: "order.events", messages: [{ key: evt.orderId, value: evt }] }) // same orderId ⇒ same partition ⇒ ordered

Backpressure & lag

Producers can outrun consumers.

When producers write faster than consumers read, the gap grows. That gap has a name — consumer lag: how many messages behind the head of the log a group is. In a queue an overwhelmed consumer feels backpressure; in a log it just falls further behind.

Watch lag— it's the single best health metric for a stream; alert when it climbs and doesn't recover.
Add consumers up to the partition count — beyond that, add partitions first.
Don't drop silently. A log absorbs bursts in its retention buffer; that buffer is the shock absorber, not a place to fall off the end.

Dead-letter queue

Quarantine the message that always fails.

A poison message — malformed, or one that always throws — will be retried forever and block everything behind it. Cap the retries, then move it aside to a dead-letter queue (DLQ) so the rest of the stream keeps flowing and a human can inspect the reject later.

eachMessage(msg) { try { handle(msg) } catch (e) { if (msg.attempts >= 5) await dlq.send(msg) // quarantine else throw e // retry } }

Replay & reprocessing

Rewind the offset to rebuild state.

Shipped a transform bug? Because the log keeps history, you can reset a group's offset and re-read. New consumer that needs the full past? Start it at offset 0. Replay only works if your consumers are idempotent (Part 4) — otherwise you replay the duplicates too.

# move the analytics group back to the start of the topic kafka-consumer-groups --reset-offsets --to-earliest \ --group analytics --topic order.placed --execute

Schema drift

A changed message shape breaks every consumer.

Producers and consumers deploy independently, so a renamed field silently breaks downstream readers. Register message schemas and enforce compatible evolution (add optional fields, never repurpose one) — a schema registry rejects an incompatible change before it ships, the streaming cousin of the schema checks in the pipelines deck.

After a few failed attempts, the poison message is shunted to the dead-letter queue — the rest of the stream is never blocked.

Ordering — key by the entity you need ordered.
Lag — monitor it; scale consumers to the partition count.
Poison messages — bound retries, then DLQ.
Replay — keep consumers idempotent so rewinding is safe.
Schemas — evolve compatibly behind a registry.

06 · The streaming & queue landscape 5 min

Five systems, two questions:
log or queue, managed or self-run.

The market splits cleanly along the queue-vs-log line from Part 2, crossed with how much you want to operate yourself. Here are the leading systems, each with a one-line strength and the catch.

Replayable log

Apache Kafka

Pro — the de-facto standard for high-throughput event streaming; huge ecosystem, true replay, exactly-once for stream processing.
Con — operationally heavy to self-host and tune; overkill for simple task queues.

Work queue

RabbitMQ

Pro — a mature, flexible message broker with rich routing; ideal for task queues and request/reply.
Con— not a replayable log; once a message is acked it's gone, and throughput trails Kafka at scale.

Managed · AWS

SQS & Kinesis

Pro — fully managed, near-zero ops: SQS for simple queues, Kinesis for the log/streaming shape.
Con — AWS lock-in; smaller feature set and ecosystem than Kafka, and costs scale with usage.

Log + queue

Apache Pulsar

Pro — does both queueing and streaming in one system, with built-in geo-replication and tiered storage.
Con — more moving parts (it relies on a separate storage layer); smaller community than Kafka.

Kafka-compatible

Redpanda

Pro — speaks the Kafka API but is a single binary (no JVM, no separate coordinator); simpler to run, lower latency.
Con — younger project and smaller ecosystem; core is open, some features are commercial.

Rule of thumb

Default picks

Already on AWS and want no ops? SQS / Kinesis. Need a real event log and own your infra? Kafka (or Redpanda for a lighter run). Classic task queue with rich routing? RabbitMQ.

How to choose — Start from Part 2: do you need replayand multiple independent readers of the same history? That's a log — Kafka, Redpanda, Kinesis or Pulsar. Do you just need to hand discrete tasks to workers? That's a queue — RabbitMQ or SQS. Then weigh operations: a managed service (SQS, Kinesis, or hosted Kafka/Redpanda) trades money and lock-in for far less to run; self-hosting trades effort for control and portability. Optimize for throughput and replay only when the workload truly needs them — most teams overreach for Kafka when a queue would do.

07 · Putting it together 4 min

One event in,
many reactions out.

Trace a single order.placedevent through everything we've covered: published once, partitioned by customer, read by independent groups, each idempotent, with a DLQ for the rejects.

One publish; four independent consumer groups, each at its own offset and idempotent; failures peel off to a dead-letter queue. Add a fifth consumer tomorrow and replay from offset 0 — nothing upstream changes.

1Put a broker between servicesto decouple them in time — the producer shouldn't rise and fall with the consumer.

2Queue for tasks, log for events.Need replay and many independent readers? That's a log (Kafka). One job, one worker? A queue.

3Order lives in the partition. Key by the entity you need ordered; partitions are also your unit of parallelism.

4Assume at-least-once; build idempotent consumers.That combination buys you "effectively-once" without the exactly-once fine print.

5Mind the edges. Monitor lag, DLQ poison messages, evolve schemas compatibly, and keep replay safe.

Keep going

Kafka: The Definitive Guide— Narkhede, Shapira & Palino
Designing Data-Intensive Applications — Kleppmann (ch. 11, stream processing)
ETL & ELT pipelines — batch vs streaming and idempotency in the data flow
System design — where async messaging fits the bigger picture

One sentence to remember

"Publish the event once; let everyone read it on their own clock."

— the whole talk, compressed

Knowledge check

Did it stick?

Five quick questions on queues vs logs, Kafka, delivery semantics and the production edges — instant feedback, no sign-in.

Rate this deck

be the first

Navigate with ← → or scroll · back to library

Message Queues &Streaming — moving eventsbetween systems that don't wait on each other.

When one service calls another directly,they rise and fall together.

Synchronous vs. asynchronous

A queue hands out work once.A log remembers everything.

What each is good at

Topics, partitions, offsets,and consumer groups.

How the pieces fit

Messages get lost, ormessages get duplicated. Pick.

Why exactly-once is subtle

The broker is easy.The edges are where it hurts.

Five systems, two questions:log or queue, managed or self-run.

Apache Kafka

RabbitMQ

SQS & Kinesis

Apache Pulsar

Redpanda

Default picks

One event in,many reactions out.

Keep going

One sentence to remember

Did it stick?

Message Queues &
Streaming — moving events
between systems that don't wait on each other.

When one service calls another directly,
they rise and fall together.

A queue hands out work once.
A log remembers everything.

Topics, partitions, offsets,
and consumer groups.

Messages get lost, or
messages get duplicated. Pick.

The broker is easy.
The edges are where it hurts.

Five systems, two questions:
log or queue, managed or self-run.

One event in,
many reactions out.