A 36-minute working session on asynchronous messaging: why we put a broker between services, the difference between a work queue and a replayable log, Kafka's topics and partitions, the delivery guarantees that actually matter, and the patterns that keep a stream honest in production.
A checkout finishes. Now five things need to happen — charge the card, email a receipt, update inventory, notify analytics, warm the recommendations. If checkout calls each of those in turn and waits, it is only as fast as the slowest one and only as available as the flakiest one. A message brokerbreaks that chain: checkout announces "order placed" and moves on.
Direct calls chain services into one fragile line. A broker lets checkout announce the event once and each consumer react on its own time.
Like the difference between phoning each colleague and waiting on hold, versus posting one note on the team board that everyone reads when they're free.
"Message broker" covers two genuinely different designs. Pick the wrong one and you'll fight the tool forever. The split is simple: does a message disappear once someone handles it, or does it stay so anyone can read it again?
The queue deletes m1 once the worker acks it. The log keeps every message; readers A and B sit at different offsets and can rewind.
order.placed stream independently.Apache Kafka is the reference design for the replayable log, and its four nouns explain how a log scales to millions of messages a second while still keeping order where it counts. Learn these four and most streaming systems read the same.
order.placed). Partition — one topic is split into N ordered logs, and partitions are what let a topic scale across machines. Offset — position within a partition. Consumer group — a set of consumers that share the work of a topic, with each partition read by exactly one member of the group.The producer hashes each message's key to choose a partition. The billing group spreads the three partitions across its three consumers — one each.
customer_id) → same partition → ordered for that customer. No key → round-robin.Two groups on one topic: billing and analytics each receive the full stream, independently.
Networks fail mid-handshake, so the broker can never be sure a consumer finished. The only honest question is which failure you prefer: dropping a message, or delivering it twice. The answer is set by when you commit the offset relative to doing the work.
Commit the offset first and a crash loses the message. Commit last and a crash re-delivers it. There is no third option at the transport layer.
Getting messages flowing takes an afternoon. Keeping them ordered, absorbing bursts, handling the message that always fails, and replaying after a bug — that's the real work. Five edges every team meets.
You only get ordering within a partition. If two events for the same order land in different partitions, a consumer can see shipped before paid. Route by a stable key so everything for one entity stays on one partition.
When producers write faster than consumers read, the gap grows. That gap has a name — consumer lag: how many messages behind the head of the log a group is. In a queue an overwhelmed consumer feels backpressure; in a log it just falls further behind.
A poison message — malformed, or one that always throws — will be retried forever and block everything behind it. Cap the retries, then move it aside to a dead-letter queue (DLQ) so the rest of the stream keeps flowing and a human can inspect the reject later.
Shipped a transform bug? Because the log keeps history, you can reset a group's offset and re-read. New consumer that needs the full past? Start it at offset 0. Replay only works if your consumers are idempotent (Part 4) — otherwise you replay the duplicates too.
Producers and consumers deploy independently, so a renamed field silently breaks downstream readers. Register message schemas and enforce compatible evolution (add optional fields, never repurpose one) — a schema registry rejects an incompatible change before it ships, the streaming cousin of the schema checks in the pipelines deck.
After a few failed attempts, the poison message is shunted to the dead-letter queue — the rest of the stream is never blocked.
The market splits cleanly along the queue-vs-log line from Part 2, crossed with how much you want to operate yourself. Here are the leading systems, each with a one-line strength and the catch.
Pro — the de-facto standard for high-throughput event streaming; huge ecosystem, true replay, exactly-once for stream processing.
Con — operationally heavy to self-host and tune; overkill for simple task queues.
Pro — a mature, flexible message broker with rich routing; ideal for task queues and request/reply.
Con— not a replayable log; once a message is acked it's gone, and throughput trails Kafka at scale.
Pro — fully managed, near-zero ops: SQS for simple queues, Kinesis for the log/streaming shape.
Con — AWS lock-in; smaller feature set and ecosystem than Kafka, and costs scale with usage.
Pro — does both queueing and streaming in one system, with built-in geo-replication and tiered storage.
Con — more moving parts (it relies on a separate storage layer); smaller community than Kafka.
Pro — speaks the Kafka API but is a single binary (no JVM, no separate coordinator); simpler to run, lower latency.
Con — younger project and smaller ecosystem; core is open, some features are commercial.
Already on AWS and want no ops? SQS / Kinesis. Need a real event log and own your infra? Kafka (or Redpanda for a lighter run). Classic task queue with rich routing? RabbitMQ.
Trace a single order.placedevent through everything we've covered: published once, partitioned by customer, read by independent groups, each idempotent, with a DLQ for the rejects.
One publish; four independent consumer groups, each at its own offset and idempotent; failures peel off to a dead-letter queue. Add a fifth consumer tomorrow and replay from offset 0 — nothing upstream changes.
"Publish the event once; let everyone read it on their own clock."
— the whole talk, compressed
Five quick questions on queues vs logs, Kafka, delivery semantics and the production edges — instant feedback, no sign-in.
Navigate with ← → or scroll · back to library