A 34-minute working session on the art of not recomputing: HTTP caching, CDNs at the edge, Redis and Memcached, the patterns that keep data fresh, and the failure modes — stampedes, poisoning, stale reads — that bite teams who skip the fundamentals.
A cache stores the result of expensive work close to where it's needed, so the next request reads the answer instead of recomputing it. Done well it cuts latency, sheds load, and lowers cost all at once. The catch — and the reason this whole deck exists — is that a cached answer can quietly go stale.
memory reads vs. a fresh database query under load — the gap a cache buys back.
cross-country round trip light can't beat — only moving the data closer can.
of traffic on a busy site is reads of the same popular items.
a cache hit skips compute, egress, and database capacity you would otherwise pay for.
Every cache works the same way: check the fast copy first; on a miss, do the slow work once and remember it.
The hit ratiois the share of requests served from cache. At a 95% hit ratio only 1 request in 20 reaches your origin — so the origin sees 20× less load. Push it to 99% and that's 100×. Small ratio gains near the top are worth a lot.
Like keeping milk in the fridge instead of walking to the shop each time — fast and cheap, until the carton quietly goes off.
Browsers, proxies, and CDNs all obey the same rules, set by a few response headers. Learn these and you get caching for free across the entire chain — no extra infrastructure, just the right Cache-Control.
ETag or Last-Modified) and a cache can revalidate cheaply once the response goes stale.A conditional request: the browser asks “still a3f9c2?” and a 304means “yes — reuse what you have.”
The response may be reused without asking the server for N seconds. After that it's stale and must be revalidated.
public lets shared caches (CDNs, proxies) keep it; privaterestricts it to the end-user's browser — use it for anything personalized.
Do not keep a copy anywhere. For secrets and sensitive data. Note: no-cacheis different — it means “store, but revalidate every time.”
Skip revalidation entirely for the lifetime of max-age. Safe only for content-hashed assets like app.3f9c.js.
If-None-Match; the server compares and answers 304 if nothing changed.If-Modified-Since. Cheaper to compute but only second-granular, so ETags win when edits are frequent.The trick behind immutable assets: put a hash of the content in the filename. Change the file and the name changes, so the URL is new and old caches are simply bypassed — no invalidation needed.
No header trick beats physics: a request to a server 8,000 km away pays for the distance. A CDN keeps copies in hundreds of cities so most users hit a machine a few milliseconds away — and your origin barely notices the traffic.
The CDN fans your content out to the edge; the origin only handles the rare miss, so it scales far past its own capacity.
immutable, forget it.Vary to split by header (e.g. Vary: Accept-Encoding) so gzip and brotli copies don't collide.The CDN is the outermost cache layer — first hop for the user, last line of defense for your origin. It sits above the application cache and the database, each layer catching what the one outside it missed. See System Design for the full request path through these tiers.
HTTP caching is automatic but coarse. When you need to cache a specific query result, a session, a computed feed — anything keyed and read by your backend — you reach for an in-memory data store like Redis or Memcached shared by every app server.
The app checks Redis first; a miss hits the database once, then the result is written back with a TTL.
Two names dominate. They overlap heavily; the right pick is about data shape and durability, not raw speed.
Pro — rich data types (lists, sets, sorted sets, streams), optional persistence, pub/sub, Lua, and clustering; one tool for cache, queue, leaderboard, rate limiter.
Con — largely single-threaded per node and feature-heavy, so it can be overkill if you only ever do GET/SET.
Choose when you want one store for many jobs, or need structures beyond plain strings.
Pro — dead-simple multi-threaded key→blob cache; scales across cores effortlessly and sips memory per entry. Brutally fast for the one thing it does.
Con — strings only, no persistence, no replication; a restart wipes everything and there are no fancy data types.
Choose when your need is purely “cache opaque blobs by key, very fast, nothing more.”
A “pattern” here is just a rule for who writes to the cache and when. Pick one deliberately — each trades freshness, complexity, and write latency differently.
The app reads/writes the database and updates the cache itself — the cache knows nothing of the DB.
Every write goes through the cache to the DB together, so the cache is never stale on read.
Writes hit only the cache and return; a worker flushes batches to the DB — fast, but data is at risk until it lands.
A close cousin of cache-aside: instead of the app populating the cache on a miss, the cache library does it for you behind a single get(key) call. Same data flow, less boilerplate — the loader is registered once with the cache.
Phil Karlton's line — cache invalidation and naming things — is a joke with a sharp point: deciding when a cached copy stops being valid is genuinely hard. Here are the practical tools that tame it.
Give each entry a Time To Live; it auto-expires after N seconds. Dead simple and self-healing — staleness is bounded by the TTL. The tuning question is just “how stale is tolerable?”
On every change to the source, actively del or update the key (and any derived keys). Precise and fresh — but you must find every place the data is cached, or it lingers.
stale-while-revalidate: once stale, serve the old copy instantly and kick off a background refresh. Users never wait for revalidation; the next request gets the fresh value.
SWR turns the cliff-edge of expiry into a soft ramp: stale answers stay fast while the fresh one loads.
This is exactly the engine behind Next.js ISR and modern edge revalidation — see Rendering Strategies for how stale-while-revalidate powers incrementally-regenerated pages. The same idea, from an HTTP header to a whole rendering model.
Caches mostly fail in a handful of well-known ways. Recognize the pattern and the fix is usually one well-placed lock or a sprinkle of randomness.
A hot key expires and thousands of requests miss at once, all hammering the origin to recompute the same value.
Fix — a lock (or single-flight) so only one request rebuilds while the rest wait or serve stale; plus SWR to avoid the miss entirely.
A bad or attacker-crafted response gets cached and served to everyone — e.g. an unkeyed header smuggles a malicious variant into a shared cache.
Fix — cache only what you trust, key on every input that changes the body, and never cache error responses by accident.
Penetration: requests for keys that don't exist skip the cache and pound the DB. Avalanche: many keys expire at the same instant.
Fix— cache “not found” too, and add random jitter to TTLs so expiries spread out.
A single-flight lock collapses a thundering herd into one origin rebuild; everyone else waits a beat or gets the stale value.
Five quick questions on hit ratios, HTTP headers, CDNs, caching patterns, and failure modes — instant feedback, no sign-in.
Navigate with ← → or scroll · back to library