Caching & CDNs · GuideDeck

01 · Why cache at all 4 min

The fastest work is the work
you never do twice.

A cache stores the result of expensive work close to where it's needed, so the next request reads the answer instead of recomputing it. Done well it cuts latency, sheds load, and lowers cost all at once. The catch — and the reason this whole deck exists — is that a cached answer can quietly go stale.

Cache — from French “to hide” — is a fast, usually smaller store that holds a copy of data so future reads are cheaper than going back to the original source of truth. Every cache trades a little freshness for a lot of speed; your job is to choose that trade deliberately.

~100×

memory reads vs. a fresh database query under load — the gap a cache buys back.

10ms

cross-country round trip light can't beat — only moving the data closer can.

90%+

of traffic on a busy site is reads of the same popular items.

a cache hit skips compute, egress, and database capacity you would otherwise pay for.

Every cache works the same way: check the fast copy first; on a miss, do the slow work once and remember it.

The one metric that matters

The hit ratiois the share of requests served from cache. At a 95% hit ratio only 1 request in 20 reaches your origin — so the origin sees 20× less load. Push it to 99% and that's 100×. Small ratio gains near the top are worth a lot.

Latency — a nearby hit answers in microseconds, not a network round trip. See Networking for why distance is the tax.
Load — caches absorb the read flood so databases stay healthy. More in Databases & SQL.
The price of stale — a cached answer is a snapshot. The whole skill is deciding how old is too old.

Like keeping milk in the fridge instead of walking to the shop each time — fast and cheap, until the carton quietly goes off.

02 · HTTP caching 6 min

The web has a cache
built into every response.

Browsers, proxies, and CDNs all obey the same rules, set by a few response headers. Learn these and you get caching for free across the entire chain — no extra infrastructure, just the right Cache-Control.

Cache-Control — the response header that sets the policy — tells every cache between origin and browser how long a response may be reused and who may store it. Pair it with a validator (an ETag or Last-Modified) and a cache can revalidate cheaply once the response goes stale.

// a public asset — cache hard, far and wide Cache-Control: public, max-age=31536000, immutable // a personalized page — store only in the browser Cache-Control: private, max-age=0, must-revalidate // a validator lets caches re-check instead of refetch ETag: "a3f9c2" Last-Modified: Wed, 18 Jun 2026 10:00:00 GMT

A conditional request: the browser asks “still a3f9c2?” and a 304means “yes — reuse what you have.”

The directives worth memorizing

max-age=N

Fresh for N seconds

The response may be reused without asking the server for N seconds. After that it's stale and must be revalidated.

public / private

Who may store it

public lets shared caches (CDNs, proxies) keep it; privaterestricts it to the end-user's browser — use it for anything personalized.

no-store

Never cache

Do not keep a copy anywhere. For secrets and sensitive data. Note: no-cacheis different — it means “store, but revalidate every time.”

immutable

It will never change

Skip revalidation entirely for the lifetime of max-age. Safe only for content-hashed assets like app.3f9c.js.

ETag vs. Last-Modified

ETag — an opaque fingerprint of the content (often a hash). The client echoes it back in If-None-Match; the server compares and answers 304 if nothing changed.
Last-Modified — a timestamp; the client sends If-Modified-Since. Cheaper to compute but only second-granular, so ETags win when edits are frequent.
Either way the win is the same: a 304 skips re-sending the body. You pay a round trip but save the bandwidth and rendering.

The cache-busting pattern

The trick behind immutable assets: put a hash of the content in the filename. Change the file and the name changes, so the URL is new and old caches are simply bypassed — no invalidation needed.

// build output — hash in the name /assets/app.3f9c2a.js // cache forever /assets/app.7b1e44.js // next deploy = new URL // HTML referencing them stays max-age=0

03 · CDNs & edge caching 5 min

Move the bytes
closer to the user.

No header trick beats physics: a request to a server 8,000 km away pays for the distance. A CDN keeps copies in hundreds of cities so most users hit a machine a few milliseconds away — and your origin barely notices the traffic.

CDN — Content Delivery Network — is a fleet of cache servers (edge nodes or PoPs, points of presence) spread across the globe. Requests are routed to the nearest one; on a hit it serves locally, on a miss it fetches from your origin once and caches the result for everyone nearby.

The CDN fans your content out to the edge; the origin only handles the rare miss, so it scales far past its own capacity.

What CDNs cache — and how

Static assets — images, JS, CSS, fonts, video. The classic case: hash the filename, cache immutable, forget it.
Whole HTML pages — when a page is the same for everyone, it can be cached at the edge too. This is exactly the edge revalidation story behind modern SSG/ISR.
Cache keys — by default the URL. Add Vary to split by header (e.g. Vary: Accept-Encoding) so gzip and brotli copies don't collide.
Edge compute — Cloudflare Workers, Fastly Compute, CloudFront Functions run logic at the edge for personalization without a trip home.

Where it sits in a bigger system

The CDN is the outermost cache layer — first hop for the user, last line of defense for your origin. It sits above the application cache and the database, each layer catching what the one outside it missed. See System Design for the full request path through these tiers.

04 · Application caches 5 min

A shared, in-memory store
your code controls.

HTTP caching is automatic but coarse. When you need to cache a specific query result, a session, a computed feed — anything keyed and read by your backend — you reach for an in-memory data store like Redis or Memcached shared by every app server.

In-memory cache — a key-value store that lives in RAM— sits beside your database and holds hot data so app servers read it in microseconds instead of re-querying. Because it's a separate, shared service, all your servers see the same cache — unlike a local per-process cache that each server keeps to itself.

async function getUser(id) { const hit = await cache.get(`user:${id}`) if (hit) return JSON.parse(hit) // fast path const user = await db.query(...) // miss → origin await cache.set(`user:${id}`, JSON.stringify(user), "EX", 300) // TTL: 5 min return user }

The app checks Redis first; a miss hits the database once, then the result is written back with a TTL.

Tooling landscape — in-memory caches

Two names dominate. They overlap heavily; the right pick is about data shape and durability, not raw speed.

Redis

The Swiss-army store

Pro — rich data types (lists, sets, sorted sets, streams), optional persistence, pub/sub, Lua, and clustering; one tool for cache, queue, leaderboard, rate limiter.

Con — largely single-threaded per node and feature-heavy, so it can be overkill if you only ever do GET/SET.

Choose when you want one store for many jobs, or need structures beyond plain strings.

Memcached

The lean specialist

Pro — dead-simple multi-threaded key→blob cache; scales across cores effortlessly and sips memory per entry. Brutally fast for the one thing it does.

Con — strings only, no persistence, no replication; a restart wipes everything and there are no fancy data types.

Choose when your need is purely “cache opaque blobs by key, very fast, nothing more.”

How to choose

Default to Redis — its versatility usually pays for itself, and managed options (AWS ElastiCache, Upstash, Redis Cloud) remove the ops burden.
Reach for Memcached when the workload is a pure, huge, simple blob cache and you want the simplest possible thing to operate.
When the simpler option wins: for a single server, an in-process map with a TTL beats any network cache — no serialization, no extra hop. Add Redis only when you need it shared.

05 · Caching patterns 5 min

Where reads and writes flow
decides what can go wrong.

A “pattern” here is just a rule for who writes to the cache and when. Pick one deliberately — each trades freshness, complexity, and write latency differently.

Cache-aside — the app manages the cache

// read: check cache, fall back, populate v = cache.get(k) ?? db.read(k) cache.set(k, v, ttl) // write: update db, then drop the stale key db.write(k, v) cache.del(k) // next read re-populates

The app reads/writes the database and updates the cache itself — the cache knows nothing of the DB.

Use when

The default for most apps. Simple, resilient — a cache outage just means slower reads.

Watch for

A brief window where the cache holds stale data after a write; delete-on-write keeps it small.

Write-through — the cache writes to the DB for you

// the cache sits in front of the db on writes cache.write(k, v) // cache updates itself… → db.write(k, v) // …and the db, synchronously // reads are always warm — cache is authoritative v = cache.read(k)

Every write goes through the cache to the DB together, so the cache is never stale on read.

Use when

Reads must always be warm and consistent with the DB, and you can pay slightly slower writes.

Cost

Every write pays both hops; you cache data that may never be read again.

Write-back — write fast now, persist later

// write only to cache; flush to db async cache.write(k, v) // returns immediately queue.push(k) // batch flush later // a worker drains the queue every N ms db.bulkWrite(queue.drain()) // fewer, bigger writes

Writes hit only the cache and return; a worker flushes batches to the DB — fast, but data is at risk until it lands.

Use when

Write-heavy workloads that tolerate brief loss (metrics, counters, view counts) and benefit from batching.

Danger

A cache crash before flush loses data. Never use it for anything you can't afford to drop.

Read-through, in one line

A close cousin of cache-aside: instead of the app populating the cache on a miss, the cache library does it for you behind a single get(key) call. Same data flow, less boilerplate — the loader is registered once with the cache.

06 · Invalidation, TTL & SWR 5 min

“There are only two hard things”
— and this is one.

Phil Karlton's line — cache invalidation and naming things — is a joke with a sharp point: deciding when a cached copy stops being valid is genuinely hard. Here are the practical tools that tame it.

Invalidation — removing or refreshing a cached entry once the source changes — keeps the cache honest. The three levers are TTL (expire on a timer), explicit eviction (delete on write), and stale-while-revalidate (serve old, refresh in the background).

TTL

Expire on a timer

Give each entry a Time To Live; it auto-expires after N seconds. Dead simple and self-healing — staleness is bounded by the TTL. The tuning question is just “how stale is tolerable?”

Explicit

Evict on write

On every change to the source, actively del or update the key (and any derived keys). Precise and fresh — but you must find every place the data is cached, or it lingers.

SWR

Serve stale, refresh async

stale-while-revalidate: once stale, serve the old copy instantly and kick off a background refresh. Users never wait for revalidation; the next request gets the fresh value.

// fresh for 60s, then serve stale up to 1 day // while a background fetch refreshes it Cache-Control: max-age=60, stale-while-revalidate=86400 // the user never blocks on revalidation: // t < 60s → fresh, return as-is // 60s..1day → return stale + refresh behind the scenes // > 1day → block and fetch fresh

SWR turns the cliff-edge of expiry into a soft ramp: stale answers stay fast while the fresh one loads.

This is exactly the engine behind Next.js ISR and modern edge revalidation — see Rendering Strategies for how stale-while-revalidate powers incrementally-regenerated pages. The same idea, from an HTTP header to a whole rendering model.

07 · Pitfalls & recap 4 min

The failure modes
that bite at 3 a.m.

Caches mostly fail in a handful of well-known ways. Recognize the pattern and the fix is usually one well-placed lock or a sprinkle of randomness.

Stampede / thundering herd

A hot key expires and thousands of requests miss at once, all hammering the origin to recompute the same value.

Fix — a lock (or single-flight) so only one request rebuilds while the rest wait or serve stale; plus SWR to avoid the miss entirely.

Cache poisoning

A bad or attacker-crafted response gets cached and served to everyone — e.g. an unkeyed header smuggles a malicious variant into a shared cache.

Fix — cache only what you trust, key on every input that changes the body, and never cache error responses by accident.

Penetration & avalanche

Penetration: requests for keys that don't exist skip the cache and pound the DB. Avalanche: many keys expire at the same instant.

Fix— cache “not found” too, and add random jitter to TTLs so expiries spread out.

A single-flight lock collapses a thundering herd into one origin rebuild; everyone else waits a beat or gets the stale value.

Five rules to walk out with

1Cache the hot, read-heavy, slow-to-compute — not everything. Measure the hit ratio.

2Use HTTP caching first. Headers give you browser, proxy, and CDN caching for free.

3Bound staleness on purpose. A TTL is a feature; pick it from the business, not by guessing.

4Plan the miss. Locks, jitter, and SWR keep a cold cache from taking the origin down.

5Simpler wins. An in-process map beats Redis until you need it shared; correctness beats a higher hit ratio.

Knowledge check

Did it stick?

Five quick questions on hit ratios, HTTP headers, CDNs, caching patterns, and failure modes — instant feedback, no sign-in.

Rate this deck

be the first

Navigate with ← → or scroll · back to library

Caching & CDNs —serving the same workonly once.

The fastest work is the workyou never do twice.

The one metric that matters

The web has a cachebuilt into every response.

The directives worth memorizing

Fresh for N seconds

Who may store it

Never cache

It will never change

ETag vs. Last-Modified

The cache-busting pattern

Move the bytescloser to the user.

What CDNs cache — and how

Where it sits in a bigger system

A shared, in-memory storeyour code controls.

Tooling landscape — in-memory caches

The Swiss-army store

The lean specialist

How to choose

Where reads and writes flowdecides what can go wrong.

Cache-aside — the app manages the cache

Write-through — the cache writes to the DB for you

Write-back — write fast now, persist later

Read-through, in one line

“There are only two hard things”— and this is one.

Expire on a timer

Evict on write

Serve stale, refresh async

The failure modesthat bite at 3 a.m.

Five rules to walk out with

Did it stick?

Caching & CDNs —
serving the same work
only once.

The fastest work is the work
you never do twice.

The web has a cache
built into every response.

Move the bytes
closer to the user.

A shared, in-memory store
your code controls.

Where reads and writes flow
decides what can go wrong.

“There are only two hard things”
— and this is one.

The failure modes
that bite at 3 a.m.