A 40-minute working session on relational databases — from why we model data as tables, through the SQL you write every day, transactions and ACID, indexes and query plans, which engine to pick, and how to design a schema that won't fight you later.
Code gets rewritten constantly; the data outlives all of it. The relational model wins because it makes one promise nothing else does: a single, trustworthy copy of your data that many programs can read and write at the same time without stepping on each other. Think of a bank's ledger — every teller and ATM works against the same balances, and nobody is allowed to leave it half-updated.
Many programs, one shared truth — the database, not any one app, owns the data.
The schema turns "please be careful" into rules the engine cannot let you break.
A good schema stores every fact once and links facts together with keys (a key is just a column whose job is to identify or point at a row). Get the keys right and the model explains itself — you can see how everything connects just by reading the tables, because the links live in the data instead of only in your head.
Unique and never null. Prefer a surrogate key (an auto id) over a natural one like email — natural values change, ids shouldn't.
order.user_id → users.id. The engine refusesan order for a user that doesn't exist — no orphan rows.
NOT NULL, UNIQUE, CHECK, DEFAULT — invariants enforced for every writer, not just the careful ones.
orders.user_id).UNIQUE constraint.Like a library: one borrower card links to many loans; each loan points back to one book and one borrower.
order_items is the junction table: it turns orders↔products (many-to-many) into two clean one-to-many links.
SQL is declarative: you describe the result set you want, not the loops to build it. Four clauses cover the vast majority of reads — and they run in an order that surprises most newcomers.
SELECT reads rows; WHERE filters them; JOIN combines tables by matching keys; GROUP BY collapses many rows into per-group summaries.You write SELECT … FROM … WHERE … GROUP BY …, but the engine evaluates FROM first and SELECT nearly last. That's why you can't use a SELECT alias inside WHERE— it doesn't exist yet.
The logical evaluation order — internalize it and confusing errors start to make sense.
WHERE, sort, then cut to a page.NULL isn't equal to anything — use IS NULL, never = NULL.INNER = rows present in both. LEFT = all of the left, padded with NULL.ON turns a join into a Cartesian product — rows explode.WHERE filters rows before grouping; HAVING filters groups after.SELECT column must appear in GROUP BY.WITH block is a named, readable subquery — chain them instead of nesting five deep.Reads are forgiving; writes are where data gets corrupted. The fix is the transaction — a boundary that says "all of these succeed, or none of them do". ACID is the four guarantees it gives you.
BEGIN … COMMIT; if anything fails, ROLLBACK undoes the whole batch as if it never ran.Atomicity in one picture: the transaction lands as a whole or disappears as a whole.
When transactions run at once, the isolation leveldecides what each one can see of the others' in-flight work. Looser = faster but allows anomalies; stricter = safer but more contention.
Like editing a shared doc: isolation is how much of someone else's half-finished edit you're allowed to see.
The engine doesn't run your SQL literally — it builds a query plan. Most performance work is reading that plan and giving it a faster path. The single biggest lever is the index.
O(n) scan into an O(log n) seek.WHERE, JOIN and ORDER BY on — not every column.Left: the engine reads everything. Right: the B-tree walks a few nodes straight to the match.
INSERT/UPDATE.(user_id, placed_at) helps a filter on user_id alone, but not on placed_at alone.EXPLAIN, don't guess.Normalize by default: one fact, one place, no contradictions. Denormalize deliberately — and only when a measured read problem justifies the duplication you take on.
Transactional systems — orders, accounts, inventory. Correctness and cheap updates beat raw read speed. This is the default.
A hot query joins five tables on every page load. Pre-compute or cache the shape — afteryou've proven the join is the bottleneck.
Every duplicated value is a future bug waiting for an update to miss a copy. Denormalization trades write-pain for read-speed — never for free.
Rule of thumb: normalize until it hurts, then denormalize until it works — with numbers, not vibes.
Almost everything so far is vendor-neutral — it works the same across the popular databases. The main fork in the road is relational vs document: do you split data into linked tables, or store each record as one self-contained blob? Here are the leading engines, what each is good and bad at, and a rule for picking.
users; their orders live in orders and point back with a key. Nothing is duplicated, and a join stitches them together when you need them.Left: facts kept once and linked by a key. Right: the same facts nested inside a single document.
Pro — extremely feature-rich and strict about correctness; handles JSON, full-text search and more without add-ons.
Con — more features means more to learn, and the out-of-the-box settings are tuned cautiously.
Pro — runs on practically every host, is easy to get started with, and is fast for simple read-heavy sites.
Con — historically looser on correctness and has a thinner feature set than PostgreSQL.
Pro — store nested, JSON-shaped records with no fixed schema; quick to start when your data is still changing shape.
Con — no real joins and no engine guarding your relationships, so your app now owns consistency.
Pro — zero setup, runs right inside your app, and is perfect for local apps, prototypes and tests.
Con— only one writer at a time, so it's the wrong tool for a busy multi-user server.
How to choose: reach for PostgreSQL for almost any app with real relationships; pick MySQL if your platform already standardizes on it; choose MongoDB only when your data is genuinely document-shaped and rarely joined; and use SQLite for local apps, prototypes and test suites.
Row stores keep a record together (great for "get this order"); column stores keep each field together, so an aggregate reads only the column it needs.
Rule of thumb:OLTP (Postgres / MySQL) runs the app; OLAP (ClickHouse or a warehouse) answers questions about it. Teams often run both — feeding the warehouse from the app database (that's the ETL/ELT deck).
Put it together: take a flat "everything in one table" design and turn it into keys, relationships and constraints — then walk out with the rules.
NOT NULL, CHECK — into the engine, not just the app.EXPLAIN first; index the columns you filter and join on.EXPLAIN — Postgres and MySQL both have excellent ones"Model the truth once, let the engine guard it, and ask it questions in SQL."
— the whole talk, compressed
Five quick questions on schema design, SQL, transactions and indexes — instant feedback, no sign-in.
Navigate with ← → or scroll · Part 2: Advanced SQL → · back to library