Library
00/08 · ~40 min
GUIDEDECK · for the data behind every app

Databases & SQL
the model, the queries,
and why they go slow.

A 40-minute working session on relational databases — from why we model data as tables, through the SQL you write every day, transactions and ACID, indexes and query plans, which engine to pick, and how to design a schema that won't fight you later.

~40 MINMIXED TEAMSQL · VENDOR-NEUTRAL
SCROLL
01 · Why relational 4 min

A database is the one place
your data has to stay true.

Code gets rewritten constantly; the data outlives all of it. The relational model wins because it makes one promise nothing else does: a single, trustworthy copy of your data that many programs can read and write at the same time without stepping on each other. Think of a bank's ledger — every teller and ATM works against the same balances, and nobody is allowed to leave it half-updated.

Relational database data kept in tables of rows and columns, where related tables are linked together (the word "relational" comes from those links). It follows a fixed schema — the agreed list of tables, their columns, and the type of value each column holds — and you ask questions of it in SQL. The engine itself (Postgres, MySQL, SQLite) enforces the rules — correct types, no duplicates, valid links — so every program that connects sees the same consistent picture.

What the engine does for you

  • Persistence — survives restarts, crashes and power loss.
  • Concurrency — hundreds of users read and write at once without corrupting each other.
  • Integrity — constraints reject bad data before it lands, not after.
  • Declarative queries — you say what you want; the engine figures out how to fetch it.
Web app Mobile API Analytics job DATABASE one source of truth

Many programs, one shared truth — the database, not any one app, owns the data.

Why not just keep it in the app?

Data living inside code / files
// a file each service writes to its own way [ { "id": 1, "total": "42.0", "user": "Mara" }, { "id": 1, "total": "oops" } // dup id, bad type ] // no rules · two writers race · who is "Mara"?
Data in a relational engine
CREATE TABLE orders ( id BIGINT PRIMARY KEY, total NUMERIC(10,2) NOT NULL, user_id BIGINT REFERENCES users(id) ); // types, uniqueness & references enforced for every writer

The schema turns "please be careful" into rules the engine cannot let you break.

02 · Tables, keys & relationships 6 min

Rows hold facts; keys connect
them without copying data.

A good schema stores every fact once and links facts together with keys (a key is just a column whose job is to identify or point at a row). Get the keys right and the model explains itself — you can see how everything connects just by reading the tables, because the links live in the data instead of only in your head.

Primary key the column that uniquely identifies a row. A foreign keyis a column that points at another table's primary key. Together they let one row reference another instead of duplicating its data — and the engine guarantees the reference is always valid.
Primary key

One row, one identity

Unique and never null. Prefer a surrogate key (an auto id) over a natural one like email — natural values change, ids shouldn't.

Foreign key

A pointer with a guarantee

order.user_id → users.id. The engine refusesan order for a user that doesn't exist — no orphan rows.

Constraints

Rules at the data layer

NOT NULL, UNIQUE, CHECK, DEFAULT — invariants enforced for every writer, not just the careful ones.

The three relationship shapes

  • One-to-many — a user has many orders. The foreign key lives on the many side (orders.user_id).
  • One-to-one — a user has one profile. Foreign key plus a UNIQUE constraint.
  • Many-to-many — orders contain many products; products appear in many orders. Needs a junction table.

Like a library: one borrower card links to many loans; each loan points back to one book and one borrower.

users id PK name email orders id PK user_id FK placed_at order_items order_id FK product_id FK products id PK price 1—∞

order_items is the junction table: it turns orders↔products (many-to-many) into two clean one-to-many links.

CREATE TABLE users ( id BIGSERIAL PRIMARY KEY, email TEXT UNIQUE NOT NULL ); CREATE TABLE products ( id BIGSERIAL PRIMARY KEY, price NUMERIC(10,2) CHECK (price >= 0) ); CREATE TABLE orders ( id BIGSERIAL PRIMARY KEY, user_id BIGINT REFERENCES users(id) ); CREATE TABLE order_items ( order_id BIGINT REFERENCES orders(id), product_id BIGINT REFERENCES products(id), qty INT NOT NULL CHECK (qty > 0), PRIMARY KEY (order_id, product_id) // composite key );
03 · The SQL you write 7 min

SELECT what, WHERE filtered,
JOIN related, GROUP to summarize.

SQL is declarative: you describe the result set you want, not the loops to build it. Four clauses cover the vast majority of reads — and they run in an order that surprises most newcomers.

SQLStructured Query Language — the standard language for relational data. A SELECT reads rows; WHERE filters them; JOIN combines tables by matching keys; GROUP BY collapses many rows into per-group summaries.

Written order ≠ run order

You write SELECT … FROM … WHERE … GROUP BY …, but the engine evaluates FROM first and SELECT nearly last. That's why you can't use a SELECT alias inside WHERE— it doesn't exist yet.

1 · FROM / JOIN 2 · WHERE 3 · GROUP BY 4 · HAVING 5 · SELECT build the row source, filter rows, fold into groups, filter groups, THEN shape columns.

The logical evaluation order — internalize it and confusing errors start to make sense.

Read and filter the rows you want

SELECT id, email, created_at FROM users WHERE country = 'PT' AND created_at >= '2026-01-01' ORDER BY created_at DESC LIMIT 50; // never SELECT * for big reads
Mental model
Take the table, keep rows matching WHERE, sort, then cut to a page.
Watch for
NULL isn't equal to anything — use IS NULL, never = NULL.

Combine tables by matching keys

SELECT u.name, o.id, o.placed_at FROM users u JOIN orders o ON o.user_id = u.id // inner: matches only WHERE o.placed_at >= '2026-06-01'; // LEFT JOIN keeps every user, NULLs where no order exists
INNER vs LEFT
INNER = rows present in both. LEFT = all of the left, padded with NULL.
Watch for
A missing or wrong ON turns a join into a Cartesian product — rows explode.

Collapse rows into per-group answers

SELECT u.country, COUNT(*) AS orders, SUM(o.total) AS revenue FROM orders o JOIN users u ON u.id = o.user_id GROUP BY u.country HAVING SUM(o.total) > 10000; // filter the GROUPS
WHERE vs HAVING
WHERE filters rows before grouping; HAVING filters groups after.
Rule
Every non-aggregated SELECT column must appear in GROUP BY.

Name a step, then build on it

WITH big_spenders AS ( SELECT user_id, SUM(total) AS spent FROM orders GROUP BY user_id HAVING SUM(total) > 5000 ) SELECT u.email, b.spent FROM big_spenders b JOIN users u ON u.id = b.user_id;
CTE
A WITH block is a named, readable subquery — chain them instead of nesting five deep.
It's really
Refactoring for SQL: small named steps beat one unreadable monster query.
04 · Writing data · transactions · ACID 6 min

A transaction turns many writes
into one all-or-nothing step.

Reads are forgiving; writes are where data gets corrupted. The fix is the transaction — a boundary that says "all of these succeed, or none of them do". ACID is the four guarantees it gives you.

Transaction a group of statements committed as a single unit. Wrap related writes in BEGIN … COMMIT; if anything fails, ROLLBACK undoes the whole batch as if it never ran.
A
Atomicity
all writes or none
C
Consistency
constraints always hold
I
Isolation
concurrent txns don't collide
D
Durability
committed = survives a crash

The classic example: moving money

Two separate writes
UPDATE accounts SET bal = bal - 100 WHERE id = 1; // ── crash here ── UPDATE accounts SET bal = bal + 100 WHERE id = 2; // $100 vanished. account 1 charged, account 2 never paid.
One atomic transaction
BEGIN; UPDATE accounts SET bal = bal - 100 WHERE id = 1; UPDATE accounts SET bal = bal + 100 WHERE id = 2; COMMIT; // crash before COMMIT → both roll back
BEGIN UPDATE account 1 UPDATE account 2 COMMIT ✓ both ROLLBACK ✕ none

Atomicity in one picture: the transaction lands as a whole or disappears as a whole.

Isolation, briefly

When transactions run at once, the isolation leveldecides what each one can see of the others' in-flight work. Looser = faster but allows anomalies; stricter = safer but more contention.

  • Read Committed — you only see committed data (the common default).
  • Repeatable Read — re-reading a row gives the same answer within the txn.
  • Serializable — as if transactions ran one at a time. Safest, slowest.

Like editing a shared doc: isolation is how much of someone else's half-finished edit you're allowed to see.

05 · Indexes & query plans 6 min

Slow queries are usually
a full scan that wants an index.

The engine doesn't run your SQL literally — it builds a query plan. Most performance work is reading that plan and giving it a faster path. The single biggest lever is the index.

Index a sorted lookup structure (usually a B-tree) over one or more columns. It lets the engine jump straight to matching rows instead of reading the whole table — turning an O(n) scan into an O(log n) seek.

Scan vs. seek

  • Sequential scan — read every row, throw most away. Fine for tiny tables; brutal on millions.
  • Index seek — walk a sorted tree to the rows you want, touching a handful of pages.
  • Index the columns you WHERE, JOIN and ORDER BY on — not every column.
SEQ SCAN reads all rows INDEX SEEK root < key ≥ key row ✓ few hops to the row

Left: the engine reads everything. Right: the B-tree walks a few nodes straight to the match.

Read the plan, then fix it

No index — Seq Scan
EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 42; Seq Scan on orders (rows=2,000,000) Filter: user_id = 42 actual time: 480 ms // reads every row
With index — Index Scan
CREATE INDEX ON orders (user_id); EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 42; Index Scan using orders_user_id_idx actual time: 0.3 ms // ~1500x faster
  • Every index must be updated on every write — more indexes = slower INSERT/UPDATE.
  • Indexes take disk and memory; unused ones are pure cost.
  • Order matters in composite indexes: (user_id, placed_at) helps a filter on user_id alone, but not on placed_at alone.
  • Index for your real query patterns — measure with EXPLAIN, don't guess.
06 · Normalization vs denormalization 4 min

Store each fact once
then break the rule on purpose.

Normalize by default: one fact, one place, no contradictions. Denormalize deliberately — and only when a measured read problem justifies the duplication you take on.

Normalization structuring tables so each fact is stored exactly once, removing redundancy. Denormalization is the deliberate reverse: copying or pre-joining data to make reads faster, accepting that you must now keep the copies in sync.
Denormalized by accident
// user details copied into every order row orders(id, user_name, user_email, user_city, total) // Mara changes her email → update 4,000 rows. // miss one → the same user has two emails. truth forks.
Normalized — one home per fact
users(id, name, email, city) // the fact lives here orders(id, user_id, total) // just a reference // email changes in ONE place. every order sees it.
Normalize when

Writes & truth matter

Transactional systems — orders, accounts, inventory. Correctness and cheap updates beat raw read speed. This is the default.

Denormalize when

Reads dominate & hurt

A hot query joins five tables on every page load. Pre-compute or cache the shape — afteryou've proven the join is the bottleneck.

The cost

You own consistency now

Every duplicated value is a future bug waiting for an update to miss a copy. Denormalization trades write-pain for read-speed — never for free.

Rule of thumb: normalize until it hurts, then denormalize until it works — with numbers, not vibes.

07 · The tooling landscape 4 min

Which engine should
you actually reach for?

Almost everything so far is vendor-neutral — it works the same across the popular databases. The main fork in the road is relational vs document: do you split data into linked tables, or store each record as one self-contained blob? Here are the leading engines, what each is good and bad at, and a rule for picking.

Relational vs document — a relational database splits data into linked tables with a fixed schema (PostgreSQL, MySQL, SQLite). A document database stores each record as one self-contained, JSON-like document with no fixed shape (MongoDB). Relational keeps the relationships and rules for you; document hands you flexibility but makes your application responsible for keeping things consistent.

The two shapes, side by side

  • Relational — a user lives in users; their orders live in orders and point back with a key. Nothing is duplicated, and a join stitches them together when you need them.
  • Document— the user and their orders are nested inside one document, so a single read returns the whole thing — but the same user's data can drift out of sync if it's copied around.
  • Neither is "better". Relational fits data full of relationships; document fits self-contained records you mostly read whole.
RELATIONAL · linked tables users id PK name orders id PK user_id FK total link DOCUMENT · one nested record { name: "Mara", orders: [ { id: 1, total: 42 }, { id: 2, total: 19 } ] } read it all in one hit

Left: facts kept once and linked by a key. Right: the same facts nested inside a single document.

The engines worth knowing

PostgreSQL

The capable default

Pro — extremely feature-rich and strict about correctness; handles JSON, full-text search and more without add-ons.
Con — more features means more to learn, and the out-of-the-box settings are tuned cautiously.

MySQL / MariaDB

Everywhere & simple

Pro — runs on practically every host, is easy to get started with, and is fast for simple read-heavy sites.
Con — historically looser on correctness and has a thinner feature set than PostgreSQL.

MongoDB · document

Flexible JSON store

Pro — store nested, JSON-shaped records with no fixed schema; quick to start when your data is still changing shape.
Con — no real joins and no engine guarding your relationships, so your app now owns consistency.

SQLite

A database in a file

Pro — zero setup, runs right inside your app, and is perfect for local apps, prototypes and tests.
Con— only one writer at a time, so it's the wrong tool for a busy multi-user server.

How to choose: reach for PostgreSQL for almost any app with real relationships; pick MySQL if your platform already standardizes on it; choose MongoDB only when your data is genuinely document-shaped and rarely joined; and use SQLite for local apps, prototypes and test suites.

When the job is analytics — OLAP & columnar

Everything above is OLTP online transaction processing: lots of small reads and writes, a row at a time (running the app). Analytics is the opposite — OLAP, online analytical processing: scanning and aggregating millions to billionsof rows over a few columns ("revenue by country last quarter"). For that you want a columnar database that stores each column together and reads only the ones a query actually touches.
ROW STORE · OLTP id · name · total id · name · total id · name · total grab one whole row fast COLUMN STORE · OLAP id name total sum ONE column over billions

Row stores keep a record together (great for "get this order"); column stores keep each field together, so an aggregate reads only the column it needs.

The analytics engines

  • ClickHouse — blazing-fast open-source columnar OLAP; ingests huge event/log streams and aggregates them in milliseconds. The go-to for self-hosted, real-time analytics.
  • Cloud warehousesBigQuery, Snowflake, Redshift: fully managed, separate storage from compute, scale to petabytes.
  • DuckDB— "SQLite for analytics": an in-process columnar engine for crunching data files locally.
  • Vector databasespgvector, Qdrant, Pinecone, Weaviate, Chroma: store embeddings(numeric meaning) for semantic / similarity search — the backbone of AI retrieval (see the RAG & Vector Search deck).

Rule of thumb:OLTP (Postgres / MySQL) runs the app; OLAP (ClickHouse or a warehouse) answers questions about it. Teams often run both — feeding the warehouse from the app database (that's the ETL/ELT deck).

08 · Modeling a real schema · recap 3 min

From a spreadsheet smell
to a schema that scales.

Put it together: take a flat "everything in one table" design and turn it into keys, relationships and constraints — then walk out with the rules.

One wide table — the spreadsheet trap
CREATE TABLE signups ( email TEXT, // no key — dups allowed course TEXT, // "SQL", "sql ", "Sql" teacher TEXT, // repeated on every row price TEXT // "42", "free", "$42" ); // rename a course → hunt every row. no integrity at all.
Modeled — keys, refs, types, checks
CREATE TABLE students ( id BIGSERIAL PRIMARY KEY, email TEXT UNIQUE ); CREATE TABLE courses ( id BIGSERIAL PRIMARY KEY, name TEXT, price NUMERIC(8,2) ); CREATE TABLE enrollments ( student_id BIGINT REFERENCES students(id), course_id BIGINT REFERENCES courses(id), PRIMARY KEY (student_id, course_id) // no double-enroll );

Five rules to walk out with

1The schema is the contract. Push rules — keys, types, NOT NULL, CHECK — into the engine, not just the app.
2One fact, one place. Normalize by default; let foreign keys do the linking.
3Wrap related writes in a transaction. All-or-nothing is the difference between a bug and lost money.
4Slow query? Read the plan. EXPLAIN first; index the columns you filter and join on.
5Denormalize with numbers, not vibes. Trade write-pain for read-speed only when a measurement demands it.

Keep going

  • Use The Index, Luke! — Markus Winand (indexing, free online)
  • Designing Data-Intensive Applications — Martin Kleppmann
  • SQL Performance Explained — Markus Winand
  • Your engine's docs on EXPLAIN — Postgres and MySQL both have excellent ones

One sentence to remember

"Model the truth once, let the engine guard it, and ask it questions in SQL."

— the whole talk, compressed

Knowledge check

Did it stick?

Five quick questions on schema design, SQL, transactions and indexes — instant feedback, no sign-in.

Rate this deck
be the first

Navigate with ← → or scroll · Part 2: Advanced SQL → · back to library