Databases & SQL · GuideDeck

01 · Why relational 4 min

A database is the one place
your data has to stay true.

Code gets rewritten constantly; the data outlives all of it. The relational model wins because it makes one promise nothing else does: a single, trustworthy copy of your data that many programs can read and write at the same time without stepping on each other. Think of a bank's ledger — every teller and ATM works against the same balances, and nobody is allowed to leave it half-updated.

Relational database — data kept in tables of rows and columns, where related tables are linked together (the word "relational" comes from those links). It follows a fixed schema — the agreed list of tables, their columns, and the type of value each column holds — and you ask questions of it in SQL. The engine itself (Postgres, MySQL, SQLite) enforces the rules — correct types, no duplicates, valid links — so every program that connects sees the same consistent picture.

What the engine does for you

Persistence — survives restarts, crashes and power loss.
Concurrency — hundreds of users read and write at once without corrupting each other.
Integrity — constraints reject bad data before it lands, not after.
Declarative queries — you say what you want; the engine figures out how to fetch it.

Many programs, one shared truth — the database, not any one app, owns the data.

Why not just keep it in the app?

Data living inside code / files

// a file each service writes to its own way [ { "id": 1, "total": "42.0", "user": "Mara" }, { "id": 1, "total": "oops" } // dup id, bad type ] // no rules · two writers race · who is "Mara"?

Data in a relational engine

CREATE TABLE orders ( id BIGINT PRIMARY KEY, total NUMERIC(10,2) NOT NULL, user_id BIGINT REFERENCES users(id) ); // types, uniqueness & references enforced for every writer

The schema turns "please be careful" into rules the engine cannot let you break.

02 · Tables, keys & relationships 6 min

Rows hold facts; keys connect
them without copying data.

A good schema stores every fact once and links facts together with keys (a key is just a column whose job is to identify or point at a row). Get the keys right and the model explains itself — you can see how everything connects just by reading the tables, because the links live in the data instead of only in your head.

Primary key — the column that uniquely identifies a row. A foreign keyis a column that points at another table's primary key. Together they let one row reference another instead of duplicating its data — and the engine guarantees the reference is always valid.

Primary key

One row, one identity

Unique and never null. Prefer a surrogate key (an auto id) over a natural one like email — natural values change, ids shouldn't.

Foreign key

A pointer with a guarantee

order.user_id → users.id. The engine refusesan order for a user that doesn't exist — no orphan rows.

Constraints

Rules at the data layer

NOT NULL, UNIQUE, CHECK, DEFAULT — invariants enforced for every writer, not just the careful ones.

The three relationship shapes

One-to-many — a user has many orders. The foreign key lives on the many side (orders.user_id).
One-to-one — a user has one profile. Foreign key plus a UNIQUE constraint.
Many-to-many — orders contain many products; products appear in many orders. Needs a junction table.

Like a library: one borrower card links to many loans; each loan points back to one book and one borrower.

order_items is the junction table: it turns orders↔products (many-to-many) into two clean one-to-many links.

CREATE TABLE users ( id BIGSERIAL PRIMARY KEY, email TEXT UNIQUE NOT NULL ); CREATE TABLE products ( id BIGSERIAL PRIMARY KEY, price NUMERIC(10,2) CHECK (price >= 0) ); CREATE TABLE orders ( id BIGSERIAL PRIMARY KEY, user_id BIGINT REFERENCES users(id) ); CREATE TABLE order_items ( order_id BIGINT REFERENCES orders(id), product_id BIGINT REFERENCES products(id), qty INT NOT NULL CHECK (qty > 0), PRIMARY KEY (order_id, product_id) // composite key );

03 · The SQL you write 7 min

SELECT what, WHERE filtered,
JOIN related, GROUP to summarize.

SQL is declarative: you describe the result set you want, not the loops to build it. Four clauses cover the vast majority of reads — and they run in an order that surprises most newcomers.

SQL — Structured Query Language — the standard language for relational data. A SELECT reads rows; WHERE filters them; JOIN combines tables by matching keys; GROUP BY collapses many rows into per-group summaries.

Written order ≠ run order

You write SELECT … FROM … WHERE … GROUP BY …, but the engine evaluates FROM first and SELECT nearly last. That's why you can't use a SELECT alias inside WHERE— it doesn't exist yet.

The logical evaluation order — internalize it and confusing errors start to make sense.

Read and filter the rows you want

SELECT id, email, created_at FROM users WHERE country = 'PT' AND created_at >= '2026-01-01' ORDER BY created_at DESC LIMIT 50; // never SELECT * for big reads

Mental model

Take the table, keep rows matching WHERE, sort, then cut to a page.

Watch for

NULL isn't equal to anything — use IS NULL, never = NULL.

Combine tables by matching keys

SELECT u.name, o.id, o.placed_at FROM users u JOIN orders o ON o.user_id = u.id // inner: matches only WHERE o.placed_at >= '2026-06-01'; // LEFT JOIN keeps every user, NULLs where no order exists

INNER vs LEFT

INNER = rows present in both. LEFT = all of the left, padded with NULL.

Watch for

A missing or wrong ON turns a join into a Cartesian product — rows explode.

Collapse rows into per-group answers

SELECT u.country, COUNT(*) AS orders, SUM(o.total) AS revenue FROM orders o JOIN users u ON u.id = o.user_id GROUP BY u.country HAVING SUM(o.total) > 10000; // filter the GROUPS

WHERE vs HAVING

WHERE filters rows before grouping; HAVING filters groups after.

Rule

Every non-aggregated SELECT column must appear in GROUP BY.

Name a step, then build on it

WITH big_spenders AS ( SELECT user_id, SUM(total) AS spent FROM orders GROUP BY user_id HAVING SUM(total) > 5000 ) SELECT u.email, b.spent FROM big_spenders b JOIN users u ON u.id = b.user_id;

CTE

A WITH block is a named, readable subquery — chain them instead of nesting five deep.

It's really

Refactoring for SQL: small named steps beat one unreadable monster query.

04 · Writing data · transactions · ACID 6 min

A transaction turns many writes
into one all-or-nothing step.

Reads are forgiving; writes are where data gets corrupted. The fix is the transaction — a boundary that says "all of these succeed, or none of them do". ACID is the four guarantees it gives you.

Transaction — a group of statements committed as a single unit. Wrap related writes in BEGIN … COMMIT; if anything fails, ROLLBACK undoes the whole batch as if it never ran.

Atomicity

all writes or none

Consistency

constraints always hold

Isolation

concurrent txns don't collide

Durability

committed = survives a crash

The classic example: moving money

Two separate writes

UPDATE accounts SET bal = bal - 100 WHERE id = 1; // ── crash here ── UPDATE accounts SET bal = bal + 100 WHERE id = 2; // $100 vanished. account 1 charged, account 2 never paid.

One atomic transaction

BEGIN; UPDATE accounts SET bal = bal - 100 WHERE id = 1; UPDATE accounts SET bal = bal + 100 WHERE id = 2; COMMIT; // crash before COMMIT → both roll back

Atomicity in one picture: the transaction lands as a whole or disappears as a whole.

Isolation, briefly

When transactions run at once, the isolation leveldecides what each one can see of the others' in-flight work. Looser = faster but allows anomalies; stricter = safer but more contention.

Read Committed — you only see committed data (the common default).
Repeatable Read — re-reading a row gives the same answer within the txn.
Serializable — as if transactions ran one at a time. Safest, slowest.

Like editing a shared doc: isolation is how much of someone else's half-finished edit you're allowed to see.

05 · Indexes & query plans 6 min

Slow queries are usually
a full scan that wants an index.

The engine doesn't run your SQL literally — it builds a query plan. Most performance work is reading that plan and giving it a faster path. The single biggest lever is the index.

Index — a sorted lookup structure (usually a B-tree) over one or more columns. It lets the engine jump straight to matching rows instead of reading the whole table — turning an O(n) scan into an O(log n) seek.

Scan vs. seek

Sequential scan — read every row, throw most away. Fine for tiny tables; brutal on millions.
Index seek — walk a sorted tree to the rows you want, touching a handful of pages.
Index the columns you WHERE, JOIN and ORDER BY on — not every column.

Left: the engine reads everything. Right: the B-tree walks a few nodes straight to the match.

Read the plan, then fix it

No index — Seq Scan

EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 42; Seq Scan on orders (rows=2,000,000) Filter: user_id = 42 actual time: 480 ms // reads every row

With index — Index Scan

CREATE INDEX ON orders (user_id); EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 42; Index Scan using orders_user_id_idx actual time: 0.3 ms // ~1500x faster

Every index must be updated on every write — more indexes = slower INSERT/UPDATE.
Indexes take disk and memory; unused ones are pure cost.
Order matters in composite indexes: (user_id, placed_at) helps a filter on user_id alone, but not on placed_at alone.
Index for your real query patterns — measure with EXPLAIN, don't guess.

06 · Normalization vs denormalization 4 min

Store each fact once —
then break the rule on purpose.

Normalize by default: one fact, one place, no contradictions. Denormalize deliberately — and only when a measured read problem justifies the duplication you take on.

Normalization — structuring tables so each fact is stored exactly once, removing redundancy. Denormalization is the deliberate reverse: copying or pre-joining data to make reads faster, accepting that you must now keep the copies in sync.

Denormalized by accident

// user details copied into every order row orders(id, user_name, user_email, user_city, total) // Mara changes her email → update 4,000 rows. // miss one → the same user has two emails. truth forks.

Normalized — one home per fact

users(id, name, email, city) // the fact lives here orders(id, user_id, total) // just a reference // email changes in ONE place. every order sees it.

Normalize when

Writes & truth matter

Transactional systems — orders, accounts, inventory. Correctness and cheap updates beat raw read speed. This is the default.

Denormalize when

Reads dominate & hurt

A hot query joins five tables on every page load. Pre-compute or cache the shape — afteryou've proven the join is the bottleneck.

The cost

You own consistency now

Every duplicated value is a future bug waiting for an update to miss a copy. Denormalization trades write-pain for read-speed — never for free.

Rule of thumb: normalize until it hurts, then denormalize until it works — with numbers, not vibes.

07 · The tooling landscape 4 min

Which engine should
you actually reach for?

Almost everything so far is vendor-neutral — it works the same across the popular databases. The main fork in the road is relational vs document: do you split data into linked tables, or store each record as one self-contained blob? Here are the leading engines, what each is good and bad at, and a rule for picking.

Relational vs document — a relational database splits data into linked tables with a fixed schema (PostgreSQL, MySQL, SQLite). A document database stores each record as one self-contained, JSON-like document with no fixed shape (MongoDB). Relational keeps the relationships and rules for you; document hands you flexibility but makes your application responsible for keeping things consistent.

The two shapes, side by side

Relational — a user lives in users; their orders live in orders and point back with a key. Nothing is duplicated, and a join stitches them together when you need them.
Document— the user and their orders are nested inside one document, so a single read returns the whole thing — but the same user's data can drift out of sync if it's copied around.
Neither is "better". Relational fits data full of relationships; document fits self-contained records you mostly read whole.

Left: facts kept once and linked by a key. Right: the same facts nested inside a single document.

The engines worth knowing

PostgreSQL

The capable default

Pro — extremely feature-rich and strict about correctness; handles JSON, full-text search and more without add-ons.
Con — more features means more to learn, and the out-of-the-box settings are tuned cautiously.

MySQL / MariaDB

Everywhere & simple

Pro — runs on practically every host, is easy to get started with, and is fast for simple read-heavy sites.
Con — historically looser on correctness and has a thinner feature set than PostgreSQL.

MongoDB · document

Flexible JSON store

Pro — store nested, JSON-shaped records with no fixed schema; quick to start when your data is still changing shape.
Con — no real joins and no engine guarding your relationships, so your app now owns consistency.

SQLite

A database in a file

Pro — zero setup, runs right inside your app, and is perfect for local apps, prototypes and tests.
Con— only one writer at a time, so it's the wrong tool for a busy multi-user server.

How to choose: reach for PostgreSQL for almost any app with real relationships; pick MySQL if your platform already standardizes on it; choose MongoDB only when your data is genuinely document-shaped and rarely joined; and use SQLite for local apps, prototypes and test suites.

When the job is analytics — OLAP & columnar

Everything above is OLTP — online transaction processing: lots of small reads and writes, a row at a time (running the app). Analytics is the opposite — OLAP, online analytical processing: scanning and aggregating millions to billionsof rows over a few columns ("revenue by country last quarter"). For that you want a columnar database that stores each column together and reads only the ones a query actually touches.

Row stores keep a record together (great for "get this order"); column stores keep each field together, so an aggregate reads only the column it needs.

The analytics engines

ClickHouse — blazing-fast open-source columnar OLAP; ingests huge event/log streams and aggregates them in milliseconds. The go-to for self-hosted, real-time analytics.
Cloud warehouses — BigQuery, Snowflake, Redshift: fully managed, separate storage from compute, scale to petabytes.
DuckDB— "SQLite for analytics": an in-process columnar engine for crunching data files locally.
Vector databases — pgvector, Qdrant, Pinecone, Weaviate, Chroma: store embeddings(numeric meaning) for semantic / similarity search — the backbone of AI retrieval (see the RAG & Vector Search deck).

Rule of thumb:OLTP (Postgres / MySQL) runs the app; OLAP (ClickHouse or a warehouse) answers questions about it. Teams often run both — feeding the warehouse from the app database (that's the ETL/ELT deck).

08 · Modeling a real schema · recap 3 min

From a spreadsheet smell
to a schema that scales.

Put it together: take a flat "everything in one table" design and turn it into keys, relationships and constraints — then walk out with the rules.

One wide table — the spreadsheet trap

CREATE TABLE signups ( email TEXT, // no key — dups allowed course TEXT, // "SQL", "sql ", "Sql" teacher TEXT, // repeated on every row price TEXT // "42", "free", "$42" ); // rename a course → hunt every row. no integrity at all.

Modeled — keys, refs, types, checks

CREATE TABLE students ( id BIGSERIAL PRIMARY KEY, email TEXT UNIQUE ); CREATE TABLE courses ( id BIGSERIAL PRIMARY KEY, name TEXT, price NUMERIC(8,2) ); CREATE TABLE enrollments ( student_id BIGINT REFERENCES students(id), course_id BIGINT REFERENCES courses(id), PRIMARY KEY (student_id, course_id) // no double-enroll );

Five rules to walk out with

1The schema is the contract. Push rules — keys, types, NOT NULL, CHECK — into the engine, not just the app.

2One fact, one place. Normalize by default; let foreign keys do the linking.

3Wrap related writes in a transaction. All-or-nothing is the difference between a bug and lost money.

4Slow query? Read the plan. EXPLAIN first; index the columns you filter and join on.

5Denormalize with numbers, not vibes. Trade write-pain for read-speed only when a measurement demands it.

Keep going

Use The Index, Luke! — Markus Winand (indexing, free online)
Designing Data-Intensive Applications — Martin Kleppmann
SQL Performance Explained — Markus Winand
Your engine's docs on EXPLAIN — Postgres and MySQL both have excellent ones

One sentence to remember

"Model the truth once, let the engine guard it, and ask it questions in SQL."

— the whole talk, compressed

Knowledge check

Did it stick?

Five quick questions on schema design, SQL, transactions and indexes — instant feedback, no sign-in.

Rate this deck

be the first

Navigate with ← → or scroll · Part 2: Advanced SQL → · back to library

Databases & SQLthe model, the queries,and why they go slow.

A database is the one placeyour data has to stay true.

What the engine does for you

Why not just keep it in the app?

Rows hold facts; keys connectthem without copying data.

One row, one identity

A pointer with a guarantee

Rules at the data layer

The three relationship shapes

SELECT what, WHERE filtered,JOIN related, GROUP to summarize.

Written order ≠ run order

Read and filter the rows you want

Combine tables by matching keys

Collapse rows into per-group answers

Name a step, then build on it

A transaction turns many writesinto one all-or-nothing step.

The classic example: moving money

Isolation, briefly

Slow queries are usuallya full scan that wants an index.

Scan vs. seek

Read the plan, then fix it

Store each fact once —then break the rule on purpose.

Writes & truth matter

Reads dominate & hurt

You own consistency now

Which engine shouldyou actually reach for?

The two shapes, side by side

The engines worth knowing

The capable default

Everywhere & simple

Flexible JSON store

A database in a file

When the job is analytics — OLAP & columnar

The analytics engines

From a spreadsheet smellto a schema that scales.

Five rules to walk out with

Keep going

One sentence to remember

Did it stick?

Databases & SQL
the model, the queries,
and why they go slow.

A database is the one place
your data has to stay true.

Rows hold facts; keys connect
them without copying data.

SELECT what, WHERE filtered,
JOIN related, GROUP to summarize.

A transaction turns many writes
into one all-or-nothing step.

Slow queries are usually
a full scan that wants an index.

Store each fact once —
then break the rule on purpose.

Which engine should
you actually reach for?

From a spreadsheet smell
to a schema that scales.