Library
00/08 · ~38 min
GUIDEDECK · for shipping code you can change without fear

Testing & TDD
— proof your code
does what you think.

A 38-minute working session on tests that earn their keep: the pyramid, the anatomy of a good test, the red-green-refactor loop, test doubles, the tools that run it all, and how to write tests that catch bugs instead of cementing them.

~38 MINMIXED TEAMLANGUAGE-AGNOSTIC
SCROLL
01 · Why testing 4 min

A test is a claim about your code
that a machine re-checks for free.

Tests aren't about proving you're right today. They're about the next person — often future-you — being able to change this code without holding their breath. A bug is cheapest to catch the moment you write it, and the cost only climbs from there.

A testan automated check — runs a slice of your code with known inputs and asserts the result matches what you expect. Green means the claim still holds; red means something changed. The whole value is the fast, repeatable re-check — not the one time you ran it by hand.

cost to fix a bug caught while you're typing it.

10×

…once it's merged and someone builds on it.

100×

…once it's in production, paged at 2 a.m.

the bug that ships silently and erodes trust in the data.

What a test suite actually buys you

  • A safety net for change — refactor aggressively; the suite tells you the moment you break a contract.
  • Executable documentation — a good test names the behavior and shows exactly how the unit is meant to be used.
  • Design pressure— if a piece of code is painful to test, that's usually a sign it's too tangled up with other parts (too tightly coupled: changing one thing forces you to change another). The difficulty of writing the test is an early warning that the design needs untangling.
  • Faster feedback — seconds in a test runner instead of minutes clicking through the app.

The goal isn't 100% coverage. It's confidence per minute spent.

02 · The testing pyramid 5 min

Many fast tests,
a few slow ones.

Not all tests cost the same. The pyramid is a budget: push most of your checks down to the cheap, fast layer, and reserve the slow, brittle layer for the handful of journeys that truly need it.

The testing pyramida ratio guide — sorts tests by scope and cost: unit (one piece in isolation) at the wide base, integration (a few pieces wired together) in the middle, and end-to-end (the whole system, like a real user) at the narrow top. Wider = cheaper, faster, and more numerous.
E2E ~10% · slow Integration ~20% Unit ~70% · fast cost · scope ↑

A common rule of thumb is roughly 70% unit, 20% integration, 10% end-to-end — most of your budget at the fast base, thinning out toward the slow, full-system top.

The three layers

  • Unit — one function or class, dependencies replaced. Milliseconds. Pinpoints which line broke.
  • Integration — real collaborators wired together (code + DB, two modules, an HTTP handler). Proves the seams fit.
  • End-to-end — the deployed system driven like a user (browser, API call in, DB out). Proves the whole journey works.

Higher up = more realistic, but slower, flakier, and harder to debug. Confidence has a price; the pyramid keeps the bill sane.

The ice-cream cone (anti-pattern)

Most teams drift into the inverted pyramid: a thick layer of slow end-to-end tests, almost no unit tests. The suite takes 40 minutes, fails randomly, and nobody trusts the red.

  • Slow feedback → people stop running it locally.
  • A failure could be anywhere — debugging is a hunt.
  • Flakiness trains the team to ignore red builds.
A healthy shape

Push logic down: test the pricing rule as a unit, not by clicking through checkout. Keep E2E for a few critical paths — sign-up, checkout, the money path.

  • Fast base runs on every save.
  • A red unit test names the exact broken behavior.
  • E2E guards the journeys, not the arithmetic.

Like proofreading: spell-check every sentence, but only read the whole essay aloud a couple of times.

03 · Anatomy of a good test 5 min

Arrange, Act, Assert
one behavior, named in plain English.

A test you can read in five seconds is a test people will keep passing. Structure it the same way every time, name it after the behavior, and assert one thing.

AAAArrange · Act · Assert — is the skeleton of almost every test. Arrange the inputs and the system under test, Actby calling the one thing you're testing, then Assert the outcome. Three visual blocks, in that order, every time.
Vague & tangled
test("test1", () => { const c = new Cart() c.add(book); c.add(pen) expect(c.total()).toBe(30) c.applyCoupon("SAVE10") // second behavior… expect(c.total()).toBe(27) expect(c.items.length).toBe(2) // …and a third }) // what broke? the name won't tell you.
One behavior, AAA, named
test("applies a 10% coupon to the cart total", () => { // Arrange const cart = cartWith(book30) // Act cart.applyCoupon("SAVE10") // Assert expect(cart.total()).toBe(27) })

Name it after the behavior

The test name is documentation. A reader should know what broke without opening the body. Describe the scenario and the expected outcome — never test1.

does X when Yreturns empty for no matchesthrows on a negative amount
// scenario → expected outcome "rejects login after 3 failed attempts" "rounds tax to the nearest cent" "returns 404 when the order is missing"

The FIRST properties

F
Fast
milliseconds, not minutes
I
Isolated
no shared state or order
R
Repeatable
same result every run
S
Self-checking
pass/fail, no eyeballing
T
Timely
written with the code

Break any one and the suite gets slower, flakier, or quietly useless. Isolated and Repeatable are the ones teams violate most — usually via shared databases and the clock.

04 · Test-Driven Development 6 min

RedGreen Refactor.

TDD inverts the usual order: write the failing test first, make it pass with the simplest code, then clean up. The test isn't an afterthought — it's the spec you're coding toward.

TDDTest-Driven Development — is a tight loop: write a small failing test for the next behavior (red), write just enough code to make it pass (green), then refactor with the test holding you safe. Repeat in minutes-long cycles. You never write production code without a failing test demanding it.
RED failing test GREEN make it pass REFACTOR clean up

Small loops, minutes each. The test goes red before the code exists, then drives it green.

The three laws (Uncle Bob)

1Write no production code until you have a failing test that requires it.
2Write only enough test to fail — a compile error counts as failing.
3Write only enough production code to pass the failing test — nothing more.

The discipline forces tiny steps. You're never more than a few minutes from a known-good state.

Why test-first changes the design

Code-first — test bolted on after
// written, then "how do I even test this?" function checkout() { const now = Date.now() // hidden clock const db = new Postgres() // hidden dependency // 60 lines of mixed I/O + rules… } // untestable without a real DB and the right date.
Test-first — seams fall out naturally
// the test you wished you could write FIRST: test("10% off orders over $100", () => { const price = discount(120) expect(price).toBe(108) }) // forces a pure fn: inputs in, result out. function discount(total) { /* no clock, no DB */ }

TDD's real gift isn't the tests — it's that hard-to-test code is hard-to-test for a reason, and writing the test first surfaces that pain while it's still cheap to fix.

05 · Test doubles 6 min

Stand-ins for the
collaborators you don't want to call.

To test a unit in isolation you replace its real dependencies — the database, the payment gateway, the clock — with controllable stand-ins. "Mock" gets used for all of them, but the five types do different jobs.

A test double a stand-in for a real dependency — lets you run the system under test (SUT) without its real collaborators. The word mock is colloquially used for all of them, but precisely: dummy, stub, spy, mock, and fake are distinct tools.
SUT OrderService depends on Real PaymentGateway slow · network · charges money ✕ not in a unit test FakePaymentGateway in-memory · instant · controllable ✓ injected for the test

Inject the double in place of the real gateway. The SUT can't tell the difference — that's what dependency injection buys you.

State vs. behavior verification

  • Stubs & fakes feed the SUT data — you then assert on the result (state verification). Prefer these.
  • Mocks & spies record calls — you assert the SUT called a collaborator a certain way (behavior verification).
  • Behavior verification couples the test to how the code works, not what it produces. Use it only when the call isthe behavior (e.g. "sends the email").

Dummy — a placeholder that's never used

// passed only to satisfy a signature; never called const logger = {} as Logger new Invoice(items, logger) // this test never logs
Use when
A constructor or method demands an argument the path under test doesn't touch.
Watch for
If it does get called, you wanted a stub.

Stub — returns canned answers

// pre-programmed responses, no logic of its own const rates: RateApi = { usdTo: () => 0.8 // always returns 0.8, whatever the input } expect(convert(100, rates)).toBe(80)
Use when
You need the collaborator to provide a specific input so you can assert on the result.
It's really
State verification — you check the output, not the call.

Spy — a stub that records how it was called

const sent = [] const mailer: Mailer = { send: (to, body) => { sent.push(to) } // records } notifyAll(users, mailer) expect(sent).toEqual(["a@x.io", "b@x.io"])
Use when
You want to inspect calls after the fact, without pre-setting expectations.
Vs. mock
A spy records and you assert later; a mock asserts as it goes.

Mock — pre-set expectations, fails if unmet

// the expectation IS the assertion const gateway = mock<Payment>() expect(gateway.charge).toHaveBeenCalledWith(2000, "usd") // the test fails if charge() wasn't called exactly so
Use when
The interaction itself is the behavior — "it must charge the card once".
Watch for
Over-mocking welds tests to implementation. See Part 6.

Fake — a real, lightweight implementation

// working logic, just not production-grade class InMemoryUserRepo implements UserRepo { private rows = new Map() save(u) { this.rows.set(u.id, u) } findById(id) { return this.rows.get(id) } }
Use when
You want realistic behavior (a DB, a clock) without the real thing's cost.
Best of
Fast like a stub, but genuinely works — great for integration tests.
06 · Coverage, flakiness & what not to test 5 min

Coverage is a map,
not the territory.

A green 100% can still ship bugs, and a flaky suite is worse than no suite. Knowing what not to test is as important as knowing what to.

Code coverage the percent of lines/branches a run executed — tells you what your tests touched, never whether they asserted anything useful. It's a great way to find untested code and a terrible target to chase to 100%.
Covered, but proves nothing
test("runs without error", () => { calculateTax(order) // 100% line coverage… }) // no assertion → tax could be -∞ and this stays green.
Asserts the behavior that matters
test("charges 8% tax, rounded to the cent", () => { expect(calculateTax({ subtotal: 49.99 })) .toBe(4.00) }) // same line covered — but now it actually checks the rule.
Flakiness

The tests that cry wolf

A flaky test passes and fails without any code change. Each false alarm teaches the team to ignore red — and a suite nobody trusts is dead weight.

  • TimeDate.now(), timezones, sleeps. Inject a clock.
  • Order & shared state — a leftover DB row from another test. Isolate.
  • Async & races — fixed sleep(500) instead of waiting for a condition.
  • Randomness & network — unseeded random, live third-party calls.

Like a smoke alarm that chirps at random — people rip the battery out, and then it can't warn you.

What NOT to test

Test behavior, not plumbing

  • The language / framework— don't test that a getter returns what you set.
  • Private internals — test through the public surface; private methods change freely.
  • Trivial glue — a one-line passthrough with no logic earns little.
  • Implementation details— "calls helper X then Y" cements how, blocking refactors.

Test observable behavior at a stable boundary: given these inputs and this state, the unit produces this output or this visible effect. If a correctness-preserving refactor turns a test red, that test was checking how, not what.

Aim coverage at a floor that catches obvious gaps (say 70–80% on changed code), then spend the saved energy on better assertions and edge cases — boundaries, empties, errors — not chasing the last untestable percent.

07 · The tooling landscape 4 min

Pick a runner for the base,
a browser driver for the top.

The tools split along the pyramid. A test runner is the workhorse for the unit and integration layers; an end-to-end (E2E) frameworkdrives a real browser for the thin top. You generally pick one of each and move on — don't agonize.

A test runner the program that runs your tests — finds your test files, executes each one, and prints a green/red report (most also measure coverage). An E2E framework goes further: it opens a real browser it can click and type into, so you can check a whole user journey. Below are the leading options, with one upside and one catch each.
E2E few · slow Playwright Cypress Integration some Jest · Vitest JUnit · PyTest Unit many · fast Jest · Vitest JUnit · PyTest one runner + one driver

The same runner usually covers your unit and integration tests; an E2E framework sits on top for the few full-journey tests.

It's really two choices

  • A runner — one per language. JavaScript teams pick Jest or Vitest; Java uses JUnit; Python uses PyTest. This is where most of your tests live.
  • An E2E framework — Playwright or Cypress. It launches a browser and acts like a user, so it only earns its keep on a few critical paths.
  • Whatever you choose, the habits from the rest of this deck — AAA, naming, red-before-green — matter far more than the brand on the tool.

Unit & integration runners

JavaScript · Jest

Jest

The long-time default for JS/TS. One install gives you the runner, assertions, mocking, and coverage together.

Pro Batteries-included, with the biggest community and the most tutorials.

Con Slower on large suites; native ES-module and TypeScript setup can be fiddly.

JavaScript · Vitest

Vitest

The modern challenger, built on the Vite bundler. Its API mirrors Jest, so moving over is mostly a find-and-replace.

Pro Very fast watch mode; TypeScript and ES modules work out of the box.

Con Younger, so a few Jest plugins and edge cases aren't covered yet.

Java · JUnit

JUnit

The de-facto standard for Java (JUnit 5). Every IDE and build tool (Maven, Gradle) speaks it natively.

Pro Rock-solid, universal, and deeply integrated with the Java toolchain.

Con Bare-bones on its own — you'll add AssertJ for readable checks and Mockito for doubles.

Python · PyTest

PyTest

The go-to for Python. You write a plain assert and it produces a detailed failure message for you.

Pro Almost no boilerplate; powerful fixtures and a huge plugin ecosystem.

Con Fixture and plugin "magic" can get hard to trace in a big test suite.

End-to-end browser drivers

E2E · Playwright

Playwright

Microsoft's browser-automation tool. It controls the browser from the outside and waits for elements automatically, which keeps tests steadier.

Pro Fast, tests all three browser engines (Chromium, Firefox, WebKit), with a great trace viewer for debugging.

Con So many features that the API takes a little while to learn.

E2E · Cypress

Cypress

Runs your test inside the browser, with a live time-travel view of every step — loved for its developer experience.

Pro Superb debugging: watch each step replay and inspect the page at any point.

Con Weaker multi-tab and cross-browser support; parallel runs lean on a paid dashboard.

How to choose — For a fresh JS/TS project (especially one already using Vite), start with Vitest; stay on Jest if you have a big existing Jest suite. Java teams use JUnit 5 (plus Mockito); Python teams use PyTest. For end-to-end, default to Playwright for speed, cross-browser coverage, and CI; choose Cypress if your team values its in-browser debugging and you only ship to Chromium.
08 · Help vs. cement · recap 3 min

A test should catch the bug —
not freeze it in place.

The most dangerous test is the one that asserts the current wrong behavior. It turns a bug into a "requirement" nobody dares to change. Assert the intended behavior.

Cements the bug
// the code is wrong; the test "documents" the wrong number test("shipping for 0 items", () => { expect(shippingFee([])).toBe(5) }) // empty carts shouldn't ship at all — but now green // "proves" the bug. Fixing it breaks the suite.
Catches it (write this from the bug report)
// a failing test that names the intended behavior test("empty cart has no shipping fee", () => { expect(shippingFee([])).toBe(0) }) // red now → fix shippingFee → green. The test is the spec.

The regression-test rule: every bug fix starts with a test that fails because of the bug. Reproduce, then fix — so it can never come back unnoticed.

1Optimize for confidence, not coverage. A green number proves execution, not correctness.
2Shape the pyramid. Many fast unit tests, fewer integration, a few E2E on the paths that matter.
3One behavior, AAA, named in English. A test you can read in five seconds is a test that survives.
4Test what, not how. Assert observable behavior; let refactors stay green. Reach for mocks sparingly.
5Red before green.Whether full TDD or just bug fixes, see the test fail first — or you don't know it tests anything.

Keep going

  • Test-Driven Development by Example — Kent Beck
  • Growing Object-Oriented Software, Guided by Tests— Freeman & Pryce
  • xUnit Test Patterns — Gerard Meszaros (the double vocabulary)
  • "Test Desiderata"— Kent Beck's 12 properties of good tests

One sentence to remember

"Test until fear turns to boredom."

— Kent Beck

Knowledge check

Did it stick?

Five quick questions on the pyramid, AAA, TDD, test doubles, and coverage — instant feedback, no sign-in.

Rate this deck
be the first

Navigate with ← → or scroll · Part 2: Advanced Testing → · back to library