A 38-minute working session on tests that earn their keep: the pyramid, the anatomy of a good test, the red-green-refactor loop, test doubles, the tools that run it all, and how to write tests that catch bugs instead of cementing them.
Tests aren't about proving you're right today. They're about the next person — often future-you — being able to change this code without holding their breath. A bug is cheapest to catch the moment you write it, and the cost only climbs from there.
cost to fix a bug caught while you're typing it.
…once it's merged and someone builds on it.
…once it's in production, paged at 2 a.m.
the bug that ships silently and erodes trust in the data.
The goal isn't 100% coverage. It's confidence per minute spent.
Not all tests cost the same. The pyramid is a budget: push most of your checks down to the cheap, fast layer, and reserve the slow, brittle layer for the handful of journeys that truly need it.
A common rule of thumb is roughly 70% unit, 20% integration, 10% end-to-end — most of your budget at the fast base, thinning out toward the slow, full-system top.
Higher up = more realistic, but slower, flakier, and harder to debug. Confidence has a price; the pyramid keeps the bill sane.
Most teams drift into the inverted pyramid: a thick layer of slow end-to-end tests, almost no unit tests. The suite takes 40 minutes, fails randomly, and nobody trusts the red.
Push logic down: test the pricing rule as a unit, not by clicking through checkout. Keep E2E for a few critical paths — sign-up, checkout, the money path.
Like proofreading: spell-check every sentence, but only read the whole essay aloud a couple of times.
A test you can read in five seconds is a test people will keep passing. Structure it the same way every time, name it after the behavior, and assert one thing.
The test name is documentation. A reader should know what broke without opening the body. Describe the scenario and the expected outcome — never test1.
Break any one and the suite gets slower, flakier, or quietly useless. Isolated and Repeatable are the ones teams violate most — usually via shared databases and the clock.
TDD inverts the usual order: write the failing test first, make it pass with the simplest code, then clean up. The test isn't an afterthought — it's the spec you're coding toward.
Small loops, minutes each. The test goes red before the code exists, then drives it green.
The discipline forces tiny steps. You're never more than a few minutes from a known-good state.
TDD's real gift isn't the tests — it's that hard-to-test code is hard-to-test for a reason, and writing the test first surfaces that pain while it's still cheap to fix.
To test a unit in isolation you replace its real dependencies — the database, the payment gateway, the clock — with controllable stand-ins. "Mock" gets used for all of them, but the five types do different jobs.
Inject the double in place of the real gateway. The SUT can't tell the difference — that's what dependency injection buys you.
A green 100% can still ship bugs, and a flaky suite is worse than no suite. Knowing what not to test is as important as knowing what to.
A flaky test passes and fails without any code change. Each false alarm teaches the team to ignore red — and a suite nobody trusts is dead weight.
Date.now(), timezones, sleeps. Inject a clock.sleep(500) instead of waiting for a condition.Like a smoke alarm that chirps at random — people rip the battery out, and then it can't warn you.
Test observable behavior at a stable boundary: given these inputs and this state, the unit produces this output or this visible effect. If a correctness-preserving refactor turns a test red, that test was checking how, not what.
Aim coverage at a floor that catches obvious gaps (say 70–80% on changed code), then spend the saved energy on better assertions and edge cases — boundaries, empties, errors — not chasing the last untestable percent.
The tools split along the pyramid. A test runner is the workhorse for the unit and integration layers; an end-to-end (E2E) frameworkdrives a real browser for the thin top. You generally pick one of each and move on — don't agonize.
The same runner usually covers your unit and integration tests; an E2E framework sits on top for the few full-journey tests.
The long-time default for JS/TS. One install gives you the runner, assertions, mocking, and coverage together.
Pro Batteries-included, with the biggest community and the most tutorials.
Con Slower on large suites; native ES-module and TypeScript setup can be fiddly.
The modern challenger, built on the Vite bundler. Its API mirrors Jest, so moving over is mostly a find-and-replace.
Pro Very fast watch mode; TypeScript and ES modules work out of the box.
Con Younger, so a few Jest plugins and edge cases aren't covered yet.
The de-facto standard for Java (JUnit 5). Every IDE and build tool (Maven, Gradle) speaks it natively.
Pro Rock-solid, universal, and deeply integrated with the Java toolchain.
Con Bare-bones on its own — you'll add AssertJ for readable checks and Mockito for doubles.
The go-to for Python. You write a plain assert and it produces a detailed failure message for you.
Pro Almost no boilerplate; powerful fixtures and a huge plugin ecosystem.
Con Fixture and plugin "magic" can get hard to trace in a big test suite.
Microsoft's browser-automation tool. It controls the browser from the outside and waits for elements automatically, which keeps tests steadier.
Pro Fast, tests all three browser engines (Chromium, Firefox, WebKit), with a great trace viewer for debugging.
Con So many features that the API takes a little while to learn.
Runs your test inside the browser, with a live time-travel view of every step — loved for its developer experience.
Pro Superb debugging: watch each step replay and inspect the page at any point.
Con Weaker multi-tab and cross-browser support; parallel runs lean on a paid dashboard.
The most dangerous test is the one that asserts the current wrong behavior. It turns a bug into a "requirement" nobody dares to change. Assert the intended behavior.
The regression-test rule: every bug fix starts with a test that fails because of the bug. Reproduce, then fix — so it can never come back unnoticed.
"Test until fear turns to boredom."
— Kent Beck
Five quick questions on the pyramid, AAA, TDD, test doubles, and coverage — instant feedback, no sign-in.
Navigate with ← → or scroll · Part 2: Advanced Testing → · back to library