Part 2 of Testing & TDD. The intro gave you the pyramid, the TDD loop, and the five test doubles. This session goes deep: generators that hunt edge cases, contracts between services, testing your tests, and the dark arts of time, concurrency, and the things you don't own.
This is Part 2 — it builds straight on the intro deck, so we assume you're fluent with example-based tests and the arrange-act-assert shape. The leap here: instead of asserting an output for one hand-picked input, you state a property that must hold for allinputs, then let the framework manufacture hundreds of them — including the cruel ones you'd never think to type.
fast-check, Hypothesis, and jqwik carry the idea into JS, Python, and the JVM.Example tests probe where you already looked. Generators probe where you didn't — then shrink the failure to a one-line repro.
NaN.Generate → run → on failure, shrink toward the minimal counterexample, then report it with its seed.
You rarely have an oracle that recomputes the right answer. These five archetypes are how experienced testers find properties without one.
Anything that serializes, parses, compresses, or encrypts should come back unchanged. The single most productive property to write — it catches lossy edge cases (trailing whitespace, Unicode normalization, -0) for free.
After sort(xs): the result is the same length, is a permutation of the input, and each element ≤ the next. You assert the shape of correctness without recomputing the answer.
Normalizing a path, trimming a string, applying a migration twice — doing it again should change nothing. A classic source of real-world bugs in "retry-safe" code.
Test the fast implementation against a slow, naïve one you trust — or against the old code during a refactor. Also: commutativity (add(a,b) === add(b,a)) and comparison to a reference library.
When there's no oracle at all: assert a relationship. Searching for "cat" should return at least as many hits as "black cat". Scaling every price by 2 should double the cart total. Powerful for ML and search.
Model-based / stateful testing generates random sequences of commands against your system and a tiny in-memory model, asserting they agree at every step — fast-check's fc.commands, Hypothesis' RuleBasedStateMachine. It finds ordering and concurrency bugs a single-call property never would.
Two services agree on an API. Service B changes a field; Service A breaks in production — and no unit test caught it, because each side mocked the other with its own assumptions. Contract testing pins those assumptions to a shared, verifiable artifact.
The consumer's expectations become a file the provider must honor — checked in two separate pipelines, never live together.
can-i-deployqueries the broker: "is every consumer of this version still compatible?" before you ship.Match on type, not exact value — the contract pins the shape the consumer depends on, and tolerates new data it ignores.
Each interaction names a precondition. On the provider side you wire that state name to a fixture (seed the row, stub the dep) so verification is deterministic.
The broker (PactFlow or self-hosted) tracks which consumer and provider versions verified against each other. can-i-deploy gates the release on a green matrix.
Pact covers message contracts (Kafka, queues) too. For schema-only guarantees, OpenAPI / Protobuf + buf breaking-change checks complement — they verify shape, not consumer intent.
Coverage tells you a line ran. It says nothing about whether your test would notice if that line were wrong. Mutation testing measures exactly that — by deliberately breaking your code and checking whether your suite screams.
< becomes <=, + becomes -, a return true flips, a statement is deleted. It runs your suite against each one. If a test fails, the mutant is killed (good — your tests caught the bug). If every test still passes, the mutant survived — a real assertion gap.age >= 17 survives → no test exercises the boundary at 17. That survivor is a precise instruction for the test to write next.
killed / (total − equivalent). A survivor is a place your code could be wrong and no test would tell you. It is a far stronger signal than line coverage — and far slower to compute.i < n vs i != n in a loop that only ever increments produces identical results, so no test can kill it. Equivalent mutants are undecidable in general and are the reason 100% is the wrong target — they inflate the denominator with unkillable noise. Chase survivors that represent real behavior, not the last few percent.It's slow — every mutant reruns (a slice of) your suite, so a full run can take hours. Make it practical: run it on changed files onlyin CI (Stryker's incremental mode, PIT's history), set a sane threshold (a break below, say, 60% on touched code), and treat the survivor list as a to-do, not a vanity metric.
Most suite rot isn't bad assertions — it's unmanaged test data and hidden coupling. A 40-line setup block, a shared mutable fixture, an order-dependent pass: these are the things that make a green suite untrustworthy. Architect them deliberately.
aVipCustomer()) and factories (Factory Boy, fishery, FactoryBot) that also persist to a DB.One factory, many variants. The test names only the field under test — everything else is a trusted default.
Scoped setup/teardown — a temp DB, a logged-in client, a seeded row. pytest's @fixture with function/module/session scope is the model: declare a dependency, the runner wires it in and tears it down. Risk: a session-scoped fixture mutated by one test leaks into the next.
Produces fresh, independent instances on demand — optionally persisted. Prefer factories over shared fixtures for entities: each test owns its data, so there's nothing to leak. Reserve fixtures for expensive infrastructure (the container, the connection pool).
Every flake traces to non-determinism. Remove the source; don't paper over it with retry(3).
pytest-randomly) to expose hidden coupling.The intro deck introduced the five doubles. Now the hard cases: code that depends on the wall clock, on race conditions, on a third-party API. The unifying move is the same every time — turn the uncontrollable thing into an injected dependency you command from the test.
Date.now() / datetime.now()sprinkled in business logic makes "expires in 30 days" untestable. Inject a Clock and advance it by hand — or freeze it with fake timers (vi.useFakeTimers, sinon), freezegun (Python), or a fixed java.time.Clock.
sleep(500) is the #1 cause of flake — too short it races, too long it crawls. Await a condition (the element, the event, the queue depth). For races, use a deterministic scheduler or controllable fake timers so you decide interleavings.
Wrap the third-party SDK behind your interface, then test against a fake of that interface. Plus real-ish options: Testcontainers (real Postgres/Kafka in Docker), WireMock / MSW for HTTP, record-replay cassettes (VCR).
Inject the clock and the "expires" logic is just data — real clock in prod, frozen clock in the test.
Run the same contract suite against the fake and the real thing. If both pass, your fake is faithful — and fast unit tests stay trustworthy.
One representative tool per technique, per ecosystem — with the honest trade-off and the moment to reach for it. All are mature and actively maintained as of 2026.
Pro: first-class TS types, excellent shrinking, integrates with Jest/Vitest, stateful fc.commands.
Con: writing good generators for complex domains takes practice.
Reach for it on any JS/TS pure logic: parsers, pricing, encoders.
Pro: the gold standard — rich strategies, a saved example database that replays past failures, RuleBasedStateMachine.
Con: generation can be slow; tunesettings deadlines.
Reach for it as the default for any Python logic worth testing hard.
Pro: a JUnit 5 engine, so it drops into existing Java/Kotlin suites; integrated shrinking.
Con: smaller community; more boilerplate than Hypothesis.
Reach for it when you're already on JUnit 5 and want PBT alongside.
Pro: the de-facto consumer-driven standard; clients in every major language; HTTP + messaging; a broker with can-i-deploy.
Con: a real learning curve; provider states and broker ops add moving parts.
Reach for it when independent teams ship services on separate cadences.
Pro: producer-driven, deeply integrated with Spring; generates stubs for consumers.
Con: Spring-centric; awkward outside the JVM ecosystem.
Reach for it in an all-Spring estate where the provider leads.
Pro: cheap breaking-change detection on REST/gRPC; no consumer test needed.
Con: verifies shape, not what a consumer actually relies on.
Reach for it as a fast first line; pair with Pact for intent.
Pro: mature, great HTML reports, incrementalmode for CI, also .NET & Scala flavors.
Con: slow on big suites without scoping to changed files.
Reach for it to audit a critical module's tests, on the diff in CI.
Pro: fast (bytecode mutation), the JVM standard, history-driven incremental runs.
Con: Java-centric config; report tuning takes effort.
Reach for it as the default mutation tool on any JVM project.
Pro: simple to drop in; clear survivor diffs to act on.
Con: notably slow; best scoped to a package, not the whole repo.
Reach for it to harden the tests around one high-stakes Python module.
Pro:cross-browser, auto-waiting (kills most flake), traces & video, parallel sharding, first-class API + component testing.
Con: still the slowest, most expensive band — keep it thin.
Reach for it for a handful of critical user-journey smoke tests — not feature coverage.
Pro: a real Postgres / Kafka / Redis in Docker, disposable per run — high fidelity, no shared env.
Con: needs Docker; slower than an in-memory fake.
Reach for it to verify the real adapter (and your fake) against the genuine engine.
Pro: stub external HTTP at the network edge; MSW shares mocks between tests and the browser.
Con: stubs encode your assumptions — pair with contract tests for the ones you own.
Reach for it to isolate from third-party HTTP you can't run locally.
A property test and a contract test, end to end — the two techniques most likely to change how your team works this quarter.
No example test would've tried 1 ÷ 3. The shrinker hands you the exact minimal repro — a classic remainder bug.
can-i-deploy.Five questions on property-based, contract, and mutation testing, plus the hard-stuff techniques — instant feedback, no sign-in.
Navigate with ← → or scroll · back to library