The incident
An executive dashboard showed "active users." The company ran on it — quarterly targets, headcount, the board deck. It had been stable and plausible for a year, which is exactly why nobody watched it: a number that has been right for a year earns a trust that makes it dangerous.
Upstream, a different team changed how sessions were recorded — a small, sensible change, shipped without announcement. It quietly altered what "a session" meant, and therefore what "active user" counted. The number drifted just enough to be wrong, not enough to look broken, and settled at a new, believable baseline.
Types matched, schema loaded, dashboard rendered, number looked plausible. Decisions were made on it for weeks. It was caught by an analyst who happened to cross-check against billing. The expensive loss was not the metric — it was the trust. Once one number was known wrong and undetected, every number became suspect.
Two failures compounded, neither a bug: the pipeline trusted its source unconditionally, and "active user" was never given a single owned definition. You cannot detect that a number is wrong if you never agreed what right was. The thesis of the whole book, in its sharpest form: a requirements failure wearing the costume of a data-quality incident.
Lab · One metric, three honest answers
Here is one week of identical raw activity. Three reasonable people define "active user" three reasonable ways. Pick a definition and watch the number change — none of them is a bug.
"Active users" — choose the definition
All three, from the same raw data:
None of these numbers is wrong — they answer different questions that share a name. The failure is not the SQL; it is shipping a metric with no single, written, owned definition. The engineer's responsibility is to force the definition into the open and pin it to one owner before the dashboard exists, so "active users" means one thing everywhere.
Lab · The silent drift, and the guard that catches it
The upstream meaning of a session changed. Toggle whether the pipeline validates its input at the boundary — and watch the difference between failing loud and drifting silent.
Validate at the boundary · fail loud
upstream source
changed silently
your pipeline
dashboard
wrong number
A check at the boundary on the data's statistical shape (not just its types) would have caught a drift that schema validation never could. The deeper fix is a data contract: a versioned agreement with the producing team covering schema, semantics, and change notification — so a "small internal change" upstream cannot silently redefine your metric.
Lab · The dimensions of data quality
"Quality" is not one thing. The easy dimensions are checked by default; the hard ones are where silent wrongness hides. Tap each.
Six dimensions
Before trusting a number on a dashboard
Tap to check each off.
Your data can be trusted when every metric has one written definition and a named owner, sources are untrusted until validated at the boundary on shape as well as type, critical feeds have contracts, numbers are reconciled against independent sources, checks fail loud, and you have elicited the real requirement behind the request. The dashboard was wrong because of an unstated assumption and an unresolved definition — not complex technology. That is the thesis of the whole book: data failures are requirements failures in disguise.