The incident
A revenue report had run every morning for a year: total the amount of every order that had not been refunded. It was written the way the requirement reads in English — orders whose ID is not in the set of refunded IDs.
SELECT SUM(amount)
FROM orders
WHERE order_id NOT IN (SELECT order_id FROM refunds);
Not a dip — zero. Every order vanished from the total at once, and nothing errored. The job ran green. The cause was a single row in refunds with a NULL order_id — a refund recorded before its order reference was populated. One NULL zeroed out an entire company's reported revenue.
NOT IN does not mean what its English implies when the value set contains a NULL. SQL does not use two truth values — it uses three: TRUE, FALSE, and UNKNOWN. A comparison against NULL returns UNKNOWN, and a WHERE clause keeps a row only when its condition is TRUE. Run the incident yourself below.
Lab · Reproduce the zero-revenue bug
Here is the exact dataset: three real orders, and a refunds table containing one genuine refund plus that one poisoned NULL row. Run each query and watch what the engine actually does.
orders
| order_id | amount |
|---|---|
| 101 | $300 |
| 102 | $150 |
| 103 | $250 |
refunds
| refund_id | order_id |
|---|---|
| 1 | 102 |
| 2 | NULL |
Run it against the poisoned data
EXISTS asks "does any row match?" — strictly TRUE or FALSE, never UNKNOWN. The NULL refund matches no order, so it excludes no order; a real refund still matches and excludes correctly. The query does what the English always meant. For "exclude rows that have a match," prefer NOT EXISTS — for correctness, not speed.
Lab · Why the row gets dropped
Trace the logic. order_id NOT IN (102, NULL) expands to order_id <> 102 AND order_id <> NULL. That last comparison is UNKNOWN for every row — and TRUE AND UNKNOWN is UNKNOWN. Explore how UNKNOWN propagates.
Three-valued logic
The rule that prevents the most bugs: a WHERE clause keeps a row only when its condition is TRUE. UNKNOWN is discarded as silently as FALSE — which is exactly how one NULL dropped every row.
Lab · The join that multiplied the money
NULL produces wrong answers by dropping rows. Joins produce them by adding rows. Join an orders table (one row per order) to its order-lines (many rows per order), then SUM the order amount — and watch the total inflate.
Fan-out / duplicate amplification
Order 101 has 3 lines; order 102 has 1. Toggle the join and watch o.amount get duplicated once per matching line.
orders (the "one")
order_lines (the "many")
Rows after the join
A measure has a grain, and you can only safely sum it at its own grain. The order total lives at the order grain; drag it across a join to the line grain and it multiplies by the number of lines. Know the cardinality of every join before you aggregate across it.
Before trusting a query in production
Run this on any query whose number someone will make a decision from. Tap to check.
You know your query is safe because you stopped trusting the English reading of the SQL and started reasoning in the engine's terms: three truth values, not two; joins as claims about cardinality you can be wrong about; measures summed only at their own grain; and a query plan you have actually read. The query that worked until it didn't worked by luck. A safe query works by construction.