The incident
A single function had grown to three hundred lines. It read files, called an API, computed business logic, mutated a list its caller still held, wrote to a database, and read the current date in the middle to decide what to compute. It worked — until the API was slow, a job was retried, and the numbers came out doubled. Nobody could debug it, because there was no way to test any one part without running the whole thing against live systems.
It was not one bug. It was five ordinary properties violated at once — impure (logic tangled with I/O, untestable), mutating shared state, not idempotent (a retry doubled revenue), eager (loaded gigabytes to count a few rows), and uncomposed (one monolith instead of named stages). Each is cheap to avoid and expensive to retrofit.
The five properties that keep code alive
Each failure in the chapter maps to one missing property. Tap each to see the trap and the discipline.
Lab · The retry that doubled revenue
A daily job loads one day's revenue total. The scheduler retries late jobs; engineers re-kick failures. So the operation will run more than once for the same day. Press Run the job repeatedly on each strategy and watch what happens.
Append vs. delete-then-insert
Strategy A · append
target.append(row) — adds a deltaStrategy B · delete-then-insert
replace_day(target, row) — establishes end stateIn a system where retries are routine, an operation that cannot run twice is a bug that has not happened yet. Design a transform to establish the correct end state for its slice — delete-then-insert or upsert keyed on that slice — not to append a delta to whatever was there. "It worked when it ran once" is not the bar; "it is correct after running an unknown number of times" is.
Lab · Why nobody could test it
A function is pure when its output depends only on its inputs and it causes no side effects. Purity is what makes logic testable without touching files, clocks, or networks. Compare the two shapes — then see which one a unit test can reach.
The impure tangle vs. the pure core
✗ impure — logic tangled with I/O
def process(path):
data = open(path).read() # I/O
today = date.today() # hidden clock
rows = parse(data)
total = sum(r.amt for r in rows
if r.day == today)
db.write(total) # side effect
return total
✓ pure core, impure shell
# pure: input -> output, no I/O
def total_for(rows, day):
return sum(r.amt for r in rows
if r.day == day)
# thin shell does the I/O
def process(path, day):
rows = parse(read(path))
total = total_for(rows, day) # testable
db.write(total)
Can a unit test verify the business logic without a file, a database, or a real calendar?
Push side effects to the edges. Keep a pure core that maps inputs to outputs, wrapped in a thin impure shell that does the reading and writing. The clock is an input, not a global — pass the date in. Now the logic is reachable by a test that needs nothing but a list and an expected number.
Checklist for future transformations
Each question maps to one of the five properties. Tap to check.
Your code survives production when its logic is pure and reachable by a test, it mutates nothing its caller holds, it is idempotent so a retry cannot corrupt, it is lazy so memory tracks output not input, and it is composed of small named stages a newcomer can read top to bottom. The 300-line function failed all five at once. None of the fixes are clever — they are just decisions made before production forced them.