The Data Engineer's Field Guide
Interactive Companion

Five incidents.
Five disciplines.

A hands-on companion to the book. Each chapter opens with a real production failure — then hands you the controls to reproduce it, understand it, and fix it yourself.

Every chapter answers one question of the form "how do I know my ___ is correct?" The recurring lesson is fitness for purpose: a structure built for one workload is a liability in another, and the skill is a feel for which one belongs where.

The recurring question

01Is my data model correct?Grain, facts & dimensions
02Is my query safe?NULLs, joins, windows
03Will my code survive prod?Purity, idempotency
04Can my pipeline recover?Backfills, late data
05Can my data be trusted?Contracts, metrics

The chapters

01

The Wrong Data Model Cost Us Six Months

A warehouse stored one row per order. Finance asked for revenue by product line — and the detail had been thrown away at load time. The fix was a six-month rebuild.

How do I know my data model is correct?
Open lab →
02

The Query That Worked Until It Didn't

A revenue report ran green for a year, then reported zero. One NULL order_id in the refunds table, and NOT IN silently excluded every row.

How do I know my query is safe?
Open lab →
03

The 300-Line Transformation Nobody Could Debug

A transform mixed pure logic with I/O, mutated data it didn't own, and double-counted on every re-run. Untestable by construction.

Will my code survive production?
Open lab →
04

The Backfill That Corrupted Three Years of Data

A re-run appended instead of replacing. Three years of history doubled, silently, because the load path was not idempotent.

How do I know my pipeline can recover?
Open lab →
05

The Dashboard Was Wrong and Nobody Noticed

One metric, three reasonable definitions, three different numbers. The dashboard rendered confidently — and was trusted for months.

How do I know my data can be trusted?
Open lab →

Appendix

Data Structures & Algorithms, by Practice

Twelve runnable lessons — linear and binary search, stacks, queues, dicts, sets, recursion, and the functional toolkit (itertools, functools) — each tied to a real pipeline use, with editable Python you run in the browser.

The foundations the chapters lean on
Open appendix →

How each lab works

1 · Reproduce

Trigger the failure

Run the original decision or query and watch it break exactly the way it did in production.

2 · Understand

See the mechanism

The framework from the chapter, made visual — grain, cardinality, idempotency, in motion.

3 · Fix

Apply the discipline

Make the deliberate choice the chapter teaches, and watch the same question now answer correctly.