The Notebook Anti-Pattern

The “notebook anti-pattern” is the name practitioners give to a cluster of failure modes that the Jupyter notebook format quietly encourages. The format is wonderful for exploration: write a cell, run it, see the output, adjust, repeat. But the same properties that make it good for poking at data make it treacherous for producing software whose results others (or one’s future self) can trust. The critique is not that notebooks are bad, but that their defaults lead careful people into sloppy outcomes.

The root issue is hidden, mutable state. A notebook keeps a long-lived kernel in memory, and cells can be run in any order, any number of times. A variable defined in cell 20 might be used in cell 5; a cell that was edited after it ran still shows stale output; a name might exist only because an earlier, since-deleted cell created it. The visible document and the actual machine state can drift apart entirely. The execution-count numbers beside each cell often reveal the damage, jumping around out of sequence, evidence that the notebook was never run cleanly from top to bottom.

This was measured, not merely asserted. In “A Large-Scale Study About Quality and Reproducibility of Jupyter Notebooks,” presented at the Mining Software Repositories conference in 2019, Pimentel, Murta, Braganholo, and Freire analyzed roughly 1.4 million notebooks from GitHub (https://2019.msrconf.org/details/msr-2019-papers/36/A-Large-scale-Study-about-Quality-and-Reproducibility-of-Jupyter-Notebooks). Their abstract notes the “growing criticism that the way notebooks are being used leads to unexpected behavior, encourage poor coding practices, and that their results can be hard to reproduce.” When they tried to re-execute the notebooks, only a small fraction ran without errors, and a smaller fraction still produced the same outputs they originally claimed.

The problems compound beyond the kernel. The notebook file is JSON that mixes code, prose, output, and binary image data, so a one-line code change can produce a sprawling, unreadable diff, and two people editing the same notebook collide in ways plain source never would. This breaks the normal discipline of version control and code review. Notebooks also tempt authors to keep everything global, to skip functions and tests, and to leave dead cells lying around, because the medium rewards quick iteration over structure.

The mitigations are now well known. Restart the kernel and run all cells before trusting a result, so the document and state agree. Keep notebooks for exploration and move reusable logic into imported, tested modules. Use tools that strip outputs or convert notebooks to scripts for cleaner diffs, and pin environments so the dependencies are reproducible. The anti-pattern endures not because the fixes are unknown but because the format’s frictionless exploration is exactly what makes ignoring them so easy, a durable tension at the heart of how modern data science and machine learning code gets written.

Sources

Related