The unreasonable effectiveness of data

In 2009 Alon Halevy, Peter Norvig, and Fernando Pereira, all at Google, published “The Unreasonable Effectiveness of Data” in IEEE Intelligent Systems. The short essay, whose title nods to a famous physics article on the effectiveness of mathematics, made a deliberately provocative argument about how to build AI systems.

The authors observed that the biggest successes in language-related machine learning were statistical speech recognition and statistical machine translation, tasks where huge amounts of real-world input-output data exist “in the wild.” Their central claim, widely quoted ever since, is that “invariably, simple models and a lot of data trump more elaborate models based on less data.” Rather than chase elegant theories of language, they urged researchers to “embrace complexity and make use of the best ally we have: the unreasonable effectiveness of data.”

This was a clear statement of the data-first worldview, written years before the formal scaling laws of the 2020s put numbers on the same intuition. It captured a shift in the field’s center of gravity, away from carefully engineered models and toward gathering and exploiting ever larger datasets. The essay drew on Google’s own experience releasing a trillion-word corpus and watching simple methods trained on it outperform fancier alternatives.

For business readers, the lesson is durable: the quality and quantity of your data often matter more than the sophistication of your algorithm. Much of the later success of large language models can be read as this 2009 thesis taken to its logical extreme.