On October 2, 2006, Netflix launched the Netflix Prize, an open competition offering one million US dollars to the first team that could improve the accuracy of its movie recommendation system, Cinematch, by 10 percent. Netflix released a training dataset of roughly 100 million anonymized movie ratings from about 480,000 subscribers and challenged anyone in the world to beat its in-house algorithm. The contest is documented in “The Netflix Prize,” a paper by Netflix’s James Bennett and Stan Lanning presented at the KDD Cup 2007 workshop.
Accuracy was measured by root mean squared error (RMSE) on a held-out test set. Cinematch scored about 0.9514 on the quiz data; to claim the grand prize a team had to push that down to roughly 0.8572, a 10 percent improvement. A leaderboard showed everyone’s progress in near real time, which turned the contest into a years-long public race.
The Netflix Prize mattered far beyond Netflix. It showed that a company could crowdsource a hard machine learning problem to the entire world, and that doing so could produce better results than an internal team. The winning solutions popularized ensemble methods and matrix factorization for recommendations, and the contest became the template for the public ML competition, the model later scaled up by platforms like Kaggle and by benchmark challenges such as ImageNet.
For business readers, the prize is an early case study in open innovation: a clearly defined metric, a public dataset, and a cash incentive can mobilize thousands of outside experts. It also carried a hard lesson about data privacy, told in a companion entry on the dataset’s de-anonymization.