Learning Representations by Back-Propagating Errors

“Learning Representations by Back-Propagating Errors” was published in Nature in October 1986 by David Rumelhart, Geoffrey Hinton, and Ronald Williams. It is the paper that brought backpropagation into the mainstream of neural network research and helped end the long winter that had followed the perceptron’s limitations.

The problem the paper solved was how to train a network with hidden layers - layers of neurons between the input and the output. A single perceptron is easy to train, but stacking layers makes the network far more powerful while raising a hard question: when the final answer is wrong, how do you know which of the many internal weights to blame? Backpropagation answers this by using calculus. It computes how much each weight contributed to the error and propagates that error signal backward through the network, layer by layer, adjusting every weight a little in the direction that reduces the mistake.

What made the paper land was not pure novelty - related ideas had appeared earlier, including Paul Werbos’s 1974 thesis - but clarity, a compelling demonstration, and timing. The authors showed that hidden layers trained this way learn useful internal representations of the data on their own, rather than features hand-designed by a programmer. That insight, that networks can discover their own features, is the heart of what would later be called deep learning.

The honest note is that backpropagation alone was not enough to make deep networks work in 1986. Training deep networks remained slow and unreliable for two more decades, held back by limited data, limited computing power, and problems like vanishing gradients. The algorithm only reached its full potential once GPUs, large datasets, and refinements arrived in the late 2000s and 2010s - at which point it became the workhorse of essentially all modern neural networks.

Sources

Last verified June 6, 2026