“Gradient-Based Learning Applied to Document Recognition” was published in 1998 in the Proceedings of the IEEE by Yann LeCun, Leon Bottou, Yoshua Bengio, and Patrick Haffner. It is a long, influential paper best known for introducing LeNet-5, a convolutional neural network that could read handwritten digits well enough to be deployed in the real world.
The paper’s lasting contribution is the convolutional neural network, or CNN. Instead of connecting every input pixel to every neuron, a CNN slides small learnable filters across the image, so the same feature detector is applied everywhere. This design encodes two facts about images directly into the architecture: that a useful pattern can appear anywhere in the frame, and that nearby pixels are related. The result is a network with far fewer parameters that learns features such as edges and strokes on its own, then combines them into recognizable shapes.
LeNet-5 was not just a research demonstration. Systems based on this work were used by banks to read the handwritten amounts on checks, processing a large share of the checks in the United States at the time. That practical success proved that gradient-based learning could deliver reliable, commercial computer vision.
The honest note is one of timing. The architecture was largely right, but the era was not ready for it: data and computing power were too limited for CNNs to beat hand-engineered methods on harder problems, so the approach stayed niche for over a decade. Its vindication came in 2012, when AlexNet, a deeper descendant of LeNet trained on GPUs and the large ImageNet dataset, won an image recognition contest by a wide margin and launched the deep learning boom.