In 1995 Corinna Cortes and Vladimir Vapnik, both at AT&T Bell Labs, published “Support-Vector Networks” in the journal Machine Learning (volume 20, pages 273 to 297). The paper’s abstract describes the support-vector network as “a new learning machine for two-group classification problems,” in which “input vectors are non-linearly mapped to a very high-dimension feature space” where “a linear decision surface is constructed.” The canonical record is the Springer page for the article; an author-side scan of the published paper is mirrored at marenglenbiba.net for readers without journal access.
The key idea was to find the decision boundary that separates two classes with the widest possible margin, supported only by the handful of hardest training examples (the “support vectors”). A mathematical trick, the kernel, let the same method draw curved boundaries by implicitly working in a much higher-dimensional space without ever computing the coordinates there. The paper reports that on a benchmark optical character recognition task, support-vector networks matched or beat classical learning algorithms.
The historical importance is what this method did to neural networks. Support vector machines had strong theoretical guarantees from Vapnik’s statistical learning theory, trained reliably, and often won head-to-head on the benchmarks of the day. Through the late 1990s and 2000s, when neural networks were hard to train and out of fashion, SVMs and related kernel methods were widely seen as the more principled, better-performing choice. They are a central reason the neural network revival did not arrive until the 2010s.
For business readers, the SVM era is a reminder that the eventually-dominant technology is not always the leading one. For roughly fifteen years, the smart money in machine learning was on a method that has since faded from the headlines, even though it remains a solid workhorse for many classification tasks.