Contrastive Learning

Contrastive learning trains a model to produce representations in which similar items sit close together and dissimilar items sit far apart. In practice, the model is shown pairs: a “positive” pair (two views of the same thing, such as two augmented crops of one image) should map to nearby vectors, while “negative” pairs (different things) should map far apart. The model learns by being rewarded for telling matches from non-matches, which forces it to capture the features that actually distinguish one item from another.

The 2020 SimCLR paper, “A Simple Framework for Contrastive Learning of Visual Representations” by Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton, showed how powerful this is for images. A simple linear classifier trained on SimCLR’s self-supervised representations reached 76.5 percent accuracy on ImageNet, matching a supervised ResNet-50. The paper found that the choice of data augmentations, an added nonlinear projection, and large batch sizes were all critical. Contrastive learning also underlies CLIP, which contrasts images against their text captions to learn a shared image-text space.

Why business readers should care: Contrastive learning is a major reason AI systems can learn useful features from unlabeled data and power semantic search, recommendation, and multimodal tools that connect images and text. It reduces dependence on expensive labeled datasets.

Sources

Related