Generative Adversarial Networks

“Generative Adversarial Networks” was submitted to arXiv in June 2014 by Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio, then at the University of Montreal. It introduced GANs, a training scheme that produced some of the first convincingly realistic machine-generated images.

The idea is a contest between two networks. A generator tries to produce fake data - say, images - that look real. A discriminator tries to tell the generator’s fakes apart from genuine examples. The two are trained together, each pushing the other to improve: the generator learns to fool the discriminator, and the discriminator learns to catch the generator. At equilibrium, in theory, the generator’s output is indistinguishable from real data. The paper framed this as a single minimax game with an elegant mathematical formulation.

GANs were new in approaching generation as an adversarial game rather than by directly maximizing the likelihood of the data, which had been the dominant approach. The framework caught on quickly and drove years of rapid progress in image synthesis, producing photorealistic faces of people who do not exist, image-to-image translation, and style transfer. It also gave the public the word “deepfake,” for the synthetic media these methods made possible.

The honest note is twofold. GANs were notoriously difficult to train, prone to instability and to “mode collapse,” where the generator produces only a narrow slice of the possible outputs. And from the start they raised the concern that realistic synthetic images and video could be used to deceive. In recent years, diffusion models have largely overtaken GANs as the leading approach to image generation, but the adversarial idea remains a landmark.

Generative Adversarial Networks

Sources

Related