Attention Is All You Need

“Attention Is All You Need” was submitted to arXiv on June 12, 2017 by eight researchers at Google: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. It was published at the NIPS 2017 conference (Advances in Neural Information Processing Systems 30).

The paper proposed the Transformer: a neural network architecture for sequence transduction built entirely on attention mechanisms, dispensing with the recurrence and convolutions that dominated prior models. The authors demonstrated state-of-the-art results on English-to-German and English-to-French machine translation while requiring significantly less training time than recurrent architectures, because attention allows far more parallelization during training.

Its influence extends far beyond translation. The Transformer is the architectural foundation of essentially every modern large language model - the “T” in GPT stands for Transformer - as well as many vision, audio, and multimodal systems. It is among the most cited papers in the history of computer science.

Sources

Related