The Transformer is introduced

milestone June 12, 2017

On June 12, 2017, eight Google researchers submitted “Attention Is All You Need” to arXiv, introducing the Transformer architecture. The paper was published at NIPS 2017 later that year.

At the time it read as a machine-translation result: better BLEU scores on standard English-German and English-French benchmarks, trained in a fraction of the time of recurrent models. In hindsight it marks the dividing line in modern AI. By replacing recurrence with attention, the Transformer made it practical to train vastly larger models on vastly more data - the direct precondition for the large language model era that followed (BERT in 2018, the GPT series, and essentially every frontier model since).

Sources

PRIMARY https://arxiv.org/abs/1706.03762

PRIMARY https://papers.nips.cc/paper_files/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html

Last verified June 6, 2026

<- Back to the AI Library

The Transformer is introduced

Sources

Related