Transformers, the tech behind LLMs

This is chapter five of 3Blue1Brown’s deep learning series and the entry point to its treatment of transformers. Grant Sanderson follows a piece of text as it moves through a GPT-style model: how words are turned into vectors, how those vectors move through the layers, and how the final step produces a probability distribution over the next token.

Rather than starting from the original paper’s notation, the video builds the picture visually. It shows what embeddings are, how the network refines its representation of each token, and how attention and the feed-forward layers fit together. The unembedding step at the end makes clear how all of this becomes a concrete prediction of the next word.

Transformers are the architecture behind essentially every current large language model, and most explanations assume a research background. By rendering the whole pipeline as something a viewer can see and follow, this explainer gives a broad audience a real working understanding of how these models generate text. It pairs naturally with the channel’s dedicated chapter on attention.

Sources

Last verified June 7, 2026