Noam Shazeer

Noam Shazeer is among the most consequential engineers of the modern AI era. He is the second-listed author of “Attention Is All You Need” (arXiv 1706.03762), the 2017 paper that introduced the Transformer architecture now underpinning nearly every large language model.

He is also a leading figure in the mixture-of-experts (MoE) approach. He is the first author of “Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer” (arXiv 1701.06538), which showed how to selectively route inputs through specialized sub-networks. The paper reported “greater than 1000x improvements in model capacity with only minor losses in computational efficiency,” scaling to as many as 137 billion parameters. Sparse MoE designs are now central to many frontier models.

Shazeer co-founded Character.AI after leaving Google. In August 2024, Character.AI announced what it called “Our Next Phase of Growth”: Shazeer, co-founder Daniel De Freitas, and members of the research team joined Google, and Character.AI granted Google a non-exclusive license to its current LLM technology. (Character.AI’s blog intermittently blocks automated access; the announcement’s content was corroborated through a second retrieval.)

After rejoining Google, Shazeer became a technical lead on the Gemini effort within Google DeepMind. For business readers, his career is a throughline of the field: he helped invent the core architecture, the efficiency technique that scales it, and then helped steer one of the major frontier model families.