Mixture of Experts (MoE)

Mixture of Experts (MoE) is a neural network design that splits a model into many specialized sub-networks, or “experts,” and uses a routing mechanism to send each input to only a small number of them. The result is a model with very high total capacity that activates only a fraction of itself per query, keeping the compute cost per answer low.

The modern, large-scale version was introduced by Shazeer et al. in the 2017 paper “Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer.” The authors began from the observation that “the capacity of a neural network to absorb information is limited by its number of parameters,” then showed how sparse routing lets parameter count grow enormously without a matching rise in cost. The author list includes Geoffrey Hinton and Jeff Dean.

MoE is now used in many leading models because it offers a favorable trade between capability and serving cost.

Why business readers should care: MoE is a key reason newer models can be both more capable and cheaper to run. When a vendor advertises a huge model at a competitive price, MoE is often how they do it.

Sources

Related