Chinchilla rule: double the data for every doubling of model size

fact

DeepMind’s 2022 paper “Training Compute-Optimal Large Language Models” established that the largest models of the day were undertrained. Its central finding is that, for a fixed compute budget, “for every doubling of model size the number of training tokens should also be doubled.” The 70-billion-parameter Chinchilla model, trained on far more data, outperformed much larger models such as the 280-billion Gopher and 175-billion GPT-3.

Sources

PRIMARY https://arxiv.org/abs/2203.15556

Last verified June 6, 2026

<- Back to the AI Library

Chinchilla rule: double the data for every doubling of model size

Sources

Related