Chinchilla: double the model, double the data

DeepMind’s 2022 paper “Training Compute-Optimal Large Language Models” found that for compute-optimal training, the model size and the number of training tokens should be scaled equally: for every doubling of model size, the number of training tokens should also double. The paper showed that the large language models of the time were substantially undertrained for their size.

Sources

Last verified June 6, 2026