Chinchilla: double the model, double the data

fact

DeepMind’s 2022 paper “Training Compute-Optimal Large Language Models” found that for compute-optimal training, the model size and the number of training tokens should be scaled equally: for every doubling of model size, the number of training tokens should also double. The paper showed that the large language models of the time were substantially undertrained for their size.

Sources

PRIMARY https://arxiv.org/abs/2203.15556

Last verified June 6, 2026

<- Back to the AI Library

Chinchilla: double the model, double the data

Sources

Related