OpenAI's Jukebox was trained on 1.2 million songs

fact

To train Jukebox, its 2020 raw-audio music model, OpenAI curated a dataset of 1.2 million songs - 600,000 of them in English - each paired with the corresponding lyrics and metadata. Working directly on the raw audio rather than on symbolic notes, the model learned to produce music with rough singing that could be conditioned on genre, artist, and lyrics, with coherence stretching up to several minutes.

Sources

PRIMARY https://arxiv.org/abs/2005.00341

Last verified June 7, 2026

<- Back to the AI Library

OpenAI's Jukebox was trained on 1.2 million songs

Sources

Related