To train Jukebox, its 2020 raw-audio music model, OpenAI curated a dataset of 1.2 million songs - 600,000 of them in English - each paired with the corresponding lyrics and metadata. Working directly on the raw audio rather than on symbolic notes, the model learned to produce music with rough singing that could be conditioned on genre, artist, and lyrics, with coherence stretching up to several minutes.