LibriSpeech: An ASR Corpus from Public Domain Audiobooks

LibriSpeech is a corpus of about 1,000 hours of 16 kHz read English speech, prepared by Vassil Panayotov with assistance from Daniel Povey and released through OpenSLR in 2015 alongside the ICASSP paper “LibriSpeech: an ASR corpus based on public domain audio books.” The audio is drawn from LibriVox, a collection of public-domain audiobooks read by volunteers, then carefully segmented and aligned to its text.

Its scale, clean licensing, and standard train and test splits made LibriSpeech the default benchmark for English speech recognition. Headline results from systems like Conformer are routinely reported as word error rates on its test-clean and test-other partitions, which makes progress directly comparable across papers.

Why business readers should care: shared, openly licensed benchmarks are what let a field measure real progress rather than vendor claims. LibriSpeech has anchored a decade of speech recognition advances, and improvements reported on it have tended to translate into the transcription tools people actually use.

LibriSpeech: An ASR Corpus from Public Domain Audiobooks

Sources

Related