DeepMind and EMBL-EBI give away the AlphaFold protein database

On July 22, 2021, DeepMind and EMBL’s European Bioinformatics Institute (EMBL-EBI) launched the AlphaFold Protein Structure Database, making AlphaFold’s predicted protein structures freely and openly available to the scientific community. The launch announcement framed the release as the most complete and accurate database of predicted 3D structures of human proteins, covering the roughly 20,000 proteins expressed by the human genome.

The peer-reviewed account, “AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models” by Mihaly Varadi, Stephen Anyango and colleagues, appeared in Nucleic Acids Research in November 2021. The paper reports free access to over 360,000 predicted structures across 21 proteomes at the time of publication, accessible through web pages, programmatic APIs, and bulk download.

The point of the database was scale. Determining a protein structure in the lab can take months or years; AlphaFold made prediction fast, and giving the predictions away meant any researcher could look up a structure instead of solving it. EMBL-EBI Director Ewan Birney described the resource as one of the most important datasets since the mapping of the human genome.

The database kept growing. By a 2022 expansion it held over 200 million entries, providing broad coverage of the UniProt protein database - close to every catalogued protein known to science. This turned AlphaFold from a contest-winning model (see AlphaFold 2 at CASP14, 2020) into shared scientific infrastructure used by millions of researchers worldwide.