The
AlphaFold Protein Structure Database (AFDB), a joint project between AlphaFold and
EMBL-EBI, was launched on July 22, 2021. At launch, the database contained AlphaFold 1-predicted
models for nearly the complete
UniProt proteome of humans and 20
model organisms, totaling over 365,000 proteins. The database does not include proteins with fewer than 16 or more than 2700
amino acid residues, but for humans they are available in the whole batch file. AlphaFold's initial goal (as of early 2022) was to expand the database to cover most of the UniRef90 set, which contains over 100 million proteins. As of May 15, 2022, the database contained 992,316 predictions. In July 2021, UniProt-KB and
InterPro has been updated to show AlphaFold predictions when available. On July 28, 2022, the team uploaded to the database the structures of around 200 million proteins from 1 million species, covering nearly every known protein on the planet. The number as of 2024 is 214 million, with 26 million being duplicates (exact sequence matches) of another protein in the database. The predicted structures can differ significantly between duplicates. As of 2025, the AFDB uses AlphaFold 2 for its predictions. All structures produced remain monomeric, but multimeric structures produced by other databases are linked on the page through the 3D-Beacons API. Foldseek, which provides fast and accurate structure searches, is also integrated. Information from AlphaMissense (a tool that uses AlphaFold to predict the outcome of missense mutations) is also integrated.
Derived databases AlphaFill adds cofactors to AlphaFold models where appropriate. This is achieved by searching the
Protein Data Bank for similar structures and transplanting cofactors to analogous positions. It is also linked to by UniProt. TmAlphaFold docks AlphaFold models to biological membranes, similar to what OPM does for PDB structures. AFTM uses AlphaFold models to identify transmembrane regions in human proteins, similar to what PDBTM does for PDB structures. The AFDB is not updated with UniProt sequences chanegs. AlphaSync keeps the AFDB in sync with UniProt entry changes, generating updated structures, residue-level features and contacts. It tries to use an AFDB entry for the exact updated sequence when available and run AlphaFold 2 independently otherwise. It fills in AFDB's blank for large (> 2,700 aa) proteins and proteins with special FASTA characters such as B, Z, U or X. The Encyclopedia of Domains (TED) applies the domain-recognition method from
CATH database to 188 million unique structures from the AFDB, identifying nearly 365 million domains, which is 100 million more than what sequence-based methods could identify. == Performance, validations and limitations ==