The source of protein structures is the
Protein Data Bank. The unit of classification of structure in SCOP is the
protein domain. What the SCOP authors mean by "domain" is suggested by their statement that small proteins and most medium-sized ones have just one domain, and by the observation that human hemoglobin, which has an α2β2 structure, is assigned two SCOP domains, one for the α and one for the β subunit. The shapes of domains are called "folds" in SCOP. Domains belonging to the same fold have the same major secondary structures in the same arrangement with the same topological connections. 1195 folds are given in SCOP version 1.75. Short descriptions of each fold are given. For example, the "globin-like" fold is described as
core: 6 helices; folded leaf, partly opened. The fold to which a domain belongs is determined by inspection, rather than by software. The levels of SCOP version 1.75 are as follows. •
Class: Types of folds, e.g., beta sheets. • Fold: The different shapes of domains within a class. •
Superfamily: The domains in a fold are grouped into superfamilies, which have at least a distant common ancestor. •
Family: The domains in a superfamily are grouped into families, which have a more recent common ancestor. • Protein domain: The domains in families are grouped into protein domains, which are essentially the same protein. • Species: The domains in "protein domains" are grouped according to species. • Domain: part of a protein. For simple proteins, it can be the entire protein.
Classes The broadest groups on SCOP version 1.75 are the
protein fold classes. These classes group structures with similar secondary structure composition, but different overall tertiary structures and evolutionarily origins. This is the top level "root" of the SCOP hierarchical classification. • All alpha proteins [46456] (284):
Domains consisting of α-helices • All beta proteins [48724] (174):
Domains consisting of β-sheets • Alpha and beta proteins (a/b) [51349] (147):
Mainly parallel beta sheets (beta-alpha-beta units) • Alpha and beta proteins (a+b) [53931] (376):
Mainly antiparallel beta sheets (segregated alpha and beta regions) • Multi-domain proteins (alpha and beta) [56572] (66):
Folds consisting of two or more domains belonging to different classes •
membrane and cell surface proteins and
peptides [56835] (58):
Does not include proteins in the immune system •
Small proteins [56992] (90):
Usually dominated by metal ligand, cofactor, and/or disulfide bridges •
coiled-coil proteins [57942] (7):
Not a true class • Low resolution protein structures [58117] (26):
Peptides and fragments. Not a true class • Peptides [58231] (121):
peptides and fragments. Not a true class. • Designed proteins [58788] (44):
Experimental structures of proteins with essentially non-natural sequences. Not a true class The number in brackets, called a "sunid", is a SCOP unique integer identifier for each node in the SCOP hierarchy. The number in parentheses indicates how many elements are in each category. For example, there are 284 folds in the "All alpha proteins" class. Each member of the hierarchy is a link to the next level of the hierarchy.
Folds Each class contains a number of distinct folds. This classification level indicates similar tertiary structure, but not necessarily evolutionary relatedness. For example, the "All-α proteins" class contains >280 distinct folds, including:
Globin-like (core: 6 helices; folded leaf, partly opened), long alpha-hairpin (2 helices; antiparallel hairpin, left-handed twist) and Type I
dockerin domains (tandem repeat of two calcium-binding loop-helix motifs, distinct from the EF-hand).
Superfamilies Domains within a fold are further classified into
superfamilies. This is a largest grouping of proteins for which
structural similarity is sufficient to indicate evolutionary relatedness and therefore share a common ancestor. However, this ancestor is presumed to be distant, because the different members of a superfamily have low
sequence identities. For example, the two superfamilies of the "Globin-like" fold are: the
Globin superfamily and alpha-helical ferredoxin superfamily (contains two Fe4-S4 clusters).
Families Protein families are more closely related than superfamilies. Domains are placed in the same family if that have either: • >30% sequence identity • some sequence identity (e.g., 15%)
and perform the same function The similarity in sequence and structure is evidence that these proteins have a closer evolutionary relationship than do proteins in the same superfamily. Sequence tools, such as
BLAST, are used to assist in placing domains into superfamilies and families. For example, the four families in the "globin-like" superfamily of the "globin-like" fold are truncated hemoglobin (lack the first helix), nerve tissue mini-hemoglobin (lack the first helix but otherwise is more similar to conventional globins than the truncated ones), globins (Heme-binding protein), and
phycocyanin-like
phycobilisome proteins (oligomers of two different types of globin-like subunits containing two extra helices at the
N-terminus binds a
bilin chromophore). Families in SCOP are each assigned a concise classification string,
sccs, where the letter identifies the class to which the domain belongs; the following integers identify the fold, superfamily, and family, respectively (e.g., a.1.1.2 for the "Globin" family).
PDB entry domains A "TaxId" is the taxonomy ID number and links to the
NCBI taxonomy browser, which provides more information about the species to which the protein belongs. Clicking on a species or isoform brings up a list of domains. For example, the "Hemoglobin, alpha-chain from Human (Homo sapiens)" protein has >190 solved protein structures, such as 2dn3 (complexed with cmo), and 2dn1 (complexed with hem, mbn, oxy). Clicking on the
PDB numbers is supposed to display the structure of the molecule, but the links are currently broken (links work in pre-SCOP). ==Example==