The importance of domains as structural building blocks and elements of evolution has brought about many automated methods for their identification and classification in proteins of known structure. Automatic procedures for reliable domain assignment is essential for the generation of the domain databases, especially as the number of known protein structures is increasing. Although the boundaries of a domain can be determined by visual inspection, construction of an automated method is not straightforward. Problems occur when faced with domains that are discontinuous or highly associated. The fact that there is no standard definition of what a domain really is has meant that domain assignments have varied enormously, with each researcher using a unique set of criteria. A structural domain is a compact, globular sub-structure with more interactions within it than with the rest of the protein. Therefore, a structural domain can be determined by two visual characteristics: its compactness and its extent of isolation. Measures of local compactness in proteins have been used in many of the early methods of domain assignment and in several of the more recent methods.
Methods One of the first algorithms was based on the calculated interface areas between two chain segments repeatedly cleaved at various residue positions. Interface areas were calculated by comparing surface areas of the cleaved segments with that of the native structure. Potential domain boundaries can be identified at a site where the interface area was at a minimum. Other methods have used measures of solvent accessibility to calculate compactness. The PUU algorithm The RIBFIND rigid bodies have been used to flexibly fit protein structures into
cryo electron microscopy density maps. A general method to identify
dynamical domains, that is protein regions that behave approximately as rigid units in the course of structural fluctuations, has been introduced by Potestio et al. and, among other applications was also used to compare the consistency of the dynamics-based domain subdivisions with standard structure-based ones. The method, termed PiSQRD, is publicly available in the form of a webserver. The latter allows users to optimally subdivide single-chain or multimeric proteins into quasi-rigid domains alternatively pre-calculated essential dynamical spaces can be uploaded by the user.
Example domains •
Armadillo repeats: named after the β-catenin-like Armadillo protein of the fruit fly
Drosophila melanogaster. • Basic leucine zipper domain (
bZIP domain): found in many DNA-binding
eukaryotic proteins. One part of the domain contains a region that mediates sequence-specific DNA-binding properties and the Leucine zipper that is required for the
dimerization of two DNA-binding regions. The DNA-binding region comprises a number of basic aminoacids such as
arginine and
lysine. •
Cadherin repeats: Cadherins function as Ca2+-dependent cell–cell
adhesion proteins. Cadherin domains are extracellular regions which mediate cell-to-cell homophilic binding between cadherins on the surface of adjacent cells. •
Death effector domain (DED): allows protein–protein binding by homotypic interactions (DED-DED).
Caspase proteases trigger
apoptosis via proteolytic cascades. Pro-caspase-8 and pro-caspase-9 bind to specific adaptor molecules via DED domains, which leads to autoactivation of caspases. •
EF hand: a
helix-turn-helix structural motif found in each
structural domain of the
signaling protein calmodulin and in the muscle protein
troponin-C. •
Foldon domain: A small protein domain from fibritin in
T4 bacteriophage that can cause proteins to trimerize. • Immunoglobulin-like domains: found in proteins of the
immunoglobulin superfamily (IgSF). They contain about 70-110
amino acids and are classified into different categories (IgV, IgC1, IgC2 and IgI) according to their size and function. They possess a characteristic fold in which two
beta sheets form a "sandwich" that is stabilized by interactions between conserved
cysteines and other charged
amino acids. They are important for protein–protein interactions in processes of
cell adhesion, cell activation, and molecular recognition. These domains are commonly found in molecules with roles in the
immune system. •
Phosphotyrosine-binding domain (PTB): PTB domains usually bind to phosphorylated tyrosine residues. They are often found in signal transduction proteins. PTB-domain binding specificity is determined by residues to the amino-terminal side of the phosphotyrosine. Examples: the PTB domains of both
SHC and
IRS-1 bind to a
NPXpY sequence. PTB-containing proteins such as SHC and IRS-1 are important for
insulin responses of human cells. •
Pleckstrin homology domain (PH): PH domains bind
phosphoinositides with high affinity. Specificity for
PtdIns(3)P,
PtdIns(4)P,
PtdIns(3,4)P2,
PtdIns(4,5)P2, and
PtdIns(3,4,5)P3 have all been observed. Given the fact that phosphoinositides are sequestered to various cell membranes (due to their long lipophilic tail) the PH domains usually causes recruitment of the protein in question to a membrane where the protein can exert a certain function in cell signalling, cytoskeletal reorganization or membrane trafficking. •
Src homology 2 domain (SH2): SH2 domains are often found in signal transduction proteins. SH2 domains confer binding to phosphorylated tyrosine (pTyr). Named after the phosphotyrosine binding domain of the src viral
oncogene, which is itself a
tyrosine kinase.
See also:
SH3 domain. •
Zinc finger DNA-binding domain (ZnF_GATA): ZnF_GATA domain-containing proteins are typically
transcription factors that usually bind to the DNA sequence [AT]GATA[AG] of
promoters. == Domains of unknown function ==