Intrinsically disordered proteins

In molecular biology, an intrinsically disordered protein (IDP) is a protein that lacks a fixed or ordered three-dimensional structure, typically in the absence of its macromolecular interaction partners, such as other proteins or RNA. IDPs range from fully unstructured to partially structured and include random coil, molten globule-like aggregates, or flexible linkers in large multi-domain proteins. They are sometimes considered as a separate class of proteins along with globular, fibrous and membrane proteins.

History

of NMR structures of the thylakoid soluble phosphoprotein TSP9, which shows a largely flexible protein chain.|250px In the 1930s-1950s, the first protein structures were solved by protein crystallography. These early structures suggested that a fixed three-dimensional structure might be generally required to mediate biological functions of proteins. These publications solidified the central dogma of molecular biology in that the amino acid sequence of a protein determines its structure which, in turn, determines its function. In 1950, Fred Karush at the Neurological Institute of New York described the "configurational adaptability" found in serum albumins contradicting this assumption. Karush was convinced that proteins have more than one configuration at the same energy level and can choose one when binding to other substrates. In the 1960s, Levinthal's paradox suggested that the systematic conformational search of a long polypeptide is unlikely to yield a single folded protein structure on biologically relevant timescales (i.e. microseconds to minutes). Curiously, for many (small) proteins or protein domains, relatively rapid and efficient refolding can be observed in vitro. As stated in Anfinsen's Dogma from 1973, the fixed 3D structure of these proteins is uniquely encoded in its primary structure (the amino acid sequence), is kinetically accessible and stable under a range of (near) physiological conditions, and can therefore be considered as the native state of such "ordered" proteins. During the subsequent decades, however, many large protein regions could not be assigned in x-ray datasets, indicating that they occupy multiple positions, which average out in electron density maps. The lack of fixed, unique positions relative to the crystal lattice suggested that these regions were "disordered". Nuclear magnetic resonance spectroscopy of proteins also demonstrated the presence of large flexible linkers and termini in many solved structural ensembles. In 2001, Dunker questioned whether information was ignored for 50 years with more quantitative analyses becoming available in the 2000s. In the 2010s it became clear that IDPs are common among disease-related proteins, such as alpha-synuclein and tau. == Abundance ==

Abundance

Proteins exist as an ensemble of similar structures with some regions more constrained than others. IDPs occupy the extreme end of this spectrum of flexibility and include proteins of considerable local structure tendency or flexible multidomain assemblies. Intrinsic disorder is particularly elevated among proteins that regulate chromatin and transcription, and bioinformatic predictions indicate that is more common in genomes and proteomes than in known structures in the protein database. Based on DISOPRED2 prediction, long (>30 residue) disordered segments occur in 2.0% of archaean, 4.2% of eubacterial and 33.0% of eukaryotic proteins, including certain disease-related proteins. ==Biological roles==

Biological roles

Highly dynamic disordered regions of proteins have been linked to functionally important phenomena such as allosteric regulation and enzyme catalysis. Intrinsic disorder is particularly enriched in proteins implicated in cell signaling and transcription, Genes that have recently been born de novo tend to have higher disorder. In animals, genes with high disorder are lost at higher rates during evolution. Flexible linkers Disordered regions are often found as flexible linkers or loops connecting domains. Linker sequences vary greatly in length but are typically rich in polar uncharged amino acids. Flexible linkers allow the connecting domains to freely twist and rotate to recruit their binding partners via protein domain dynamics. They also allow their binding partners to induce larger scale conformational changes by long-range allostery. Linear motifs Linear motifs are short disordered segments of proteins that mediate functional interactions with other proteins or other biomolecules (RNA, DNA, sugars etc.). that are transient secondary structural elements primed for target recognition. In several cases it has been demonstrated that these transient structures become full and stable secondary structures, e.g., helices, upon target binding. Hence, PresMos are the putative active sites in IDPs. Coupled folding and binding Many unstructured proteins undergo transitions to ordered states upon binding to their targets (e.g. molecular recognition features (MoRFs)). The coupled folding and binding may be local, involving only a few interacting residues, or it might involve an entire protein domain. It was recently shown that the coupled folding and binding allows the burial of a large surface area that would be possible only for fully structured proteins if they were much larger. Moreover, certain disordered regions might serve as "molecular switches" in regulating certain biological function by switching to ordered conformation upon molecular recognition like small molecule-binding, DNA/RNA binding, ion interactions etc. The ability of disordered proteins to bind, and thus to exert a function, shows that stability is not a required condition. Many short functional sites, for example short linear motifs are over-represented in disordered proteins. Disordered proteins and short linear motifs are particularly abundant in many RNA viruses such as Hendra virus, HCV, HIV-1 and human papillomaviruses. This enables such viruses to overcome their informationally limited genomes by facilitating binding, and manipulation of, a large number of host cell proteins. Disorder in the bound state (fuzzy complexes) Intrinsically disordered proteins can retain their conformational freedom even when they bind specifically to other proteins. The structural disorder in bound state can be static or dynamic. In fuzzy complexes structural multiplicity is required for function and the manipulation of the bound disordered region changes activity. The conformational ensemble of the complex is modulated via post-translational modifications or protein interactions. Specificity of DNA binding proteins often depends on the length of fuzzy regions, which is varied by alternative splicing. Some fuzzy complexes may exhibit high binding affinity, although other studies showed different affinity values for the same system in a different concentration regime. ==Structural aspects==

Structural aspects

Intrinsically disordered proteins adapt a dynamic range of rapidly interchanging conformations in vivo according to the cell's conditions, creating a structural or conformational ensemble. Their structures are strongly function-related. Few proteins are fully disordered in their native state. Disorder is mostly found in intrinsically disordered regions (IDRs) within an otherwise well-structured protein. By employing a topological approach, one can categorize motifs according to their topological buildup and the timescale of their formation. A common aspect of IDP structural ensembles is the ability or tendency to fold upon an interaction to a binding partner in the cell. Examples of IDP folding in a binding context are binding-coupled folding, and formation of fuzzy complexes. However, it is also possible for proteins to remain entirely disordered in a binding scenario. Conversely, it is also possible for an isolated IDP to form compact states while preserving disorder and high solvent accessibility. ==Experimental validation==

Experimental validation

IDPs can be validated in several contexts. Most approaches for experimental validation r and of IDPs are restricted to extracted or purified proteins. Some new experimental strategies aim to explore in vivo conformations and structural variations of IDPs inside intact living cells and systematic comparisons between their dynamics in vivo and in vitro. (In the study of IDPs, the term in vivo is used a little differently from the ordinary meaning of in vivo: it refers to the state as found in living cells, not necessarily the entire living organism, as opposed to the traditional cell-free method of study.) In vitro approaches Intrinsically unfolded proteins, once purified, can be identified by various experimental methods. The primary method to obtain information on disordered regions of a protein is NMR spectroscopy. The lack of electron density in X-ray crystallographic studies may also be a sign of disorder. Folded proteins have a high density (partial specific volume of 0.72-0.74 mL/g) and commensurately small radius of gyration. Hence, unfolded proteins can be detected by methods that are sensitive to molecular size, density or hydrodynamic drag, such as size exclusion chromatography, analytical ultracentrifugation, small angle X-ray scattering (SAXS), and measurements of the diffusion constant. Unfolded proteins are also characterized by their lack of secondary structure, as assessed by far-UV (170–250 nm) circular dichroism (esp. a pronounced minimum at ~200 nm) or infrared spectroscopy. Unfolded proteins also have exposed backbone peptide groups exposed to solvent, so that they are readily cleaved by proteases, undergo rapid hydrogen-deuterium exchange and exhibit a small dispersion (<1 ppm) in their 1H amide chemical shifts as measured by NMR. (Folded proteins typically show dispersions as large as 5 ppm for the amide protons.) Recently, new methods including fast parallel proteolysis (FASTpp) have been introduced, which allow to determine the fraction folded/disordered without the need for purification. Even subtle differences in the stability of missense mutations, protein partner binding and (self)polymerisation-induced folding of (e.g.) coiled-coils can be detected using FASTpp as recently demonstrated using the tropomyosin-troponin protein interaction. Fully unstructured protein regions can be experimentally validated by their hypersusceptibility to proteolysis using short digestion times and low protease concentrations. Bulk methods to study IDP structure and dynamics include SAXS for ensemble shape information, NMR for atomistic ensemble refinement, fluorescence for visualising molecular interactions and conformational transitions, x-ray crystallography to highlight more mobile regions in otherwise rigid protein crystals, cryo-EM to reveal less fixed parts of proteins, light scattering to monitor size distributions of IDPs or their aggregation kinetics, NMR chemical shift and circular dichroism to monitor secondary structure of IDPs. Single-molecule methods to study IDPs include spFRET to study conformational flexibility of IDPs and the kinetics of structural transitions, optical tweezers for high-resolution insights into the ensembles of IDPs and their oligomers or aggregates, nanopores to reveal global shape distributions of IDPs, magnetic tweezers to study structural transitions for long times at low forces, high-speed atomic force microscopy (AFM) to visualise the spatio-temporal flexibility of IDPs directly. In vivo approaches The first direct evidence for in vivo persistence of intrinsic disorder has been achieved by in-cell NMR upon electroporation of a purified IDP and recovery of cells to an intact state. This was achieved in 2016. In vivo biotinylation was originally used to study which proteins come in the proximity of each other (proximity tagging). It was hypothesized and proven in 2018 that biotinylation favors intrinsically disordered regions as they are more accessible. Although the association between biotinylation and disorder is not absolute, the preference is strongly suggestive of a disordered state in vivo. Because biotinylation status can be checked for a large number of residues at the same time, large-scale biotin "painting" can identify many likely disordered regions at the same time. ==Computer simulations==

Computer simulations

Owing to high structural heterogeneity, NMR/SAXS experimental parameters obtained will be an average over a large number of highly diverse and disordered states (an ensemble of disordered states). Hence, to understand the structural implications of these experimental parameters, there is a necessity for accurate representation of these ensembles by computer simulations. All-atom molecular dynamic simulations can be used for this purpose but their use is limited by the accuracy of current force-fields in representing disordered proteins. Nevertheless, some force-fields have been explicitly developed for studying disordered proteins by optimising force-field parameters using available NMR data for disordered proteins. (examples are CHARMM 22*, CHARMM 32, Amber ff03* etc.) MD simulations restrained by experimental parameters (restrained-MD) have also been used to characterise disordered proteins. In principle, one can sample the whole conformational space given an MD simulation (with accurate Force-field) is run long enough. Because of very high structural heterogeneity, the time scales that needs to be run for this purpose are very large and are limited by computational power. However, other computational techniques such as accelerated-MD simulations, replica exchange simulations, metadynamics, multicanonical MD simulations, or methods using coarse-grained representation with implicit and explicit solvents have been used to sample broader conformational space in smaller time scales. Moreover, various protocols and methods of analyzing IDPs, such as studies based on quantitative analysis of GC content in genes and their respective chromosomal bands, have been used to understand functional IDP segments. ==Disorder annotation==

Disorder annotation

. Blue and red arrows point to missing residues on receptor and growth hormone, respectively.|250px Intrinsic disorder can be either annotated from experimental information or predicted with specialized software. Disorder prediction algorithms can predict intrinsic disorder (ID) propensity with high accuracy (approaching around 80%) based on primary sequence composition, similarity to unassigned segments in protein x-ray datasets, flexible regions in NMR studies and physico-chemical properties of amino acids. Disorder databases Databases have been established to annotate protein sequences with intrinsic disorder information. The DisProt database contains a collection of manually curated protein segments which have been experimentally determined to be disordered. MobiDB is a database combining experimentally curated disorder annotations (e.g. from DisProt) with data derived from missing residues in X-ray crystallographic structures and flexible regions in NMR structures. Predicting IDPs by sequence Separating disordered from ordered proteins is essential for disorder prediction. One of the first steps to find a factor that distinguishes IDPs from non-IDPs is to specify biases within the amino acid composition. The following hydrophilic, charged amino acids A, R, G, Q, S, P, E and K have been characterized as disorder-promoting amino acids, while order-promoting amino acids W, C, F, I, Y, V, L, and N are hydrophobic and uncharged. The remaining amino acids H, M, T and D are ambiguous, found in both ordered and unstructured regions. As it can be seen from the list, small, charged, hydrophilic residues often promote disorder, while large and hydrophobic residues promote order. This information is the basis of most sequence-based predictors. Regions with little to no secondary structure, also known as NORS (no regular secondary structure) regions, and low-complexity regions can easily be detected. However, not all disordered proteins contain such low complexity sequences. Prediction methods Determining disordered regions from biochemical methods is very costly and time-consuming. Due to the variable nature of IDPs, only certain aspects of their structure can be detected, so that a full characterization requires a large number of different methods and experiments. This further increases the expense of IDP determination. In order to overcome this obstacle, computer-based methods are created for predicting protein structure and function. It is one of the main goals of bioinformatics to derive knowledge by prediction. Predictors for IDP function are also being developed, but mainly use structural information such as linear motif sites. There are different approaches for predicting IDP structure, such as neural networks or matrix calculations, based on different structural and/or biophysical properties. Many computational methods exploit sequence information to predict whether a protein is disordered. Notable examples of such software include IUPRED and Disopred. Different methods may use different definitions of disorder. Meta-predictors show a new concept, combining different primary predictors to create a more competent and exact predictor. Due to the different approaches of predicting disordered proteins, estimating their relative accuracy is fairly difficult. For example, neural networks are often trained on different datasets. The disorder prediction category is a part of biannual CASP experiment that is designed to test methods according accuracy in finding regions with missing 3D structure (marked in PDB files as REMARK465, missing electron densities in X-ray structures). Disorder prediction can be more complicated for de novo-emerged and orphan proteins, which often lack detectable homologs and are generally shorter than “classical” proteins, reducing the reliability of predictors trained largely on conserved, globular proteins. Comparative benchmarks further show that structure/disorder predictors behave differently on de novo and random proteins than on conserved proteins, including different relationships between predicted disorder and confidence scores of 3D structure predictors, such as AlphaFold and ESMfold. == Disorder and disease ==

Disorder and disease

Intrinsically unstructured proteins have been implicated in a number of diseases. Many key tumour suppressors have large intrinsically unstructured regions, for example p53 and BRCA1. These regions of the proteins are responsible for mediating many of their interactions. Taking the cell's native defense mechanisms as a model drugs can be developed, trying to block the place of noxious substrates and inhibiting them, and thus counteracting the disease. == See also ==

Source: Wikipedia ↗

tickerdossier.com tickerdossier.substack.com