Databases SLiMs are usually described by
regular expressions in the motif literature with the important residues defined based on a combination of experimental, structural and evolutionary evidence. However, high throughput screening such as phage display has seen a large increase in the available information for many motifs classes allowing them to be described with
sequence logos. Several diverse repositories currently curate the available motif data. In terms of scope, the
Eukaryotic Linear Motif resource (ELM) and
MiniMotif Miner (MnM) represent the two largest motif databases as they attempt to capture all motifs from the available literature. Several more specific and specialised databases also exist, PepCyber and ScanSite focus on smaller subsets of motifs, phosphopeptide binding and important signaling domains respectively. PDZBase focuses solely on PDZ domain ligands.
MEROPS and CutDB curate available proteolytic event data including protease specificity and cleavage sites. There has been a large increase in the number of publications describing motif mediated interactions over past decade and as a result a large amount of the available literature remains to be curated. Recent work has created the tool MiMosa to expedite the annotation process and encourage semantically robust motif descriptions.
Discovery tools SLiMs are short and degenerate and as a result the proteome is littered with stochastically occurring peptides that resemble functional motifs. The biologically relevant cellular partners can easily distinguish functional motifs, however computational tools have yet to reach a level of sophistication where motif discovery can be accomplished with high success rates. Motif discovery tools can be split into two major categories, discovery of novel instance of known functional motifs class and discovery of functional motifs class, however, they all use a limited and overlapping set of attributes to discriminate true and false positives. The main discrimatory attributes used in motif discovery are: • Accessibility – the motif must be accessible for the binding partner.
Intrinsic disorder prediction tools (such as IUPred or GlobPlot), domain databases (such as
Pfam and
SMART) and experimentally derived structural data (from sources such as
PDB) can be used to check the accessibility of predicted motif instances. • Conservation – the conservation of a motif correlates strongly with functionality and many experimental motifs are seen as islands of strong constraint in regions of weak conservation. Alignment of homologous proteins can be used to calculate conservation metric for a motif. • Physicochemical properties – Certain intrinsic properties of residues or stretches of amino acids are strong discriminators of functionality, for example, the propensity of a region of disorder to undergo a disorder to order transition. • Enrichment in groupings of similar proteins – Motif often evolve convergently to carry out similar tasks in different proteins such as mediating binding to a specific partner or targeting proteins to a particular subcellular localisation. Often in such cases these grouping the motif occurs more often than is expected by chance and can be detected by searching for enriched motifs.
Novel functional motifs instances The
Eukaryotic Linear Motif resource (ELM)
Novel functional motifs class More recently computational methods have been developed that can identify new Short Linear Motifs de novo. Interactome-based tools rely on identifying a set of proteins that are likely to share a common function, such as binding the same protein or being cleaved by the same peptidase. Two examples of such software are DILIMOT and SLiMFinder. Anchor and α-MoRF-Pred use physicochemical properties to search for motif-like peptides in disordered regions (termed
MoRFs, among others). ANCHOR identifies stretches of intrinsically disordered regions that cannot form favorable intrachain interactions to fold without additional stabilising energy contributed by a globular interaction partner. α-MoRF-Pred uses the inherent propensity of many SLiM to undergo a disorder to order transition upon binding to discover α-helical forming stretches within disordered regions. MoRFPred and MoRFchibi SYSTEM are SVM based predictors which utilize multiple features including local sequence physicochemical properties, long stretches of disordered regions and conservation in their predictions. SLiMPred is neural network–based method for the de novo discovery of SLiMs from the protein sequence. Information about the structural context of the motif (predicted secondary structure, structural motifs, solvent accessibility, and disorder) are used during the predictive process. Importantly, no previous knowledge about the protein (i.e., no evolutionary or experimental information) is required. == References ==