Protein identification There are two main ways MS is used to identify proteins.
Peptide mass fingerprinting uses the masses of proteolytic peptides as input to a search of a database of predicted masses that would arise from digestion of a list of known proteins. If a protein sequence in the reference list gives rise to a significant number of predicted masses that match the experimental values, there is some evidence that this protein was present in the original sample. Purification steps therefore limit the throughput of the peptide mass fingerprinting approach. Alternatively, peptides can be fragmented with MS/MS to more definitively identify them. MS is also the preferred method for the identification of
post-translational modifications in proteins versus other approaches such as antibody-based methods. An annotated
peptide spectral library can also be used as a reference for protein/peptide identification. It offers the unique strength of reduced search space and increased specificity. The limitations include spectra not included in the library will not be identified, spectra collected from different types of mass spectrometers can have quite distinct features, and reference spectra in the library may contain noise peaks, which may lead to false positive identifications. A number of different algorithmic approaches have been described to identify peptides and proteins from
tandem mass spectrometry (MS/MS), peptide
de novo sequencing and sequence tag-based searching.
Antigen presentation Antigen presentation is the first step in educating the immune system to recognize new pathogens. To this end, antigen presenting cells expose protein fragments via
MHC molecules to the immune system. Not all protein fragments bind, however, to the MHC molecules of a certain individual. Using mass spectrometry, the true spectrum of molecules presented to the immune system can be determined.
Protein quantitation Multiple methods allow for the quantitation of proteins by mass spectrometry, and recent advances have enabled quantifying thousands of proteins in single cells. Protein quantification by mass spectrometry benefits from efficient sampling (counting) of many ions per protein compared to other methods. Quantifications can be performed by label-free methods and by multiplexed methods, which use isotopic mass tags as labels. Multiplexed methods can improve both quantitative accuracy and throughput. Typically, stable (e.g. non-radioactive) heavier
isotopes of
carbon (13C) or
nitrogen (15N) are incorporated into one sample while the other one is labeled with corresponding light isotopes (e.g. 12C and 14N). The two samples are mixed before the analysis. Peptides derived from the different samples can be distinguished due to their mass difference. The ratio of their peak intensities corresponds to the relative abundance ratio of the peptides (and proteins). The first generation of methods for isotope labeling included
SILAC (stable isotope labeling by amino acids in cell culture), trypsin-catalyzed 18O labeling,
ICAT (isotope coded affinity tagging), and
iTRAQ (isobaric tags for relative and absolute quantitation). The more recent generation of multiplexing methods include tandem mass tags (TMT) for DDA data and mTRAQ for multiplexed DIA (plexDIA). "Semi-quantitative" mass spectrometry can be performed without labeling of samples. Typically, this is done with MALDI analysis (in linear mode). The peak intensity, or the peak area, from individual molecules (typically proteins) is here correlated to the amount of protein in the sample. However, the individual signal depends on the primary structure of the protein, on the complexity of the sample, and on the settings of the instrument. Other types of "label-free" quantitative mass spectrometry, uses the spectral counts (or peptide counts) of digested proteins as a means for determining relative protein amounts. Comparing charge state distributions can give information about the structure of a protein. A wide variety of high charge states indicates disorder of the protein, whereas more compact, folded proteins result in lower charge states. By using chemical crosslinking to couple parts of the protein that are close in space, but far apart in sequence, information about the overall structure can be inferred. By following the
exchange of amide protons with
deuterium from the solvent, it is possible to probe the solvent accessibility of various parts of the protein.
Hydrogen-deuterium exchange mass spectrometry has been used to study proteins and their conformations for over 20 years. This type of protein structural analysis can be suitable for proteins that are challenging for other structural methods. Another interesting avenue in protein structural studies is laser-induced covalent labeling. In this technique, solvent-exposed sites of the protein are modified by hydroxyl radicals. Its combination with rapid mixing has been used in protein folding studies.
Proteogenomics In what is now commonly referred to as
proteogenomics, peptides identified with
mass spectrometry are used for improving gene annotations (for example, gene start sites) and protein annotations. Parallel analysis of the genome and the proteome facilitates discovery of post-translational modifications and proteolytic events, especially when comparing multiple species. ==References==