Point accepted mutation

A point accepted mutation — also known as a PAM — is the replacement of a single amino acid in the primary structure of a protein with another single amino acid, which is accepted by the processes of natural selection. This definition does not include all point mutations in the DNA of an organism. In particular, silent mutations are not point accepted mutations, nor are mutations that are lethal or that are rejected by natural selection in other ways.

Biological background

The genetic instructions of every replicating cell in a living organism are contained within its DNA. This is known as a mutation. At the molecular level, there are regulatory systems that correct most — but not all — of these changes to the DNA before it is replicated. One of the possible mutations that occurs is the replacement of a single nucleotide, known as a point mutation. If a point mutation occurs within an expressed region of a gene, an exon, then this will change the codon specifying a particular amino acid in the protein produced by that gene. Changing a single amino acid in a protein may reduce its ability to carry out this function, or the mutation may even change the function that the protein carries out. Conversely, the change may allow the cell to continue functioning albeit differently, and the mutation can be passed on to the organism's offspring. If this change does not result in any significant physical disadvantage to the offspring, the possibility exists that this mutation will persist within the population. The possibility also exists that the change in function becomes advantageous. In either case, while being subjected to the processes of natural selection, the point mutation has been accepted into the genetic pool. The 20 amino acids translated by the genetic code vary greatly by the physical and chemical properties of their side chains. However, these amino acids can be categorised into groups with similar physicochemical properties. Substituting an amino acid with another from the same category is more likely to have a smaller impact on the structure and function of a protein than replacement with an amino acid from a different category. Consequently, acceptance of point mutations depends heavily on the amino acid being replaced in the mutation, and the replacement amino acid. The PAM matrices are a mathematical tool that account for these varying rates of acceptance when evaluating the similarity of proteins during alignment. ==Terminology==

Terminology

The term accepted point mutation was initially used to describe the mutation phenomenon. However, the acronym PAM was preferred over APM due to readability, and so the term point accepted mutation is used more regularly. Because the value n in the PAMn matrix represents the number of mutations per 100 amino acids, which can be likened to a percentage of mutations, the term percentage accepted mutation is sometimes used. It is important to distinguish between point accepted mutations (PAMs), point accepted mutation matrices (PAM matrices) and the PAMn matrix. The term 'point accepted mutation' refers to the mutation event itself. However, 'PAM matrix' refers to one of a family of matrices which contain scores representing the likelihood of two amino acids being aligned due to a series of mutation events, rather than due to random chance. The 'PAMn matrix' is the PAM matrix corresponding to a time frame long enough for n mutation events to occur per 100 amino acids. ==Construction of PAM matrices==

Construction of PAM matrices

PAM matrices were introduced by Margaret Dayhoff in 1978. The calculation of these matrices was based on 1572 observed mutations in the phylogenetic trees of 71 families of closely related proteins. The proteins to be studied were selected on the basis of having high similarity with their predecessors. The protein alignments included were required to display at least 85% identity. As a result, it is reasonable to assume that any aligned mismatches were the result of a single mutation event, rather than several at the same location. Each PAM matrix has twenty rows and twenty columns — one representing each of the twenty amino acids translated by the genetic code. The value in each cell of a PAM matrix is related to the probability of a row amino acid before the mutation being aligned with a column amino acid afterwards. The base unit of time for the PAM matrices is the time required for 1 mutation to occur per 100 amino acids, sometimes called 'a PAM unit' or 'a PAM' of time. :\text{PAM}_n(i,j) = log \frac{f(j)M_{n}(i,j)}{f(i)f(j)} = log \frac{f(j)M^n(i,j)}{f(i)f(j)} = log \frac{M^n(i,j)}{f(i)} Note that in Gusfield's book, the entries M(i,j) and \text{PAM}_n(i,j) are related to the probability of the ith amino acid mutating into the jth amino acid. This is the origin of the different equation for the entries of the PAM matrices. When using the PAMn matrix to score an alignment of two proteins, the following assumption is made: ::If these two proteins are related, the evolutionary interval separating them is the time taken for n point accepted mutations to occur per 100 amino acids. When the alignment of the ith and jth amino acids is considered, the score indicates the relative likelihoods of the alignment due to the proteins being related or due to random chance. • If the proteins are related, a series of point accepted mutations must have occurred to mutate the original amino acid into its replacement. Suppose the jth amino acid is the original. Based on the abundance of amino acids in proteins, the probability of the jth amino acid being the original is f(j). Given any particular unit of this amino acid, the probability of being replaced by the ith amino acid in the assumed time interval is M_n(i,j). Thus, the probability of the alignment is f(j)M_n(i,j), the numerator within the logarithm. • If the proteins are not related, the events that the two aligned amino acids are the ith and jth amino acids must be independent. The probabilities of these events are f(i) and f(j), which means the probability of the alignment is f(i)f(j), the denominator of the logarithm. • Thus, the logarithm in the equation results in a positive entry if the alignment is more likely due to point accepted mutations, and a negative entry if the alignment is more likely due to chance. ==Properties of the PAM matrices==

Properties of the PAM matrices

Symmetry of the PAM matrices While the mutation probability matrix M is not symmetric, each of the PAM matrices are. :\frac{m}{100} = 1 - e^{-\frac{n}{100}} : The validity of these estimates can be verified by counting the number of amino acids that remain unchanged under the action of the matrix M. The total number of unchanged amino acids for the time interval of the PAMn matrix is :\sum_{j=1}^{20}n(j)M^n(j,j) and so the proportion of unchanged amino acids is :\frac{\sum_{j=1}^{20}n(j)M^n(j,j)}{N} = \sum_{j=1}^{20}f(j)M^n(j,j) = 1 - \frac{m}{100} ==An example - PAM250==

An example - PAM250

A PAM250 is a commonly used scoring matrix for sequence comparison. Only the lower half of the matrix needs to be computed, since by their construction, PAM matrices are required to be symmetric. Each of the 20 amino acid are shown down the top and side of the matrix, with 3 additional ambiguous amino acids. The amino acids are most commonly shown listed alphabetically, or listed in groups. These groups are the characteristics shared among the amino acids. ==Uses in bioinformatics==

Uses in bioinformatics

Determining the time of divergence in phylogenetic trees The molecular clock hypothesis predicts that the rate of amino acid substitution in a particular protein will be approximately constant over time, though this rate may vary between protein families. Comparing PAM and BLOSUM Although the PAM log-odds matrices were the first scoring matrices used with BLAST, the PAM matrices have largely been replaced by the BLOSUM matrices. Although both matrices produce similar scoring outcomes they were generated using differing methodologies. The BLOSUM matrices were generated directly from the amino acid differences in aligned blocks that have diverged to varying degrees the PAM matrices reflect the extrapolation of evolutionary information based on closely related sequences to longer timescales. Since scoring information for the PAM and BLOSUM matrices were generated in very different ways the numbers associated with the matrices have fundamentally different meanings; the numbers for PAM matrices increase for comparisons among more divergent proteins whereas the numbers for the BLOSUM matrices decrease. However, all amino acid substitution matrices can be compared in an information theoretic framework using their relative entropy. ==See also==

Source: Wikipedia ↗

tickerdossier.com tickerdossier.substack.com