PAM matrices were introduced by
Margaret Dayhoff in 1978. The calculation of these matrices was based on 1572 observed mutations in the
phylogenetic trees of 71 families of closely related proteins. The proteins to be studied were selected on the basis of having high similarity with their predecessors. The protein alignments included were required to display at least 85% identity. As a result, it is reasonable to assume that any aligned mismatches were the result of a single mutation event, rather than several at the same location. Each PAM matrix has twenty rows and twenty columns — one representing each of the twenty amino acids translated by the genetic code. The value in each cell of a PAM matrix is related to the probability of a row amino acid before the mutation being aligned with a column amino acid afterwards. The base unit of time for the PAM matrices is the time required for 1 mutation to occur per 100 amino acids, sometimes called 'a PAM unit' or 'a PAM' of time. :\text{PAM}_n(i,j) = log \frac{f(j)M_{n}(i,j)}{f(i)f(j)} = log \frac{f(j)M^n(i,j)}{f(i)f(j)} = log \frac{M^n(i,j)}{f(i)} Note that in Gusfield's book, the entries M(i,j) and \text{PAM}_n(i,j) are related to the probability of the ith amino acid mutating into the jth amino acid. This is the origin of the different equation for the entries of the PAM matrices. When using the PAMn matrix to score an alignment of two proteins, the following assumption is made: ::
If these two proteins are related, the evolutionary interval separating them is the time taken for n point accepted mutations to occur per 100 amino acids. When the alignment of the ith and jth amino acids is considered, the score indicates the relative likelihoods of the alignment due to the proteins being related or due to random chance. • If the proteins are related, a series of point accepted mutations must have occurred to mutate the original amino acid into its replacement. Suppose the jth amino acid is the original. Based on the abundance of amino acids in proteins, the probability of the jth amino acid being the original is f(j). Given any particular unit of this amino acid, the
probability of being replaced by the ith amino acid in the assumed time interval is M_n(i,j). Thus, the probability of the alignment is f(j)M_n(i,j), the numerator within the
logarithm. • If the proteins are not related, the events that the two aligned amino acids are the ith and jth amino acids must be
independent. The probabilities of these events are f(i) and f(j), which means the probability of the alignment is f(i)f(j), the denominator of the logarithm. • Thus, the logarithm in the equation results in a positive entry if the alignment is more likely due to point accepted mutations, and a negative entry if the alignment is more likely due to chance. ==Properties of the PAM matrices==