Ka/Ks ratio

In genetics, the Ka/Ks ratio, also known as ω or dN/dS ratio, is used to estimate the balance between neutral mutations, purifying selection and beneficial mutations acting on a set of homologous protein-coding genes. It is calculated as the ratio of the number of nonsynonymous substitutions per non-synonymous site (Ka), in a given period of time, to the number of synonymous substitutions per synonymous site (Ks), in the same period. The latter are assumed to be neutral, so that the ratio indicates the net balance between deleterious and beneficial mutations. Values of Ka/Ks significantly above 1 are unlikely to occur without at least some of the mutations being advantageous. If beneficial mutations are assumed to make little contribution, then Ka/Ks estimates the degree of evolutionary constraint.

Context

Selection acts on variation in phenotypes, which are often the result of mutations in protein-coding genes. The genetic code is written in DNA sequences as codons, groups of three nucleotides. Each codon represents a single amino acid in a protein chain. However, there are more codons (64) than amino acids found in proteins (20), so many codons are effectively synonyms. For example, the DNA codons TTT and TTC both code for the amino acid Phenylalanine, so a change from the third T to C makes no difference to the resulting protein. On the other hand, the codon GAG codes for Glutamic acid while the codon GTG codes for Valine, so a change from the middle A to T does change the resulting protein, for better or (more likely) worse, so the change is not a synonym. These changes are illustrated in the tables below. The Ka/Ks ratio measures the relative rates of synonymous and nonsynonymous substitutions at a particular site. == Methods ==

Methods

Methods for estimating Ka and Ks use a sequence alignment of two or more nucleotide sequences of homologous genes that code for proteins (rather than being genetic switches, controlling development or the rate of activity of other genes). Methods can be classified into three groups: approximate methods, maximum-likelihood methods, and counting methods. However, unless the sequences to be compared are distantly related (in which case maximum-likelihood methods prevail), the class of method used makes a minimal impact on the results obtained; more important are the assumptions implicit in the chosen method. It estimates critical parameters, including the divergence between sequences and the transition/transversion ratio, by deducing the most likely values to produce the input data. == Interpreting results ==

Interpreting results

The Ka/Ks ratio is used to infer the direction and magnitude of natural selection acting on protein coding genes. A ratio greater than 1 implies positive or Darwinian selection (driving change); less than 1 implies purifying or stabilizing selection (acting against change); and a ratio of exactly 1 indicates neutral (i.e. no) selection. However, a combination of positive and purifying selection at different points within the gene or at different times along its evolution may cancel each other out. The resulting averaged value can mask the presence of one of the selections and lower the seeming magnitude of another selection. Of course, it is necessary to perform a statistical analysis to determine whether a result is significantly different from 1, or whether any apparent difference may occur as a result of a limited data set. The appropriate statistical test for an approximate method involves approximating dN − dS with a normal approximation, and determining whether 0 falls within the central region of the approximation. More sophisticated likelihood techniques can be used to analyse the results of a Maximum Likelihood analysis, by performing a chi-squared test to distinguish between a null model (Ka/Ks = 1) and the observed results. == Utility ==

Utility

The Ka/Ks ratio is a more powerful test of the neutral model of evolution than many others available in population genetics as it requires fewer assumptions. == Complications ==

Complications

There is often a systematic bias in the frequency at which various nucleotides are swapped, as certain mutations are more probable than others. Some simpler approximate methods, such as those of Miyata & Yasunaga and Nei & Gojobori, neglect to take these into account, which generates a faster computational time at the expense of accuracy; these methods will systematically overestimate N and underestimate S. In addition, as time progresses, it is possible for a site to undergo multiple modifications. For instance, a codon may switch from AAA→AAC→AAT→AAA. There is no way of detecting multiple substitutions at a single site, thus the estimate of the number of substitutions is always an underestimate. In addition, in the example above two non-synonymous and one synonymous substitution occurred at the third site; however, because substitutions restored the original sequence, there is no evidence of any substitution. As the divergence time between two sequences increases, so too does the amount of multiple substitutions. Thus "long branches" in a dN/dS analysis can lead to underestimates of both dN and dS, and the longer the branch, the harder it is to correct for the introduced noise. Of course, the ancestral sequence is usually unknown, and two lineages being compared will have been evolving in parallel since their last common ancestor. This effect can be mitigated by constructing the ancestral sequence; the accuracy of this sequence is enhanced by having a large number of sequences descended from that common ancestor to constrain its sequence by phylogenetic methods. Methods that account for biases in codon usage and transition/transversion rates are substantially more reliable than those that do not. == Limitations ==

Limitations

Although the Ka/Ks ratio is a good indicator of selective pressure at the sequence level, evolutionary change can often take place in the regulatory region of a gene which affects the level, timing or location of gene expression. Ka/Ks analysis will not detect such change. It will only calculate selective pressure within protein coding regions. In addition, selection that does not cause differences at an amino acid level—for instance, balancing selection—cannot be detected by these techniques. This limits the usefulness of the Ka/Ks ratio for comparing closely related populations. == Individual codon approach ==

Individual codon approach

Additional information can be gleaned by determining the Ka/Ks ratio at specific codons within a gene sequence. For instance, the frequency-tuning region of an opsin may be under enhanced selective pressure when a species colonises and adapts to new environment, whereas the region responsible for initializing a nerve signal may be under purifying selection. In order to detect such effects, one would ideally calculate the Ka/Ks ratio at each site. However this is computationally expensive and in practise, a number of Ka/Ks classes are established, and each site is assigned to the best-fitting class. The first step in identifying whether positive selection acts on sites is to compare a test where the Ka/Ks ratio is constrained to be a/Ks to exceed 1 in some sites improves the fit of the model. If this is the case, then sites fitting into the class where Ka/Ks > 1 are candidates to be experiencing positive selection. This form of test can either identify sites that further laboratory research can examine to determine possible selective pressure; or, sites believed to have functional significance can be assigned into different Ka/Ks classes before the model is run. ==Notes==

Source: Wikipedia ↗

tickerdossier.com tickerdossier.substack.com