to produce two
paralogs (Genes A and B). A speciation event produces
orthologs in the two daughter species. Bottom: in a separate species, an unrelated gene has a similar function (Gene C) but has a
separate evolutionary origin and so is an
analog. Homologous sequences are orthologous if they are inferred to be descended from the same ancestral sequence separated by a
speciation event: when a species diverges into two separate species, the copies of a single gene in the two resulting species are said to be orthologous. Orthologs, or orthologous genes, are genes in different species that originated by vertical descent from a single gene of the
last common ancestor. The term "ortholog" was coined in 1970 by the
molecular evolutionist
Walter Fitch. For instance, the plant
Flu regulatory protein is present both in
Arabidopsis (multicellular higher plant) and
Chlamydomonas (single cell green algae). The
Chlamydomonas version is more complex: it crosses the membrane twice rather than once, contains additional domains and undergoes
alternative splicing. However, it can fully substitute the much simpler
Arabidopsis protein, if transferred from algae to plant genome by means of
genetic engineering. Significant sequence similarity and shared functional domains indicate that these two genes are orthologous genes, inherited from the
shared ancestor. Orthology is strictly defined in terms of ancestry. Given that the exact ancestry of genes in different organisms is difficult to ascertain due to
gene duplication and genome rearrangement events, the strongest evidence that two similar genes are orthologous is usually found by carrying out phylogenetic analysis of the gene lineage. Orthologs often, but not always, have the same function. Orthologous sequences provide useful information in taxonomic classification and phylogenetic studies of organisms. The pattern of genetic divergence can be used to trace the relatedness of organisms. Two organisms that are very closely related are likely to display very similar DNA sequences between two orthologs. Conversely, an organism that is further removed evolutionarily from another organism is likely to display a greater divergence in the sequence of the orthologs being studied.
Databases of orthologous genes and de novo orthology inference tools Given their tremendous importance for biology and
bioinformatics, orthologous genes have been organized in several specialized
databases that provide tools to identify and analyze orthologous gene sequences. These resources employ approaches that can be generally classified into those that use
heuristic analysis of all pairwise sequence comparisons, and those that use
phylogenetic methods. Sequence comparison methods were first pioneered in the COGs database in 1997. These methods have been extended and automated in twelve different databases the most advanced being AYbRAH Analyzing Yeasts by Reconstructing Ancestry of Homologs as well as these following databases right now. Some tools predict orthologous de novo from the input protein sequences, might not provide any Database. Among these tools are SonicParanoid and OrthoFinder. •
eggNOG • GreenPhylDB for plants • InParanoid focuses on pairwise ortholog relationships • OHNOLOGS is a repository of the genes retained from whole genome duplications in the vertebrate genomes including human and mouse. •
OMA •
OrthoDB appreciates that the orthology concept is relative to different speciation points by providing a hierarchy of orthologs along the species tree. • OrthoInspector is a repository of orthologous genes for 4753 organisms covering the three domains of life • OrthologID • OrthoMaM for mammals • OrthoMCL • Roundup •
SonicParanoid is a graph based method that uses machine learning to reduce execution times and infer orthologs at the domain level. Tree-based
phylogenetic approaches aim to distinguish speciation from gene duplication events by comparing gene trees with species trees, as implemented in databases and software tools such as: • LOFT • TreeFam •
OrthoFinder A third category of hybrid approaches uses both heuristic and phylogenetic methods to construct clusters and determine trees, for example: • EnsemblCompara GeneTrees • HomoloGene • Ortholuge == Paralogy ==