Traditional single-gene studies are effective in establishing phylogenetic trees among closely related organisms, but have drawbacks when comparing more distantly related organisms or microorganisms. This is because of
lateral gene transfer,
convergence, and varying rates of evolution for different genes. By using entire genomes in these comparisons, the anomalies created from these factors are overwhelmed by the pattern of evolution indicated by the majority of the data. Using this method, it is theoretically possible to create fully resolved phylogenetic trees, and timing constraints can be recovered more accurately. However, in practice this is not always the case. Due to insufficient data, multiple trees can sometimes be supported by the same data when analyzed using different methods. Notable results of phylogenomics (in the sense of massive
multigene phylogenies): • Using 135 genes from 65 different
species of photosynthetic organisms, it has been discovered that most of the photosynthetic eukaryotes are linked and possibly share a single ancestor. These included
plants,
alveolates,
rhizarians,
haptophytes and
cryptomonads. This has been referred to as the
Plants+HC+SAR megagroup. This study concatenates these genes together in what's called a "supermatrix" approach. • The root of the bacterial tree of life and the extent of horizontal gene transfer was determined by tracing the evolution of 11,272 gene families. This is a "supertree" approach. • The root of the archaeal tree of life was determined using a 45-protein supermatrix analysis and a 3242-protein supertree analysis. The 31,236 gene families in archaea are then put on the tree to determine what the ancestral archaea may have. • Using 120 proteins from bacteria or 53 proteins from archaea (supermatrix), the
Genome Taxonomy Database generates a taxonomy of all bacteria and archaea with high-quality sequenced genomes. Using massive multigene phylogenies to classify organisms does not necessarily require the use of whole genomes. A 2023 study uses
target enrichment to selectively sequence the 114 chosen marker genes, reducing the cost of the work by skipping irrelevant sequences. ==Databases==