Molecular versus morphological data The characteristics used to create a cladogram can be roughly categorized as either
morphological (synapsid skull, warm blooded, unicellular, etc.) or molecular (
DNA,
RNA, or
protein sequence). Prior to the advent of DNA sequencing, cladistic analysis primarily used morphological data. For animals, behavioral data was also sometimes used. As
DNA sequencing has become cheaper and easier,
molecular systematics has become a more and more popular way to infer phylogenetic hypotheses. Using a parsimony criterion is only one of several methods to infer a phylogeny from molecular data. Approaches such as
maximum likelihood, which incorporate explicit models of sequence evolution, are non-Hennigian ways to evaluate sequence data. Another powerful method of reconstructing phylogenies is the use of genomic
retrotransposon markers, which are thought to be less prone to the problem of
reversion that plagues sequence data. They are also generally assumed to have a low incidence of homoplasies because it was once thought that their integration into the
genome was entirely random; this seems at least sometimes not to be the case, however. in cladistics. This diagram indicates "A" and "C" as ancestral states, and "B", "D" and "E" as states that are present in terminal taxa. Note that in practice, ancestral conditions are not known
a priori (as shown in this heuristic example), but must be inferred from the pattern of shared states observed in the terminals. Given that each terminal in this example has a unique state, in reality we would not be able to infer anything conclusive about the ancestral states (other than the fact that the existence of unobserved states "A" and "C" would be unparsimonious inferences!)
Plesiomorphies and synapomorphies Researchers must decide which character states are "ancestral" (
plesiomorphies) and which are derived (
synapomorphies), because only synapomorphic character states provide evidence of grouping. This determination is usually done by comparison to the character states of one or more
outgroups. States shared between the outgroup and some members of the in-group are symplesiomorphies; states that are present only in a subset of the in-group are synapomorphies. Note that character states unique to a single terminal (autapomorphies) do not provide evidence of grouping. The choice of an outgroup is a crucial step in cladistic analysis because different outgroups can produce trees with profoundly different topologies.
Homoplasies A
homoplasy is a character state that is shared by two or more taxa due to some cause
other than common ancestry. The two main types of homoplasy are convergence (evolution of the "same" character in at least two distinct lineages) and reversion (the return to an ancestral character state). Characters that are obviously homoplastic, such as white fur in different lineages of Arctic mammals, should not be included as a character in a phylogenetic analysis as they do not contribute anything to our understanding of relationships. However, homoplasy is often not evident from inspection of the character itself (as in DNA sequence, for example), and is then detected by its incongruence (unparsimonious distribution) on a most-parsimonious cladogram. Note that characters that are homoplastic may still contain
phylogenetic signal. A well-known example of homoplasy due to convergent evolution would be the character, "presence of wings". Although the wings of birds, bats, and insects serve the same function, each evolved independently, as can be seen by their
anatomy. If a bird, bat, and a winged insect were scored for the character, "presence of wings", a homoplasy would be introduced into the dataset, and this could potentially confound the analysis, possibly resulting in a false hypothesis of relationships. Of course, the only reason a homoplasy is recognizable in the first place is because there are other characters that imply a pattern of relationships that reveal its homoplastic distribution.
What is not a cladogram A cladogram is the diagrammatic result of an analysis, which groups taxa on the basis of synapomorphies alone. There are many other phylogenetic algorithms that treat data somewhat differently, and result in phylogenetic trees that look like cladograms but are not cladograms. For example, phenetic algorithms, such as UPGMA and Neighbor-Joining, group by overall similarity, and treat both synapomorphies and symplesiomorphies as evidence of grouping, The resulting diagrams are phenograms, not cladograms, Similarly, the results of model-based methods (Maximum Likelihood or Bayesian approaches) that take into account both branching order and "branch length," count both synapomorphies and autapomorphies as evidence for or against grouping, The diagrams resulting from those sorts of analysis are not cladograms, either.
Cladogram selection There are several
algorithms available to identify the "best" cladogram. The algorithms are almost always performed by computers, though some can be followed manually for data sets of only a few species and characteristics. Most algorithms use a
metric (also called a distance) to measure how consistent a candidate cladogram is with the data. Most cladogram algorithms use the
mathematical optimization. The
computational problem to be solved is the minimization of the metric. Some algorithms are useful only for molecular characteristic data, others only for morphological data, and some only for characteristic data incorporating both. Algorithms for cladograms or phylogenetic trees include
least squares,
neighbor-joining,
parsimony,
maximum likelihood, and
Bayesian inference. Algorithms that perform optimization tasks (such as building cladograms) can be sensitive to the order in which the input data (the list of species and their characteristics) is presented. Inputting the data in various orders can cause the same algorithm to produce different "best" cladograms. In these situations, the user should input the data in various orders and compare the results. Using different algorithms on a single data set can sometimes yield different "best" cladograms, because each algorithm may have a unique definition of what is "best". Because of the astronomical number of possible cladograms, algorithms cannot guarantee that the solution is the overall best solution. A nonoptimal cladogram will be selected if the program settles on a local minimum rather than the desired global minimum. To help solve this problem, many cladogram algorithms use a
simulated annealing approach to increase the likelihood that the selected cladogram is the optimal one. Biologists sometimes use the term
parsimony for a specific kind of cladogram generation algorithm and sometimes as an umbrella term for all phylogenetic algorithms.
Terminology The
basal position is the direction of the base (or root) of a rooted phylogenetic tree or cladogram. A basal clade is the earliest clade (of a given taxonomic rank[a]) to branch within a larger clade. ==Statistics==