The working example is based on a
JC69 genetic distance matrix computed from the
5S ribosomal RNA sequence alignment of five bacteria:
Bacillus subtilis (a),
Bacillus stearothermophilus (b),
Lactobacillus viridescens (c),
Acholeplasma modicum (d), and
Micrococcus luteus (e).
First step •
First clustering Let us assume that we have five elements (a,b,c,d,e) and the following matrix D_1 of pairwise distances between them: In this example, D_1 (a,b)=17 is the smallest value of D_1, so we join elements a and b. •
First branch length estimation Let u denote the node to which a and b are now connected. Setting \delta(a,u)=\delta(b,u)=D_1(a,b)/2 ensures that elements a and b are equidistant from u. This corresponds to the expectation of the
ultrametricity hypothesis. The branches joining a and b to u then have lengths \delta(a,u)=\delta(b,u)=17/2=8.5 (
see the final dendrogram) •
First distance matrix update We then proceed to update the initial proximity matrix D_1 into a new proximity matrix D_2 (see below), reduced in size by one row and one column because of the clustering of a with b. Bold values in D_2 correspond to the new distances, calculated by retaining the
maximum distance between each element of the first cluster (a,b) and each of the remaining elements: D_2((a,b),c)=max(D_1(a,c),D_1(b,c))=max(21,30)=30 D_2((a,b),d)=max(D_1(a,d),D_1(b,d))=max(31,34)=34 D_2((a,b),e)=max(D_1(a,e),D_1(b,e))=max(23,21)=23 Italicized values in D_2 are not affected by the matrix update as they correspond to distances between elements not involved in the first cluster.
Second step •
Second clustering We now reiterate the three previous steps, starting from the new distance matrix D_2 : Here, D_2 ((a,b),e)=23 is the lowest value of D_2, so we join cluster (a,b) with element e. •
Second branch length estimation Let v denote the node to which (a,b) and e are now connected. Because of the ultrametricity constraint, the branches joining a or b to v, and e to v, are equal and have the following total length: \delta(a,v)=\delta(b,v)=\delta(e,v)=23/2=11.5 We deduce the missing branch length: \delta(u,v)=\delta(e,v)-\delta(a,u)=\delta(e,v)-\delta(b,u)=11.5-8.5=3 (
see the final dendrogram) •
Second distance matrix update We then proceed to update the D_2 matrix into a new distance matrix D_3 (see below), reduced in size by one row and one column because of the clustering of (a,b) with e : D_3(((a,b),e),c)=max(D_2((a,b),c),D_2(e,c))=max(30,39)=39 D_3(((a,b),e),d)=max(D_2((a,b),d),D_2(e,d))=max(34,43)=43
Third step •
Third clustering We again reiterate the three previous steps, starting from the updated distance matrix D_3. Here, D_3 (c,d)=28 is the smallest value of D_3, so we join elements c and d. •
Third branch length estimation Let w denote the node to which c and d are now connected. The branches joining c and d to w then have lengths \delta(c,w)=\delta(d,w)=28/2=14 (
see the final dendrogram) •
Third distance matrix update There is a single entry to update: D_4((c,d),((a,b),e))=max(D_3(c,((a,b),e)), D_3(d,((a,b),e)))=max(39, 43)=43
Final step The final D_4 matrix is: So we join clusters ((a,b),e) and (c,d). Let r denote the (root) node to which ((a,b),e) and (c,d) are now connected. The branches joining ((a,b),e) and (c,d) to r then have lengths: \delta(((a,b),e),r)=\delta((c,d),r)=43/2=21.5 We deduce the two remaining branch lengths: \delta(v,r)=\delta(((a,b),e),r)-\delta(e,v)=21.5-11.5=10 \delta(w,r)=\delta((c,d),r)-\delta(c,w)=21.5-14=7.5
The complete-linkage dendrogram The dendrogram is now complete. It is ultrametric because all tips (a to e) are equidistant from r : \delta(a,r)=\delta(b,r)=\delta(e,r)=\delta(c,r)=\delta(d,r)=21.5 The dendrogram is therefore rooted by r, its deepest node. == Comparison with other linkages ==