Ward's method Ward's method is an agglomerative clustering method in which the dissimilarity between two clusters and is measured by the amount by which merging the two clusters into a single larger cluster would increase the average squared distance of a point to its cluster
centroid. That is, :d(A,B)=\sum_{x\in A, y\in B} \frac{d^2(x,y)} - \sum_{x,y\in A} \frac{d^2(x,y)} - \sum_{x,y\in B} \frac{d^2(x,y)}. Expressed in terms of the centroid c_A and
cardinality n_A of the two clusters, it has the simpler formula :d(A,B)=\frac{d^2(c_a,c_b)}{1/n_A + 1/n_B}, allowing it to be computed in constant time per distance calculation. Although highly sensitive to
outliers, Ward's method is the most popular variation of agglomerative clustering both because of the round shape of the clusters it typically forms and because of its principled definition as the clustering that at each step has the smallest variance within its clusters. Alternatively, this distance can be seen as the difference in
k-means cost between the new cluster and the two old clusters. Ward's distance is also reducible, as can be seen more easily from a different formula for calculating the distance of a merged cluster from the distances of the clusters it was merged from: :d(A\cup B,C) = \frac{n_A+n_C}{n_A+n_B+n_C} d(A,C) + \frac{n_B+n_C}{n_A+n_B+n_C} d(B,C) - \frac{n_C}{n_A+n_B+n_C} d(A,B). Distance update formulas such as this one are called formulas "of Lance–Williams type" after the work of . If d(A,B) is the smallest of the three distances on the right hand side (as would necessarily be true if A and B are mutual nearest-neighbors) then the negative contribution from its term is cancelled by the n_C coefficient of one of the two other terms, leaving a positive value added to the weighted average of the other two distances. Therefore, the combined distance is always at least as large as the minimum of d(A,C) and d(B,C), meeting the definition of reducibility. Because Ward's distance is reducible, the nearest-neighbor chain algorithm using Ward's distance calculates exactly the same clustering as the standard greedy algorithm. For points in a
Euclidean space of constant dimension, it takes time and space .
Single linkage In
single-linkage or nearest-neighbor clustering, the oldest form of agglomerative hierarchical clustering,
Centroid distance Another distance measure commonly used in agglomerative clustering is the distance between the centroids of pairs of clusters, also known as the weighted group method. ==History==