Consider two hierarchical clusterings of n objects labeled A_1 and A_2. The trees A_1 and A_2 can be cut to produce k=2,\ldots,n-1 clusters for each tree (by either selecting clusters at a particular height of the tree or setting different strength of the hierarchical clustering). For each value of k, the following table can then be created :M=[m_{i,j}] \qquad (i=1,\ldots,k \text{ and } j=1,\ldots,k) where m_{i,j} is of objects common between the ith cluster of A_1 and jth cluster of A_2. The
Fowlkes–Mallows index for the specific value of k is then defined as : B_k=\frac{T_k}{\sqrt{P_kQ_k}} where :T_k=\sum_{i=1}^{k}\sum_{j=1}^{k}m_{i,j}^2-n :P_k=\sum_{i=1}^{k}(\sum_{j=1}^{k}m_{i,j})^2-n :Q_k=\sum_{j=1}^{k}(\sum_{i=1}^{k}m_{i,j})^2-n B_k can then be calculated for every value of k and the similarity between the two clusterings can be shown by plotting B_k versus k. For each k we have 0 \le B_k \le 1.
Fowlkes–Mallows index can also be defined based on the number of points that are common or uncommon in the two hierarchical clusterings. If we define :TP as the number of pairs of points that are present in the same cluster in both A_1 and A_2. :FP as the number of pairs of points that are present in the same cluster in A_1 but not in A_2. :FN as the number of pairs of points that are present in the same cluster in A_2 but not in A_1. :TN as the number of pairs of points that are in different clusters in both A_1 and A_2. Each pair of points is counted in exactly one of TP, FP, FN, or TN, so the sum of these equals the total number of pairs: : TP+FP+FN+TN = {n\choose 2} = \frac{n(n-1)}{2} The
Fowlkes–Mallows index for two clusterings can be defined as : FM = \sqrt{ PPV \cdot TPR} = \sqrt{ \frac {TP}{TP+FP} \cdot \frac{TP}{TP+FN} } where TP is the number of
true positives, FP is the number of
false positives, and FN is the number of
false negatives. TPR is the
true positive rate, also called
sensitivity or
recall, and PPV is the
positive predictive rate, also known as
precision. The Fowlkes–Mallows index is the
geometric mean of
precision and recall. ==Discussion==