Bimodal distributions are a commonly used example of how summary statistics such as the
mean,
median, and
standard deviation can be deceptive when used on an arbitrary distribution. For example, in the distribution in Figure 1, the mean and median would be about zero, even though zero is not a typical value. The standard deviation is also larger than deviation of each normal distribution. Although several have been suggested, there is no presently generally agreed summary statistic (or set of statistics) to quantify the parameters of a general bimodal distribution. For a mixture of two normal distributions the means and standard deviations along with the mixing parameter (the weight for the combination) are usually used – a total of five parameters.
Ashman's D A statistic that may be useful is Ashman's D: D = \frac{ \left| \mu_1 - \mu_2 \right| }{ \sqrt{ \frac{1}{2} \left( \sigma_1^2 + \sigma_2^2 \right) } } where
μ1,
μ2 are the means and
σ1,
σ2 are the standard deviations. For a mixture of two normal distributions
D > 2 is required for a clean separation of the distributions.
van der Eijk's A This measure is a weighted average of the degree of agreement the frequency distribution.
A ranges from -1 (perfect
bimodality) to +1 (perfect
unimodality). It is defined as A = U \left( 1 - \frac{ S - 1 }{ K - 1 } \right) where
U is the unimodality of the distribution,
S the number of categories that have nonzero frequencies and
K the total number of categories. The value of U is 1 if the distribution has any of the three following characteristics: • all responses are in a single category • the responses are evenly distributed among all the categories • the responses are evenly distributed among two or more contiguous categories, with the other categories with zero responses With distributions other than these the data must be divided into 'layers'. Within a layer the responses are either equal or zero. The categories do not have to be contiguous. A value for
A for each layer (
Ai) is calculated and a weighted average for the distribution is determined. The weights (
wi) for each layer are the number of responses in that layer. In symbols A_\text{overall} = \sum_i w_i A_i A
uniform distribution has
A = 0: when all the responses fall into one category
A = +1. One theoretical problem with this index is that it assumes that the intervals are equally spaced. This may limit its applicability.
Bimodal separation This index assumes that the distribution is a mixture of two normal distributions with means (
μ1 and
μ2) and standard deviations (
σ1 and
σ2): S = \frac{ \mu_1 - \mu_2 }{ 2( \sigma_1 +\sigma_2 ) }
Bimodality coefficient Sarle's bimodality coefficient
b is \beta = \frac{ \gamma^2 + 1 }{ \kappa } where
γ is the
skewness and
κ is the
kurtosis. The kurtosis is here defined to be the standardised fourth moment around the mean. The value of
b lies between 0 and 1. The logic behind this coefficient is that a bimodal distribution with light tails will have very low kurtosis, an asymmetric character, or both – all of which increase this coefficient. The formula for a finite sample is b = \frac{ g^2 + 1 }{ k + \frac{ 3( n - 1 )^2 }{ ( n - 2 )( n - 3 ) } } where
n is the number of items in the sample,
g is the
sample skewness and
k is the sample
excess kurtosis. The value of
b for the
uniform distribution is 5/9. This is also its value for the
exponential distribution. Values greater than 5/9 may indicate a bimodal or multimodal distribution, though corresponding values can also result for heavily skewed unimodal distributions. The maximum value (1.0) is reached only by a
Bernoulli distribution with only two distinct values or the sum of two different
Dirac delta functions (a bi-delta distribution). The distribution of this statistic is unknown. It is related to a statistic proposed earlier by Pearson – the difference between the kurtosis and the square of the skewness (
vide infra).
Bimodality amplitude This is defined as B = \sqrt{ \frac{ A_r }{ A_l } } \sum_i P_i where
Al and
Ar are the amplitudes of the left and right peaks respectively and
Pi is the logarithm taken to the base 2 of the proportion of the distribution in the ith interval. The maximal value of the
ΣP is 1 but the value of
B may be greater than this. To use this index, the log of the values are taken. The data is then divided into interval of width Φ whose value is log 2. The width of the peaks are taken to be four times 1/4Φ centered on their maximum values.
Bimodality indices Wang's index The bimodality index proposed by Wang
et al assumes that the distribution is a sum of two normal distributions with equal variances but differing means. It is defined as follows: \delta = \frac{ | \mu_1 - \mu_2 |}{ \sigma } where
μ1,
μ2 are the means and
σ is the common standard deviation. BI = \delta \sqrt{ p( 1 - p ) } where
p is the mixing parameter.
Sturrock's index A different bimodality index has been proposed by Sturrock. This index (
B) is defined as B = \frac{ 1 }{ N } \left[ \left( \sum_1^N \cos ( 2 \pi m \gamma ) \right)^2 + \left( \sum_1^N \sin ( 2 \pi m \gamma ) \right)^2 \right] When
m = 2 and
γ is uniformly distributed,
B is exponentially distributed. This statistic is a form of
periodogram. It suffers from the usual problems of estimation and spectral leakage common to this form of statistic.
de Michele and Accatino's index Another bimodality index has been proposed by de Michele and Accatino. Their index (
B) is B = | \mu - \mu_M | where
μ is the arithmetic mean of the sample and \mu_M = \frac{ \sum_{ i = 1 }^L m_i x_i }{ \sum_{ i = 1 }^L m_i } where
mi is number of data points in the
ith bin,
xi is the center of the
ith bin and
L is the number of bins. The authors suggested a cut off value of 0.1 for
B to distinguish between a bimodal (
B > 0.1)and unimodal (
B B = | \phi_2 - \phi_1 | \frac{ p_2 }{ p_1 } where
p1 and
p2 are the proportion contained in the primary (that with the greater amplitude) and secondary (that with the lesser amplitude) mode and
φ1 and
φ2 are the
φ-sizes of the primary and secondary mode. The
φ-size is defined as minus one times the log of the data size taken to the base 2. This transformation is commonly used in the study of sediments. The authors recommended a cut off value of 1.5 with B being greater than 1.5 for a bimodal distribution and less than 1.5 for a unimodal distribution. No statistical justification for this value was given.
Otsu's method Otsu's method for finding a threshold for separation between two modes relies on minimizing the quantity \frac{ n_1 \sigma_1^2 + n_2 \sigma_2^2 }{ m \sigma^2 } where
ni is the number of data points in the
ith subpopulation,
σi2 is the variance of the
ith subpopulation,
m is the total size of the sample and
σ2 is the sample variance. Some researchers (particularly in the field of
digital image processing) have applied this quantity more broadly as an index for detecting bimodality, with a small value indicating a more bimodal distribution. ==Statistical tests==