It is considered to be good practice to estimate at least one additional analysis of aggregation (other than Taylor's law) because the use of only a single index may be misleading. Although a number of other methods for detecting relationships between the variance and mean in biological samples have been proposed, to date none have achieved the popularity of Taylor's law. The most popular analysis used in conjunction with Taylor's law is probably Iwao's Patchiness regression test but all the methods listed here have been used in the literature.
Barlett–Iwao model Barlett in 1936 and later Iwao independently in 1968 both proposed an alternative relationship between the variance and the mean. In symbols : s_i^2 = am_i + bm_i^2 \, where
s is the variance in the
ith sample and
mi is the mean of the
ith sample When the population follows a
negative binomial distribution,
a = 1 and
b =
k (the exponent of the negative binomial distribution). This alternative formulation has not been found to be as good a fit as Taylor's law in most studies.
Nachman model Nachman proposed a relationship between the mean density and the proportion of samples with zero counts: : p_0 = \exp( -a m^b ) where
p0 is the proportion of the sample with zero counts,
m is the mean density,
a is a scale parameter and
b is a dispersion parameter. If
a =
b = 1 the distribution is random. This relationship is usually tested in its logarithmic form : \log m = c + d \log p_0 Allsop used this relationship along with Taylor's law to derive an expression for the proportion of infested units in a sample : P_1 = 1 - \exp\left( -\exp\left( \frac{ \frac{ \log_e \left( \frac{ A^2 } a \right) }{ b - 2 } + \log_e( n )\left( \frac{ b - 1 }{ b - 2 } - 1 \right) - c } d \right) \right) : N = n P_1 where : A^2 = \frac{D^2}{z^2_{\alpha/2}} where
D2 is the degree of precision desired,
zα/2 is the upper α/2 of the normal distribution,
a and
b are the Taylor's law coefficients,
c and
d are the Nachman coefficients,
n is the sample size and
N is the number of infested units.
Kono–Sugino equation Binary sampling is not uncommonly used in ecology. In 1958 Kono and Sugino derived an equation that relates the proportion of samples without individuals to the mean density of the samples. : \log( m ) = \log( a ) + b \log( - \log( p_0 ) ) where
p0 is the proportion of the sample with no individuals,
m is the mean sample density,
a and
b are constants. Like Taylor's law this equation has been found to fit a variety of populations including ones that obey Taylor's law. Unlike the negative binomial distribution this model is independent of the mean density. The derivation of this equation is straightforward. Let the proportion of empty units be
p0 and assume that these are distributed exponentially. Then : p_0 = \exp( -A m^B ) Taking logs twice and rearranging, we obtain the equation above. This model is the same as that proposed by Nachman. The advantage of this model is that it does not require counting the individuals but rather their presence or absence. Counting individuals may not be possible in many cases particularly where insects are the matter of study. ;Note The equation was derived while examining the relationship between the proportion
P of a series of rice hills infested and the mean severity of infestation
m. The model studied was : P = 1 - a e^{ b m } where
a and
b are empirical constants. Based on this model the constants
a and
b were derived and a table prepared relating the values of
P and
m ;Uses The predicted estimates of
m from this equation are subject to bias and it is recommended that the adjusted mean (
ma ) be used instead : m_a = m \left( 1 - \frac { \operatorname{var}( \log( m_i ) ) } 2 \right) where var is the variance of the sample unit means
mi and
m is the overall mean. An alternative adjustment to the mean estimates is : \operatorname{var}( m ) = m^2 ( c_1 + c_2 - c_3 + \text{MSE} ) where : c_1 = \frac{ \beta^2 ( 1 - p_0 ) }{ n p_0 \log_e( p_0 )^2 } : c_2 = \frac{ \text{MSE} } { N } + s_\beta^2 ( \log_e( \log_e( p_0 ) ) - p^2 ) : c_3 = \frac{ \exp( a + ( b - 2 )[\alpha - \beta \log_e( p_0 ) ] ) } n where MSE is the
mean square error of the regression,
α and
β are the constant and slope of the regression respectively,
sβ2 is the variance of the slope of the regression,
N is the number of points in the regression,
n is the number of sample units and
p is the mean value of
p0 in the regression. The parameters
a and
b are estimated from Taylor's law: : s^2 = a + b \log_e( m )
Hughes–Madden equation Hughes and Madden have proposed testing a similar relationship applicable to binary observations in cluster, where each cluster contains from 0 to n individuals.) who suggested testing the regression : \log( \operatorname{var}_\text{obs} / n^2 ) = a + b \log \frac{p ( 1 - p )} n where varobs is the variance,
a and
b are the constants of the regression,
n here is the sample size (not sample per cluster) and
p is the probability of a sample containing at least one individual.
Negative binomial distribution model A negative binomial model has also been proposed. The dispersion parameter (
k) using the method of moments is
m2 / (
s2 –
m ) and
pi is the proportion of samples with counts > 0. The
s2 used in the calculation of
k are the values predicted by Taylor's law.
pi is plotted against 1 − (
k(
k +
m)−1)
k and the fit of the data is visually inspected. Perry and Taylor have proposed an alternative estimator of
k based on Taylor's law. : \frac 1 k = \frac{am^{b - 2} - 1} m A better estimate of the dispersion parameter can be made with the method of
maximum likelihood. For the negative binomial it can be estimated from the equation
kc is equal to 1/slope of this regression.
Charlier coefficient This coefficient (
C) is defined as : C = \frac{ 100 ( s^2 - m )^{0.5} } m If the population can be assumed to be distributed in a negative binomial fashion, then
C = 100 (1/
k)0.5 where
k is the dispersion parameter of the distribution.
Cole's index of dispersion This index (
Ic) is defined as : I_c = \frac{ \sum x^2 }{ ( \sum x )^2 } The usual interpretation of this index is as follows: values of
Ic 1 are taken to mean a uniform distribution, a random distribution or an aggregated distribution. Because
s2 = Σ x2 − (Σx)2, the index can also be written : I_c = \frac{ s^2 + ( nm )^2 }{ ( nm )^2 } = \frac{ 1 }{ n^2} \frac{ s^2 }{ m^2 } + 1 If Taylor's law can be assumed to hold, then : I_c = \frac{ a m^{ b - 2 } }{ n^2 } + 1
Lloyd's indexes Lloyd's index of mean crowding (
IMC) is the average number of other points contained in the sample unit that contains a randomly chosen point. : \mathrm{IMC} = m + \frac{s^2}{m - 1} where
m is the sample mean and
s2 is the variance. Lloyd's index of patchiness (
IP) Let : y_i = m_i + \frac{s^2}{m_i} - 1
yi here is Lloyd's index of mean crowding. The sample size (
n) for a given degree of precision (
D) for this regression is given by The upper and lower limits of this test are based on critical densities mc where control of a pest requires action to be taken. : N_u = im_c + t( i ( a + 1 ) m_c + ( b - 1 ) m_c^2 )^{1/2} : N_l = im_c - t( i ( a + 1 ) m_c + ( b - 1 ) m_c^2 )^{1/2} where
Nu and
Nl are the upper and lower bounds respectively,
a is the constant from the regression,
b is the slope and
i is the number of samples. Kuno has proposed an alternative sequential stopping test also based on this regression. : T_n = \frac { a + 1 } { D^2 - \frac { b - 1 } n } where
Tn is the total sample size,
D is the degree of precision,
n is the number of samples units, a is the constant and b is the slope from the regression respectively. Kuno's test is subject to the condition that
n ≥ (
b − 1) /
D2 Parrella and Jones have proposed an alternative but related stop line : T_n = \left( 1 - \frac n N \right) \frac {a + 1} { D^2 - \left( 1 - \frac n N \right) \frac {b - 1} n } where
a and
b are the parameters from the regression,
N is the maximum number of sampled units and
n is the individual sample size.
Morisita's index of dispersion Masaaki Morisita's index of dispersion (
Im ) is the scaled probability that two points chosen at random from the whole population are in the same sample. Higher values indicate a more clumped distribution. : I_m = \frac { \sum x ( x - 1 ) } { n m ( m - 1 ) } An alternative formulation is : I_m = n \frac{ \sum x^2 - \sum x } { ( \sum x )^2 - \sum x } where
n is the total sample size,
m is the sample mean and
x are the individual values with the sum taken over the whole sample. It is also equal to : I_m = \frac { n \operatorname{IMC} } {nm - 1} where
IMC is Lloyd's index of crowding. : z = \frac { I_m - 1 } { 2 / (n m^2)} where
m is the overall sample mean,
n is the number of sample units and
z is the normal distribution
abscissa. Significance is tested by comparing the value of
z against the values of the
normal distribution. A function for its calculation is available in the statistical
R language in the vegan package. Note, not to be confused with
Morisita's overlap index.
Standardised Morisita's index Smith-Gill developed a statistic based on Morisita's index which is independent of both sample size and population density and bounded by −1 and +1. This statistic is calculated as follows First determine Morisita's index (
Id ) in the usual fashion. Then let
k be the number of units the population was sampled from. Calculate the two critical values : M_u = \frac { \chi^2_{0.975} - k + \sum x } { \sum x - 1 } : M_c = \frac { \chi^2_{0.025} - k + \sum x } { \sum x - 1 } where χ2 is the chi square value for
n − 1 degrees of freedom at the 97.5% and 2.5% levels of confidence. The standardised index (
Ip ) is then calculated from one of the formulae below. When
Id ≥
Mc > 1 : I_p = 0.5 + 0.5 \left( \frac { I_d - M_c } { k - M_c } \right) When
Mc >
Id ≥ 1 : I_p = 0.5 \left( \frac { I_d - 1 } { M_u - 1 } \right) When 1 >
Id ≥
Mu : I_p = -0.5 \left( \frac { I_d - 1 } { M_u - 1 } \right) When 1 >
Mu >
Id : I_p = -0.5 + 0.5 \left( \frac { I_d - M_u } { M_u } \right)
Ip ranges between +1 and −1 with 95% confidence intervals of ±0.5.
Ip has the value of 0 if the pattern is random; if the pattern is uniform,
Ip p > 0.
Southwood's index of spatial aggregation Southwood's index of spatial aggregation (
k) is defined as : \frac {1}{k} = \frac{m^*}{m} - 1 where
m is the mean of the sample and
m* is Lloyd's index of crowding. is : \mathrm{ID} = \frac{( n - 1 ) s^2 }{ m } This index may be used to test for over dispersion of the population. It is recommended that in applications n > 5 and that the sample total divided by the number of samples is > 3. In symbols : \frac { \sum x } { n } > 3 where
x is an individual sample value. The expectation of the index is equal to
n and it is distributed as the
chi-square distribution with
n − 1 degrees of freedom when the population is Poisson distributed. Under a random (Poisson) distribution
ICS is expected to equal 0. Positive values indicate a clumped distribution; negative values indicate a uniform distribution. : \mathrm{ ICS } = \frac{s^2}{m - 1} where
s2 is the variance and
m is the mean. If the population obeys Taylor's law : \mathrm{ ICS } = a m^{ b - 1 } - 1 The
ICS is also equal to Katz's test statistic divided by (
n / 2 )1/2 where
n is the sample size. It is also related to Clapham's test statistic. It is also sometimes referred to as the clumping index.
Green's index Green's index (
GI) is a modification of the index of cluster size that is independent of
n the number of sample units. : C_x = \frac { s^2 / m - 1 } { nm - 1 } This index equals 0 if the distribution is random, 1 if it is maximally aggregated and −1 / (
nm − 1 ) if it is uniform. The distribution of Green's index is not currently known so statistical tests have been difficult to devise for it. If the population obeys Taylor's law : C_x = \frac { a m^{ b - 1 } - 1 } {nm - 1}
Binary dispersal index Binary sampling (presence/absence) is frequently used where it is difficult to obtain accurate counts. The dispersal index (
D) is used when the study population is divided into a series of equal samples ( number of units =
N: number of units per sample =
n: total population size =
n x
N ). The theoretical variance of a sample from a population with a binomial distribution is : s^2 = n p ( 1 - p ) where
s2 is the variance,
n is the number of units sampled and
p is the mean proportion of sampling units with at least one individual present. The dispersal index (
D) is defined as the ratio of observed variance to the expected variance. In symbols : D = \frac {\text{var}_\text{obs} } { \text{var}_\text{bin} } = \frac{s^2} {np(1 - p)} where varobs is the observed variance and varbin is the expected variance. The expected variance is calculated with the overall mean of the population. Values of
D > 1 are considered to suggest aggregation.
D(
n − 1 ) is distributed as the chi squared variable with
n − 1 degrees of freedom where
n is the number of units sampled. An alternative test is the
C test. : C = \frac { D( n N - 1 ) - n N } { ( 2 N (n^2 - n))^{1/2} } where
D is the dispersal index,
n is the number of units per sample and
N is the number of samples. C is distributed normally. A statistically significant value of C indicates
overdispersion of the population.
D is also related to
intraclass correlation (
ρ) which is defined as : \rho = 1 - \frac{ \sum x_i ( T - x_i ) } { p ( 1 - p ) N T ( T - 1 ) } where
T is the number of organisms per sample,
p is the likelihood of the organism having the sought after property (diseased, pest free,
etc), and xi is the number of organism in the
ith unit with this property.
T must be the same for all sampled units. In this case with
n constant : \rho = \frac{ D - 1 } { n - 1 } If the data can be fitted with a
beta-binomial distribution then : m_0 = \exp\left( \frac{\log a}{1 - b} \right) ==Related statistics==