Diversity index

A diversity index is a method of measuring how many different types there are in a dataset. Diversity indices are statistical representations of different aspects of biodiversity, which are useful simplifications for comparing different communities or sites.

Effective number of species or Hill numbers

True diversity, or the effective number of types, refers to the number of equally abundant types needed for the average proportional abundance of the types to equal that observed in the dataset of interest (where all types may not be equally abundant). The true diversity in a dataset is calculated by first taking the weighted generalized mean of the proportional abundances of the types in the dataset, and then taking the reciprocal of this. The equation is: When , the above equation is undefined. However, the mathematical limit as approaches 1 is well defined and the corresponding diversity is calculated with the following equation: {}^1\!D={1 \over {\prod_{i=1}^R p_i^{p_i}}} = \exp\left(-\sum_{i=1}^R p_i \ln(p_i)\right) which is the exponential of the Shannon entropy calculated with natural logarithms (see above). In other domains, this statistic is also known as the perplexity. The general equation of diversity is often written in the form {}^q\!D = \left ( {\sum_{i=1}^R p_i^q} \right )^{1/(1-q)} and the term inside the parentheses is called the basic sum. Some popular diversity indices correspond to the basic sum as calculated with different values of . == Sensitivity of the diversity value to rare vs. abundant species ==

Sensitivity of the diversity value to rare vs. abundant species

The value of is often referred to as the order of the diversity. It defines the sensitivity of the true diversity to rare vs. abundant species by modifying how the weighted mean of the species' proportional abundances is calculated. With some values of the parameter , the value of the generalized mean assumes familiar kinds of weighted means as special cases. In particular, • corresponds to the weighted harmonic mean, • to the weighted geometric mean, and • to the weighted arithmetic mean. • As approaches infinity, the weighted generalized mean with exponent approaches the maximum value, which is the proportional abundance of the most abundant species in the dataset. Generally, increasing the value of increases the effective weight given to the most abundant species. This leads to obtaining a larger value and a smaller true diversity () value with increasing . When , the weighted geometric mean of the values is used, and each species is exactly weighted by its proportional abundance (in the weighted geometric mean, the weights are the exponents). When , the weight given to abundant species is exaggerated, and when , the weight given to rare species is. At , the species weights exactly cancel out the species proportional abundances, such that the weighted mean of the values equals even when all species are not equally abundant. At , the effective number of species, , hence equals the actual number of species . In the context of diversity, is generally limited to non-negative values. This is because negative values of would give rare species so much more weight than abundant ones that would exceed . ==Richness==

Richness

Richness simply quantifies how many different types the dataset of interest contains. For example, species richness (usually noted ) is simply the number of species, e.g. at a particular site. Richness is a simple measure, so it has been a popular diversity index in ecology, where abundance data are often not available. If true diversity is calculated with , the effective number of types () equals the actual number of types, which is identical to Richness (). ==Shannon index==

{{anchor|Shannon index}}Shannon index

The Shannon index has been a popular diversity index in the ecological literature, where it is also known as '''Shannon's diversity index, Shannon–Wiener index, and (erroneously) Shannon–Weaver index'. The measure was originally proposed by Claude Shannon in 1948 to quantify the entropy (hence Shannon entropy'', related to Shannon information content) in strings of text. The idea is that the more letters there are, and the closer their proportional abundances in the string of interest, the more difficult it is to correctly predict which letter will be the next one in the string. The Shannon entropy quantifies the uncertainty (entropy or degree of surprise) associated with this prediction. It is most often calculated as follows: H' = -\sum_{i=1}^R p_i \ln(p_i) where is the proportion of characters belonging to the th type of letter in the string of interest. In ecology, is often the proportion of individuals belonging to the th species in the dataset of interest. Then the Shannon entropy quantifies the uncertainty in predicting the species identity of an individual that is taken at random from the dataset. Although the equation is here written with natural logarithms, the base of the logarithm used when calculating the Shannon entropy can be chosen freely. Shannon himself discussed logarithm bases 2, 10 and , and these have since become the most popular bases in applications that use the Shannon entropy. Each log base corresponds to a different measurement unit, which has been called binary digits (bits), decimal digits (decits), and natural digits (nats) for the bases 2, 10 and , respectively. Comparing Shannon entropy values that were originally calculated with different log bases requires converting them to the same log base: change from the base to base is obtained with multiplication by . The Shannon index () is related to the weighted geometric mean of the proportional abundances of the types. Specifically, it equals the logarithm of true diversity as calculated with : H' = -\sum_{i=1}^R p_i \ln(p_i) = -\sum_{i=1}^R \ln\left(p_i^{p_i}\right) This can also be written \begin{align} H' &= -\left[\ln\left(p_1^{p_1}\right) +\ln\left(p_2^{p_2}\right) +\ln\left(p_3^{p_3}\right) + \cdots + \ln\left(p_R^{p_R}\right)\right] \\[1ex] &= -\ln\left(p_1^{p_1}p_2^{p_2}p_3^{p_3} \cdots p_R^{p_R}\right) = \ln \left ( {1 \over p_1^{p_1}p_2^{p_2}p_3^{p_3} \cdots p_R^{p_R}} \right ) \\ &= \ln \left ( {1 \over {\prod_{i=1}^R p_i^{p_i}}} \right ) \end{align} Since the sum of the values equals 1 by definition, the denominator equals the weighted geometric mean of the values, with the values themselves being used as the weights (exponents in the equation). The term within the parentheses hence equals true diversity , and equals . When all types in the dataset of interest are equally common, all values equal , and the Shannon index hence takes the value . The more unequal the abundances of the types, the larger the weighted geometric mean of the values, and the smaller the corresponding Shannon entropy. If practically all abundance is concentrated to one type, and the other types are very rare (even if there are many of them), Shannon entropy approaches zero. When there is only one type in the dataset, Shannon entropy exactly equals zero (there is no uncertainty in predicting the type of the next randomly chosen entity). In machine learning the Shannon index is also called as Information gain. Rényi entropy The Rényi entropy is a generalization of the Shannon entropy to other values of than 1. It can be expressed: {}^qH = \frac{1}{1-q} \; \ln\left ( \sum_{i=1}^R p_i^q \right ) which equals {}^qH = \ln\left ( {1 \over \sqrt[q-1]{{\sum_{i=1}^R p_i p_i^{q-1}}}} \right ) = \ln({}^q\!D) This means that taking the logarithm of true diversity based on any value of gives the Rényi entropy corresponding to the same value of . ==Simpson index==

Simpson index

The Simpson index was introduced in 1949 by Edward H. Simpson to measure the degree of concentration when individuals are classified into types. The same index was rediscovered by Orris C. Herfindahl in 1950. The square root of the index had already been introduced in 1945 by the economist Albert O. Hirschman. As a result, the same measure is usually known as the Simpson index in ecology, and as the Herfindahl index or the Herfindahl–Hirschman index (HHI) in economics. The measure equals the probability that two entities taken at random from the dataset of interest represent the same type. Since the mean proportional abundance of the types increases with decreasing number of types and increasing abundance of the most abundant type, λ obtains small values in datasets of high diversity and large values in datasets of low diversity. This is counterintuitive behavior for a diversity index, so often, such transformations of λ that increase with increasing diversity have been used instead. The most popular of such indices have been the inverse Simpson index (1/λ) and the Gini–Simpson index (1 − λ). in the field of machine learning. The original Simpson index λ equals the probability that two entities taken at random from the dataset of interest (with replacement) represent the same type. Its transformation 1 − λ, therefore, equals the probability that the two entities represent different types. This measure is also known in ecology as the probability of interspecific encounter (PIE) and the Gini–Simpson index. which is also known as the Blau index, is the same measure as the Gini–Simpson index. The quantity is also known as the expected heterozygosity in population genetics. ==Berger–Parker index==

Berger–Parker index

The Berger–Parker index, named after Wolfgang H. Berger and Frances Lawrence Parker, equals the maximum value in the dataset, i.e., the proportional abundance of the most abundant type. This corresponds to the weighted generalized mean of the values when approaches infinity, and hence equals the inverse of the true diversity of order infinity (). ==See also==

Source: Wikipedia ↗

tickerdossier.com tickerdossier.substack.com