Measures of central tendency Mode The
mode of a beta distributed
random variable X with
α,
β > 1 is the most likely value of the distribution (corresponding to the peak in the PDF), and is given by the following expression: \frac{\alpha - 1} {\alpha + \beta - 2} . When both parameters are less than one (
α,
β 1 the mode (resp. anti-mode when ), is at the center of the distribution: it is symmetric in those cases. See
Shapes section in this article for a full list of mode cases, for arbitrary values of
α and
β. For several of these cases, the maximum value of the density function occurs at one or both ends. In some cases the (maximum) value of the density function occurring at the end is finite. For example, in the case of
α = 2,
β = 1 (or
α = 1,
β = 2), the density function becomes a
right-triangle distribution which is finite at both ends. In several other cases there is a
singularity at one end, where the value of the density function approaches infinity. For example, in the case
α =
β = 1/2, the beta distribution simplifies to become the
arcsine distribution. There is debate among mathematicians about some of these cases and whether the ends (
x = 0, and
x = 1) can be called
modes or not. • Whether the ends are part of the
domain of the density function • Whether a
singularity can ever be called a
mode • Whether cases with two maxima should be called
bimodal Median The median of the beta distribution is the unique real number x = I_{1/2}^{[-1]}(\alpha,\beta) for which the
regularized incomplete beta function I_x(\alpha,\beta) = \tfrac{1}{2} . There is no general
closed-form expression for the
median of the beta distribution for arbitrary values of
α and
β.
Closed-form expressions for particular values of the parameters
α and
β follow: • For symmetric cases
α =
β, median = 1/2. • For
α = 1 and
β > 0, median =1-2^{-1/\beta} (this case is the
mirror-image of the
power function distribution) • For
α > 0 and
β = 1, median = 2^{-1/\alpha} (this case is the power function distribution p. 207) "the average of the two extreme observations uses all the sample information. This illustrates how, for short-tailed distributions, the extreme observations should get more weight." By contrast, it follows that the median of "U-shaped" bimodal distributions with modes at the edge of the distribution (with Beta(
α,
β) such that ) is not robust, as the sample median drops the extreme sample observations from consideration. A practical application of this occurs for example for
random walks, since the probability for the time of the last visit to the origin in a random walk is distributed as the
arcsine distribution Beta(1/2, 1/2): the mean of a number of
realizations of a random walk is a much more robust estimator than the median (which is an inappropriate sample measure estimate in this case).
Geometric mean The logarithm of the
geometric mean GX of a distribution with
random variable X is the arithmetic mean of ln(
X), or, equivalently, its expected value: \ln G_X = \operatorname{E}[\ln X] For a beta distribution, the expected value integral gives: \begin{align} \operatorname{E}[\ln X] &= \int_0^1 \ln x\, f(x;\alpha,\beta)\,dx \\[4pt] &= \int_0^1 \ln x \,\frac{ x^{\alpha-1}(1-x)^{\beta-1}}{\Beta(\alpha,\beta)}\,dx \\[4pt] &= \frac{1}{\Beta(\alpha,\beta)} \, \int_0^1 \frac{\partial x^{\alpha-1}(1-x)^{\beta-1}}{\partial \alpha}\,dx \\[4pt] &= \frac{1}{\Beta(\alpha,\beta)} \frac{\partial}{\partial \alpha} \int_0^1 x^{\alpha-1}(1-x)^{\beta-1}\,dx \\[4pt] &= \frac{1}{\Beta(\alpha,\beta)} \frac{\partial \Beta(\alpha,\beta)}{\partial \alpha} \\[4pt] &= \frac{\partial \ln \Beta(\alpha,\beta)}{\partial \alpha} \\[4pt] &= \frac{\partial \ln \Gamma(\alpha)}{\partial \alpha} - \frac{\partial \ln \Gamma(\alpha + \beta)}{\partial \alpha} \\[4pt] &= \psi(\alpha) - \psi(\alpha + \beta) \end{align} where
ψ is the
digamma function. Therefore, the geometric mean of a beta distribution with shape parameters
α and
β is the exponential of the digamma functions of
α and
β as follows: G_X = e^{\operatorname{E}[\ln X]}= e^{\psi(\alpha) - \psi(\alpha + \beta)} While for a beta distribution with equal shape parameters
α =
β, it follows that skewness = 0 and mode = mean = median = 1/2, the geometric mean is less than 1/2: . The reason for this is that the logarithmic transformation strongly weights the values of
X close to zero, as ln(
X) strongly tends towards negative infinity as
X approaches zero, while ln(
X) flattens towards zero as . Along a line , the following limits apply: \begin{align} &\lim_{\alpha = \beta \to 0} G_X = 0 \\ &\lim_{\alpha = \beta \to \infty} G_X =\tfrac{1}{2} \end{align} Following are the limits with one parameter finite (non-zero) and the other approaching these limits: \begin{align} \lim_{\beta \to 0} G_X = \lim_{\alpha \to \infty} G_X = 1\\ \lim_{\alpha\to 0} G_X = \lim_{\beta \to \infty} G_X = 0 \end{align} The accompanying plot shows the difference between the mean and the geometric mean for shape parameters
α and
β from zero to 2. Besides the fact that the difference between them approaches zero as
α and
β approach infinity and that the difference becomes large for values of
α and
β approaching zero, one can observe an evident asymmetry of the geometric mean with respect to the shape parameters
α and
β. The difference between the geometric mean and the mean is larger for small values of
α in relation to
β than when exchanging the magnitudes of
β and
α.
N. L.Johnson and
S. Kotz This is relevant because the beta distribution is a suitable model for the random behavior of percentages and it is particularly suitable to the statistical modelling of proportions. The geometric mean plays a central role in maximum likelihood estimation, see section "Parameter estimation, maximum likelihood." Actually, when performing maximum likelihood estimation, besides the
geometric mean GX based on the random variable X, also another geometric mean appears naturally: the
geometric mean based on the linear transformation ––, the mirror-image of
X, denoted by
G(1−
X): G_{1-X} = e^{\operatorname{E}[\ln(1-X)] } = e^{\psi(\beta) - \psi(\alpha + \beta)} Along a line , the following limits apply: \begin{align} &\lim_{\alpha = \beta \to 0} G_{1-X} =0 \\ &\lim_{\alpha = \beta \to \infty} G_{1-X} =\tfrac{1}{2} \end{align} Following are the limits with one parameter finite (non-zero) and the other approaching these limits: \begin{align} \lim_{\beta \to 0} G_{(1-X)} = \lim_{\alpha \to \infty} G_{(1-X)} = 0\\ \lim_{\alpha\to 0} G_{(1-X)} = \lim_{\beta \to \infty} G_{(1-X)} = 1 \end{align} It has the following approximate value: G_{(1-X)} \approx \frac{\beta - \frac{1}{2}}{\alpha+\beta-\frac{1}{2}}\text{ if } \alpha, \beta > 1. Although both
GX and
G1−
X are asymmetric, in the case that both shape parameters are equal , the geometric means are equal:
GX =
G(1−
X). This equality follows from the following symmetry displayed between both geometric means: G_X (\Beta(\alpha, \beta) ) = G_{1-X}(\Beta(\beta, \alpha) ).
Harmonic mean X is the arithmetic mean of 1/
X, or, equivalently, its expected value. Therefore, the
harmonic mean (
HX) of a beta distribution with shape parameters
α and
β is: \begin{align} H_X &= \frac{1}{\operatorname{E}\left[\frac{1}{X}\right]} \\ &=\frac{1}{\int_0^1 \frac{f(x;\alpha,\beta)}{x}\,dx} \\ &=\frac{1}{\int_0^1 \frac{x^{\alpha-1}(1-x)^{\beta-1}}{x \Beta(\alpha,\beta)}\,dx} \\ &= \frac{\alpha - 1}{\alpha + \beta - 1}\text{ if } \alpha > 1 \text{ and } \beta > 0 \\ \end{align} The
harmonic mean (
HX) of a beta distribution with
α H_X = \frac{\alpha-1}{2\alpha-1}, showing that for
α =
β the harmonic mean ranges from 0, for
α =
β = 1, to 1/2, for
α =
β → ∞. Following are the limits with one parameter finite (non-zero) and the other approaching these limits: \begin{align} &\lim_{\alpha\to 0} H_X \text{ is undefined} \\ &\lim_{\alpha\to 1} H_X = \lim_{\beta \to \infty} H_X = 0 \\ &\lim_{\beta \to 0} H_X = \lim_{\alpha \to \infty} H_X = 1 \end{align} The harmonic mean plays a role in maximum likelihood estimation for the four parameter case, in addition to the geometric mean. Actually, when performing maximum likelihood estimation for the four parameter case, besides the harmonic mean
HX based on the random variable
X, also another harmonic mean appears naturally: the harmonic mean based on the linear transformation (1 −
X), the mirror-image of
X, denoted by
H1 −
X: H_{1-X} = \frac{1}{\operatorname{E} \left[\frac 1 {1-X}\right]} = \frac{\beta - 1}{\alpha + \beta-1} \text{ if } \beta > 1, \text{ and } \alpha> 0. The
harmonic mean (
H(1 −
X)) of a beta distribution with
β H_{(1-X)} = \frac{\beta-1}{2\beta-1}, showing that for
α =
β the harmonic mean ranges from 0, for
α =
β = 1, to 1/2, for
α =
β → ∞. Following are the limits with one parameter finite (non-zero) and the other approaching these limits: \begin{align} &\lim_{\beta\to 0} H_{1-X} \text{ is undefined} \\ &\lim_{\beta\to 1} H_{1-X} = \lim_{\alpha\to \infty} H_{1-X} = 0 \\ &\lim_{\alpha\to 0} H_{1-X} = \lim_{\beta\to \infty} H_{1-X} = 1 \end{align} Although both
HX and
H1−
X are asymmetric, in the case that both shape parameters are equal
α =
β, the harmonic means are equal:
HX =
H1−
X. This equality follows from the following symmetry displayed between both harmonic means: H_X (\Beta(\alpha, \beta) )=H_{1-X}(\Beta(\beta, \alpha) ) \text{ if } \alpha, \beta> 1.
Measures of statistical dispersion Variance The
variance (the second moment centered on the mean) of a beta distribution
random variable X with parameters
α and
β is: \operatorname{var}(X) = \operatorname{E}\left[(X - \mu)^2\right] = \frac{\alpha \beta}{\left(\alpha + \beta\right)^2 \left(\alpha + \beta + 1\right)} Letting
α =
β in the above expression one obtains \operatorname{var}(X) = \frac{1}{4(2\beta + 1)}, showing that for
α =
β the variance decreases monotonically as increases. Setting in this expression, one finds the maximum variance var(
X) = 1/4 Kurtosis has also been used to distinguish the seismic signal generated by a person's footsteps from other signals. As persons or other targets moving on the ground generate continuous signals in the form of seismic waves, one can separate different targets based on the seismic waves they generate. Kurtosis is sensitive to impulsive signals, so it's much more sensitive to the signal generated by human footsteps than other signals generated by vehicles, winds, noise, etc. Unfortunately, the notation for kurtosis has not been standardized. Kenney and Keeping use the symbol γ2 for the
excess kurtosis, but
Abramowitz and Stegun use different terminology. To prevent confusion between kurtosis (the fourth moment centered on the mean, normalized by the square of the variance) and excess kurtosis, when using symbols, they will be spelled out as follows: \begin{align} \text{excess kurtosis} &=\text{kurtosis} - 3\\ &=\frac{\operatorname{E}[(X - \mu)^4]}{{(\operatorname{var}(X))^{2}}}-3\\ &=\frac{6[\alpha^3-\alpha^2(2\beta - 1) + \beta^2(\beta + 1) - 2\alpha\beta(\beta + 2)]}{\alpha \beta (\alpha + \beta + 2)(\alpha + \beta + 3)}\\ &=\frac{6[(\alpha - \beta)^2 (\alpha +\beta + 1) - \alpha \beta (\alpha + \beta + 2)]} {\alpha \beta (\alpha + \beta + 2) (\alpha + \beta + 3)} . \end{align} Letting
α =
β in the above expression one obtains \text{excess kurtosis} =- \frac{6}{3+2\alpha} \text{ if } \alpha = \beta. Therefore, for symmetric beta distributions, the excess kurtosis is negative, increasing from a minimum value of −2 at the limit as {
α =
β} → 0, and approaching a maximum value of zero as {
α =
β} → ∞. The value of −2 is the minimum value of excess kurtosis that any distribution (not just beta distributions, but any distribution of any possible kind) can ever achieve. This minimum value is reached when all the probability density is entirely concentrated at each end
x = 0 and
x = 1, with nothing in between: a 2-point
Bernoulli distribution with equal probability 1/2 at each end (a coin toss: see section below "Kurtosis bounded by the square of the skewness" for further discussion). The description of
kurtosis as a measure of the "potential outliers" (or "potential rare, extreme values") of the probability distribution, is correct for all distributions including the beta distribution. When rare, extreme values can occur in the beta distribution, the higher its kurtosis; otherwise, the kurtosis is lower. For
α ≠
β, skewed beta distributions, the excess kurtosis can reach unlimited positive values (particularly for
α → 0 for finite
β, or for
β → 0 for finite
α) because the side away from the mode will produce occasional extreme values. Minimum kurtosis takes place when the mass density is concentrated equally at each end (and therefore the mean is at the center), and there is no probability mass density in between the ends. Using the
parametrization in terms of mean
μ and sample size
ν =
α +
β: \begin{align} \alpha & {} = \mu \nu ,\text{ where }\nu =(\alpha + \beta) >0\\ \beta & {} = (1 - \mu) \nu , \text{ where }\nu =(\alpha + \beta) >0. \end{align} one can express the excess kurtosis in terms of the mean
μ and the sample size
ν as follows: \text{excess kurtosis} =\frac{6}{3 + \nu}\bigg (\frac{(1 - 2 \mu)^2 (1 + \nu)}{\mu (1 - \mu) (2 + \nu)} - 1 \bigg ) The excess kurtosis can also be expressed in terms of just the following two parameters: the variance var, and the sample size
ν as follows: \text{excess kurtosis} =\frac{6}{(3 + \nu)(2 + \nu)}\left(\frac{1}{\text{ var }} - 6 - 5 \nu \right)\text{ if }\text{var} and, in terms of the variance
var and the mean
μ as follows: \text{excess kurtosis} =\frac{6 \text{ var } (1 - \text{ var } - 5 \mu (1 - \mu) )}{(\text{var } + \mu (1 - \mu))(2\text{ var } + \mu (1 - \mu) )}\text{ if }\text{var} The plot of excess kurtosis as a function of the variance and the mean shows that the minimum value of the excess kurtosis (−2, which is the minimum possible value for excess kurtosis for any distribution) is intimately coupled with the maximum value of variance (1/4) and the symmetry condition: the mean occurring at the midpoint (
μ = 1/2). This occurs for the symmetric case of
α =
β = 0, with zero skewness. At the limit, this is the 2 point
Bernoulli distribution with equal probability 1/2 at each
Dirac delta function end
x = 0 and
x = 1 and zero probability everywhere else. (A coin toss: one face of the coin being
x = 0 and the other face being
x = 1.) Variance is maximum because the distribution is bimodal with nothing in between the two modes (spikes) at each end. Excess kurtosis is minimum: the probability density "mass" is zero at the mean and it is concentrated at the two peaks at each end. Excess kurtosis reaches the minimum possible value (for any distribution) when the probability density function has two spikes at each end: it is bi-"peaky" with nothing in between them. On the other hand, the plot shows that for extreme skewed cases, where the mean is located near one or the other end (
μ = 0 or
μ = 1), the variance is close to zero, and the excess kurtosis rapidly approaches infinity when the mean of the distribution approaches either end. Alternatively, the excess kurtosis can also be expressed in terms of just the following two parameters: the square of the skewness, and the sample size ν as follows: \text{excess kurtosis} =\frac{6}{3 + \nu}\bigg(\frac{(2 + \nu)}{4} (\text{skewness})^2 - 1\bigg)\text{ if (skewness)}^2-2 From this last expression, one can obtain the same limits published over a century ago by
Karl Pearson : \begin{align} \varphi_X(\alpha;\beta;t) &= \operatorname{E}\left[e^{itX}\right]\\ &= \int_0^1 e^{itx} f(x;\alpha,\beta) \, dx \\ &={}_1F_1(\alpha; \alpha+\beta; it)\!\\ &=\sum_{n=0}^\infty \frac {\alpha^\overline{n} (it)^n} {(\alpha+\beta)^\overline{n} n!}\\ &= 1 +\sum_{k=1}^{\infty} \left( \prod_{r=0}^{k-1} \frac{\alpha+r}{\alpha+\beta+r} \right) \frac{(it)^k}{k!} \end{align} where : x^\overline{n}=x(x+1)(x+2)\cdots(x+n-1) is the
rising factorial. The value of the characteristic function for
t = 0, is one: \varphi_X(\alpha;\beta;0)={}_1F_1(\alpha; \alpha+\beta; 0) = 1. Also, the real and imaginary parts of the characteristic function enjoy the following symmetries with respect to the origin of variable
t: \operatorname{Re} \left [ {}_1F_1(\alpha; \alpha+\beta; it) \right ] = \operatorname{Re} \left [ {}_1F_1(\alpha; \alpha+\beta; - it) \right ] \operatorname{Im} \left [ {}_1F_1(\alpha; \alpha+\beta; it) \right ] = - \operatorname{Im} \left [ {}_1F_1(\alpha; \alpha+\beta; - it) \right ] The symmetric case
α =
β simplifies the characteristic function of the beta distribution to a
Bessel function, since in the special case
α +
β = 2
α the
confluent hypergeometric function (of the first kind) reduces to a
Bessel function (the modified Bessel function of the first kind I_{\alpha-\frac 1 2} ) using
Kummer's second transformation as follows: \begin{align} {}_1F_1(\alpha;2\alpha; it) &= e^{\frac{it}{2}} {}_0F_1 \left(; \alpha+\tfrac{1}{2}; \frac{(it)^2}{16} \right) \\ &= e^{\frac{it}{2}} \left(\frac{it}{4}\right)^{\frac{1}{2}-\alpha} \Gamma\left(\alpha+\tfrac{1}{2}\right) I_{\alpha-\frac 1 2} \left(\frac{it}{2}\right).\end{align} In the accompanying plots, the
real part (Re) of the
characteristic function of the beta distribution is displayed for symmetric (
α =
β) and skewed (
α ≠
β) cases.
Other moments Moment generating function It also follows
Moments of transformed random variables Moments of linearly transformed, product and inverted random variables One can also show the following expectations for a transformed random variable, as they usually transform various shapes (including J-shapes) into (usually skewed) bell-shaped densities over the logit variable, and they may remove the end singularities over the original variable: \begin{align} \operatorname{E}\left[\ln \frac{X}{1-X} \right] &= \psi(\alpha) - \psi(\beta)= \operatorname{E}[\ln X] +\operatorname{E} \left[\ln \frac{1}{1-X} \right],\\ \operatorname{E}\left [\ln \frac{1-X}{X} \right ] &=\psi(\beta) - \psi(\alpha)= - \operatorname{E} \left[\ln \frac{X}{1-X} \right] . \end{align} Johnson considered the distribution of the
logit – transformed variable ln(
X/1 −
X), including its moment generating function and approximations for large values of the shape parameters. This transformation extends the finite support [0, 1] based on the original variable
X to infinite support in both directions of the real line (−∞, +∞). The logit of a beta variate has the
logistic-beta distribution. Higher order logarithmic moments can be derived by using the representation of a beta distribution as a proportion of two gamma distributions and differentiating through the integral. They can be expressed in terms of higher order poly-gamma functions as follows: \begin{align} \operatorname{E} \left [\ln^2(X) \right ] &= (\psi(\alpha) - \psi(\alpha + \beta))^2+\psi_1(\alpha)-\psi_1(\alpha+\beta), \\ \operatorname{E} \left [\ln^2(1-X) \right ] &= (\psi(\beta) - \psi(\alpha + \beta))^2+\psi_1(\beta)-\psi_1(\alpha+\beta), \\ \operatorname{E} \left [\ln (X)\ln(1-X) \right ] &=(\psi(\alpha) - \psi(\alpha + \beta))(\psi(\beta) - \psi(\alpha + \beta)) -\psi_1(\alpha+\beta). \end{align} therefore the
variance of the logarithmic variables and
covariance of ln(
X) and ln(1−
X) are: \begin{align} \operatorname{cov}[\ln X, \ln(1-X)] &= \operatorname{E}\left[\ln X \ln(1-X)\right] - \operatorname{E}[\ln X]\operatorname{E}[\ln(1-X)] \\ &= -\psi_1(\alpha+\beta) \\ & \\ \operatorname{var}[\ln X] &= \operatorname{E}[\ln^2 X] - (\operatorname{E}[\ln X])^2 \\ &= \psi_1(\alpha) - \psi_1(\alpha + \beta) \\ &= \psi_1(\alpha) + \operatorname{cov}[\ln X, \ln(1-X)] \\ & \\ \operatorname{var}[\ln (1-X)] &= \operatorname{E}[\ln^2 (1-X)] - (\operatorname{E}[\ln (1-X)])^2 \\ &= \psi_1(\beta) - \psi_1(\alpha + \beta) \\ &= \psi_1(\beta) + \operatorname{cov}[\ln X, \ln(1-X)] \end{align} where the
trigamma function, denoted
ψ1(
α), is the second of the
polygamma functions, and is defined as the derivative of the
digamma function: \psi_1(\alpha) = \frac{d^2\ln\Gamma(\alpha)}{d\alpha^2}= \frac{d \psi(\alpha)}{d\alpha}. The variances and covariance of the logarithmically transformed variables
X and (1 −
X) are different, in general, because the logarithmic transformation destroys the mirror-symmetry of the original variables
X and (1 −
X), as the logarithm approaches negative infinity for the variable approaching zero. These logarithmic variances and covariance are the elements of the
Fisher information matrix for the beta distribution. They are also a measure of the curvature of the log likelihood function (see section on Maximum likelihood estimation). The variances of the log inverse variables are identical to the variances of the log variables: \begin{align} \operatorname{var}\left[\ln \frac{1}{X} \right] &=\operatorname{var}[\ln X] = \psi_1(\alpha) - \psi_1(\alpha + \beta), \\ \operatorname{var}\left[\ln \frac{1}{1-X} \right] &=\operatorname{var}[\ln (1-X)] = \psi_1(\beta) - \psi_1(\alpha + \beta), \\ \operatorname{cov}\left[\ln \frac{1}{X} ,\, \ln \frac{1}{1-X} \right] &=\operatorname{cov}[\ln X, \ln(1-X)]= -\psi_1(\alpha + \beta).\end{align} It also follows that the variances of the
logit-transformed variables are \begin{align} \operatorname{var}\left[\ln \frac{X}{1-X} \right] &= \operatorname{var}\left[\ln \frac{1-X}{X} \right] \\ &= -\operatorname{cov}\left [\ln \frac{X}{1-X}, \, \ln \frac{1-X}{X} \right] \\[1ex] &= \psi_1(\alpha) + \psi_1(\beta). \end{align}
Quantities of information (entropy) Given a beta distributed random variable,
X ~ Beta(
α,
β), the
differential entropy of
X is (measured in
nats), the expected value of the negative of the logarithm of the
probability density function: \begin{align} h(X) &= \operatorname{E}\left[-\ln f(X;\alpha,\beta)\right] \\[4pt] &= \int_0^1 -f(x;\alpha,\beta) \ln f(x;\alpha,\beta) \, dx \\[4pt] &= \ln \Beta(\alpha,\beta) - (\alpha-1)\psi(\alpha) - (\beta-1) \psi(\beta)+(\alpha+\beta-2) \psi(\alpha+\beta) \end{align} where
f(
x;
α,
β) is the
probability density function of the beta distribution: f(x;\alpha,\beta) = \frac{x^{\alpha-1} \left(1-x\right)^{\beta-1}}{\Beta(\alpha,\beta)} The
digamma function ψ appears in the formula for the differential entropy as a consequence of Euler's integral formula for the
harmonic numbers which follows from the integral: \int_0^1 \frac {1-x^{\alpha-1}}{1-x} \, dx = \psi(\alpha)-\psi(1) The
differential entropy of the beta distribution is negative for all values of
α and
β greater than zero, except at
α =
β = 1 (for which values the beta distribution is the same as the
uniform distribution), where the
differential entropy reaches its
maximum value of zero. It is to be expected that the maximum entropy should take place when the beta distribution becomes equal to the uniform distribution, since uncertainty is maximal when all possible events are equiprobable. For
α or
β approaching zero, the
differential entropy approaches its
minimum value of negative infinity. For (either or both)
α or
β approaching zero, there is a maximum amount of order: all the probability density is concentrated at the ends, and there is zero probability density at points located between the ends. Similarly for (either or both)
α or
β approaching infinity, the differential entropy approaches its minimum value of negative infinity, and a maximum amount of order. If either
α or
β approaches infinity (and the other is finite) all the probability density is concentrated at an end, and the probability density is zero everywhere else. If both shape parameters are equal (the symmetric case),
α =
β, and they approach infinity simultaneously, the probability density becomes a spike (
Dirac delta function) concentrated at the middle
x = 1/2, and hence there is 100% probability at the middle
x = 1/2 and zero probability everywhere else. The (continuous case)
differential entropy was introduced by Shannon in his original paper (where he named it the "entropy of a continuous distribution"), as the concluding part of the same paper where he defined the
discrete entropy. It is known since then that the differential entropy may differ from the
infinitesimal limit of the discrete entropy by an infinite offset, therefore the differential entropy can be negative (as it is for the beta distribution). What really matters is the relative value of entropy. Given two beta distributed random variables,
X1 ~ Beta(
α,
β) and
X2 ~ Beta('
, '), the
cross-entropy is (measured in nats) \begin{align} H(X_1,X_2) &= \int_0^1 - f(x;\alpha,\beta) \ln f(x;\alpha',\beta') \,dx \\[4pt] &= \ln \Beta(\alpha',\beta') - (\alpha'-1)\psi(\alpha) - (\beta'-1)\psi(\beta) + \left(\alpha'+\beta'-2\right) \psi(\alpha+\beta). \end{align} The
cross entropy has been used as an error metric to measure the distance between two hypotheses. Its absolute value is minimum when the two distributions are identical. It is the information measure most closely related to the log maximum likelihood Expressing the mode (only for
α,
β > 1), and the mean in terms of
α and
β: \frac{ \alpha - 1 }{ \alpha + \beta - 2 } \le \text{median} \le \frac{ \alpha }{ \alpha + \beta } , If 1 1 the absolute distance between the mean and the median is less than 5% of the distance between the maximum and minimum values of
x. On the other hand, the absolute distance between the mean and the mode can reach 50% of the distance between the maximum and minimum values of
x, for the (
pathological) case of
α = 1 and
β = 1, for which values the beta distribution approaches the uniform distribution and the
differential entropy approaches its
maximum value, and hence maximum "disorder". For example, for
α = 1.0001 and
β = 1.00000001: • mode = 0.9999; PDF(mode) = 1.00010 • mean = 0.500025; PDF(mean) = 1.00003 • median = 0.500035; PDF(median) = 1.00003 • mean − mode = −0.499875 • mean − median = −9.65538 × 10−6 where PDF stands for the value of the
probability density function.
Mean, geometric mean and harmonic mean relationship As remarked by
Feller, published in 1916, a graph with the
kurtosis as the vertical axis (
ordinate) and the square of the
skewness as the horizontal axis (
abscissa), in which a number of distributions were displayed. The region occupied by the beta distribution is bounded by the following two
lines in the (skewness2,kurtosis)
plane, or the (skewness2,excess kurtosis)
plane: (\text{skewness})^2+1 or, equivalently, (\text{skewness})^2-2 At a time when there were no powerful digital computers,
Karl Pearson accurately computed further boundaries, (Pearson 1895, pp. 357, 360, 373–376) also showed that the
gamma distribution is a Pearson type III distribution. Hence this boundary line for Pearson's type III distribution is known as the gamma line. (This can be shown from the fact that the excess kurtosis of the gamma distribution is 6/
k and the square of the skewness is 4/
k, hence (excess kurtosis − (3/2) skewness2 = 0) is identically satisfied by the gamma distribution regardless of the value of the parameter "k"). Pearson later noted that the
chi-squared distribution is a special case of Pearson's type III and also shares this boundary line (as it is apparent from the fact that for the
chi-squared distribution the excess kurtosis is 12/
k and the square of the skewness is 8/
k, hence (excess kurtosis − (3/2) skewness2 = 0) is identically satisfied regardless of the value of the parameter "k"). This is to be expected, since the chi-squared distribution
X ~ χ2(
k) is a special case of the gamma distribution, with parametrization X ~ Γ(k/2, 1/2) where k is a positive integer that specifies the "number of degrees of freedom" of the chi-squared distribution. An example of a beta distribution near the upper boundary (excess kurtosis − (3/2) skewness2 = 0) is given by α = 0.1, β = 1000, for which the ratio (excess kurtosis)/(skewness2) = 1.49835 approaches the upper limit of 1.5 from below. An example of a beta distribution near the lower boundary (excess kurtosis + 2 − skewness2 = 0) is given by α= 0.0001, β = 0.1, for which values the expression (excess kurtosis + 2)/(skewness2) = 1.01621 approaches the lower limit of 1 from above. In the infinitesimal limit for both
α and
β approaching zero symmetrically, the excess kurtosis reaches its minimum value at −2. This minimum value occurs at the point at which the lower boundary line intersects the vertical axis (
ordinate). (However, in Pearson's original chart, the ordinate is kurtosis, instead of excess kurtosis, and it increases downwards rather than upwards). Values for the skewness and excess kurtosis below the lower boundary (excess kurtosis + 2 − skewness2 = 0) cannot occur for any distribution, and hence
Karl Pearson appropriately called the region below this boundary the "impossible region". The boundary for this "impossible region" is determined by (symmetric or skewed) bimodal U-shaped distributions for which the parameters
α and
β approach zero and hence all the probability density is concentrated at the ends:
x = 0, 1 with practically nothing in between them. Since for
α ≈
β ≈ 0 the probability density is concentrated at the two ends
x = 0 and
x = 1, this "impossible boundary" is determined by a
Bernoulli distribution, where the two only possible outcomes occur with respective probabilities
p and
q = 1 −
p. For cases approaching this limit boundary with symmetry
α =
β, skewness ≈ 0, excess kurtosis ≈ −2 (this is the lowest excess kurtosis possible for any distribution), and the probabilities are
p ≈
q ≈ 1/2. For cases approaching this limit boundary with skewness, excess kurtosis ≈ −2 + skewness2, and the probability density is concentrated more at one end than the other end (with practically nothing in between), with probabilities p = \tfrac{\beta}{\alpha + \beta} at the left end
x = 0 and q = 1-p = \tfrac{\alpha}{\alpha + \beta} at the right end
x = 1.
Symmetry All statements are conditional on
α,
β > 0: •
Probability density function reflection symmetry f(x;\alpha,\beta) = f(1-x;\beta,\alpha) •
Cumulative distribution function reflection symmetry plus unitary
translation F(x;\alpha,\beta) = I_x(\alpha,\beta) = 1- F(1- x;\beta,\alpha) = 1 - I_{1-x}(\beta,\alpha) •
Mode reflection symmetry plus unitary
translation \operatorname{mode}(\Beta(\alpha, \beta))= 1-\operatorname{mode}(\Beta(\beta, \alpha)),\text{ if }\Beta(\beta, \alpha)\ne \Beta(1,1) •
Median reflection symmetry plus unitary
translation \operatorname{median} (\Beta(\alpha, \beta) )= 1 - \operatorname{median} (\Beta(\beta, \alpha)) •
Mean reflection symmetry plus unitary
translation \mu (\Beta(\alpha, \beta) )= 1 - \mu (\Beta(\beta, \alpha) ) •
Geometric means each is individually asymmetric, the following symmetry applies between the geometric mean based on
X and the geometric mean based on its
reflection 1−
X G_X (\Beta(\alpha, \beta) ) = G_{1-X}(\Beta(\beta, \alpha) ) •
Harmonic means each is individually asymmetric, the following symmetry applies between the harmonic mean based on
X and the harmonic mean based on its
reflection 1−
X H_X (\Beta(\alpha, \beta) ) = H_{1-X}(\Beta(\beta, \alpha) ) \text{ if } \alpha, \beta > 1. •
Variance symmetry \operatorname{var} (\Beta(\alpha, \beta) )=\operatorname{var} (\Beta(\beta, \alpha) ) •
Geometric variances each is individually asymmetric, the following symmetry applies between the log geometric variance based on X and the log geometric variance based on its
reflection 1−
X \ln(\operatorname{var}_{GX} (\Beta(\alpha, \beta))) = \ln(\operatorname{var}_{G(1-X)}(\Beta(\beta, \alpha))) •
Geometric covariance symmetry \ln \operatorname{cov}_{GX,(1-X)}(\Beta(\alpha, \beta))=\ln \operatorname{cov}_{GX,(1-X)}(\Beta(\beta, \alpha)) •
Mean absolute deviation around the mean symmetry \operatorname{E}[|X - E[X]| ] (\Beta(\alpha, \beta))=\operatorname{E}[| X - E[X]|] (\Beta(\beta, \alpha)) •
Skewness skew-symmetry \operatorname{skewness} (\Beta(\alpha, \beta) )= - \operatorname{ skewness} (\Beta(\beta, \alpha) ) •
Excess kurtosis symmetry \text{excess kurtosis} (\Beta(\alpha, \beta) )= \text{excess kurtosis} (\Beta(\beta, \alpha) ) •
Characteristic function symmetry of
Real part (with respect to the origin of variable "
t") \text{Re} [{}_1F_1(\alpha; \alpha+\beta; it) ] = \text{Re} [ {}_1F_1(\alpha; \alpha+\beta; - it)] •
Characteristic function skew-symmetry of
Imaginary part (with respect to the origin of variable "
t") \text{Im} [{}_1F_1(\alpha; \alpha+\beta; it) ] = - \text{Im} [ {}_1F_1(\alpha; \alpha+\beta; - it) ] •
Characteristic function symmetry of
Absolute value (with respect to the origin of variable "
t") \text{Abs} [ {}_1F_1(\alpha; \alpha+\beta; it) ] = \text{Abs} [ {}_1F_1(\alpha; \alpha+\beta; - it) ] •
Differential entropy symmetry h(\Beta(\alpha, \beta) )= h(\Beta(\beta, \alpha) ) •
Relative entropy (also called Kullback–Leibler divergence) symmetry D_{\mathrm{KL}}(X_1\parallel X_2) = D_{\mathrm{KL}}(X_2\parallel X_1), \text{ if }h(X_1) = h(X_2)\text{, for (skewed) }\alpha \neq \beta •
Fisher information matrix symmetry {\mathcal{I}}_{i, j} = {\mathcal{I}}_{j, i}
Geometry of the probability density function Inflection points For certain values of the shape parameters α and β, the
probability density function has
inflection points, at which the
curvature changes sign. The position of these inflection points can be useful as a measure of the
dispersion or spread of the distribution. Defining the following quantity: \kappa =\frac{\sqrt{\frac{(\alpha-1)(\beta-1)}{\alpha+\beta-3}}}{\alpha+\beta-2} Points of inflection occur, •
α =
β → 0 is a 2-point
Bernoulli distribution with equal probability 1/2 at each
Dirac delta function end
x = 0 and
x = 1 and zero probability everywhere else. A coin toss: one face of the coin being
x = 0 and the other face being
x = 1. • \lim_{\alpha = \beta \to 0} \operatorname{var}(X) = \tfrac{1}{4} • \lim_{\alpha = \beta \to 0} \operatorname{excess \ kurtosis}(X) = - 2 a lower value than this is impossible for any distribution to reach. • The
differential entropy approaches a
minimum value of −∞ •
α = β = 1 • the uniform distribution (continuous)|uniform [0, 1] distribution • no mode • var(
X) = 1/12 • excess kurtosis(
X) = −6/5 • The (negative anywhere else)
differential entropy reaches its
maximum value of zero • CF = Sinc (t) • '''
α =
β > 1''' • symmetric
unimodal • mode = 1/2. • 0 2 is bell-shaped, with
inflection points located to either side of the mode • 0 \lim_{\alpha = \beta \to \infty} \operatorname{var}(X) = 0 • \lim_{\alpha = \beta \to \infty} \operatorname{excess \ kurtosis}(X) = 0 • The
differential entropy approaches a
minimum value of −∞
Skewed (α ≠ β) The density function is
skewed. An interchange of parameter values yields the
mirror image (the reverse) of the initial curve, some more specific cases: • '''
α β. • bimodal: left mode = 0, right mode = 1, anti-mode = \tfrac{\alpha-1}{\alpha + \beta-2} • 0 1,
β > 1''' •
unimodal (magenta & cyan plots), • Positive skew for
α β. • \text{mode}= \tfrac{\alpha-1}{\alpha + \beta-2} • 0 0 (maximum variance occurs for \alpha=\tfrac{-1+\sqrt{5}}{2}, \beta=1, or
α =
Φ the
golden ratio conjugate) • '''
α ≥ 1,
β 0 (maximum variance occurs for \alpha=1, \beta=\tfrac{-1+\sqrt{5}}{2}, or
β =
Φ the
golden ratio conjugate) • '''
α = 1,
β > 1''' • positively skewed, • strictly decreasing (red plot), • a reversed (mirror-image)
power function distribution • mean = 1 / (
β + 1) • median = 1 - 1/21/
β • mode = 0 • α = 1, 1 1-\tfrac{1}{\sqrt{2}} • 1/18 \text{median}=1-\tfrac {1}{\sqrt{2}} • var(
X) = 1/18 • α = 1, β > 2 • reverse J-shaped with a right tail, •
convex • 0 • 0 1, β = 1''' • negatively skewed, • strictly increasing (green plot), • the power function distribution • mean = α / (α + 1) • median = 1/21/α • mode = 1 • 2 > α > 1, β = 1 •
concave • \tfrac{1}{2} • 1/18 \text{median}=\tfrac {1}{\sqrt{2}} • var(
X) = 1/18 • α > 2, β = 1 • J-shaped with a left tail,
convex • \tfrac{1}{\sqrt{2}} • 0 < var(
X) < 1/18 == Related distributions ==