Beta distribution

Probability density function The probability density function (PDF) of the beta distribution, for 0 \leq x \leq 1 or 0 , and shape parameters \alpha , \beta > 0 , is a power function of the variable x and of its reflection (1-x) as follows: \begin{align} f(x;\alpha,\beta) & = \mathrm{constant}\cdot x^{\alpha-1}(1-x)^{\beta-1} \\[3pt] & = \frac{x^{\alpha-1}(1-x)^{\beta-1}}{\displaystyle \int_0^1 u^{\alpha-1} (1-u)^{\beta-1}\, du} \\[6pt] & = \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)}\, x^{\alpha-1}(1-x)^{\beta-1} \\[6pt] & = \frac{1}{\Beta(\alpha,\beta)} x^{\alpha-1}(1-x)^{\beta-1} \end{align} where \Gamma(z) is the gamma function. The beta function, \Beta, is a normalization constant to ensure that the total probability is 1. In the above equations x is a realization—an observed value that actually occurred—of a random variable X . Several authors, including N. L. Johnson and S. Kotz, and X \sim \beta_{\alpha, \beta}. Cumulative distribution function The cumulative distribution function is F(x;\alpha,\beta) = \frac{\Beta{}(x;\alpha,\beta)}{\Beta{}(\alpha,\beta)} = I_x(\alpha,\beta) where \Beta(x;\alpha,\beta) is the incomplete beta function and I_x(\alpha,\beta) is the regularized incomplete beta function. For positive integers α and β, the cumulative distribution function of a beta distribution can be expressed in terms of the cumulative distribution function of a binomial distribution with F_{\text{beta}}(x;\alpha,\beta) = F_{\text{binomial}}(\beta-1;\alpha+\beta-1,1-x). Alternative parameterizations Two parameters Mean and sample size The beta distribution may also be reparameterized in terms of its mean μ and the sum of the two shape parameters ( p. 83). Denoting by αPosterior and βPosterior the shape parameters of the posterior beta distribution resulting from applying Bayes' theorem to a binomial likelihood function and a prior probability, the interpretation of the addition of both shape parameters to be sample size = ν = α·Posterior + β·Posterior is only correct for the Haldane prior probability Beta(0,0). Specifically, for the Bayes (uniform) prior Beta(1,1) the correct interpretation would be sample size = α·Posterior + β Posterior − 2, or ν = (sample size) + 2. For sample size much larger than 2, the difference between these two priors becomes negligible. (See section Bayesian inference for further details.) ν = α + β is referred to as the "sample size" of a beta distribution, but one should remember that it is, strictly speaking, the "sample size" of a binomial likelihood function only when using a Haldane Beta(0,0) prior in Bayes' theorem. This parametrization may be useful in Bayesian parameter estimation. For example, one may administer a test to a number of individuals. If it is assumed that each person's score (0 ≤ θ ≤ 1) is drawn from a population-level beta distribution, then an important statistic is the mean of this population-level distribution. The mean and sample size parameters are related to the shape parameters α and β via \begin{align} \alpha &= \omega (\kappa - 2) + 1 \\ \beta &= (1 - \omega)(\kappa - 2) + 1 \end{align} For the mode, 0 , to be well-defined, we need \alpha,\beta>1, or equivalently \kappa>2. If instead we define the concentration as c=\alpha+\beta-2, the condition simplifies to c>0 and the beta density at \alpha=1+c\omega and \beta=1+c(1-\omega) can be written as: f(x;\omega,c) = \frac{x^{c\omega}(1-x)^{c(1-\omega)}}{\Beta\bigl(1+c\omega,1+c(1-\omega)\bigr)} where c directly scales the sufficient statistics, \log(x) and \log(1-x). Note also that in the limit, c\to0, the distribution becomes flat. Mean and variance Solving the system of (coupled) equations given in the above sections as the equations for the mean and the variance of the beta distribution in terms of the original parameters α and β, one can express the α and β parameters in terms of the mean (μ) and the variance (var): \begin{align} \nu &= \alpha + \beta = \frac{\mu(1-\mu)}{\mathrm{var}}-1, \text{ where }\nu =(\alpha + \beta) >0,\text{ therefore: }\text{var} This parametrization of the beta distribution may lead to a more intuitive understanding than the one based on the original parameters α and β. For example, by expressing the mode, skewness, excess kurtosis and differential entropy in terms of the mean and the variance: Four parameters A beta distribution with the two shape parameters α and β is supported on the range [0,1] or (0,1). It is possible to alter the location and scale of the distribution by introducing two further parameters representing the minimum, a, and maximum c (c > a), values of the distribution, by a linear transformation substituting the non-dimensional variable x in terms of the new variable y (with support [a,c] or (a,c)) and the parameters a and c: y = x(c-a) + a, \text{ therefore } x = \frac{y-a}{c-a}. The probability density function of the four parameter beta distribution is equal to the two parameter distribution, scaled by the range (c − a), (so that the total area under the density curve equals a probability of one), and with the "y" variable shifted and scaled as follows: \begin{align} f(y; \alpha, \beta, a, c) = \frac{f(x;\alpha,\beta)}{c-a} &= \frac{\left(\frac{y-a}{c-a}\right)^{\alpha-1} \left (\frac{c-y}{c-a} \right)^{\beta-1} }{(c-a)B(\alpha, \beta)} \\[1ex] &= \frac{ (y-a)^{\alpha-1} (c-y)^{\beta-1} }{(c-a)^{\alpha+\beta-1}B(\alpha, \beta)}. \end{align} That a random variable Y is beta-distributed with four parameters α, β, a, and c will be denoted by: Y \sim \operatorname{Beta}(\alpha, \beta, a, c). Some measures of central location are scaled (by (c − a)) and shifted (by a), as follows: \begin{align} \mu_Y &= \mu_X(c-a) + a \\[1ex] & = \frac{\alpha}{\alpha+\beta} \left(c-a\right) + a = \frac{\alpha c+ \beta a}{\alpha+\beta} \end{align} \begin{align} \text{mode}(Y) &=\text{mode}(X)(c-a) + a \\[1ex] & = \frac{\alpha - 1}{\alpha+\beta - 2} \left(c-a\right) + a \\[1ex] & = \frac{(\alpha-1) c+(\beta-1) a}{\alpha+\beta-2}\ , & \text{ if } \alpha,\, \beta>1 \end{align} \begin{align} \text{median}(Y) &= \text{median}(X)(c-a) + a \\[1ex] & = I_{\frac{1}{2}}^{[-1]}(\alpha,\beta) \left(c-a\right)+a \end{align} Note: the geometric mean and harmonic mean cannot be transformed by a linear transformation in the way that the mean, median and mode can. The shape parameters of Y can be written in term of its mean and variance as \begin{align} \alpha &= \frac{\left(a - \mu_Y\right) \left(a \, c - a \, \mu_Y - c \, \mu_Y + \mu_Y^2 + \sigma_Y^2\right)}{\sigma_Y^2(c-a)} \\ \beta &= -\frac{\left(c - \mu_Y\right) \left(a \, c - a \, \mu_Y - c \, \mu_Y + \mu_Y^2 + \sigma_Y^2\right)}{\sigma_Y^2(c-a)} \end{align} The statistical dispersion measures are scaled (they do not need to be shifted because they are already centered on the mean) by the range (c − a), linearly for the mean deviation and nonlinearly for the variance: \begin{align} &\text{(mean deviation around mean)}(Y) \\[1ex] &= (\text{(mean deviation around mean)}(X))(c-a) \\ &= \frac{2 \alpha^\alpha \beta^\beta}{\Beta(\alpha,\beta)(\alpha + \beta)^{\alpha + \beta + 1}}(c-a) \end{align} \text{var}(Y) = \text{var}(X)(c-a)^2 =\frac{\alpha\beta (c-a)^2}{(\alpha+\beta)^2(\alpha+\beta+1)}. Since the skewness and excess kurtosis are non-dimensional quantities (as moments centered on the mean and normalized by the standard deviation), they are independent of the parameters a and c, and therefore equal to the expressions given above in terms of X (with support [0,1] or (0,1)): \text{skewness}(Y) =\text{skewness}(X) = \frac{2 (\beta - \alpha) \sqrt{\alpha + \beta + 1} }{(\alpha + \beta + 2) \sqrt{\alpha \beta}}. \text{kurtosis excess}(Y) =\text{kurtosis excess}(X) = \frac{6\left[(\alpha - \beta)^2 (\alpha +\beta + 1) - \alpha \beta (\alpha + \beta + 2)\right]} {\alpha \beta (\alpha + \beta + 2) (\alpha + \beta + 3)} ==Properties==