Real-world observations such as the measurements of yesterday's rain throughout the day typically cannot be complete sets of all possible observations that could be made. As such, the variance calculated from the finite set will in general not match the variance that would have been calculated from the full population of possible observations. This means that one
estimates the mean and variance from a limited set of observations by using an
estimator equation. The estimator is a function of the
sample of
observations drawn without observational bias from the whole
population of potential observations. In this example, the sample would be the set of actual measurements of yesterday's rainfall from available rain gauges within the geography of interest. The simplest estimators for population mean and population variance are simply the mean and variance of the sample, the
sample mean and
(uncorrected) sample variance – these are
consistent estimators (they converge to the value of the whole population as the number of samples increases) but can be improved. Most simply, the sample variance is computed as the sum of
squared deviations about the (sample) mean, divided by as the number of samples. However, using values other than improves the estimator in various ways. Four common values for the denominator are , , , and : is the simplest (the variance of the sample), eliminates bias, Firstly, if the true population mean is unknown, then the sample variance (which uses the sample mean in place of the true mean) is a
biased estimator: it underestimates the variance by a factor of ; correcting this factor, resulting in the sum of squared deviations about the sample mean divided by instead of , is called ''
Bessel's correction''. The resulting estimator is unbiased and is called the
(corrected) sample variance or
unbiased sample variance. If the mean is determined in some other way than from the same samples used to estimate the variance, then this bias does not arise, and the variance can safely be estimated as that of the samples about the (independently known) mean. Secondly, the sample variance does not generally minimize
mean squared error between sample variance and population variance. Correcting for bias often makes this worse: one can always choose a scale factor that performs better than the corrected sample variance, though the optimal scale factor depends on the
excess kurtosis of the population (see '
) and introduces bias. This always consists of scaling down the unbiased estimator (dividing by a number larger than ) and is a simple example of a shrinkage estimator: one "shrinks" the unbiased estimator towards zero. For the normal distribution, dividing by (instead of or ) minimizes mean squared error. The resulting estimator is biased, however, and is known as the biased sample variation'''.
Population variance In general, the
population variance of a
finite population of size with values is given by \begin{align} \sigma^2 &= \frac{1}{N} \sum_{i=1}^N {\left(x_i - \mu\right)}^2 = \frac{1}{N} \sum_{i=1}^N \left(x_i^2 - 2 \mu x_i + \mu^2 \right) \\[5pt] &= \left(\frac{1}{N} \sum_{i=1}^N x_i^2\right) - 2\mu \left(\frac{1}{N} \sum_{i=1}^N x_i\right) + \mu^2 \\[5pt] &= \operatorname{E}[x_i^2] - \mu^2 , \end{align} where the population mean is \mu = \operatorname{E}[x_i] = \frac 1N \sum_{i=1}^N x_i and {{tmath|1= \textstyle \operatorname{E}[x_i^2] = \left(\frac{1}{N} \sum_{i=1}^N x_i^2\right)}}, where \operatorname{E} is the
expectation value operator. The population variance can also be computed using \sigma^2 = \frac {1} {N^2}\sum_{i (The right side has duplicate terms in the sum while the middle side has only unique terms to sum.) This is true because \begin{align} &\frac{1}{2N^2} \sum_{i, j=1}^N {\left( x_i - x_j \right)}^2 \\[5pt] ={} &\frac{1}{2N^2} \sum_{i, j=1}^N \left( x_i^2 - 2x_i x_j + x_j^2 \right) \\[5pt] ={} &\frac{1}{2N} \sum_{j=1}^N \left(\frac{1}{N} \sum_{i=1}^N x_i^2\right) - \left(\frac{1}{N} \sum_{i=1}^N x_i\right) \left(\frac{1}{N} \sum_{j=1}^N x_j\right) + \frac{1}{2N} \sum_{i=1}^N \left(\frac{1}{N} \sum_{j=1}^N x_j^2\right) \\[5pt] ={} &\frac{1}{2} \left( \sigma^2 + \mu^2 \right) - \mu^2 + \frac{1}{2} \left( \sigma^2 + \mu^2 \right) \\[5pt] ={} &\sigma^2. \end{align} The population variance matches the variance of the generating probability distribution. In this sense, the concept of population can be extended to continuous random variables with infinite populations.
Sample variance Biased sample variance In many practical situations, the true variance of a population is not known
a priori and must be computed somehow. When dealing with extremely large populations, it is not possible to count every object in the population, so the computation must be performed on a
sample of the population. This is generally referred to as
sample variance or
empirical variance. Sample variance can also be applied to the estimation of the variance of a continuous distribution from a sample of that distribution. We take a
sample with replacement of values from the population of size , where , and estimate the variance on the basis of this sample. Directly taking the variance of the sample data gives the average of the
squared deviations: \tilde{S}_Y^2 = \frac{1}{n} \sum_{i=1}^n \left(Y_i - \overline{Y}\right)^2 = \left(\frac 1n \sum_{i=1}^n Y_i^2\right) - \overline{Y}^2 = \frac{1}{n^2} \sum_{i,j\,:\,i (See the section '''' for the derivation of this formula.) Here, \overline{Y} denotes the
sample mean: \overline{Y} = \frac{1}{n} \sum_{i=1}^n Y_i. Since the are selected randomly, both \overline{Y} and \tilde{S}_Y^2 are
random variables. Their expected values can be evaluated by averaging over the ensemble of all possible samples of size from the population. For \tilde{S}_Y^2 this gives: \begin{align} \operatorname{E}[\tilde{S}_Y^2] &= \operatorname{E}\left[ \frac{1}{n} \sum_{i=1}^n {\left(Y_i - \frac{1}{n} \sum_{j=1}^n Y_j \right)}^2 \right] \\[5pt] &= \frac 1n \sum_{i=1}^n \operatorname{E}\left[ Y_i^2 - \frac{2}{n} Y_i \sum_{j=1}^n Y_j + \frac{1}{n^2} \sum_{j=1}^n Y_j \sum_{k=1}^n Y_k \right] \\[5pt] &= \frac 1n \sum_{i=1}^n \left( \operatorname{E}\left[Y_i^2\right] - \frac{2}{n} \left( \sum_{j \neq i} \operatorname{E}\left[Y_i Y_j\right] + \operatorname{E}\left[Y_i^2\right] \right) + \frac{1}{n^2} \sum_{j=1}^n \sum_{k \neq j}^n \operatorname{E}\left[Y_j Y_k\right] +\frac{1}{n^2} \sum_{j=1}^n \operatorname{E}\left[Y_j^2\right] \right) \\[5pt] &= \frac 1n \sum_{i=1}^n \left( \frac{n - 2}{n} \operatorname{E}\left[Y_i^2\right] - \frac{2}{n} \sum_{j \neq i} \operatorname{E}\left[Y_i Y_j\right] + \frac{1}{n^2} \sum_{j=1}^n \sum_{k \neq j}^n \operatorname{E}\left[Y_j Y_k\right] +\frac{1}{n^2} \sum_{j=1}^n \operatorname{E}\left[Y_j^2\right] \right) \\[5pt] &= \frac 1n \sum_{i=1}^n \left[ \frac{n - 2}{n} \left(\sigma^2 + \mu^2\right) - \frac{2}{n} (n - 1)\mu^2 + \frac{1}{n^2} n(n - 1)\mu^2 + \frac{1}{n} \left(\sigma^2 + \mu^2\right) \right] \\[5pt] &= \frac{n - 1}{n} \sigma^2. \end{align} Here \sigma^2 = \operatorname{E}[Y_i^2] - \mu^2 derived in the section is
population variance and \operatorname{E}[Y_i Y_j] = \operatorname{E}[Y_i] \operatorname{E}[Y_j] = \mu^2 due to independency of Y_i and . Hence \tilde{S}_Y^2 gives an estimate of the population variance \sigma^2 that is biased by a factor of \frac{n - 1}{n} because the expectation value of \tilde{S}_Y^2 is smaller than the population variance (true variance) by that factor. For this reason, \tilde{S}_Y^2 is referred to as the
biased sample variance.
Unbiased sample variance Correcting for this bias yields the
unbiased sample variance, denoted : S^2 = \frac{n}{n - 1} \tilde{S}_Y^2 = \frac{n}{n - 1} \left[ \frac{1}{n} \sum_{i=1}^n \left(Y_i - \overline{Y}\right)^2 \right] = \frac{1}{n - 1} \sum_{i=1}^n \left(Y_i - \overline{Y} \right)^2 Either estimator may be simply referred to as the
sample variance when the version can be determined by context. The same proof is also applicable for samples taken from a continuous probability distribution. The use of the term is called
Bessel's correction, and it is also used in
sample covariance and the
sample standard deviation (the square root of variance). The square root is a
concave function and thus introduces negative bias (by
Jensen's inequality), which depends on the distribution, and thus the corrected sample standard deviation (using Bessel's correction) is biased. The
unbiased estimation of standard deviation is a technically involved problem, though for the normal distribution using the term yields an almost unbiased estimator. The unbiased sample variance is a
U-statistic for the function , meaning that it is obtained by averaging a 2-sample statistic over 2-element subsets of the population.
Example For a set of numbers , if this set is the whole data population for some measurement, then variance is the population variance 932.743 as the sum of the squared deviations about the mean of this set, divided by 12 as the number of the set members. If the set is a sample from the whole population, then the unbiased sample variance can be calculated as 1017.538 that is the sum of the squared deviations about the mean of the sample, divided by 11 instead of 12. A function VAR.S in
Microsoft Excel gives the unbiased sample variance while VAR.P is for population variance.
Distribution of the sample variance Being a function of
random variables, the sample variance is itself a random variable, and it is natural to study its distribution. In the case that
Yi are independent observations from a
normal distribution,
Cochran's theorem shows that the
unbiased sample variance S2 follows a scaled
chi-squared distribution (see also:
asymptotic properties and an
elementary proof): (n - 1) \frac{S^2}{\sigma^2} \sim \chi^2_{n-1} , where is the
population variance. As a direct consequence, it follows that \operatorname{E}\left(S^2\right) = \operatorname{E}\left(\frac{\sigma^2}{n - 1} \chi^2_{n-1}\right) = \sigma^2 , and \operatorname{Var}\left[S^2\right] = \operatorname{Var}\left(\frac{\sigma^2}{n - 1} \chi^2_{n-1}\right) = \frac{\sigma^4}{{\left(n - 1\right)}^2} \operatorname{Var}\left(\chi^2_{n-1}\right) = \frac{2\sigma^4}{n - 1}. If are independent and identically distributed, but not necessarily normally distributed, then \operatorname{E}\left[S^2\right] = \sigma^2; \quad \operatorname{Var}\left[S^2\right] = \frac{\sigma^4}{n} \left(\kappa - 1 + \frac{2}{n - 1} \right) = \frac{1}{n} \left(\mu_4 - \frac{n - 3}{n - 1}\sigma^4\right), where
κ is the
kurtosis of the distribution and is the fourth
central moment. If the conditions of the
law of large numbers hold for the squared observations, is a
consistent estimator of . One can see indeed that the variance of the estimator tends asymptotically to zero. An asymptotically equivalent formula was given in Kenney and Keeping (1951:164), Rose and Smith (2002:264), and Weisstein (n.d.).
Samuelson's inequality Samuelson's inequality is a result that states bounds on the values that individual observations in a sample can take, given that the sample mean and (biased) variance have been calculated. Values must lie within the limits {{tmath|\bar y \pm \sigma_Y (n-1)^{1/2} }}.
Effect of adding one observation on variance When a single new observation x_{n+1} is added to a set of n observations with mean \bar{x}_n and variance s_n^2, the new variance s_{n+1}^2 can be expressed using a recursive updating formula. Based on the identity for the sum of squares provided by Chan et al. (1983): :n s_{n+1}^2 = (n-1)s_n^2 + \frac{n}{n+1}(x_{n+1} - \bar{x}_n)^2 From this relationship, the impact of the new observation on the variance depends on its distance from the current mean. If x_{n+1} = \bar{x}_n \pm s_n \sqrt{\frac{n+1}{n}}, then the
variance will remain unchanged. Accordingly, if the new observation is closer to the mean (|x_{n+1}| ), then the variance will decrease - and if is further from the mean (|x_{n+1}| > \bar{x}_n + s_n \sqrt{\frac{n+1}{n}}), then the variance will increase.
Derivation for Sample Variance Using the updating formula for the sum of squares (SS): :SS_{n+1} = SS_n + \frac{n}{n+1}(x_{n+1} - \bar{x}_n)^2 Substituting the relationship for sample variance (SS = (n-1)s^2): :n s^2_{n+1} = (n-1)s^2_n + \frac{n}{n+1}(x_{n+1} - \bar{x}_n)^2 Setting s^2_{n+1} = s^2_n: :n s^2_n = (n-1)s^2_n + \frac{n}{n+1}(x_{n+1} - \bar{x}_n)^2 :s^2_n = \frac{n}{n+1}(x_{n+1} - \bar{x}_n)^2 Solving for x_{n+1} yields: :x_{n+1} = \bar{x}_n \pm s_n \sqrt{\frac{n+1}{n}}
Derivation for Population Variance For population variance (\sigma^2 = \frac{SS}{n}), the updating formula is: :(n+1)\sigma^2_{n+1} = n\sigma^2_n + \frac{n}{n+1}(x_{n+1} - \mu_n)^2 Setting \sigma^2_{n+1} = \sigma^2_n: :(n+1)\sigma^2_n = n\sigma^2_n + \frac{n}{n+1}(x_{n+1} - \mu_n)^2 :\sigma^2_n = \frac{n}{n+1}(x_{n+1} - \mu_n)^2 Solving for x_{n+1} yields: :x_{n+1} = \mu_n \pm \sigma_n \sqrt{\frac{n+1}{n}}
Relations with the harmonic and arithmetic means It has been shown that for a sample of positive real numbers, \sigma_y^2 \le 2y_{\max} (A - H) , where is the maximum of the sample, is the arithmetic mean, is the
harmonic mean of the sample and \sigma_y^2 is the (biased) variance of the sample. This bound has been improved, and it is known that variance is bounded by \begin{align} \sigma_y^2 &\le \frac{y_{\max} (A - H)(y_\max - A)}{y_\max - H}, \\[1ex] \sigma_y^2 &\ge \frac{y_{\min} (A - H)(A - y_\min)}{H - y_\min}, \end{align} where is the minimum of the sample. == Tests of equality of variances ==