Sample variance The sample variance highlights two different issues about bias and risk. First, the “naive” estimator that divides by
n is biased downward because the sample mean is estimated from the same data. Multiplying by
n/(n−1) (
Bessel’s correction) yields an unbiased estimator. Second, unbiasedness does not imply minimum
mean squared error. Suppose
X1, ...,
Xn are
independent and identically distributed (i.i.d.) random variables with
expectation μ and
variance σ2. If the
sample mean and uncorrected
sample variance are defined as \overline{X}\,=\frac 1 n \sum_{i=1}^n X_i \qquad S^2 = \frac 1 n \sum_{i=1}^n \left(X_i - \overline{X}\right)^2 \qquad then
S2 is a biased estimator of
σ2. This follows immediately from the
law of total variance because \underbrace{\operatorname{Var}(X)}_{\sigma^2} = \underbrace{\operatorname{E}\left[\operatorname{Var}\left(X\mid\bar X\right)\right]}_{E[S^2]} + \underbrace{\operatorname{Var}\left(\operatorname{E}\left[X\mid\bar X\right]\right)}_{\sigma^2/n}, \quad \implies E[S^2] = \frac{n-1}{n}\sigma^2. In other words, the expected value of the uncorrected sample variance does not equal the population variance
σ2, unless multiplied by a normalization factor. The ratio between the biased (uncorrected) and unbiased estimates of the variance is known as
Bessel's correction. The sample mean, on the other hand, is an unbiased estimator of the population mean
μ. Suppose that
X has a Poisson distribution with expectation
λ. Suppose it is desired to estimate \operatorname{P}(X=0)^2=e^{-2\lambda}\quad with a sample of size 1. (For example, when incoming calls at a telephone switchboard are modeled as a Poisson process, and
λ is the average number of calls per minute, then
e−2
λ (the estimand) is the probability that no calls arrive in the next two minutes.) Since the expectation of an unbiased estimator
δ(
X) is equal to the
estimand, i.e. \operatorname E(\delta(X))=\sum_{x=0}^\infty \delta(x) \frac{\lambda^x e^{-\lambda}}{x!} = e^{-2\lambda}, the only function of the data constituting an unbiased estimator is \delta(x)=(-1)^x. \, To see this, note that when decomposing e−
λ from the above expression for expectation, the sum that is left is a
Taylor series expansion of e−
λ as well, yielding e−
λe−
λ = e−2
λ (see
Characterizations of the exponential function). If the observed value of
X is 100, then the estimate is 1, although the true value of the quantity being estimated is very likely to be near 0, which is the opposite extreme. And, if
X is observed to be 101, then the estimate is even more absurd: It is −1, although the quantity being estimated must be positive. The (biased)
maximum likelihood estimator e^{-2{X}}\quad is far better than this unbiased estimator. Not only is its value always positive but it is also more accurate in the sense that its
mean squared error e^{-4\lambda}-2e^{\lambda(1/e^2-3)}+e^{\lambda(1/e^4-1)} \, is smaller; compare the unbiased estimator's MSE of 1-e^{-4\lambda}. \, The MSEs are functions of the true value
λ. The bias of the maximum-likelihood estimator is: e^{\lambda(1/e^2-1)}-e^{-2\lambda}. \,
Maximum of a discrete uniform distribution The bias of maximum-likelihood estimators can be substantial. Consider a case where
n tickets numbered from 1 to
n are placed in a box and one is selected at random, giving a value
X. If
n is unknown, then the maximum-likelihood estimator of
n is
X, even though the expectation of
X given
n is only (
n + 1)/2; we can be certain only that
n is at least
X and is probably more. In this case, the natural unbiased estimator is 2
X − 1. ==Median-unbiased estimators==