There are two different versions of the law of large numbers that are described below. They are called the
strong law of large numbers and the
weak law of large numbers.
Mutual independence of the random variables can be replaced by
pairwise independence or
exchangeability in both versions of the law. The difference between the strong and the weak version is concerned with the mode of convergence being asserted. For interpretation of these modes, see
Convergence of random variables.
Weak law {{multiple image |width1=50 |image1=Blank300.png The
weak law of large numbers (also called
Khinchin's law) states that given a collection of
independent and identically distributed (iid) samples from a random variable with finite mean, the sample mean
converges in probability to the expected value {{NumBlk|| \overline{X}_n\ \overset{P}{\rightarrow}\ \mu \qquad\textrm{when}\ n \to \infty. |}} That is, for any positive number
ε, \lim_{n\to\infty}\Pr\!\left(\,|\overline{X}_n-\mu| Interpreting this result, the weak law states that for any nonzero margin specified (
ε), no matter how small, with a sufficiently large sample there will be a very high probability that the average of the observations will be close to the expected value; that is, within the margin. As mentioned earlier, the weak law applies in the case of i.i.d. random variables, but it also applies in some other cases. For example, the variance may be different for each random variable in the series, keeping the expected value constant. If the variances are bounded, then the law applies, as shown by
Chebyshev as early as 1867. (If the expected values change during the series, then we can simply apply the law to the average deviation from the respective expected values. The law then states that this converges in probability to zero.) In fact, Chebyshev's proof works so long as the variance of the average of the first
n values goes to zero as
n goes to infinity. {{NumBlk|| \overline{X}_n\ \overset{\text{a.s.}}{\longrightarrow}\ \mu \qquad\textrm{when}\ n \to \infty. |}} That is, \Pr\!\left( \lim_{n\to\infty}\overline{X}_n = \mu \right) = 1. What this means is that, as the number of trials
n goes to infinity, the probability that the average of the observations converges to the expected value, is equal to one. The modern proof of the strong law is more complex than that of the weak law, and relies on passing to an appropriate sub-sequence. If the summands are independent but not identically distributed, then {{NumBlk|| \overline{X}_n - \operatorname{E}\big[\overline{X}_n\big]\ \overset{\text{a.s.}}{\longrightarrow}\ 0, |}} provided that each
Xk has a finite second moment and \sum_{k=1}^{\infty} \frac{1}{k^2} \operatorname{Var}[X_k] This statement is known as ''Kolmogorov's strong law'', see e.g. .
Differences between the weak law and the strong law The
weak law states that for a specified large
n, the average \overline{X}_n is likely to be near
μ. Thus, it leaves open the possibility that |\overline{X}_n -\mu| > \varepsilon happens an infinite number of times, although at infrequent intervals. (Not necessarily |\overline{X}_n -\mu| \neq 0 for all
n). The
strong law shows that this
almost surely will not occur. I.e., with probability 1 for any the inequality |\overline{X}_n -\mu| holds for all large enough
n. The strong law does not hold in the following cases, but the weak law does. {{ordered list E\left(\frac{\sin(X)e^X}{X}\right) =\ \int_{x=0}^{\infty}\frac{\sin(x)e^x}{x}e^{-x}dx = \frac{\pi}{2} E\left(\frac{2^X(-1)^X}{X}\right) =\ \sum_{x=1}^{\infty}\frac{2^x(-1)^x}{x}2^{-x}=-\ln(2) \begin{cases} 1-F(x)&=\frac{e}{2x\ln(x)},&x \ge e \\ F(x)&=\frac{e}{-2x\ln(-x)},&x \le -e \end{cases} then it has no expected value, but the weak law is true. }}
Uniform laws of large numbers There are extensions of the law of large numbers to collections of estimators, where the convergence is uniform over the collection; thus the name
uniform law of large numbers. Suppose
f(
x,
θ) is some
function defined for
θ ∈ Θ, and continuous in
θ. Then for any fixed
θ, the sequence {
f(
X1,
θ),
f(
X2,
θ), ...} will be a sequence of independent and identically distributed random variables, such that the sample mean of this sequence converges in probability to E[
f(
X,
θ)]. This is the
pointwise (in
θ) convergence. A particular example of a
uniform law of large numbers states the conditions under which the convergence happens
uniformly in
θ. If •
Θ is compact, •
f(
x,
θ) is continuous at each
θ ∈ Θ for
almost all xs, and
measurable function of
x at each
θ. • there exists a
dominating function
d(
x) such that E[
d(
X)] \left\| f(x,\theta) \right\| \leq d(x) \quad\text{for all}\ \theta\in\Theta. Then E[
f(
X,
θ)] is continuous in
θ, and \sup_{\theta\in\Theta} \left\| \frac 1 n \sum_{i=1}^n f(X_i,\theta) - \operatorname{E}[f(X,\theta)] \right\| \overset{\mathrm{P}}{\rightarrow} \ 0. This result is useful to derive consistency of a large class of estimators (see
Extremum estimator).
Borel's law of large numbers '''Borel's law of large numbers''', named after
Émile Borel, states that if an experiment is repeated a large number of times, independently under identical conditions, then the proportion of times that any specified event is expected to occur approximately equals the probability of the event's occurrence on any particular trial; the larger the number of repetitions, the better the approximation tends to be. More precisely, if
E denotes the event in question,
p its probability of occurrence, and
Nn(
E) the number of times
E occurs in the first
n trials, then with probability one, \frac{N_n(E)}{n}\to p\text{ as }n\to\infty. This theorem makes rigorous the intuitive notion of probability as the expected long-run relative frequency of an event's occurrence. It is a special case of any of several more general laws of large numbers in probability theory. ==Proof of the weak law==