For
X1,
X2, ...
Xn independent and identically-distributed random variables with values in \mathbb{R} and
cumulative distribution function F(
x), the empirical distribution function is defined as :F_n(x)=\frac{1}{n}\sum_{i=1}^n I_{(-\infty,x]}(X_i), where I
C is the
indicator function of the set
C. For every (fixed)
x,
Fn(
x) is a sequence of random variables which converge to
F(
x)
almost surely by the strong
law of large numbers. That is,
Fn converges to
F pointwise. Glivenko and Cantelli strengthened this result by proving
uniform convergence of
Fn to
F by the
Glivenko–Cantelli theorem. A centered and scaled version of the empirical measure is the
signed measure :G_n(A)=\sqrt{n}(P_n(A)-P(A)) It induces a map on measurable functions
f given by :f\mapsto G_n f=\sqrt{n}(P_n-P)f=\sqrt{n}\left(\frac{1}{n}\sum_{i=1}^n f(X_i)-\mathbb{E}f\right) By the
central limit theorem, G_n(A)
converges in distribution to a
normal random variable
N(0,
P(
A)(1 −
P(
A))) for fixed measurable set
A. Similarly, for a fixed function
f, G_nf converges in distribution to a normal random variable N(0,\mathbb{E}(f-\mathbb{E}f)^2), provided that \mathbb{E}f and \mathbb{E}f^2 exist.
Definition :\bigl(G_n(c)\bigr)_{c\in\mathcal{C}} is called an
empirical process indexed by \mathcal{C}, a collection of measurable subsets of
S. :\bigl(G_nf\bigr)_{f\in\mathcal{F}} is called an
empirical process indexed by \mathcal{F}, a collection of measurable functions from
S to \mathbb{R}. A significant result in the area of empirical processes is
Donsker's theorem. It has led to a study of
Donsker classes: sets of functions with the useful property that empirical processes indexed by these classes
converge weakly to a certain
Gaussian process. While it can be shown that Donsker classes are
Glivenko–Cantelli classes, the converse is not true in general. ==Example==