Let
T:
X →
X be a
measure-preserving transformation on a
measure space (
X, Σ,
μ) and suppose
f is a
μ-integrable function, i.e.
f ∈
L1(
μ). Then we define the following
averages: {{block indent | text =
Time average: This is defined as the average (if it exists) over iterations of
T starting from some initial point
x: \hat f(x) = \lim_{n\to\infty}\; \frac{1}{n} \sum_{k=0}^{n-1} f(T^k x).}} {{block indent | text =
Space average: If
μ(
X) is finite and nonzero, we can consider the
space or
phase average of ƒ: \bar f = \frac 1{\mu(X)} \int f\,d\mu. \quad \text{ (For a probability space, } \mu(X)=1.) }} In general the time average and space average may be different. But if the transformation is ergodic, and the measure is invariant, then the time average is equal to the space average
almost everywhere. This is the celebrated ergodic theorem, in an abstract form due to
George David Birkhoff. (Actually, Birkhoff's paper considers not the abstract general case but only the case of dynamical systems arising from differential equations on a
smooth manifold.) The
equidistribution theorem is a special case of the ergodic theorem, dealing specifically with the distribution of probabilities on the unit interval. More precisely, the
pointwise or
strong ergodic theorem states that the limit in the definition of the time average of exists for almost every
x and that the (almost everywhere defined) limit function \hat f is integrable: \hat f \in L^1(\mu). \, Furthermore, \hat f is
T-invariant, that is to say \hat f \circ T = \hat f \, holds almost everywhere, and if
μ(
X) is finite, then the normalization is the same: \int \hat f\, d\mu = \int f\, d\mu. In particular, if
T is ergodic, then \hat f must be a constant (almost everywhere), and so one has that \bar f = \hat f \, almost everywhere. Joining the first to the last claim and assuming that
μ(
X) is finite and nonzero, one has that \lim_{n\to\infty}\; \frac{1}{n} \sum_{k=0}^{n-1} f(T^k x) = \frac 1 {\mu(X)} \int f\,d\mu for
almost all x, i.e., for all
x except for a set of
measure zero. For an ergodic transformation, the time average equals the space average almost surely. As an example, assume that the measure space (
X, Σ,
μ) models the particles of a gas as above, and let denote the
velocity of the particle at position
x. Then the pointwise ergodic theorems says that the average velocity of all particles at some given time is equal to the average velocity of one particle over time. A generalization of Birkhoff's theorem is
Kingman's subadditive ergodic theorem. ==Probabilistic formulation: Birkhoff–Khinchin theorem==
Birkhoff–Khinchin theorem. Let ƒ be measurable,
E(|ƒ|) \lim_{n\to\infty}\; \frac{1}{n} \sum_{k=0}^{n-1} f(T^k x)=E(f \mid \mathcal{C})(x), where E(f|\mathcal{C}) is the
conditional expectation given the σ-algebra \mathcal{C} of invariant sets of
T.
Corollary (
Pointwise Ergodic Theorem): In particular, if
T is also ergodic, then \mathcal{C} is the trivial σ-algebra, and thus with probability 1: \lim_{n\to\infty}\; \frac{1}{n} \sum_{k=0}^{n-1} f(T^k x)=E(f). ==Mean ergodic theorem==