Let us assume the canonical process with H represented by 1 and T represented by 0 . The
law of large numbers states that the average of the sequence, i.e., \bar{X}_{n}:=\frac{1}{n}\sum_{i=1}^{n}X_{i} , will approach the
expected value almost certainly, that is, the events which do not satisfy this limit have zero probability. The
expectation value of flipping
heads, assumed to be represented by 1, is given by p. In fact, one has :\mathbb{E}[X_i]=\mathbb{P}([X_i=1])=p, for any given random variable X_i out of the infinite sequence of
Bernoulli trials that compose the Bernoulli process. One is often interested in knowing how often one will observe
H in a sequence of
n coin flips. This is given by simply counting: Given
n successive coin flips, that is, given the set of all possible
strings of length
n, the number
N(
k,
n) of such strings that contain
k occurrences of
H is given by the
binomial coefficient :N(k,n) = {n \choose k}=\frac{n!}{k! (n-k)!} If the probability of flipping heads is given by
p, then the total probability of seeing a string of length
n with
k heads is :\mathbb{P}([S_n=k]) = {n\choose k} p^k (1-p)^{n-k} , where S_n=\sum_{i=1}^{n}X_i . The probability measure thus defined is known as the
Binomial distribution. As we can see from the above formula that, if n=1, the
Binomial distribution will turn into a
Bernoulli distribution. So we can know that the
Bernoulli distribution is exactly a special case of
Binomial distribution when n equals to 1. Of particular interest is the question of the value of S_{n} for a sufficiently long sequences of coin flips, that is, for the limit n\to\infty. In this case, one may make use of
Stirling's approximation to the factorial, and write :n! = \sqrt{2\pi n} \;n^n e^{-n} \left(1 + \mathcal{O}\left(\frac{1}{n}\right)\right) Inserting this into the expression for
P(
k,
n), one obtains the
Normal distribution; this is the content of the
central limit theorem, and this is the simplest example thereof. The combination of the law of large numbers, together with the central limit theorem, leads to an interesting and perhaps surprising result: the
asymptotic equipartition property. Put informally, one notes that, yes, over many coin flips, one will observe
H exactly
p fraction of the time, and that this corresponds exactly with the peak of the Gaussian. The asymptotic equipartition property essentially states that this peak is infinitely sharp, with infinite fall-off on either side. That is, given the set of all possible infinitely long strings of
H and
T occurring in the Bernoulli process, this set is partitioned into two: those strings that occur with probability 1, and those that occur with probability 0. This partitioning is known as the
Kolmogorov 0-1 law. The size of this set is interesting, also, and can be explicitly determined: the logarithm of it is exactly the
entropy of the Bernoulli process. Once again, consider the set of all strings of length
n. The size of this set is 2^n. Of these, only a certain subset are likely; the size of this set is 2^{nH} for H\le 1. By using Stirling's approximation, putting it into the expression for
P(
k,
n), solving for the location and width of the peak, and finally taking n\to\infty one finds that :H=-p\log_2 p - (1-p)\log_2(1-p) This value is the
Bernoulli entropy of a Bernoulli process. Here,
H stands for entropy; not to be confused with the same symbol
H standing for
heads.
John von Neumann posed a question about the Bernoulli process regarding the possibility of a given process being
isomorphic to another, in the sense of the
isomorphism of dynamical systems. The question long defied analysis, but was finally and completely answered with the
Ornstein isomorphism theorem. This breakthrough resulted in the understanding that the Bernoulli process is unique and
universal; in a certain sense, it is the single most random process possible; nothing is 'more' random than the Bernoulli process (although one must be careful with this informal statement; certainly, systems that are
mixing are, in a certain sense, "stronger" than the Bernoulli process, which is merely ergodic but not mixing. However, such processes do not consist of independent random variables: indeed, many purely deterministic, non-random systems can be mixing). ==Dynamical systems==