Exchangeability and the i.i.d. statistical model
The property of exchangeability is closely related to the use of
independent and identically distributed (i.i.d.) random variables in statistical models. A sequence of random variables that are i.i.d., conditional on some underlying distributional form, is exchangeable. This follows directly from the structure of the joint probability distribution generated by the i.i.d. form. Mixtures of exchangeable sequences (in particular, sequences of i.i.d. variables) are exchangeable. The converse can be established for infinite sequences, through an important
representation theorem by
Bruno de Finetti (later extended by other probability theorists such as
Halmos and
Savage). The extended versions of the theorem show that in any infinite sequence of exchangeable random variables, the random variables are conditionally i.i.d., given the underlying distributional form. This theorem is stated briefly below. (De Finetti’s original theorem only showed this to be true for random indicator variables, but this was later extended to encompass all sequences of random variables.) Another way of putting this is that de Finetti’s theorem characterizes exchangeable sequences as mixtures of i.i.d. sequences—while an exchangeable sequence need not itself be unconditionally i.i.d., it can be expressed as a mixture of underlying i.i.d. sequences.
The representation theorem: This statement is based on the presentation in O’Neill (2009) in the references below. Given an infinite sequence of random variables \mathbf{X}=(X_1,X_2,X_3,\ldots) we define the limiting
empirical distribution function F_\mathbf{X} by : F_\mathbf{X}(x) = \lim_{n\to\infty} \frac{1}{n} \sum_{i=1}^n I(X_i \le x). (This is the
Cesàro limit of the indicator functions. In cases where the Cesàro limit does not exist this function can actually be defined as the
Banach limit of the indicator functions, which is an extension of this limit. This latter limit always exists for sums of indicator functions, so that the empirical distribution is always well-defined.) This means that for any vector of random variables in the sequence we have joint distribution function given by : \Pr (X_1 \le x_1,X_2 \le x_2,\ldots,X_n \le x_n) = \int \prod_{i=1}^n F_\mathbf{X}(x_i)\,dP(F_\mathbf{X}). If the distribution function F_\mathbf{X} is indexed by another parameter \theta then (with densities appropriately defined) we have : p_{X_1,\ldots,X_n}(x_1,\ldots,x_n) = \int \prod_{i=1}^n p_{X_i}(x_i\mid\theta)\,dP(\theta). These equations show the joint distribution or density characterised as a mixture distribution based on the underlying limiting empirical distribution (or a parameter indexing this distribution). Note that not all finite exchangeable sequences are mixtures of i.i.d.. To see this, consider sampling without replacement from a
finite set until no elements are left. The resulting sequence is exchangeable, but not a mixture of i.i.d.. Indeed, conditioned on all other elements in the sequence, the remaining element is known. == Covariance and correlation ==