Conditioning on an event If is an event in \mathcal{F} with nonzero probability, and is a
discrete random variable, the conditional expectation of given is : \begin{aligned} \operatorname{E} (X \mid A) &= \sum_x x P(X = x \mid A) \\ & =\sum_x x \frac{P(\{X = x\} \cap A)}{P(A)} \end{aligned} where the sum is taken over all possible outcomes of . If P(A) = 0, the conditional expectation is undefined due to the
division by zero.
Discrete random variables If and are
discrete random variables, the conditional expectation of given is : \begin{aligned} \operatorname{E} (X \mid Y=y) &= \sum_x x P(X = x \mid Y = y) \\ &= \sum_x x \frac{P(X = x, Y = y)}{P(Y=y)} \end{aligned} where P(X = x, Y = y) is the
joint probability mass function of and . The sum is taken over all possible outcomes of . As above, the expression is undefined if P(Y=y) = 0. Conditioning on a discrete random variable is the same as conditioning on the corresponding event: :\operatorname{E} (X \mid Y=y) = \operatorname{E} (X \mid A) where is the set \{ Y = y \}.
Continuous random variables Let X and Y be
continuous random variables with joint density f_{X,Y}(x,y), Y's density f_{Y}(y), and conditional density \textstyle f_{X\mid Y}(x\mid y) = \frac{ f_{X,Y}(x,y) }{f_{Y}(y)} of X given the event Y=y. The conditional expectation of X given Y=y is : \begin{aligned} \operatorname{E} (X \mid Y=y) &= \int_{-\infty}^\infty x f_{X\mid Y}(x\mid y) \, \mathrm{d}x \\ &= \frac{1}{f_Y(y)}\int_{-\infty}^\infty x f_{X,Y}(x,y) \, \mathrm{d}x. \end{aligned} When the denominator is zero, the expression is undefined. Conditioning on a continuous random variable is not the same as conditioning on the event \{ Y = y \} as it was in the discrete case. For a discussion, see
Conditioning on an event of probability zero. Not respecting this distinction can lead to contradictory conclusions as illustrated by the
Borel-Kolmogorov paradox.
L2 random variables All random variables in this section are assumed to be in L^2, that is
square integrable. In its full generality, conditional expectation is developed without this assumption, see below under
Conditional expectation with respect to a sub-σ-algebra. The L^2 theory is, however, considered more intuitive and admits
important generalizations. In the context of L^2 random variables, conditional expectation is also called
regression. In what follows let (\Omega, \mathcal{F}, P) be a probability space, and X: \Omega \to \mathbb{R} in L^2 with mean \mu_X and
variance \sigma_X^2. The expectation \mu_X minimizes the
mean squared error: : \min_{x \in \mathbb{R}} \operatorname{E}\left((X - x)^2\right) = \operatorname{E}\left((X - \mu_X)^2\right) = \sigma_X^2. The conditional expectation of is defined analogously, except instead of a single number \mu_X, the result will be a function e_X(y). Let Y: \Omega \to \mathbb{R}^n be a
random vector. The conditional expectation e_X: \mathbb{R}^n \to \mathbb{R} is a measurable function such that : \min_{g \text{ measurable }} \operatorname{E}\left((X - g(Y))^2\right) = \operatorname{E}\left((X - e_X(Y))^2\right). Note that unlike \mu_X, the conditional expectation e_X is not generally unique: there may be multiple minimizers of the mean squared error.
Uniqueness Example 1: Consider the case where is the constant random variable that is always 1. Then the mean squared error is minimized by any function of the form : e_X(y) = \begin{cases} \mu_X & \text{if } y = 1, \\ \text{any number} & \text{otherwise.} \end{cases}
Example 2: Consider the case where is the 2-dimensional random vector (X, 2X). Then clearly :\operatorname{E}(X \mid Y) = X but in terms of functions it can be expressed as e_X(y_1, y_2) = 3y_1-y_2 or e'_X(y_1, y_2) = y_2 - y_1 or infinitely many other ways. In the context of
linear regression, this lack of uniqueness is called
multicollinearity. Conditional expectation is unique up to a set of measure zero in \mathbb{R}^n. The measure used is the
pushforward measure induced by . In the first example, the pushforward measure is a
Dirac distribution at 1. In the second it is concentrated on the "diagonal" \{ y : y_2 = 2 y_1 \}, so that any set not intersecting it has measure 0.
Existence The existence of a minimizer for \min_g \operatorname{E}\left((X - g(Y))^2\right) is non-trivial. It can be shown that : M := \{ g(Y) : g \text{ is measurable and }\operatorname{E}(g(Y)^2) is a closed subspace of the Hilbert space L^2(\Omega). By the
Hilbert projection theorem, the
necessary and sufficient condition for e_X to be a minimizer is that for all f(Y) in we have : \langle X - e_X(Y), f(Y) \rangle = 0. In words, this equation says that the
residual X - e_X(Y) is orthogonal to the space of all functions of . This orthogonality condition, applied to the
indicator functions f(Y) = 1_{Y \in H}, is used below to extend conditional expectation to the case that and are not necessarily in L^2.
Connections to regression The conditional expectation is often approximated in
applied mathematics and
statistics due to the difficulties in analytically calculating it, and for interpolation. The Hilbert subspace : M = \{ g(Y) : \operatorname{E}(g(Y)^2) defined above is replaced with subsets thereof by restricting the functional form of , rather than allowing any measurable function. Examples of this are
decision tree regression when is required to be a
simple function,
linear regression when is required to be
affine, etc. These generalizations of conditional expectation come at the cost of many of
its properties no longer holding. For example, let be the space of all linear functions of and let \mathcal{E}_{M} denote this generalized conditional expectation/L^2 projection. If M does not contain the
constant functions, the
tower property \operatorname{E}(\mathcal{E}_M(X)) = \operatorname{E}(X) will not hold. An important special case is when and are jointly normally distributed. In this case it can be shown that the conditional expectation is equivalent to linear regression: : e_X(Y) = \alpha_0 + \sum_i \alpha_i Y_i for coefficients \{\alpha_i\}_{i = 0..n} described in Multivariate normal distribution#Conditional distributions.
Conditional expectation with respect to a sub-σ-algebra File:LokaleMittelwertbildung.svg|thumb|upright=1.5|'
Conditional expectation with respect to a σ
-algebra:' in this example the probability space (\Omega, \mathcal{F}, P) is the [0,1] interval with the
Lebesgue measure. We define the following
σ-algebras: \mathcal{A} = \mathcal{F}; \mathcal{B} is the
σ-algebra generated by the intervals with end-points 0, , , , 1; and \mathcal{C} is the
σ-algebra generated by the intervals with end-points 0, , 1. Here the conditional expectation is effectively the average over the minimal sets of the
σ-algebra. Consider the following: • (\Omega, \mathcal{F}, P) is a
probability space. • X\colon\Omega \to \mathbb{R}^n is a
random variable on that probability space with finite expectation. • \mathcal{H} \subseteq \mathcal{F} is a sub-
σ-algebra of \mathcal{F}. Since \mathcal{H} is a sub \sigma-algebra of \mathcal{F}, the function X\colon\Omega \to \mathbb{R}^n is usually not \mathcal{H}-measurable, thus the existence of the integrals of the form \int_H X \,dP|_\mathcal{H}, where H\in\mathcal{H} and P|_\mathcal{H} is the restriction of P to \mathcal{H}, cannot be stated in general. However, the local averages \int_H X\,dP can be recovered in (\Omega, \mathcal{H}, P|_\mathcal{H}) with the help of the conditional expectation. A
conditional expectation of
X given \mathcal{H}, denoted as \operatorname{E}(X\mid\mathcal{H}), is any \mathcal{H}-
measurable function \Omega \to \mathbb{R}^n which satisfies: : \int_H\operatorname{E}(X \mid \mathcal{H})\,\mathrm{d}P = \int_H X \,\mathrm{d}P for each H \in \mathcal{H}. The
Law of the unconscious statistician is then : \operatorname{E}[f(X)\mid\mathcal{H}] = \int f(x) \kappa_\mathcal{H}(-, \mathrm{d}x), This shows that conditional expectations are, like their unconditional counterparts, integrations, against a conditional measure.
General Definition In full generality, consider: • A probability space (\Omega,\mathcal{A},P). • A
Banach space (E,\|\cdot\|_E). • A
Bochner integrable random variable X:\Omega\to E. • A sub-
σ-algebra \mathcal{H}\subseteq \mathcal{A}. The
conditional expectation of X given \mathcal{H} is the up to a P-nullset unique and integrable E-valued \mathcal{H}-measurable random variable \operatorname{E}(X \mid \mathcal{H}) satisfying :\int_H \operatorname{E}(X \mid \mathcal{H}) \,\mathrm{d}P = \int_H X \,\mathrm{d}P for all H \in \mathcal{H}. In this setting the conditional expectation is sometimes also denoted in operator notation as \operatorname{E}^\mathcal{H}X. == Basic properties ==