Chain rule (probability)

Two events For two events A and B, the chain rule states that :\mathbb P(A \cap B) = \mathbb P(B \mid A) \mathbb P(A), where \mathbb P(B \mid A) denotes the conditional probability of B given A. Example A Jar A has 1 black ball and 2 white balls, and another Jar B has 1 black ball and 3 white balls. Suppose we pick an urn at random and then select a ball from that urn. Let event A be choosing the first urn, i.e. \mathbb P(A) = \mathbb P(\overline{A}) = 1/2, where \overline A is the complementary event of A. Let event B be the chance we choose a white ball. The chance of choosing a white ball, given that we have chosen the first urn, is \mathbb P(B|A) = 2/3. The intersection A \cap B then describes choosing the first urn and a white ball from it. The probability can be calculated by the chain rule as follows: :\mathbb P(A \cap B) = \mathbb P(B \mid A) \mathbb P(A) = \frac 23 \cdot \frac 12 = \frac 13. Finitely many events For events A_1,\ldots,A_n whose intersection has not probability zero, the chain rule states :\begin{align} \mathbb P\left(A_1 \cap A_2 \cap \ldots \cap A_n\right) &= \mathbb P\left(A_n \mid A_1 \cap \ldots \cap A_{n-1}\right) \mathbb P\left(A_1 \cap \ldots \cap A_{n-1}\right) \\ &= \mathbb P\left(A_n \mid A_1 \cap \ldots \cap A_{n-1}\right) \mathbb P\left(A_{n-1} \mid A_1 \cap \ldots \cap A_{n-2}\right) \mathbb P\left(A_1 \cap \ldots \cap A_{n-2}\right) \\ &= \mathbb P\left(A_n \mid A_1 \cap \ldots \cap A_{n-1}\right) \mathbb P\left(A_{n-1} \mid A_1 \cap \ldots \cap A_{n-2}\right) \cdot \ldots \cdot \mathbb P(A_3 \mid A_1 \cap A_2) \mathbb P(A_2 \mid A_1) \mathbb P(A_1)\\ &= \mathbb P(A_1) \mathbb P(A_2 \mid A_1) \mathbb P(A_3 \mid A_1 \cap A_2) \cdot \ldots \cdot \mathbb P(A_n \mid A_1 \cap \dots \cap A_{n-1})\\ &= \prod_{k=1}^n \mathbb P(A_k \mid A_1 \cap \dots \cap A_{k-1})\\ &= \prod_{k=1}^n \mathbb P\left(A_k \,\Bigg|\, \bigcap_{j=1}^{k-1} A_j\right). \end{align} Example 1 For n=4, i.e. four events, the chain rule reads :\begin{align} \mathbb P(A_1 \cap A_2 \cap A_3 \cap A_4) &= \mathbb P(A_4 \mid A_3 \cap A_2 \cap A_1)\mathbb P(A_3 \cap A_2 \cap A_1) \\ &= \mathbb P(A_4 \mid A_3 \cap A_2 \cap A_1)\mathbb P(A_3 \mid A_2 \cap A_1)\mathbb P(A_2 \cap A_1) \\ &= \mathbb P(A_4 \mid A_3 \cap A_2 \cap A_1)\mathbb P(A_3 \mid A_2 \cap A_1)\mathbb P(A_2 \mid A_1)\mathbb P(A_1). \end{align} Example 2 We randomly draw 4 cards (one at a time) without replacement from deck with 52 cards. What is the probability that we have picked 4 aces? First, we set A_n := \left\{ \text{draw an ace in the } n^{\text{th}} \text{ try} \right\}. Obviously, we get the following probabilities :\mathbb P(A_1) = \frac 4{52}, \qquad \mathbb P(A_2 \mid A_1) = \frac 3{51}, \qquad \mathbb P(A_3 \mid A_1 \cap A_2) = \frac 2{50}, \qquad \mathbb P(A_4 \mid A_1 \cap A_2 \cap A_3) = \frac 1{49}. Applying the chain rule, :\mathbb P(A_1 \cap A_2 \cap A_3 \cap A_4) = \frac 4{52} \cdot \frac 3{51} \cdot \frac 2{50} \cdot \frac 1{49} = \frac{24}{6497400}. Statement of the theorem and proof Let (\Omega, \mathcal A, \mathbb P) be a probability space. Recall that the conditional probability of an A \in \mathcal A given B \in \mathcal A is defined as : \begin{align} \mathbb P(A \mid B) := \begin{cases} \frac{\mathbb P(A \cap B)}{\mathbb P(B)}, & \mathbb P(B) > 0,\\ 0 & \mathbb P(B) = 0. \end{cases} \end{align} Then we have the following theorem.{{math theorem|name = Chain rule| Let (\Omega, \mathcal A, \mathbb P) be a probability space. Let A_1, ..., A_n \in \mathcal A. Then :\begin{align} \mathbb P\left(A_1 \cap A_2 \cap \ldots \cap A_n\right) &= \mathbb P(A_1) \mathbb P(A_2 \mid A_1) \mathbb P(A_3 \mid A_1 \cap A_2) \cdot \ldots \cdot \mathbb P(A_n \mid A_1 \cap \dots \cap A_{n-1})\\ &= \mathbb P(A_1) \prod_{j=2}^n \mathbb P(A_j \mid A_1 \cap \dots \cap A_{j-1}). \end{align}}} {{Math proof|The formula follows immediately by recursion : \begin{align} (1) && &\mathbb P(A_1) \mathbb P(A_2 \mid A_1) &=&\qquad \mathbb P(A_1 \cap A_2) \\ (2) && &\mathbb P(A_1) \mathbb P(A_2 \mid A_1) \mathbb P(A_3 \mid A_1 \cap A_2) &=&\qquad \mathbb P(A_1 \cap A_2) \mathbb P(A_3 \mid A_1 \cap A_2) \\ &&&&=&\qquad \mathbb P(A_1 \cap A_2 \cap A_3), \end{align} where we used the definition of the conditional probability in the first step.}} ==Chain rule for discrete random variables==