Two
independent events become
conditionally dependent given that at least one of them occurs. Symbolically: : If P(A\cap B)=P(A)P(B) and P(A\cup B) then :: P(A\cap B\mid A\cup B)
Proof: Note that P(A\mid A\cup B)=P(A)/P(A\cup B) and P(B\mid A\cup B)=P(B)/P(A\cup B) which, together with P(A\cap B)=P(A)P(B) and P(A\cup B) (so \frac{1}{P(A \cup B)} ) implies that : \begin{align} P(A\cap B\mid A\cup B) = \frac{P(A\cap B)}{P(A\cup B)} = \frac{P(A)P(B)}{P(A\cup B)} One can see this in tabular form as follows: the yellow regions are the outcomes where at least one event occurs (and
~A means "not
A"). For instance, if one has a sample of 100, and both
A and B occur independently half the time ( P(A) = P(B) = 1 / 2 ), one obtains: So in 75 outcomes, either
A or B occurs, of which 50 have A occurring. By comparing the conditional probability of A to the unconditional probability of
A: :P(A\mid A \cup B) = 50 / 75 = 2 / 3 > P(A) = 50 / 100 = 1 / 2 We see that the probability of A is higher (2 / 3) in the subset of outcomes where (A \text{ or } B) occurs, than in the overall population (1 / 2). On the other hand, the probability of A given both B and (A \text{ or } B) is simply the unconditional probability of A, P(A), since A is independent of B. In the numerical example, we have conditioned on being in the top row: Here the probability of
A is 25 / 50 = 1 / 2. Berkson's paradox arises because the conditional probability of
A given B
within the three-cell subset equals the conditional probability in the overall population, but the unconditional probability within the subset is inflated relative to the unconditional probability in the overall population, hence, within the subset, the presence of B decreases the conditional probability of
A (back to its overall unconditional probability): :P(A\mid B, A \cup B) = P(A\mid B) = P(A) :P(A\mid A \cup B) > P(A) Because the effect of conditioning on (A \cup B) derives from the relative size of P(A\mid A \cup B) and P(A) the effect is particularly large when A is rare (P(A)\ll1) but very strongly correlated to B (P(A\mid B) \approx 1). For example, consider the case below where N is very large: For the case without conditioning on (A \cup B) we have :P(A) = 1/(N+1) :P(A\mid B) = 1 So A occurs rarely, unless B is present, when A occurs always. Thus B is dramatically increasing the likelihood of A. For the case with conditioning on (A \cup B) we have :P(A\mid A \cup B) = 1 :P(A\mid B, A \cup B) = P(A\mid B) = 1 Now
A occurs always, whether
B is present or not. So
B has no impact on the likelihood of
A. Thus we see that for highly correlated data a huge positive correlation of
B on
A can be effectively removed when one conditions on (A \cup B). == See also ==