We separate the case in which the measure space is a probability space from the more general case because the probability case is more accessible for the general reader.
Intuition \operatorname{E}(X) = \operatorname{P}(X where \operatorname{E}(X|X is larger than or equal to 0 as the random variable X is non-negative and \operatorname{E}(X|X\geq a) is larger than or equal to a because the conditional expectation only takes into account of values larger than or equal to a which r.v. X can take.
Property 1: \operatorname{P}(X Given a non-negative random variable X, the conditional expectation \operatorname{E}(X \mid X because X \geq 0. Also, probabilities are always non-negative, i.e., \operatorname{P}(X . Thus, the product: \operatorname{P}(X . This is intuitive since conditioning on X still results in non-negative values, ensuring the product remains non-negative.
Property 2: \operatorname{P}(X \geq a) \cdot \operatorname{E}(X \mid X \geq a) \geq a \cdot \operatorname{P}(X \geq a) For X \geq a , the expected value given X \geq a is at least a. \operatorname{E}(X \mid X \geq a) \geq a . Multiplying both sides by \operatorname{P}(X \geq a) , we get: \operatorname{P}(X \geq a) \cdot \operatorname{E}(X \mid X \geq a) \geq a \cdot \operatorname{P}(X \geq a). This is intuitive since all values considered are at least a, making their average also greater than or equal to a. Hence intuitively, \operatorname{E}(X)\geq \operatorname{P}(X \geq a)\cdot \operatorname{E}(X|X\geq a)\geq a \cdot \operatorname{P}(X\geq a), which directly leads to \operatorname{P}(X\geq a)\leq \frac{\operatorname{E}(X)}{a}.
Probability-theoretic proof Method 1: From the definition of expectation: :\operatorname{E}(X)=\int_{-\infty}^{\infty} xf(x) \, dx However, X is a non-negative random variable thus, :\operatorname{E}(X)=\int_{-\infty}^\infty xf(x) \, dx = \int_0^\infty xf(x) \, dx From this we can derive, :\operatorname{E}(X)=\int_0^a xf(x) \, dx + \int_a^\infty xf(x) \, dx \ge \int_a^\infty xf(x) \, dx \ge\int_a^\infty af(x) \, dx = a\int_a^\infty f(x) \, dx= a \operatorname{Pr}(X \ge a) From here, dividing through by a allows us to see that :\Pr(X \ge a) \le \operatorname{E}(X)/a
Method 2: For any event E, let I_E be the indicator random variable of E , that is, I_E=1 if E occurs and I_E=0 otherwise. Using this notation, we have I_{(X\geq a)}=1 if the event X\geq a occurs, and I_{(X\geq a)}=0 if X. Then, given a>0, :aI_{(X \geq a)} \leq X which is clear if we consider the two possible values of X\geq a. If X, then I_{(X\geq a)}=0, and so a I_{(X\geq a)}=0\leq X. Otherwise, we have X\geq a, for which I_{X\geq a}=1 and so aI_{X\geq a}=a\leq X. Since \operatorname{E} is a monotonically increasing function, taking expectation of both sides of an inequality cannot reverse it. Therefore, :\operatorname{E}(aI_{(X \geq a)}) \leq \operatorname{E}(X). Now, using linearity of expectations, the left side of this inequality is the same as :a\operatorname{E}(I_{(X \geq a)}) = a(1\cdot\operatorname{P}(X \geq a) + 0\cdot\operatorname{P}(X Thus we have :a\operatorname{P}(X \geq a) \leq \operatorname{E}(X) and since
a > 0, we can divide both sides by
a.
Measure-theoretic proof We may assume that the function f is non-negative, since only its absolute value enters in the equation. Now, consider the real-valued function
s on
X given by : s(x) = \begin{cases} \varepsilon, & \text{if } f(x) \geq \varepsilon \\ 0, & \text{if } f(x) Then 0\leq s(x)\leq f(x). By the definition of the
Lebesgue integral : \int_X f(x) \, d\mu \geq \int_X s(x) \, d \mu = \varepsilon \mu( \{ x\in X : \, f(x) \geq \varepsilon \} ) and since \varepsilon >0 , both sides can be divided by \varepsilon, obtaining :\mu(\{x\in X : \, f(x) \geq \varepsilon \}) \leq {1\over \varepsilon }\int_X f \,d\mu.
Discrete case We now provide a proof for the special case when X is a discrete random variable which only takes on non-negative integer values. Let a be a positive integer. By definition a\operatorname{Pr}(X > a) =a\operatorname{Pr}(X = a + 1) + a\operatorname{Pr}(X = a + 2) + a\operatorname{Pr}(X = a + 3) + ... \leq a\operatorname{Pr}(X = a) + (a+1)\operatorname{Pr}(X = a + 1) + (a+2)\operatorname{Pr}(X = a + 2) + ... \leq \operatorname{Pr}(X = 1) + 2\operatorname{Pr}(X = 2) + 3\operatorname{Pr}(X = 3) + ... +a\operatorname{Pr}(X = a ) + (a+1)\operatorname{Pr}(X = a + 1) + (a+2)\operatorname{Pr}(X = a + 2) + ... =\operatorname{E}(X) Dividing by a yields the desired result. ==Corollaries==