Several extensions of Chebyshev's inequality have been developed.
Selberg's inequality Selberg derived a generalization to arbitrary intervals. Suppose X is a random variable with mean \mu and variance \sigma^2. Selberg's inequality states that if \beta \geq \alpha \geq 0, : \Pr( X \in [\mu - \alpha, \mu + \beta] ) \ge \begin{cases}\frac{ \alpha^2 }{\alpha^2 + \sigma^2} &\text{if } \alpha(\beta-\alpha) \geq 2\sigma^2 \\ \frac{4\alpha\beta - 4\sigma^2}{(\alpha + \beta)^2} &\text{if } 2\alpha\beta \geq 2\sigma^2 \geq \alpha(\beta - \alpha) \\ 0 & \sigma^2 \geq \alpha\beta\end{cases} When \alpha = \beta, this reduces to Chebyshev's inequality. These are known to be the best possible bounds.
Finite-dimensional vector Chebyshev's inequality naturally extends to the multivariate setting, where one has n random variables X_i with mean \mu_i and variance \sigma_i^2. Then the following inequality holds. :\Pr\left(\sum_{i=1}^n (X_i - \mu_i)^2 \ge k^2 \sum_{i=1}^n \sigma_i^2 \right) \le \frac{1}{k^2} This is known as the Birnbaum–Raymond–Zuckerman inequality after the authors who proved it for two dimensions. This result can be rewritten in terms of
vectors X = (X_1, X_2, \ldots) with mean \mu = (\mu_1, \mu_2, \ldots), standard deviation \sigma = (\sigma_1, \sigma_2, \ldots), in the Euclidean norm || \cdot ||. : \Pr(\| X - \mu \| \ge k \| \sigma \|) \le \frac{ 1 } { k^2 }. One can also get a similar
infinite-dimensional Chebyshev's inequality. A second related inequality has also been derived by Chen. Let n be the
dimension of the stochastic vector X and let \operatorname{E}(X) be the mean of X. Let S be the
covariance matrix and k > 0. Then : \Pr \left( ( X - \operatorname{E}(X) )^T S^{-1} (X - \operatorname{E}(X)) where Y^T is the
transpose of Y. The inequality can be written in terms of the
Mahalanobis distance as : \Pr \left( d^2_S(X,\operatorname{E}(X)) where the Mahalanobis distance based on S is defined by : d_S(x,y) =\sqrt{ (x -y)^T S^{-1} (x -y) } Navarro proved that these bounds are sharp, that is, they are the best possible bounds for that regions when we just know the mean and the covariance matrix of X. Stellato et al. showed that this multivariate version of the Chebyshev inequality can be easily derived analytically as a special case of Vandenberghe et al. where the bound is computed by solving a
semidefinite program (SDP). Known correlation If the variables are independent this inequality can be sharpened. :\Pr\left (\bigcap_{i = 1}^n \frac{\sigma_i} \le k_i \right ) \ge \prod_{i=1}^n \left (1 - \frac{1}{k_i^2} \right) Berge derived an inequality for two correlated variables X_1, X_2. Let \rho be the correlation coefficient between X_1 and X_2 and let \sigma_i^2 be the variance of X_i. Then : \Pr\left( \bigcap_{ i = 1}^2 \left[ \frac{ | X_i - \mu_i | } { \sigma_i } This result can be sharpened to having different bounds for the two random variables and having asymmetric bounds, as in Selberg's inequality. Olkin and Pratt derived an inequality for n correlated variables. : \Pr\left(\bigcap_{i = 1 }^n \frac{\sigma_i} where the sum is taken over the n variables and : u = \sum_{i=1}^n \frac{1}{ k_i^2} + 2\sum_{i=1}^n \sum_{j where \rho_{ij} is the correlation between X_i and X_j. Olkin and Pratt's inequality was subsequently generalised by Godwin.
Higher moments Mitzenmacher and
Upfal note that by applying Markov's inequality to the nonnegative variable | X - \operatorname{E}(X) |^n, one can get a family of tail bounds : \Pr\left(| X - \operatorname{E}(X) | \ge k \operatorname{E}(|X - \operatorname{E}(X) |^n )^{ \frac{1}{n} }\right) \le \frac{1 } {k^n}, \qquad k >0,\ n \geq 2. For n = 2 we obtain Chebyshev's inequality. For k \geq 1,\ n > 4 and assuming that the nth moment exists, this bound is tighter than Chebyshev's inequality. This strategy, called the
method of moments, is often used to prove tail bounds.
Exponential moment A related inequality sometimes known as the exponential Chebyshev's inequality is the inequality : \Pr(X \ge \varepsilon) \le e^{ -t \varepsilon }\operatorname{E}\left (e^{ t X } \right), \qquad t > 0. Let K(t) be the
cumulant generating function, : K( t ) = \log \left(\operatorname{E}\left( e^{ t x } \right) \right). Taking the
Legendre–Fenchel transformation of K(t) and using the exponential Chebyshev's inequality we have : -\log( \Pr (X \ge \varepsilon )) \ge \sup_t( t \varepsilon - K( t ) ). This inequality may be used to obtain exponential inequalities for unbounded variables.
Bounded variables If \Pr(x) has finite support based on the interval [a, b], let M = \max(|a|, |b|), where |x| is the
absolute value of x. If the mean of \Pr(x) is zero then for all k > 0 : \frac{\operatorname{E}(|X|^r ) - k^r }{M^r} \le \Pr( | X | \ge k ) \le \frac{\operatorname{E}(| X |^r ) }{ k^r }. The second of these inequalities with r = 2 is the Chebyshev bound. The first provides a lower bound for the value of \Pr(x). ==Finite samples==