Sub-Gaussian distribution

Subgaussian norm The subgaussian norm of X , denoted as \Vert X \Vert_{\psi_2} , is\Vert X \Vert_{\psi_2} = \inf\left\{ c>0 : \operatorname{E}\left[\exp{\left(\frac{X^2}{c^2}\right)}\right] \leq 2 \right\}.In other words, it is the Orlicz norm of X generated by the Orlicz function \Phi(u)=e^{u^2}-1. By condition (2) below, subgaussian random variables can be characterized as those random variables with finite subgaussian norm. Variance proxy If there exists a number s^2 \ge 0 such that \operatorname{E} [e^{(X-\operatorname{E}[X])t}] \leq e^{\frac{s^2t^2}{2}} for all t \in \mathbb{R}, then s^2 is called a variance proxy. The smallest such s^2 is called the optimal variance proxy and denoted by \Vert X\Vert_{\mathrm{vp}}^2. The optimal variance proxy and the subgaussian norm are related by \sqrt{3/8} \cdot \Vert X \Vert_{\psi_2} \leq \Vert X\Vert_{\mathrm{vp}} \leq \sqrt{\log 2} \cdot \Vert X \Vert_{\psi_2}, and both bounds are sharp, attained by the standard Gaussian and Rademacher distributions, respectively. For a Gaussian random variable X \sim \mathcal{N}(\mu,\sigma^2), one has \operatorname{E}[e^{(X-\operatorname{E}[X])t}] = e^{\frac{\sigma^2 t^2}{2}}, and therefore \|X\|_{\mathrm{vp}}^2 = \sigma^2. Equivalent definitions Let X be a random variable with zero mean. Let K_1, K_2, K_3, \dots be positive constants. The following conditions are equivalent: (Proposition 2.5.2 ) • Tail probability bound: \operatorname{P}(|X| \geq t) \leq 2 \exp{(-t^2/K_1^2)} for all t \geq 0; • Finite subgaussian norm: \Vert X \Vert_{\psi_2} = K_2 ; • Moment: \operatorname{E} |X|^p \leq 2K_3^p \Gamma\left(\frac{p}{2}+1\right) for all p \geq 1, where \Gamma is the Gamma function; • Moment: \operatorname{E}|X|^p\leq K^p p^{p/2} for all p \geq 1; • Moment-generating function (of X ), or variance proxy : \operatorname{E} [e^{(X-\operatorname{E}[X])t}] \leq e^{\frac{K^2t^2}{2}} for all t; • Moment-generating function (of X^2 ): \operatorname{E}[e^{X^2t^2}] \leq e^{K^2t^2} for all t \in [-1/K, +1/K]; • Union bound: for some c > 0, \ \operatorname{E}[\max\{|X_1 - \operatorname{E}[X]|,\ldots,|X_n - \operatorname{E}[X]|\}] \leq c \sqrt{\log n} for all n > c, where X_1, \ldots, X_n are i.i.d copies of X; • Subexponential: X^2 has a subexponential distribution. Furthermore, the constant K is the same in the definitions (1) to (5), up to an absolute constant. So for example, given a random variable satisfying (1) and (2), the minimal constants K_1, K_2 in the two definitions satisfy K_1 \leq cK_2, K_2 \leq c' K_1, where c, c' are constants independent of the random variable. Proof of equivalence As an example, the first four definitions are equivalent by the proof below. Proof. (1)\implies(3) By the layer cake representation,\begin{align} \operatorname{E} |X|^p &= \int_0^\infty \operatorname{P}(|X|^p \geq t) dt\\ &= \int_0^\infty pt^{p-1}\operatorname{P}(|X| \geq t) dt\\ &\leq 2\int_0^\infty pt^{p-1}\exp\left(-\frac{t^2}{K_1^2}\right) dt\\ \end{align} After a change of variables u=t^2/K_1^2, we find that\begin{align} \operatorname{E} |X|^p &\leq 2K_1^p \frac{p}{2}\int_0^\infty u^{\frac{p}{2}-1}e^{-u} du\\ &= 2K_1^p \frac{p}{2}\Gamma\left(\frac{p}{2}\right)\\ &= 2K_1^p \Gamma\left(\frac{p}{2}+1\right). \end{align}(3)\implies(2) By the Taylor series e^x = 1 + \sum_{p=1}^\infty \frac{x^p}{p!},\begin{align} \operatorname{E}[\exp{(\lambda X^2)}] &= 1 + \sum_{p=1}^\infty \frac{\lambda^p \operatorname{E}{[X^{2p}]}}{p!}\\ &\leq 1 + \sum_{p=1}^\infty \frac{2\lambda^p K_3^{2p} \Gamma\left(p+1\right)}{p!}\\ &= 1 + 2 \sum_{p=1}^\infty \lambda^p K_3^{2p}\\ &= 2 \sum_{p=0}^\infty \lambda^p K_3^{2p}-1\\ &= \frac{2}{1-\lambda K_3^2}-1 \quad\text{for }\lambda K_3^2 which is less than or equal to 2 for \lambda \leq \frac{1}{3K_3^2}. Let K_2 \geq 3^{\frac{1}{2}}K_3, then \operatorname{E}[\exp{(X^2/K_2^2)}] \leq 2. (2)\implies(1) By Markov's inequality,\operatorname{P}(|X|\geq t) = \operatorname{P}\left( \exp\left(\frac{X^2}{K_2^2}\right) \geq \exp\left(\frac{t^2}{K_2^2}\right) \right) \leq \frac{\operatorname{E}[\exp{(X^2/K_2^2)}]}{\exp\left(\frac{t^2}{K_2^2}\right)} \leq 2 \exp\left(-\frac{t^2}{K_2^2}\right). (3) \iff (4) by asymptotic formula for gamma function: \Gamma(p/2 + 1 ) \sim \sqrt{\pi p} \left(\frac{p}{2e} \right)^{p/2}. From the proof, we can extract a cycle of three inequalities: • If \operatorname{P}(|X| \geq t) \leq 2 \exp{(-t^2/K^2)} , then \operatorname{E} |X|^p \leq 2K^p \Gamma\left(\frac{p}{2}+1\right) for all p \geq 1 . • If \operatorname{E} |X|^p \leq 2K^p \Gamma\left(\frac{p}{2}+1\right) for all p \geq 1 , then \|X \|_{\psi_2} \leq 3^{\frac{1}{2}}K . • If \|X \|_{\psi_2} \leq K , then \operatorname{P}(|X| \geq t) \leq 2 \exp{(-t^2/K^2)} . In particular, the constant K provided by the definitions are the same up to a constant factor, so we can say that the definitions are equivalent up to a constant independent of X. Similarly, because up to a positive multiplicative constant, \Gamma(p/2 + 1) = p^{p/2} \times ((2e)^{-1/2}p^{1/2p})^p for all p \geq 1, the definitions (3) and (4) are also equivalent up to a constant. == Basic properties ==