Symmetrization Lemma: Let V=\{x\in X^m:|Q_P(h)-\widehat{Q}_x(h)|\geq\varepsilon \text{ for some } h\in H\} and : R=\{(r,s)\in X^m \times X^m:|\widehat{Q_r}(h)-\widehat{Q}_s(h) |\geq \varepsilon /2 \text{ for some } h\in H\}. Then for m\geq\frac 2 {\varepsilon^2}, P^m(V)\leq 2P^{2m}(R). Proof: By the
triangle inequality, if |Q_{P}(h)-\widehat{Q}_r(h)|\geq\varepsilon and |Q_P(h)-\widehat{Q}_s (h)|\leq\varepsilon /2 then |\widehat{Q}_r(h)-\widehat{Q}_s (h)|\geq\varepsilon /2. Therefore, : \begin{align} & P^{2m}(R) \\[5pt] \geq {} & P^{2m}\{\exists h\in H,|Q_{P}(h)-\widehat{Q}_r(h)| \geq \varepsilon \text{ and } |Q_P(h)-\widehat{Q}_s(h)|\leq\varepsilon /2\} \\[5pt] = {} & \int_V P^m\{s:\exists h\in H,|Q_P(h)-\widehat{Q}_r(h)|\geq\varepsilon \text{ and } |Q_P(h)-\widehat{Q}_s(h)|\leq\varepsilon /2\} \, dP^m(r) \\[5pt] = {} & A \end{align} since r and s are independent. Now for r\in V fix an h\in H such that |Q_P(h)-\widehat{Q}_r(h)|\geq\varepsilon. For this h, we shall show that : P^m \left\{ |Q_P(h)-\widehat{Q}_s(h)|\leq\frac \varepsilon 2\right\} \geq\frac 1 2. Thus for any r\in V, A\geq\frac{P^m(V)}2 and hence P^{2m}(R)\geq\frac{P^m(V)}2. And hence we perform the first step of our high level idea. Notice, m\cdot \widehat{Q}_s(h) is a binomial
random variable with expectation m\cdot Q_{P}(h) and variance m\cdot Q_P(h)(1-Q_P(h)). By
Chebyshev's inequality we get : P^m \left\\sum_{\sigma\in\Gamma_m} \int_{X^{2m}} 1_R(\sigma(x)) \, dP^{2m}(x) \\[5pt] = {} & \int_{X^{2m}} \frac 1 \sum_{\sigma\in\Gamma_m} 1_R (\sigma(x)) \, dP^{2m}(x) \\[5pt] & \text{(because } |\Gamma_{m}| \text{ is finite)} \\[5pt] = {} & \int_{X^{2m}} \Pr[\sigma(x)\in R] \, dP^{2m}(x) \quad \text{(the expectation)} \\[5pt] \leq {} & \max_{x\in X^{2m}}(\Pr[\sigma(x)\in R]). \end{align} The maximum is guaranteed to exist since there is only a finite set of values that probability under a
random permutation can take.
Reduction to a finite class Lemma: Basing on the previous lemma, : \max_{x\in X^{2m}}(\Pr[\sigma(x)\in R])\leq 4\Pi_H(2m)e^{-\varepsilon^2 m/8} . Proof: Let us define x=(x_1,x_2,\ldots,x_{2m}) and t=|H|_x| which is at most \Pi_H(2m). This means there are functions h_1,h_2,\ldots,h_t\in H such that for any h\in H,\exists i between 1 and t with h_i(x_k)=h(x_k) for 1\leq k\leq 2m. We see that \sigma(x)\in R iff for some h in H satisfies, |\frac{1}{m}|\{1\leq i\leq m:h(x_{\sigma_{i}})=1\}|-\frac{1}{m}|\{m+1\leq i\leq 2m:h(x_{\sigma_{i}})=1\}||\geq\frac{\varepsilon}{2}. Hence if we define w^{j}_{i}=1 if h_{j}(x_{i})=1 and w^{j}_{i}=0 otherwise. For 1\leq i\leq m and 1\leq j\leq t, we have that \sigma(x)\in R iff for some j in {1,\ldots,t} satisfies |\frac 1 m \left(\sum_i w^j_{\sigma(i)}-\sum_i w^j_{\sigma(m+i)}\right)|\geq\frac \varepsilon 2 . By union bound we get : \Pr[\sigma(x)\in R]\leq t\cdot \max\left(\Pr[|\frac 1 m \left(\sum_i w^j_{\sigma_i} - \sum_i w^j_{\sigma_{m+i}}\right)| \geq \frac \varepsilon 2]\right) :\leq \Pi_{H}(2m)\cdot \max\left(\Pr\left[ \left| \frac 1 m \left(\sum_i w^j_{\sigma_i}-\sum_i w^j_{\sigma_{m+i}}\right)\right| \geq \frac \varepsilon 2 \right] \right). Since, the distribution over the permutations \sigma is uniform for each i, so w^j_{\sigma_i}-w^j_{\sigma_{m+i}} equals \pm |w^j_i-w^j_{m+i}|, with equal probability. Thus, : \Pr\left[\left|\frac 1 m \left(\sum_i \left(w^j_{\sigma_i}-w^j_{\sigma_{m+i}}\right)\right)\right|\geq\frac \varepsilon 2\right] = \Pr\left[ \left| \frac 1 m \left( \sum_i|w^j_i-w^j_{m+i}|\beta_i\right)\right|\geq\frac \varepsilon 2\right], where the probability on the right is over \beta_{i} and both the possibilities are equally likely. By
Hoeffding's inequality, this is at most 2e^{-m\varepsilon^2/8}. Finally, combining all the three parts of the proof we get the
Uniform Convergence Theorem. ==References==