Analysis of Boolean functions

We will mostly consider functions defined on the domain \{-1,1\}^n. Sometimes it is more convenient to work with the domain \{0,1\}^n instead. If f is defined on \{-1,1\}^n, then the corresponding function defined on \{0,1\}^n is :f_{01}(x_1,\ldots,x_n) = f((-1)^{x_1},\ldots,(-1)^{x_n}). Similarly, for us a Boolean function is a \{-1,1\}-valued function, though often it is more convenient to consider \{0,1\}-valued functions instead. Fourier expansion Every real-valued function f\colon \{-1,1\}^n \to \mathbb{R} has a unique expansion as a multilinear polynomial: : f(x) = \sum_{S \subseteq [n]} \hat{f}(S) \chi_S(x), \quad \chi_S(x) = \prod_{i \in S} x_i. (Note that even if the function is 0-1 valued this is not a sum mod 2, but just an ordinary sum of real numbers.) This is the Hadamard transform of the function f, which is the Fourier transform in the group \mathbb{Z}_2^n. The coefficients \hat{f}(S) are known as Fourier coefficients, and the entire sum is known as the Fourier expansion of f. The functions \chi_S are known as Fourier characters, and they form an orthonormal basis for the space of all functions over \{-1,1\}^n, with respect to the inner product \langle f,g \rangle = 2^{-n} \sum_{x \in \{-1,1\}^n} f(x) g(x). The Fourier coefficients can be calculated using an inner product: : \hat{f}(S) = \langle f, \chi_S \rangle. In particular, this shows that \hat{f}(\emptyset) = \operatorname{E}[f], where the expected value is taken with respect to the uniform distribution over \{-1,1\}^n. Parseval's identity states that : \|f\|^2 = \operatorname{E}[f^2] = \sum_S \hat{f}(S)^2. If we skip S = \emptyset, then we get the variance of f: : \operatorname{Var}[f] = \sum_{S \neq \emptyset} \hat{f}(S)^2. Fourier degree and Fourier levels The degree of a function f\colon \{-1,1\}^n \to \mathbb{R} is the maximum d such that \hat{f}(S) \neq 0 for some set S of size d. In other words, the degree of f is its degree as a multilinear polynomial. It is convenient to decompose the Fourier expansion into levels: the Fourier coefficient \hat{f}(S) is on level |S|. The degree d part of f is : f^{=d} = \sum_ \hat{f}(S)^2. For 0 \leq \delta \leq 1, the noise sensitivity of f at \delta is : \operatorname{NS}_\delta[f] = \frac{1}{2} - \frac{1}{2} \operatorname{Stab}_{1-2\delta}[f]. If f is Boolean, then this is the probability that the value of f changes if we flip each coordinate with probability \delta, independently. Noise operator The noise operator T_\rho is an operator taking a function f\colon \{-1,1\}^n \to \mathbb{R} and returning another function T_\rho f\colon \{-1,1\}^n \to \mathbb{R} given by : (T_\rho f)(x) = \operatorname{E}_{y \sim N_\rho(x)}[f(y)] = \sum_{S \subseteq [n]} \rho^ \hat{f}(S) \chi_S. When \rho > 0, the noise operator can also be defined using a continuous-time Markov chain in which each bit is flipped independently with rate 1. The operator T_\rho corresponds to running this Markov chain for \frac{1}{2}\log\frac{1}{\rho} steps starting at x, and taking the average value of f at the final state. This Markov chain is generated by the Laplacian of the Hamming graph, and this relates total influence to the noise operator. Noise stability can be defined in terms of the noise operator: \operatorname{Stab}_\rho[f] = \langle f, T_\rho f \rangle . Hypercontractivity For 1 \leq q , the L_q-norm of a function f\colon \{-1,1\}^n \to \mathbb{R} is defined by : \|f\|_q = \sqrt[q]{\operatorname{E}[|f|^q]}. We also define \|f\|_\infty = \max_{x \in \{-1,1\}^n} |f(x)|. The hypercontractivity theorem states that for all p > q > 1, if |\rho| \leq \sqrt{\frac{q-1}{p-1}} then : \|T_\rho f\|_p \leq \|f\|_q. Hypercontractivity is closely related to the logarithmic Sobolev inequalities of functional analysis. A similar result for 1 > p > q is known as reverse hypercontractivity. It states that if |\rho| \leq \sqrt{\frac{1-p}{1-q}} then : \|T_\rho f\|_q \geq \|f\|_p. p-Biased analysis In many situations the input to the function is not uniformly distributed over \{-1,1\}^n, but instead has a bias toward -1 or 1. In these situations it is customary to consider functions over the domain \{0,1\}^n. For 0 , the p-biased measure \mu_p is given by : \mu_p(x) = p^{\sum_i x_i} (1-p)^{\sum_i (1-x_i)}. This measure can be generated by choosing each coordinate independently to be 1 with probability p and 0 with probability 1-p. The classical Fourier characters are no longer orthogonal with respect to this measure. Instead, we use the following characters: : \omega_S(x) = \left(\sqrt{\frac{p}{1-p}}\right)^ \left(-\sqrt{\frac{1-p}{p}}\right)^. The p-biased Fourier expansion of f is the expansion of f as a linear combination of p-biased characters: : f = \sum_{S \subseteq [n]} \hat{f}(S) \omega_S. We can extend the definitions of influence and the noise operator to the p-biased setting by using their spectral definitions. Influence The i's influence is given by : \operatorname{Inf}_i[f] = \sum_{S \ni i} \hat{f}(S)^2 = p(1-p) \operatorname{E}[(f-f^{\oplus i})^2]. The total influence is the sum of the individual influences: :\operatorname{Inf}[f] = \sum_{i=1}^n \operatorname{Inf}_i[f] = \sum_{S} |S| \hat{f}(S)^2 . Noise operator A pair of \rho-correlated random variables can be obtained by choosing x,z \sim \mu_p independently and y \sim N_\rho(x), where N_\rho is given by : y_i = \begin{cases} x_i & \text{w.p. } \rho, \\ z_i & \text{w.p. } 1-\rho. \end{cases} The noise operator is then given by : (T_\rho f)(x) = \sum_{S \subseteq [n]} \rho^ \hat{f}(S) \omega_S(x) = \operatorname{E}_{y \sim N_\rho(x)} [f(y)]. Using this we can define the noise stability and the noise sensitivity, as before. Russo–Margulis formula The Russo–Margulis formula (also called the Margulis–Russo formula) states that for monotone Boolean functions f\colon \{0,1\}^n \to \{0,1\}, : \frac{d}{dp} \operatorname{E}_{x \sim \mu_p} [f(x)] = \frac{\operatorname{Inf}[f]}{p(1-p)} = \sum_{i=1}^n \Pr[f \neq f^{\oplus i}]. Both the influence and the probabilities are taken with respect to \mu_p, and on the right-hand side we have the average sensitivity of f. If we think of f as a property, then the formula states that as p varies, the derivative of the probability that f occurs at p equals the average sensitivity at p. The Russo–Margulis formula is key for proving sharp threshold theorems such as Friedgut's. Gaussian space One of the deepest results in the area, the invariance principle, connects the distribution of functions on the Boolean cube \{-1,1\}^n to their distribution on Gaussian space, which is the space \mathbb{R}^n endowed with the standard n-dimensional Gaussian measure. Many of the basic concepts of Fourier analysis on the Boolean cube have counterparts in Gaussian space: • The counterpart of the Fourier expansion in Gaussian space is the Hermite expansion, which is an expansion to an infinite sum (converging in L^2) of multivariate Hermite polynomials. • The counterpart of total influence or average sensitivity for the indicator function of a set is Gaussian surface area, which is the Minkowski content of the boundary of the set. • The counterpart of the noise operator is the Ornstein–Uhlenbeck operator (related to the Mehler transform), given by (U_\rho f)(x) = \operatorname{E}_{z \sim N(0,1)}[f(\rho x + \sqrt{1-\rho^2}z)], or alternatively by (U_\rho f)(x) = \operatorname{E}[f(y)], where x,y is a pair of \rho-correlated standard Gaussians. • Hypercontractivity holds (with appropriate parameters) in Gaussian space as well. Gaussian space is more symmetric than the Boolean cube (for example, it is rotation invariant), and supports continuous arguments which may be harder to get through in the discrete setting of the Boolean cube. The invariance principle links the two settings, and allows deducing results on the Boolean cube from results on Gaussian space. ==Basic results==