Min-max theorem

Let be a Hermitian matrix. As with many other variational results on eigenvalues, one considers the Rayleigh–Ritz quotient {{math|RA : Cn \ {0} → R}} defined by :R_A(x) = \frac{(Ax, x)}{(x,x)} where denotes the Euclidean inner product on . The Rayleigh quotient of an eigenvector v is its associated eigenvalue \lambda because R_A(v) = (\lambda x, x)/(x, x) = \lambda. For a Hermitian matrix A, the range of the continuous functions RA(x) is a compact interval [a, b] of the real line. The maximum b and the minimum a are the largest and smallest eigenvalue of A, respectively. The min-max theorem is a refinement of this fact. Min-max theorem Let A be Hermitian on an inner product space V with dimension n, with spectrum ordered in descending order \lambda_1 \geq ... \geq \lambda_n. Let v_1, ..., v_n be the corresponding unit-length orthogonal eigenvectors. Reverse the spectrum ordering, so that \xi_1 = \lambda_n, ..., \xi_n = \lambda_1. {{Math proof|title=Proof|proof= Part 2 is a corollary, using -A. M is a k dimensional subspace, so if we pick any list of n-k+1 vectors, their span N := span(v_k, ... v_n) must intersect M on at least a single line. Take unit x \in M\cap N. That’s what we need. : x = \sum_{i=k}^n a_i v_i, since x\in N. : Since \sum_{i=k}^n |a_i|^2 = 1, we find \langle x,Ax \rangle = \sum_{i=k}^n |a_i|^2\lambda_i \leq \lambda_k. }} {{Math theorem \lambda_k &=\max _{\begin{array}{c} \mathcal{M} \subset V \\ \operatorname{dim}(\mathcal{M})=k \end{array}} \min _{\begin{array}{c} x \in \mathcal{M} \\ \|x\|=1 \end{array}}\langle x, A x\rangle\\ &=\min _{\begin{array}{c} \mathcal{M} \subset V \\ \operatorname{dim}(\mathcal{M})=n-k+1 \end{array}} \max _{\begin{array}{c} x \in \mathcal{M} \\ \|x\|=1 \end{array}}\langle x, A x\rangle \text{. } \end{aligned} }} Define the partial trace tr_V(A) to be the trace of projection of A to V. It is equal to \sum_i v_i^*Av_i given an orthonormal basis of V. {{Math theorem|name=Wielandt minimax formula|note=|math_statement= Let 1 \leq i_1 be integers. Define a partial flag to be a nested collection V_1 \subset \cdots \subset V_k of subspaces of \mathbb{C}^n such that \operatorname{dim}\left(V_j\right)=i_j for all 1 \leq j \leq k. Define the associated Schubert variety X\left(V_1, \ldots, V_k\right) to be the collection of all k dimensional subspaces W such that \operatorname{dim}\left(W \cap V_j\right) \geq j. \lambda_{i_1}(A)+\cdots+\lambda_{i_k}(A)=\sup _{V_1, \ldots, V_k} \inf_{W \in X\left(V_1, \ldots, V_k\right)} tr_W(A) }} {{Math proof|title=Proof|proof= The \leq case. Let V_{j} = span(e_1, \dots, e_{i_j}), and any W \in X\left(V_1, \ldots, V_k\right), it remains to show that \lambda_{i_1}(A)+\cdots+\lambda_{i_k}(A) \leq tr_W(A) To show this, we construct an orthonormal set of vectors v_1, \dots, v_k such that v_j \in V_j \cap W. Then tr_W(A) \geq \sum_j \langle v_j, Av_j\rangle \geq \lambda_{i_j}(A) Since dim(V_1 \cap W) \geq 1, we pick any unit v_1 \in V_1 \cap W. Next, since dim(V_2 \cap W) \geq 2, we pick any unit v_2 \in (V_2 \cap W) that is perpendicular to v_1, and so on. The \geq case. For any such sequence of subspaces V_i, we must find some W \in X\left(V_1, \ldots, V_k\right) such that \lambda_{i_1}(A)+\cdots+\lambda_{i_k}(A) \geq tr_W(A) Now we prove this by induction. The n=1 case is the Courant-Fischer theorem. Assume now n \geq 2. If i_1 \geq 2, then we can apply induction. Let E = span(e_{i_1}, \dots, e_n). We construct a partial flag within E from the intersection of E with V_1, \dots, V_k. We begin by picking a (i_k-(i_1-1))-dimensional subspace W_k' \subset E \cap V_{i_k}, which exists by counting dimensions. This has codimension (i_1-1) within V_{i_k}. Then we go down by one space, to pick a (i_{k-1} - (i_1 - 1))-dimensional subspace W_{k-1}' \subset W_k \cap V_{i_{k-1}}. This still exists. Etc. Now since dim(E) \leq n-1, apply the induction hypothesis, there exists some W \in X(W_1, \dots, W_k) such that \lambda_{i_1 - (i_1-1)}(A|E)+\cdots+\lambda_{i_k- (i_1-1)}(A|E) \geq tr_W(A) Now \lambda_{i_j - (i_1-1)}(A|E) is the (i_j-(i_1-1))-th eigenvalue of A orthogonally projected down to E. By Cauchy interlacing theorem, \lambda_{i_j - (i_1-1)}(A|E) \leq \lambda_{i_j}(A). Since X(W_1, \dots, W_k)\subset X(V_1, \dots, V_k), we’re done. If i_1 = 1, then we perform a similar construction. Let E = span(e_{2}, \dots, e_n). If V_k \subset E, then we can induct. Otherwise, we construct a partial flag sequence W_2, \dots, W_k By induction, there exists some W' \in X(W_2, \dots, W_k)\subset X(V_2, \dots, V_k), such that \lambda_{i_2-1}(A|E)+\cdots+\lambda_{i_k-1}(A|E) \geq tr_{W'}(A) thus \lambda_{i_2}(A)+\cdots+\lambda_{i_k}(A) \geq tr_{W'}(A) And it remains to find some v such that W' \oplus v \in X(V_1, \dots, V_k). If V_1 \not\subset W', then any v \in V_1 \setminus W' would work. Otherwise, if V_2 \not\subset W', then any v \in V_2 \setminus W' would work, and so on. If none of these work, then it means V_k \subset E, contradiction. }} This has some corollaries: {{Math theorem|name=Extremal partial trace|note=|math_statement= \lambda_1(A)+\dots+\lambda_k(A)=\sup_{\operatorname{dim}(V)=k }tr_V(A) \xi_1(A)+\dots+\xi_k(A)=\inf_{\operatorname{dim}(V)=k }tr_V(A) }} {{Math theorem|name=Corollary|note=|math_statement= The sum \lambda_1(A)+\dots+\lambda_k(A) is a convex function, and \xi_1(A)+\dots+\xi_k(A) is concave. (Schur-Horn inequality) \xi_1(A)+\dots+\xi_k(A) \leq a_{i_1,i_1} + \dots + a_{i_k,i_k} \leq \lambda_1(A)+\dots+\lambda_k(A) for any subset of indices. Equivalently, this states that the diagonal vector of A is majorized by its eigenspectrum. }} {{Math theorem|name=Schatten-norm Hölder inequality|note=|math_statement= Given Hermitian A, B and Hölder pair 1/p + 1/q = 1, |\operatorname{tr}(A B)| \leq\|A\|_{S^p}\|B\|_{S^q} }} {{Math proof|title=Proof|proof= WLOG, B is diagonalized, then we need to show |\sum_i B_{ii} A_{ii} | \leq \|A \|_{S^p} \|(B_{ii})\|_{l^q} By the standard Hölder inequality, it suffices to show \|(A_{ii})\|_{l^p}\leq \|A \|_{S^p} By the Schur-Horn inequality, the diagonals of A are majorized by the eigenspectrum of A, and since the map f(x_1, \dots, x_n) = \|x\|_p is symmetric and convex, it is Schur-convex. }} Counterexample in the non-Hermitian case Let N be the nilpotent matrix :\begin{bmatrix} 0 & 1 \\ 0 & 0 \end{bmatrix}. Define the Rayleigh quotient R_N(x) exactly as above in the Hermitian case. Then it is easy to see that the only eigenvalue of N is zero, while the maximum value of the Rayleigh quotient is . That is, the maximum value of the Rayleigh quotient is larger than the maximum eigenvalue. == Applications ==