LOBPCG

Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) is a matrix-free method for finding the largest eigenvalues and the corresponding eigenvectors of a symmetric generalized eigenvalue problem

Background

Kantorovich in 1948 proposed calculating the smallest eigenvalue \lambda_1 of a symmetric matrix A by steepest descent using a direction r = Ax-\lambda (x) x of a scaled gradient of a Rayleigh quotient \lambda(x) = \langle x, Ax \rangle/\langle x, x\rangle in a scalar product \langle x, y \rangle = x^\mathsf{T} y, with the step size computed by minimizing the Rayleigh quotient in the linear span of the vectors x and r, i.e. in a locally optimal manner. Samokish proposed applying a preconditioner T to the residual vector r to generate the preconditioned direction w = T r and derived asymptotic, as x approaches the eigenvector, convergence rate bounds. D'yakonov suggested spectrally equivalent preconditioning and derived non-asymptotic convergence rate bounds. Block locally optimal multi-step steepest descent for eigenvalue problems was described in. Local minimization of the Rayleigh quotient on the subspace spanned by the current approximation, the current residual and the previous approximation, as well as its block version, appeared in. The preconditioned version was analyzed in and. ==Main features==

Main features

Source: • Matrix-free, i.e. does not require storing the coefficient matrix explicitly, but can access the matrix by evaluating matrix-vector products. • Factorization-free, i.e. does not require any matrix decomposition even for a generalized eigenvalue problem. • The costs per iteration and the memory use are competitive with those of the Lanczos method, computing a single extreme eigenpair of a symmetric matrix. • Linear convergence is theoretically guaranteed and practically observed. • Accelerated convergence due to direct preconditioning, in contrast to the Lanczos method, including variable and non-symmetric as well as fixed and positive definite preconditioning. • Allows trivial incorporation of efficient domain decomposition and multigrid techniques via preconditioning. • Warm starts and computes an approximation to the eigenvector on every iteration. • More numerically stable compared to the Lanczos method, and can operate in low-precision computer arithmetic. • Easy to implement, with many versions already appeared. • Blocking allows utilizing highly efficient matrix-matrix operations, e.g., BLAS 3. • The block size can be tuned to balance convergence speed vs. computer costs of orthogonalizations and the Rayleigh-Ritz method on every iteration. ==Algorithm==

Algorithm

Single-vector version ====Preliminaries: Gradient descent for eigenvalue problems==== The method performs an iterative maximization (or minimization) of the generalized Rayleigh quotient \rho(x) := \rho(A,B; x) :=\frac{x^\mathsf{T} A x}{x^\mathsf{T} B x}, which results in finding largest (or smallest) eigenpairs of A x = \lambda B x. The direction of the steepest ascent, which is the gradient, of the generalized Rayleigh quotient is positively proportional to the vector r := Ax - \rho(x) Bx, called the eigenvector residual. If a preconditioner T is available, it is applied to the residual and gives the vector w := Tr, called the preconditioned residual. Without preconditioning, we set T := I and so An iterative method x^{i+1} := x^i + \alpha^i T \left(Ax^i - \rho(x^i) Bx^i\right), or, in short, \begin{align} x^{i+1} &:= x^i + \alpha^i w^i,\, \\ w^i &:= Tr^i,\, \\ r^i &:= Ax^i - \rho(x^i) Bx^i, \end{align} is known as preconditioned steepest ascent (or descent), where the scalar \alpha^i is called the step size. The optimal step size can be determined by maximizing the Rayleigh quotient, i.e., x^{i+1} := \arg\max_{y\in \operatorname{span}\{x^i,w^i\}} \rho(y) (or \arg\min in case of minimizing), in which case the method is called locally optimal. Three-term recurrence To dramatically accelerate the convergence of the locally optimal preconditioned steepest ascent (or descent), one extra vector can be added to the two-term recurrence relation to make it three-term: x^{i+1} := \arg\max_{y\in \operatorname{span}\{x^i,w^i,x^{i-1}\}} \rho(y) (use \arg\min in case of minimizing). The maximization/minimization of the Rayleigh quotient in a 3-dimensional subspace can be performed numerically by the Rayleigh–Ritz method. Adding more vectors, see, e.g., Richardson extrapolation, does not result in significant acceleration but increases computation costs, so is not generally recommended. Numerical stability improvements As the iterations converge, the vectors x^i and x^{i-1} become nearly linearly dependent, resulting in a precision loss and making the Rayleigh–Ritz method numerically unstable in the presence of round-off errors. The loss of precision may be avoided by substituting the vector x^{i-1} with a vector p^i, that may be further away from x^{i}, in the basis of the three-dimensional subspace \operatorname{span}\{x^i, w^i, x^{i-1}\}, while keeping the subspace unchanged and avoiding orthogonalization or any other extra operations. utilize unstable but efficient Cholesky decomposition of the normal matrix, which is performed only on individual matrices W^{i} and P^{i}, rather than on the whole subspace. The constantly increasing amount of computer memory allows typical block sizes nowadays in the 10^3-10^4 range, where the percentage of compute time spend on orthogonalizations and the Rayleigh-Ritz method starts dominating. Locking of previously converged eigenvectors Block methods for eigenvalue problems that iterate subspaces commonly have some of the iterative eigenvectors converged faster than others that motivates locking the already converged eigenvectors, i.e., removing them from the iterative loop, in order to eliminate unnecessary computations and improve numerical stability. A simple removal of an eigenvector may likely result in forming its duplicate in still iterating vectors. The fact that the eigenvectors of symmetric eigenvalue problems are pair-wise orthogonal suggest keeping all iterative vectors orthogonal to the locked vectors. Locking can be implemented differently maintaining numerical accuracy and stability while minimizing the compute costs. For example, LOBPCG implementations, separating hard locking, i.e. a deflation by restriction, where the locked eigenvectors serve as a code input and do not change, from soft locking, where the locked vectors do not participate in the typically most expensive iterative step of computing the residuals, however, fully participate in the Rayleigh—Ritz method and thus are allowed to be changed by the Rayleigh—Ritz method. Modifications, LOBPCG II LOBPCG includes all columns of matrices X^{i},\, W^{i}, and P^{i} into the Rayleigh–Ritz method resulting in an up to 3k-by-3k eigenvalue problem needed to solve and up to 9k^2 dot products to compute at every iteration, where k denotes the block size — the number of columns. For large block sizes k this starts dominating compute and I/O costs and limiting parallelization, where multiple compute devices are running simultaneously. The original LOBPCG paper goes further applying the LOBPCG algorithm to each approximate eigenvector separately, i.e., running the unblocked version of the LOBPCG method for each desired eigenpair for a fixed number of iterations. The Rayleigh-Ritz procedures in these runs only need to solve a set of 3 × 3 projected eigenvalue problems. The global Rayleigh-Ritz procedure for all desired eigenpairs is only applied periodically at the end of a fixed number of unblocked LOBPCG iterations. Such modifications may be less robust compared to the original LOBPCG. Individually running branches of the single-vector LOBPCG may not follow continuous iterative paths flipping instead and creating duplicated approximations to the same eigenvector. The single-vector LOBPCG may be unsuitable for clustered eigenvalues, but separate small-block LOBPCG runs require determining their block sizes automatically during the process of iterations since the number of the clusters of eigenvalues and their sizes may be unknown a priori. ==Convergence theory and practice==

Convergence theory and practice

LOBPCG by construction is guaranteed to minimize the Rayleigh quotient not slower than the block steepest gradient descent, which has a comprehensive convergence theory. Every eigenvector is a stationary point of the Rayleigh quotient, where the gradient vanishes. Thus, the gradient descent may slow down in a vicinity of any eigenvector, however, it is guaranteed to either converge to the eigenvector with a linear convergence rate or, if this eigenvector is a saddle point, the iterative Rayleigh quotient is more likely to drop down below the corresponding eigenvalue and start converging linearly to the next eigenvalue below. The worst value of the linear convergence rate has been determined and depends on the relative gap between the eigenvalue and the rest of the matrix spectrum and the quality of the preconditioner, if present. For a general matrix, there is evidently no way to predict the eigenvectors and thus generate the initial approximations that always work well. The iterative solution by LOBPCG may be sensitive to the initial eigenvectors approximations, e.g., taking longer to converge slowing down as passing intermediate eigenpairs. Moreover, in theory, one cannot guarantee convergence necessarily to the smallest eigenpair, although the probability of the miss is zero. A good quality random Gaussian function with the zero mean is commonly the default in LOBPCG to generate the initial approximations. To fix the initial approximations, one can select a fixed seed for the random number generator. In contrast to the Lanczos method, LOBPCG rarely exhibits asymptotic superlinear convergence in practice. ==Partial Principal component analysis (PCA) and Singular Value Decomposition (SVD)==

Partial [[Principal component analysis]] (PCA) and [[Singular Value Decomposition]] (SVD)

LOBPCG can be trivially adapted for computing several largest singular values and the corresponding singular vectors (partial SVD), e.g., for iterative computation of PCA, for a data matrix with zero mean, without explicitly computing the covariance matrix , i.e. in matrix-free fashion. The main calculation is evaluation of a function of the product of the covariance matrix and the block-vector that iteratively approximates the desired singular vectors. PCA needs the largest eigenvalues of the covariance matrix, while LOBPCG is typically implemented to calculate the smallest ones. A simple work-around is to negate the function, substituting for and thus reversing the order of the eigenvalues, since LOBPCG does not care if the matrix of the eigenvalue problem is positive definite or not. ==General software implementations==

General software implementations

LOBPCG's inventor, Andrew Knyazev, published a reference implementation called Block Locally Optimal Preconditioned Eigenvalue Xolvers (BLOPEX) with interfaces to PETSc, hypre, and Parallel Hierarchical Adaptive MultiLevel method (PHAML). Other implementations are available in, e.g., GNU Octave, MATLAB (including for distributed or tiling arrays), Anasazi (Trilinos), SLEPc, SciPy, MAGMA, Pytorch, Rust, OpenMP and OpenACC, CuPy (A NumPy-compatible array library accelerated by CUDA), Google JAX, and NVIDIA AMGX. LOBPCG is implemented, but not included, in TensorFlow. ==Applications==

Applications

===Data mining=== Software packages scikit-learn and Megaman use LOBPCG to scale spectral clustering and manifold learning via Laplacian eigenmaps to large data sets. NVIDIA has implemented LOBPCG in its nvGRAPH library introduced in CUDA 8. Sphynx, a hybrid distributed- and shared-memory-enabled parallel graph partitioner - the first graph partitioning tool that works on GPUs on distributed-memory settings - uses spectral clustering for graph partitioning, computing eigenvectors on the Laplacian matrix of the graph using LOBPCG from the Anasazi package. ===Material sciences=== LOBPCG is implemented in ABINIT (including CUDA version) and Octopus. It has been used for multi-billion size matrices by Gordon Bell Prize finalists, on the Earth Simulator supercomputer in Japan. Hubbard model for strongly-correlated electron systems to understand the mechanism behind the superconductivity uses LOBPCG to calculate the ground state of the Hamiltonian on the K computer and multi-GPU systems. There are MATLAB and Julia versions of LOBPCG for Kohn-Sham equations and density functional theory (DFT) using the plane wave basis. Recent implementations include TTPY, Platypus‐QM, MFDn, ACE-Molecule, LACONIC. ===Mechanics and fluids=== LOBPCG from BLOPEX is used for preconditioner setup in Multilevel Balancing Domain Decomposition by Constraints (BDDC) solver library BDDCML, which is incorporated into OpenFTL (Open Finite element Template Library) and Flow123d simulator of underground water flow, solute and heat transport in fractured porous media. LOBPCG has been implemented in LS-DYNA and indirectly in ANSYS. ===Maxwell's equations=== LOBPCG is one of core eigenvalue solvers in PYFEMax and high performance multiphysics finite element software Netgen/NGSolve. LOBPCG from hypre is incorporated into open source lightweight scalable C++ library for finite element methods MFEM, which is used in many projects, including BLAST, XBraid, VisIt, xSDK, the FASTMath institute in SciDAC, and the co-design Center for Efficient Exascale Discretizations (CEED) in the Exascale computing Project. ===Denoising=== Iterative LOBPCG-based approximate low-pass filter can be used for denoising; see, e.g., to accelerate total variation denoising. ===Image segmentation=== Image segmentation via spectral clustering performs a low-dimension embedding using an affinity matrix between pixels, followed by clustering of the components of the eigenvectors in the low dimensional space, e.g., using the graph Laplacian for the bilateral filter. Image segmentation via spectral graph partitioning by LOBPCG with multigrid preconditioning has been first proposed in and actually tested in and. The latter approach has been later implemented in Python scikit-learn that uses LOBPCG from SciPy with algebraic multigrid preconditioning for solving the eigenvalue problem for the graph Laplacian. ==References==

Source: Wikipedia ↗

tickerdossier.com tickerdossier.substack.com