Anderson acceleration

Given a function f:\mathbb{R}^n \to \mathbb{R}^n, consider the problem of finding a fixed point of f, which is a solution to the equation f(x) = x. A classical approach to the problem is to employ a fixed-point iteration scheme; that is, given an initial guess x_0 for the solution, to compute the sequence x_{k+1} = f(x_k) until some convergence criterion is met. However, the convergence of such a scheme is not guaranteed in general; moreover, the rate of convergence is usually linear, which can become too slow if the evaluation of the function f is computationally expensive. :x_1 = f(x_0) :\forall k = 1, 2, \dots ::m_k = \min\{m, k\} ::G_k = \begin{bmatrix} g_{k-m_k} & \dots & g_k \end{bmatrix} ::\alpha_k = \operatorname{argmin}_{\alpha\in A_k} \|G_k \alpha\|_2, \quad \text{where}\;A_k = \{\alpha = (\alpha_0, \dots, \alpha_{m_k}) \in \mathbb{R}^{m_k+1} : \sum_{i=0}^{m_k}\alpha_i = 1\} ::x_{k+1} = \sum_{i=0}^{m_k}(\alpha_k)_i f_{k-m_k+i} where the matrix–vector multiplication G_k \alpha = \sum_{i=0}^{m_k}(\alpha)_i g_{k-m_k+i}, and (\alpha)_i is the ith element of \alpha. Conventional stopping criteria can be used to end the iterations of the method. For example, iterations can be stopped when \|x_{k+1} - x_k\| falls under a prescribed tolerance, or when the residual g(x_k) falls under a prescribed tolerance. • solve \theta_k = \{(\theta_k)_i\}_{i = 1}^{m_k} = \operatorname{argmin}_{\theta\in\mathbb{R}^{m_k}}\left\|g_k + \sum_{i=1}^{m_k}\theta_i(g_{k-i} - g_k)\right\|_2, then set x_{k+1} = x_k + g_k + \sum_{j = 1}^{m_k}(\theta_k)_j(x_{k-j} - x_k + g_{k-j} - g_k). For both choices, the optimization problem is in the form of an unconstrained linear least-squares problem, which can be solved by standard methods including QR decomposition and singular value decomposition, possibly including regularization techniques to deal with rank deficiencies and conditioning issues in the optimization problem. Solving the least-squares problem by solving the normal equations is generally not advisable due to potential numerical instabilities and generally high computational cost. Stagnation in the method (i.e. subsequent iterations with the same value, x_{k+1} = x_k) causes the method to break down, due to the singularity of the least-squares problem. Similarly, near-stagnation (x_{k+1}\approx x_k) results in bad conditioning of the least squares problem. Moreover, the choice of the parameter m might be relevant in determining the conditioning of the least-squares problem, as discussed below. Relaxation The algorithm can be modified introducing a variable relaxation parameter (or mixing parameter) \beta_k > 0. At each step, compute the new iterate as x_{k+1} = (1 - \beta_k)\sum_{i=0}^{m_k}(\alpha_k)_i x_{k-m_k+i} + \beta_k \sum_{i=0}^{m_k}(\alpha_k)_i f(x_{k-m_k+i})\;.The choice of \beta_k is crucial to the convergence properties of the method; in principle, \beta_k might vary at each iteration, although it is often chosen to be constant. Choice of The parameter m determines how much information from previous iterations is used to compute the new iteration x_{k+1}. On the one hand, if m is chosen to be too small, too little information is used and convergence may be undesirably slow. On the other hand, if m is too large, information from old iterations may be retained for too many subsequent iterations, so that again convergence may be slow. Moreover, the choice of m affects the size of the optimization problem. A too large value of m may worsen the conditioning of the least squares problem and the cost of its solution. In general, the particular problem to be solved determines the best choice of the m parameter. Choice of With respect to the algorithm described above, the choice of m_k at each iteration can be modified. One possibility is to choose m_k = k for each iteration k (sometimes referred to as Anderson acceleration without truncation). This way, every new iteration x_{k+1} is computed using all the previously computed iterations. A more sophisticated technique is based on choosing m_k so as to maintain a small enough conditioning for the least-squares problem. == Relations to other classes of methods ==