Metropolis-adjusted Langevin algorithm

Let \pi denote a probability density function on \mathbb{R}^{d}, one from which it is desired to draw an ensemble of independent and identically distributed samples. We consider the overdamped Langevin Itô diffusion : \dot{X} = \nabla \log \pi(X) + \sqrt{2} \dot{W} driven by the time derivative of a standard Brownian motion W. (Note that another commonly used normalization for this diffusion is : \dot{X} = \frac{1}{2} \nabla \log \pi(X) + \dot{W}, which generates the same dynamics.) In the limit as t \to \infty, this probability distribution \rho(t) of X(t) approaches a stationary distribution, which is also invariant under the diffusion, which we denote \rho_\infty. It turns out that, in fact, \rho_\infty = \pi. Approximate sample paths of the Langevin diffusion can be generated by many discrete-time methods. One of the simplest is the Euler–Maruyama method with a fixed time step \tau > 0. We set X_0 := x_0 and then recursively define an approximation X_k to the true solution X(k \tau) by :X_{k + 1} := X_k + \tau \nabla \log \pi(X_k) + \sqrt{2 \tau} \xi_k, where each \xi_{k} is an independent draw from a multivariate normal distribution on \mathbb{R}^{d} with mean 0 and covariance matrix equal to the d \times d identity matrix. Note that X_{k + 1} is normally distributed with mean X_k + \tau \nabla \log \pi(X_k) and covariance equal to 2 \tau times the d \times d identity matrix. In contrast to the Euler–Maruyama method for simulating the Langevin diffusion, which always updates X_k according to the update rule :X_{k + 1} := X_k + \tau \nabla \log \pi(X_k) + \sqrt{2 \tau} \xi_k, MALA incorporates an additional step. We consider the above update rule as defining a proposal \tilde{X}_{k + 1} for a new state, :\tilde{X}_{k + 1} := X_k + \tau \nabla \log \pi(X_k) + \sqrt{2 \tau} \xi_k. This proposal is accepted or rejected according to the Metropolis-Hastings algorithm: set :\alpha := \min \left\{ 1 , \frac{\pi(\tilde{X}_{k + 1}) q(X_{k}\mid\tilde{X}_{k + 1})}{\pi({X}_{k}) q(\tilde{X}_{k + 1}\mid X_k)} \right\}, where :q(x'\mid x) \propto \exp \left( - \frac{1}{4 \tau} \| x' - x - \tau \nabla \log \pi(x) \|_2^2 \right) is the transition probability density from x to x' (note that, in general q(x'\mid x) \neq q(x\mid x')). Let u be drawn from the continuous uniform distribution on the interval [0, 1]. If u \leq \alpha, then the proposal is accepted, and we set X_{k + 1} := \tilde{X}_{k + 1}; otherwise, the proposal is rejected, and we set X_{k + 1} := X_k. The combined dynamics of the Langevin diffusion and the Metropolis–Hastings algorithm satisfy the detailed balance conditions necessary for the existence of a unique, invariant, stationary distribution \rho_{\infty} = \pi. Compared to naive Metropolis–Hastings, MALA has the advantage that it usually proposes moves into regions of higher \pi probability, which are then more likely to be accepted. On the other hand, when \pi is strongly anisotropic (i.e. it varies much more quickly in some directions than others), it is necessary to take 0 in order to properly capture the Langevin dynamics; the use of a positive-definite preconditioning matrix A \in \mathbb{R}^{d \times d} can help to alleviate this problem, by generating proposals according to :\tilde{X}_{k + 1} := X_k + \tau A \nabla \log \pi(X_k) + \sqrt{2 \tau A} \xi_k, so that \tilde{X}_{k + 1} has mean X_k + \tau A \nabla \log \pi(X_k) and covariance 2 \tau A. For limited classes of target distributions, the optimal acceptance rate for this algorithm can be shown to be 0.574; if it is discovered to be substantially different in practice, \tau should be modified accordingly. ==References==