In the case of a linear forward map and when we deal with a finite number of model parameters, the forward map can be written as a
linear system d = Fp where F is the
matrix that characterizes the forward map. The linear system can be systematically solved by means of both regularization and Bayesian methods.
An elementary example: Earth's gravitational field Only a few physical systems are actually linear with respect to the model parameters. One such system from geophysics is that of the
Earth's gravitational field. The Earth's gravitational field is determined by the density distribution of the Earth in the subsurface. Because the
lithology of the Earth changes quite significantly, we are able to observe minute differences in the Earth's gravitational field on the surface of the Earth. From our understanding of gravity (Newton's Law of Gravitation), we know that the mathematical expression for gravity is: d= \frac{G p}{r^2}; here d is a measure of the local gravitational acceleration, G is the
universal gravitational constant, p is the local mass (which is related to density) of the rock in the subsurface and r is the distance from the mass to the observation point. By discretizing the above expression, we are able to relate the discrete data observations on the surface of the Earth to the discrete model parameters (density) in the subsurface that we wish to know more about. For example, consider the case where we have measurements carried out at 5 locations on the surface of the Earth. In this case, our data vector, d is a column vector of dimension (5×1): its i-th component is associated with the i-th observation location. We also know that we only have five unknown masses p_j in the subsurface (unrealistic but used to demonstrate the concept) with known location: we denote by r_{ij} the distance between the i-th observation location and the j-th mass. Thus, we can construct the linear system relating the five unknown masses to the five data points as follows: d = F p, d = \begin{bmatrix} d_1 \\ d_2 \\ d_3 \\ d_4 \\ d_5 \end{bmatrix}, \quad p = \begin{bmatrix} p_1 \\ p_2 \\ p_3 \\ p_4 \\ p_5 \end{bmatrix}, F = \begin{bmatrix} \frac{G}{r_{11}^2} & \frac{G}{r_{12}^2} & \frac{G}{r_{13}^2} & \frac{G}{r_{14}^2} & \frac{G}{r_{15}^2} \\ \frac{G}{r_{21}^2} & \frac{G}{r_{22}^2} & \frac{G}{r_{23}^2} & \frac{G}{r_{24}^2} & \frac{G}{r_{25}^2} \\ \frac{G}{r_{31}^2} & \frac{G}{r_{32}^2} & \frac{G}{r_{33}^2} & \frac{G}{r_{34}^2} & \frac{G}{r_{35}^2} \\ \frac{G}{r_{41}^2} & \frac{G}{r_{42}^2} & \frac{G}{r_{43}^2} & \frac{G}{r_{44}^2} & \frac{G}{r_{45}^2} \\ \frac{G}{r_{51}^2} & \frac{G}{r_{52}^2} & \frac{G}{r_{53}^2} & \frac{G}{r_{54}^2} & \frac{G}{r_{55}^2} \end{bmatrix} To solve for the model parameters that fit our data, we might be able to invert the matrix F to directly convert the measurements into our model parameters. For example: p = F^{-1} d_\text{obs} A system with five equations and five unknowns is a very specific situation: our example was designed to end up with this specificity. In general, the numbers of data and unknowns are different so that matrix F is not square. However, even a square matrix can have no inverse: matrix F can be
rank deficient (i.e. has zero eigenvalues) and the solution of the system p = F^{-1} d_\text{obs} is not unique. Then the solution of the inverse problem will be undetermined. This is a first difficulty. Over-determined systems (more equations than unknowns) have other issues. Also noise may corrupt our observations making d possibly outside the space F(P) of possible responses to model parameters so that solution of the system p = F^{-1} d_\text{obs} may not exist. This is another difficulty.
Tools to overcome the first difficulty The first difficulty reflects a crucial problem: Our observations do not contain enough information and additional data are required. Additional data can come from physical
prior information on the parameter values, on their spatial distribution or, more generally, on their mutual dependence. It can also come from other experiments: For instance, we may think of integrating data recorded by gravimeters and seismographs for a better estimation of densities. The integration of this additional information is basically a problem of
statistics. This discipline is the one that can answer the question: How to mix quantities of different nature? We will be more precise in the section "Bayesian approach" below. Concerning distributed parameters, prior information about their spatial distribution often consists of information about some derivatives of these distributed parameters. Also, it is common practice, although somewhat artificial, to look for the "simplest" model that reasonably matches the data. This is usually achieved by
penalizing the
L^1 norm of the gradient (or the
total variation) of the parameters (this approach is also referred to as the maximization of the entropy). One can also make the model simple through a parametrization that introduces degrees of freedom only when necessary. Additional information may also be integrated through inequality constraints on the model parameters or some functions of them. Such constraints are important to avoid unrealistic values for the parameters (negative values for instance). In this case, the space spanned by model parameters will no longer be a vector space but a
subset of admissible models denoted by P_\text{adm} in the sequel.
Tools to overcome the second difficulty As mentioned above, noise may be such that our measurements are not the image of any model, so that we cannot look for a model that produces the data but rather look for
the best (or optimal) model: that is, the one that best matches the data. This leads us to minimize an
objective function, namely a
functional that quantifies how big the residuals are or how far the predicted data are from the observed data. Of course, when we have perfect data (i.e. no noise) then the recovered model should fit the observed data perfectly. A standard objective function, \varphi, is of the form: \varphi(p) = \|F p-d_\text{obs} \|^2 where \| \cdot \| is the Euclidean norm (it will be the
L^2 norm when the measurements are functions instead of samples) of the residuals. This approach amounts to making use of
ordinary least squares, an approach widely used in statistics. However, the Euclidean norm is known to be very sensitive to outliers: to avoid this difficulty we may think of using other distances, for instance the L^1 norm, in replacement of the L^2 norm.
Bayesian approach Very similar to the least-squares approach is the probabilistic approach: If we know the statistics of the noise that contaminates the data, we can think of seeking the most likely model m, which is the model that matches the
maximum likelihood criterion. If the noise is
Gaussian, the maximum likelihood criterion appears as a least-squares criterion, the Euclidean scalar product in data space being replaced by a scalar product involving the
co-variance of the noise. Also, should prior information on model parameters be available, we could think of using
Bayesian inference to formulate the solution of the inverse problem. This approach is described in detail in Tarantola's book.
Numerical solution of our elementary example Here we make use of the Euclidean norm to quantify the data misfits. As we deal with a linear inverse problem, the objective function is quadratic. For its minimization, it is classical to compute its gradient using the same rationale (as we would to minimize a function of only one variable). At the optimal model p_\text{opt}, this gradient vanishes which can be written as: \nabla_p \varphi = 2 (F^\mathrm{T} F p_\text{opt} - F^\mathrm{T} d_\text{obs}) = 0 where
FT denotes the
matrix transpose of
F. This equation simplifies to: F^\mathrm{T} F p_\text{opt} = F^\mathrm{T} d_\text{obs} This expression is known as the normal equation and gives us a possible solution to the inverse problem. In our example matrix F^\mathrm{T} F turns out to be generally full rank so that the equation above makes sense and determines uniquely the model parameters: we do not need integrating additional information for ending up with a unique solution.
Mathematical and computational aspects Inverse problems are typically ill-posed, as opposed to the
well-posed problems usually met in mathematical modeling. Of the three conditions for a
well-posed problem suggested by
Jacques Hadamard (existence, uniqueness, and stability of the solution or solutions) the condition of stability is most often violated. In the sense of
functional analysis, the inverse problem is represented by a mapping between
metric spaces. While inverse problems are often formulated in infinite dimensional spaces, limitations to a finite number of measurements, and the practical consideration of recovering only a finite number of unknown parameters, may lead to the problems being recast in discrete form. In this case the inverse problem will typically be
ill-conditioned. In these cases,
regularization may be used to introduce mild assumptions on the solution and prevent
overfitting. Many instances of regularized inverse problems can be interpreted as special cases of
Bayesian inference.
Numerical solution of the optimization problem Some inverse problems have a very simple solution, for instance, when one has a set of
unisolvent functions, meaning a set of functions such that evaluating them at distinct points yields a set of
linearly independent vectors. This means that given a
linear combination of these functions, the coefficients can be computed by arranging the vectors as the columns of a matrix and then inverting this matrix. The simplest example of unisolvent functions is polynomials constructed, using the
unisolvence theorem, so as to be unisolvent. Concretely, this is done by inverting the
Vandermonde matrix. But this a very specific situation. In general, the solution of an inverse problem requires sophisticated optimization algorithms. When the model is described by a large number of parameters (the number of unknowns involved in some diffraction tomography applications can reach one billion), solving the linear system associated with the normal equations can be cumbersome. The numerical method to be used for solving the optimization problem depends in particular on the cost required for computing the solution F p of the forward problem. Once chosen the appropriate algorithm for solving the forward problem (a straightforward matrix-vector multiplication may be not adequate when matrix F is huge), the appropriate algorithm for carrying out the minimization can be found in textbooks dealing with numerical methods for the solution of linear systems and for the minimization of quadratic functions (see for instance Ciarlet or Nocedal). Also, the user may wish to add physical constraints to the models: In this case, they have to be familiar with
constrained optimization methods, a subject in itself. In all cases, computing the gradient of the objective function often is a key element for the solution of the optimization problem. As mentioned above, information about the spatial distribution of a distributed parameter can be introduced through the parametrization. One can also think of adapting this parametrization during the optimization. Should the objective function be based on a norm other than the Euclidean norm, we have to leave the area of quadratic optimization. As a result, the optimization problem becomes more difficult. In particular, when the L^1 norm is used for quantifying the data misfit the objective function is no longer differentiable: its gradient does not make sense any longer. Dedicated methods (see for instance Lemaréchal) from non differentiable optimization come in. Once the optimal model is computed we have to address the question: "Can we trust this model?" The question can be formulated as follows: How large is the set of models that match the data "nearly as well" as this model? In the case of quadratic objective functions, this set is contained in a hyper-ellipsoid, a subset of R^M (M is the number of unknowns), whose size depends on what we mean with "nearly as well", that is on the noise level. The direction of the largest axis of this ellipsoid (
eigenvector associated with the smallest eigenvalue of matrix F^T F) is the direction of poorly determined components: if we follow this direction, we can bring a strong perturbation to the model without changing significantly the value of the objective function and thus end up with a significantly different quasi-optimal model. We clearly see that the answer to the question "can we trust this model" is governed by the noise level and by the eigenvalues of the
Hessian of the objective function or equivalently, in the case where no regularization has been integrated, by the
singular values of matrix F. Of course, the use of regularization (or other kinds of prior information) reduces the size of the set of almost optimal solutions and, in turn, increases the confidence we can put in the computed solution.
Stability, regularization and model discretization in infinite dimension We focus here on the recovery of a distributed parameter. When looking for distributed parameters we have to discretize these unknown functions. Doing so, we reduce the dimension of the problem to something finite. But now, the question is: is there any link between the solution we compute and that of the initial problem? Then another question: what do we mean with the solution of the initial problem? Since a finite number of data does not allow the determination of an infinity of unknowns, the original data misfit functional has to be regularized to ensure the uniqueness of the solution. Many times, reducing the unknowns to a finite-dimensional space will provide an adequate regularization: the computed solution will look like a discrete version of the solution we were looking for. For example, a naïve discretization will often work for solving the
deconvolution problem: it will work as long as we do not allow missing frequencies to show up in the numerical solution. But many times, regularization has to be integrated explicitly in the objective function. In order to understand what may happen, we have to keep in mind that solving such a linear inverse problem amount to solving a Fredholm integral equation of the first kind: d(x) = \int_\Omega K(x,y) p(y) dy where K is the kernel, x and y are vectors of R^2, and \Omega is a domain in R^2. This holds for a 2D application. For a 3D application, we consider x,y \in R^3. Note that here the model parameters p consist of a function and that the response of a model also consists of a function denoted by d(x). This equation is an extension to infinite dimension of the matrix equation d=Fp given in the case of discrete problems. For sufficiently smooth K the operator defined above is
compact on reasonable
Banach spaces such as the
L^2.
F. Riesz theory states that the set of singular values of such an operator contains zero (hence the existence of a null-space), is finite or at most countable, and, in the latter case, they constitute a sequence that goes to zero. In the case of a symmetric kernel, we have an infinity of eigenvalues and the associated eigenvectors constitute a hilbertian basis of L^2. Thus any solution of this equation is determined up to an additive function in the null-space and, in the case of infinity of singular values, the solution (which involves the reciprocal of arbitrary small eigenvalues) is unstable: two ingredients that make the solution of this integral equation a typical ill-posed problem! However, we can define a solution through the
pseudo-inverse of the forward map (again up to an arbitrary additive function). When the forward map is compact, the classical
Tikhonov regularization will work if we use it for integrating prior information stating that the L^2 norm of the solution should be as small as possible: this will make the inverse problem well-posed. Yet, as in the finite dimension case, we have to question the confidence we can put in the computed solution. Again, basically, the information lies in the eigenvalues of the Hessian operator. Should subspaces containing eigenvectors associated with small eigenvalues be explored for computing the solution, then the solution can hardly be trusted: some of its components will be poorly determined. The smallest eigenvalue is equal to the weight introduced in Tikhonov regularization. Irregular kernels may yield a forward map which is not compact and even
unbounded if we naively equip the space of models with the L^2 norm. In such cases, the Hessian is not a bounded operator and the notion of eigenvalue does not make sense any longer. A mathematical analysis is required to make it a
bounded operator and design a well-posed problem: an illustration can be found in. Again, we have to question the confidence we can put in the computed solution and we have to generalize the notion of eigenvalue to get the answer. Analysis of the spectrum of the Hessian operator is thus a key element to determine how reliable the computed solution is. However, such an analysis is usually a very heavy task. This has led several authors to investigate alternative approaches in the case where we are not interested in all the components of the unknown function but only in sub-unknowns that are the images of the unknown function by a linear operator. These approaches are referred to as the " Backus and Gilbert method",
Lions's sentinels approach, and the SOLA method: these approaches turned out to be strongly related with one another as explained in Chavent Finally, the concept of
limited resolution, often invoked by physicists, is nothing but a specific view of the fact that some poorly determined components may corrupt the solution. But, generally speaking, these poorly determined components of the model are not necessarily associated with high frequencies.
Some classical linear inverse problems for the recovery of distributed parameters The problems mentioned below correspond to different versions of the Fredholm integral: each of these is associated with a specific kernel K.
Deconvolution The goal of
deconvolution is to reconstruct the original image or signal p(x) which appears as noisy and blurred on the data d(x). From a mathematical point of view, the kernel K(x,y) here only depends on the difference between x and y.
Tomographic methods In these methods we attempt at recovering a distributed parameter, the observation consisting in the measurement of the integrals of this parameter carried out along a family of lines. We denote by \Gamma_x the line in this family associated with measurement point x. The observation at x can thus be written as: d(x) = \int_{\Gamma_x} w(x,y) p(y) \, dy where w(x,y) a known weighting function. Comparing this equation with the Fredholm integral above, we notice that the kernel K(x,y) is kind of a
delta function that peaks on line {\Gamma_x}. With such a kernel, the forward map is not compact.
Computed tomography In
X-ray computed tomography the lines on which the parameter is integrated are straight lines: the
tomographic reconstruction of the parameter distribution is based on the inversion of the
Radon transform. Although from a theoretical point of view many linear inverse problems are well understood, problems involving the Radon transform and its generalisations still present many practical challenges, such as with of sufficiency of data. Such problems include incomplete data for the x-ray transform in three dimensions and problems involving the generalisation of the x-ray transform to tensor fields. Solutions explored include
Algebraic Reconstruction Technique,
filtered backprojection, and as computing power has increased,
iterative reconstruction methods such as
iterative Sparse Asymptotic Minimum Variance.
Diffraction tomography Diffraction tomography is a classical linear inverse problem in exploration seismology: the amplitude recorded at one time for a given source-receiver pair is the sum of contributions arising from points such that the sum of the distances, measured in traveltimes, from the source and the receiver, respectively, is equal to the corresponding recording time. In 3D the parameter is not integrated along lines but over surfaces. Should the propagation velocity be constant, such points are distributed on an ellipsoid. The inverse problems consists in retrieving the distribution of diffracting points from the seismograms recorded along the survey, the velocity distribution being known. A direct solution has been originally proposed by Beylkin and Lambaré et al.: these works were the starting points of approaches known as amplitude preserved migration (see Beylkin and Bleistein). Should geometrical optics techniques (i.e. rays) be used for the solving the wave equation, these methods turn out to be closely related to the so-called least-squares migration methods derived from the least-squares approach (see Lailly, Tarantola).
Doppler tomography (astrophysics) If we consider a rotating stellar object, the spectral lines we can observe on a spectral profile will be shifted due to Doppler effect. Doppler tomography aims at converting the information contained in spectral monitoring of the object into a 2D image of the emission (as a function of the radial velocity and of the phase in the periodic rotation movement) of the stellar atmosphere. As explained by
Tom Marsh this linear inverse problem is tomography like: we have to recover a distributed parameter which has been integrated along lines to produce its effects in the recordings.
Inverse heat conduction Early publications on inverse heat conduction arose from determining surface heat flux during atmospheric re-entry from buried temperature sensors. Other applications where surface heat flux is needed but surface sensors are not practical include: inside reciprocating engines, inside rocket engines; and, testing of nuclear reactor components. A variety of numerical techniques have been developed to address the ill-posedness and sensitivity to measurement error caused by damping and lagging in the temperature signal.{{cite journal | author1 = Beck, J. V. | author2 = Blackwell, B. |author3 = Haji-Sheikh, B. | title = Comparison of some inverse heat conduction methods using experimental data | journal = International Journal of Heat and Mass Transfer | volume = 39 | issue = 17 | year = 1996 | pages = 3649–3657 | doi = 10.1016/0017-9310(96)00034-8 ==Non-linear inverse problems==