Many filtering and control methods represent estimates of the state of a system in the form of a mean vector and an associated error covariance matrix. As an example, the estimated 2-dimensional position of an object of interest might be represented by a mean position vector, [x, y], with an uncertainty given in the form of a 2x2 covariance matrix giving the variance in x, the variance in y, and the
cross covariance between the two. A variance that is zero implies that there is no uncertainty or error and that the position of the object is exactly what is specified by the mean vector. The mean and covariance representation only gives the first two moments of an underlying, but otherwise unknown, probability distribution. In the case of a moving object, the unknown probability distribution might represent the uncertainty of the object's position at a given time. The mean and covariance representation of uncertainty is mathematically convenient because any linear transformation T can be applied to a mean vector m and covariance matrix M as Tm and TMT^\mathrm{T}. This linearity property does not hold for moments beyond the first raw moment (the mean) and the second
central moment (the covariance), so it is not generally possible to determine the mean and covariance resulting from a nonlinear transformation because the result depends on all the moments, and only the first two are given. Although the covariance matrix is often treated as being the expected squared error associated with the mean, in practice the matrix is maintained as an upper bound on the actual squared error. Specifically, a mean and covariance estimate (m,M) is conservatively maintained so that the covariance matrix M is greater than or equal to the actual squared error associated with m. Mathematically this means that the result of subtracting the expected squared error (which is not usually known) from M is a semi-definite or
positive-definite matrix. The reason for maintaining a conservative covariance estimate is that most filtering and control algorithms will tend to diverge (fail) if the covariance is underestimated. This is because a spuriously small covariance implies less uncertainty and leads the filter to place more weight (confidence) than is justified in the accuracy of the mean. Returning to the example above, when the covariance is zero it is trivial to determine the location of the object after it moves according to an arbitrary nonlinear function f(x,y): just apply the function to the mean vector. When the covariance is not zero the transformed mean will
not generally be equal to f(x,y) and it is not even possible to determine the mean of the transformed probability distribution from only its prior mean and covariance. Given this indeterminacy, the nonlinearly transformed mean and covariance can only be approximated. The earliest approximation was to linearize the nonlinear function and apply the resulting
Jacobian matrix to the given mean and covariance. This is the basis of the
extended Kalman Filter (EKF), and although it was known to yield poor results in many circumstances, there was no practical alternative for many decades. ==Motivation==