The KLT
feature tracker is based on two papers: In the first paper, Lucas and Kanade developed the idea of a local search using gradients weighted by an approximation to the
second derivative of the image.
One-dimensional case If h is the displacement between two images F(x) and G(x) = F(x+h) then the approximation is made that F'(x) \approx \dfrac{F(x+h)-F(x)}{h}=\dfrac{G(x)-F(x)}{h}\, so that h \approx \dfrac{G(x)-F(x)}{F'(x)}\, This approximation to the gradient of the image is only accurate if the displacement of the local area between the two images to be registered is not too large. The approximation to h depends on x. For combining the various estimates of h at various values of x, it is natural to average them: h \approx \dfrac{\sum_{x}\dfrac{G(x)-F(x)}{F'(x)}}{\sum_{x}1}. The average can be further improved by weighting the contribution of each term to it, which is inversely proportional to an estimate of \left \vert F''(x) \right \vert , where F''(x) \approx \dfrac{G'(x)-F'(x)}{h}. For the purpose of facilitating the expression, a
weighting function is defined: w(x) = \dfrac{1}{\left \vert G'(x)-F'(x) \right \vert}. The average with weighting is thereby: h = \dfrac{\sum_{x}\dfrac{w(x)\left [ G(x)-F(x) \right ]}{F'(x)}}{\sum_{x}w(x)}. Upon obtaining the estimate F(x) can be moved by the estimate of h. The procedure is applied repeatedly, yielding a type of
Newton–Raphson iteration. The sequence of estimates will ideally converge to the best h. The iteration can be expressed by \begin{cases} h_{0} = 0 \\ h_{k+1} = h_{k} + \dfrac{\sum_{x}\dfrac{w(x)\left [ G(x)-F(x+h_{k})\right ]}{F'(x+h_{k})}}{\sum_{x}w(x)} \end{cases}
An alternative derivation The derivation above cannot be generalized well to two dimensions for the 2-D
linear approximation occurs differently. This can be corrected by applying the linear approximation in the form: F(x+h) \approx F(x)+hF'(x), to find the h which minimizes the L2 norm measure of the difference (or error) between the curves, where the error can be expressed as: E=\sum_{x}\left [F(x+h)-G(x)\right ]^{2}. To minimize the error with respect to h, partially differentiate E and set it to zero: \begin{align} 0 = \dfrac{\partial E}{\partial h} & \approx \dfrac{\partial}{\partial h}\sum_{x}\left [F(x)+hF'(x)-G(x)\right ]^{2} \\ & = \sum_{x}2F'(x)\left [F(x)+hF'(x)-G(x)\right ], \end{align} \Rightarrow h \approx \dfrac{\sum_{x} F'(x)[G(x)-F(x)]}{\sum_{x} F'(x)^{2}}\, This is basically the same as the 1-D case, except for the fact that the weighting function w(x)=F'(x)^2. And the iteration form with weighting can be expressed as: \begin{cases} h_0 = 0 \\ h_{k+1}=h_k + \dfrac{\sum_x w(x)F'(x+h_k) \left [G(x)-F(x+h_k)\right ]}{\sum_x w(x)F'(x+h_k)^2} \end{cases}
Performance To evaluate the
performance of the algorithm, we are naturally curious about under what conditions and how fast the sequence of h_k's converges to the real h. Consider the case: \begin{align} F(x)&=\sin x, \\ G(x)&=F(x+h)=\sin (x+h). \end{align} Both versions of the registration algorithm will converge to the correct h for \left\vert h\right\vert , i.e. for initial misregistrations as large as one-half wavelength. The range of convergence can be improved by suppressing high spatial frequencies in the image, which could be achieved by
smoothing the image, that will also undesirably suppress small details of it. If the window of smoothing is much larger than the size of the object being matched, the object may be suppressed entirely, so that a match would be no longer possible. Since lowpass-filtered images can be sampled at lower
resolution with no loss of information, a coarse-to-fine strategy is adopted. A low-resolution smoothed version of the image can be used to obtain an approximate match. Applying the algorithm to higher resolution images will refine the match obtained at lower resolution. As smoothing extends the range of convergence, the weighting function improves the accuracy of approximation, speeding up the convergence. Without weighting, the calculated displacement h_1 of the first iteration with F(x) = \sin x falls off to zero as the displacement approaches one-half wavelength.
Implementation The implementation requires the calculation of the weighted sums of the quantities F'G, F'F, and (F')^2 over the region of interest R. Although F'(x) cannot be calculated exactly, it can be estimated by: F'(x) \approx \dfrac{F(x+\Delta x)-F(x)}{\Delta x}, where \Delta x is chosen appropriately small. Some sophisticated technique can be used for estimating the first derivatives, but in general such techniques are equivalent to first smoothing the function, and then taking the difference.
Generalization to multiple dimensions The registration algorithm for 1-D and 2-D can be generalized to more dimensions. To do so, we try to minimize the L2 norm measure of error: E=\sum_{\mathbf{x}\in R}\left [F(\mathbf{x}+\mathbf{h})-G(\mathbf{x})\right ]^{2}, where \mathbf{x} and \mathbf{h} are n-dimensional row vectors. A linear approximation analogous: F(\mathbf{x}+\mathbf{h}) \approx F(\mathbf{x})+\mathbf{h}\left(\dfrac{\partial}{\partial \mathbf{x}}F(\mathbf{x})\right)^{T}. And partially differentiate E with respect to \mathbf{h}: \begin{align} 0 = \dfrac{\partial E}{\partial \mathbf{h}} & \approx \dfrac{\partial}{\partial \mathbf{h}}\sum_{\mathbf{x}}\left [F(\mathbf{x})+\mathbf{h}\left(\dfrac{\partial F}{\partial \mathbf{x}}\right)^{T}-G(\mathbf{x})\right ]^{2} \\ & = \sum_{\mathbf{x}}2\left [F(\mathbf{x})+\mathbf{h}\left(\dfrac{\partial F}{\partial \mathbf{x}}\right)^{T}-G(\mathbf{x})\right ] \left(\dfrac{\partial F}{\partial \mathbf{x}}\right), \end{align} \Rightarrow \mathbf{h} \approx \left [\sum_{\mathbf{x}}\left [G(\mathbf{x})-F(\mathbf{x})\right ]\left (\dfrac{\partial F}{\partial\mathbf{x}}\right )\right ] \left [\sum_{\mathbf{x}}\left (\dfrac{\partial F}{\partial\mathbf{x}}\right )^{T}\left (\dfrac{\partial F}{\partial\mathbf{x}}\right )\right ]^{-1}, which has much the same form as the 1-D version.
Further generalizations The method can also be extended to take into account registration based on more complex transformations, such as rotation, scaling, and shearing, by considering G(x) = F(Ax+h), where A is a linear spatial transform. The error to be minimized is then E = \sum_{x}\left [F(Ax+h)-G(x)\right ]^2. To determine the amount \Delta A to adjust A and \Delta h to adjust h, again, use the linear approximation: F(x(A+\Delta A)+(h+\Delta h)) \approx F(Ax+h)+(\Delta Ax+\Delta h)\dfrac{\partial}{\partial x}F(x). The approximation can be used similarly to find the error expression, which becomes quadratic in the quantities to be minimized with respect to. After figuring out the error expression, differentiate it with respect to the quantities to be minimized and set the results zero, yielding a set of linear equations, then solve them. A further generalization is designed for accounting for the fact that the brightness may be different in the two views, due to the difference of the viewpoints of the cameras or to differences in the processing of the two images. Assume the difference as linear transformation: F(x)=\alpha G(x)+\beta, where \alpha represents a contrast adjustment and \beta represents a brightness adjustment. Combining this expression with the general linear transformation registration problem: E=\sum_{x}\left [F(Ax+h)-(\alpha G(x)+\beta)\right ]^2 as the quantity to minimize with respect to \alpha, \beta, A, and h. ==Detection and tracking of point features==