Harris–Laplace detector (initial region points) The Harris affine detector relies heavily on both the Harris measure and a Gaussian
scale space representation. Therefore, a brief examination of both follow. For a more exhaustive derivations see
corner detection and
Gaussian scale space or their associated papers.
Harris corner measure The Harris corner detector algorithm relies on a central principle: at a corner, the image intensity will change largely in multiple directions. This can alternatively be formulated by examining the changes of intensity due to shifts in a local window. Around a corner point, the image intensity will change greatly when the window is shifted in an arbitrary direction. Following this intuition and through a clever decomposition, the Harris detector uses the
second moment matrix as the basis of its corner decisions. (See
corner detection for a more complete derivation). The matrix A, has also been called the autocorrelation matrix and has values closely related to the
derivatives of image intensity. : A(\mathbf{x}) = \sum_{p,q} w(p,q) \begin{bmatrix} I_x^2(p,q) & I_x I_y(p,q) \\ I_x I_y(p,q) & I_y^2(p,q)\\ \end{bmatrix} where I_{x} and I_{y} are the respective
derivatives (of pixel intensity) in the x and y direction at point (p,q); p and q are the position parameters of the weighting function w. The off-diagonal entries are the product of I_{x} and I_{y}, while the diagonal entries are squares of the respective
derivatives. The weighting function w(x,y) can be uniform, but is more typically an isotropic, circular Gaussian, : w(x,y) = g(x,y,\sigma) = \frac{1}{2\pi \sigma ^2} e^{ \left (-\frac{ x^2 + y^2}{2\sigma ^2} \right )} that acts to average in a local region while weighting those values near the center more heavily. As it turns out, this A matrix describes the shape of the autocorrelation measure as due to shifts in window location. Thus, if we let \lambda_1 and \lambda_2 be the eigenvalues of A, then these values will provide a quantitative description of how the autocorrelation measure changes in space: its principal curvatures. As Harris and Stephens (1988) point out, the A matrix centered on corner points will have two large, positive eigenvalues. A derivative of order m, D_{i_1, ... i_m}, must be normalized by a factor s^m in the following manner: : D_{i_1, \dots, i_m}(\mathbf{x},s) = s^m L_{i_1, \dots, i_m}(\mathbf{x},s) These derivatives, or any arbitrary measure, can be adapted to a scale space representation by calculating this measure using a set of scales recursively where the nth scale is s_n = k^n s_0 . See
scale space for a more complete description.
Combining Harris detector across Gaussian scale-space The
Harris–Laplace detector combines the traditional 2D Harris corner detector with the idea of a Gaussian
scale space representation in order to create a scale-invariant detector. Harris-corner points are good starting points because they have been shown to have good rotational and illumination invariance in addition to identifying the interesting points of the image. However, the points are not scale invariant and thus the second-moment matrix must be modified to reflect a scale-invariant property. Let us denote, M = \mu(\mathbf{x}, \sigma_{\mathit{I}}, \sigma_{\mathit{D}}) as the scale adapted second-moment matrix used in the Harris–Laplace detector. : M = \mu(\mathbf{x}, \sigma_{\mathit{I}}, \sigma_{\mathit{D}}) = \sigma_D^2 g(\sigma_I) \otimes \begin{bmatrix} L_{x}^2(\mathbf{x}, \sigma_{D}) & L_{x}L_{y}(\mathbf{x}, \sigma_{D}) \\ L_{x}L_{y}(\mathbf{x}, \sigma_{D}) & L_{y}^2(\mathbf{x}, \sigma_{D}) \end{bmatrix} where g(\sigma_I) is the Gaussian kernel of scale \sigma_I and \mathbf{x} = (x,y). Similar to the Gaussian-scale space, L(\mathbf{x}) is the Gaussian-smoothed image. The \mathbf{\otimes} operator denotes convolution. L_{x}(\mathbf{x},\sigma_{D}) and L_{y}(\mathbf{x}, \sigma_{D}) are the derivatives in their respective direction applied to the smoothed image and calculated using a Gaussian kernel with scale \sigma_D. In terms of our Gaussian scale-space framework, the \sigma_I parameter determines the current scale at which the Harris corner points are detected. Building upon this scale-adapted second-moment matrix, the
Harris–Laplace detector is a twofold process: applying the Harris corner detector at multiple scales and automatically choosing the
characteristic scale.
Multi-scale Harris corner points The algorithm searches over a fixed number of predefined scales. This set of scales is defined as: : {\sigma_1 \dots \sigma_n} = {k^{1}\sigma_0 \dots k^{n}\sigma_0} Mikolajczyk and Schmid (2004) use k = 1.4. For each integration scale, \sigma_I, chosen from this set, the appropriate differentiation scale is chosen to be a constant factor of the integration scale: \sigma_D = s\sigma_I. Mikolajczyk and Schmid (2004) used s = 0.7. The \sigma_I^2 factor (as discussed above in Gaussian scale-space) is used to normalize the LoG across scales and make these measures comparable, thus making a maximum relevant. Mikolajczyk and Schmid (2001) demonstrate that the LoG measure attains the highest percentage of correctly detected corner points in comparison to other scale-selection measures. -->In fact, the eigenvectors and eigenvalues of the covariance matrix define the rotation and size of the ellipsoid. Thus we can easily see that this representation allows us to completely define an arbitrary elliptical affine region over which we want to integrate or differentiate. The goal of the affine invariant detector is to identify regions in images that are related through affine transformations. We thus consider a point \mathbf{x}_L and the transformed point \mathbf{x}_R = A\mathbf{x}_L, where A is an affine transformation. In the case of images, both \mathbf{x}_R and \mathbf{x}_L live in R^2 space. The second-moment matrices are related in the following manner: : \begin{align} \mu(\mathbf{x}_L,\Sigma_{I,L}, \Sigma_{D,L}) & {} = A^T \mu (\mathbf{x}_R, \Sigma_{I,R}, \Sigma_{D,R}) A \\ M_L & {} = \mu(\mathbf{x}_L,\Sigma_{I,L}, \Sigma_{D,L}) \\ M_R & {} = \mu (\mathbf{x}_R, \Sigma_{I,R}, \Sigma_{D,R}) \\ M_L & {} = A^T M_R A \\ \Sigma_{I,R} & {} = A \Sigma_{I,L} A^T\text{ and }\Sigma_{D,R} = A \Sigma_{D,L} A^T \end{align} where \Sigma_{I,b} and \Sigma_{D,b} are the covariance matrices for the b reference frame. If we continue with this formulation and enforce that : \begin{align} \Sigma_{I,L} = \sigma_I M_L^{-1} \\ \Sigma_{D,L} = \sigma_D M_L^{-1} \end{align} where \sigma_I and \sigma_D are scalar factors, one can show that the covariance matrices for the related point are similarly related: : \begin{align} \Sigma_{I,R} = \sigma_I M_R^{-1} \\ \Sigma_{D,R} = \sigma_D M_R^{-1} \end{align} By requiring the covariance matrices to satisfy these conditions, several nice properties arise. One of these properties is that the square root of the second-moment matrix, M^{\tfrac{1}{2}} will transform the original anisotropic region into isotropic regions that are related simply through a pure rotation matrix R. These new isotropic regions can be thought of as a normalized reference frame. The following equations formulate the relation between the normalized points x_R^' and x_L^': : \begin{align} A = M_R^{-\tfrac{1}{2}} R M_L^{\tfrac{1}{2}} \\ x_R^' = M_R^{\tfrac{1}{2}}x_R \\ x_L^' = M_L^{\tfrac{1}{2}}x_L \\ x_L^' = R x_R^'\\ \end{align} The
rotation matrix can be recovered using gradient methods likes those in the
SIFT descriptor. As discussed with the Harris detector, the
eigenvalues and eigenvectors of the second-moment matrix, M = \mu(\mathbf{x}, \Sigma_I, \Sigma_D) characterize the curvature and shape of the pixel intensities. That is, the eigenvector associated with the largest eigenvalue indicates the direction of largest change and the eigenvector associated with the smallest eigenvalue defines the direction of least change. In the 2D case, the eigenvectors and eigenvalues define an ellipse. For an isotropic region, the region should be circular in shape and not elliptical. This is the case when the eigenvalues have the same magnitude. Thus a measure of the isotropy around a local region is defined as the following: : \mathcal{Q} = \frac{\lambda_{\min}(M)}{\lambda_{\max}(M)} where \lambda denote eigenvalues. This measure has the range [0 \dots 1]. A value of 1 corresponds to perfect isotropy.
Iterative algorithm Using this mathematical framework, the Harris affine detector algorithm iteratively discovers the second-moment matrix that transforms the anisotropic region into a normalized region in which the isotropic measure is sufficiently close to one. The algorithm uses this
shape adaptation matrix, U, to transform the image into a normalized reference frame. In this normalized space, the interest points' parameters (spatial location, integration scale and differentiation scale) are refined using methods similar to the Harris–Laplace detector. The second-moment matrix is computed in this normalized reference frame and should have an isotropic measure close to one at the final iteration. At every kth iteration, each interest region is defined by several parameters that the algorithm must discover: the U^{(k)} matrix, position \mathbf{x}^{(k)}, integration scale \sigma_I^{(k)} and differentiation scale \sigma_D^{(k)}. Because the detector computes the second-moment matrix in the transformed domain, it's convenient to denote this transformed position as \mathbf{x}_w^{(k)} where U^{(k)}\mathbf{x}_w^{(k)} = \mathbf{x^{(k)}}. {{ordered list :U^{(0)} = \mathit{identity} and \mathbf{x}^{(0)}, \sigma_D^{(0)}, and \sigma_I^{(0)} are those from the Harris–Laplace detector. : \sigma_I^{(k)} = \underset{{\sigma_I = t\sigma_I^{(k-1)}\atop t \in [0.7, \dots, 1.4]}}{\operatorname{argmax}} \, \sigma_I^2 \det(L_{xx}(\mathbf{x}, \sigma_I) + L_{yy}(\mathbf{x},\sigma_I)) It's important to note that the integration scale in the U-normalized space differs significantly than the non-normalized space. Therefore, it is necessary to search for the integration scale as opposed to using the scale in the non-normalized space. : \sigma_D^{(k)} = \underset{\sigma_D = s\sigma_I^{(k)},\; s \in [0.5, \dots, 0.75]}{\operatorname{argmax}} \, \frac{\lambda_{\min}(\mu(\mathbf{x}_w^{(k)}, \sigma_I^{k}, \sigma_D))}{\lambda_{\max}(\mu(\mathbf{x}_w^{(k)}, \sigma_I^{k}, \sigma_D))} where \mu(\mathbf{x}_w^{(k)}, \sigma_I^{k}, \sigma_D) is the second-moment matrix evaluated in the normalized reference frame. This maximization processes causes the eigenvalues to converge to the same value. : \mathbf{x}_w^{(k)} = \underset{\mathbf{x}_w \in W(\mathbf{x}_w^{(k-1)})}{\operatorname{argmax}} \, \det(\mu(\mathbf{x}_w, \sigma_I^{k}, \sigma_D^{(k)})) - \alpha \operatorname{trace}^2(\mu(\mathbf{x}_w, \sigma_I^{k}, \sigma_D^{(k)})) where \mu is the second-moment matrix as defined above. The window W(\mathbf{x}_w^{(k-1)}) is the set of 8-nearest neighbors of the previous iteration's point in the normalized reference frame. Because our spatial localization was done in the U-normalized reference frame, the newly chosen point must be transformed back to the original reference frame. This is achieved by transforming a displacement vector and adding this to the previous point: : \mathbf{x}^{(k)} = \mathbf{x}^{(k-1)} + U^{(k-1)}\cdot (\mathbf{x}_w^{(k)} - \mathbf{x}_w^{(k-1)}) : U = \prod_{k} \mu_i^{(k)} \cdot U^{(0)} = \prod_{k} (\mu^{-\tfrac{1}{2}})^{(k)} \cdot U^{(0)} : 1 - \frac{\lambda_{\min}(\mu_i^{(k)})}{\lambda_{\max}(\mu_i^{(k)})} Mikolajczyk and Schmid (2004) had good success with \epsilon_C = 0.05 . }} == Computation and implementation ==