Harris affine region detector

In the fields of computer vision and image analysis, the Harris affine region detector belongs to the category of feature detection. Feature detection is a preprocessing step of several algorithms that rely on identifying characteristic points or interest points so to make correspondences between images, recognize textures, categorize objects or build panoramas.

Overview

The Harris affine detector can identify similar regions between images that are related through affine transformations and have different illuminations. These affine-invariant detectors should be capable of identifying similar regions in images taken from different viewpoints that are related by a simple geometric transformation: scaling, rotation and shearing. These detected regions have been called both invariant and covariant. On one hand, the regions are detected invariant of the image transformation but the regions covariantly change with image transformation. Earlier works in this direction include use of affine shape adaptation by Lindeberg and Garding for computing affine invariant image descriptors and in this way reducing the influence of perspective image deformations, the use affine adapted feature points for wide baseline matching by Baumberg and the first use of scale invariant feature points by Lindeberg; for an overview of the theoretical background. The Harris affine detector relies on the combination of corner points detected through Harris corner detection, multi-scale analysis through Gaussian scale space and affine normalization using an iterative affine shape adaptation algorithm. The recursive and iterative algorithm follows an iterative approach to detecting these regions: • Identify initial region points using scale-invariant Harris–Laplace detector. • For each initial point, normalize the region to be affine invariant using affine shape adaptation. • Iteratively estimate the affine region: selection of proper integration scale, differentiation scale and spatially localize interest points. • Update the affine region using these scales and spatial localizations. • Repeat step 3 if the stopping criterion is not met. == Algorithm description ==

Algorithm description

Harris–Laplace detector (initial region points) The Harris affine detector relies heavily on both the Harris measure and a Gaussian scale space representation. Therefore, a brief examination of both follow. For a more exhaustive derivations see corner detection and Gaussian scale space or their associated papers. Harris corner measure The Harris corner detector algorithm relies on a central principle: at a corner, the image intensity will change largely in multiple directions. This can alternatively be formulated by examining the changes of intensity due to shifts in a local window. Around a corner point, the image intensity will change greatly when the window is shifted in an arbitrary direction. Following this intuition and through a clever decomposition, the Harris detector uses the second moment matrix as the basis of its corner decisions. (See corner detection for a more complete derivation). The matrix A, has also been called the autocorrelation matrix and has values closely related to the derivatives of image intensity. : A(\mathbf{x}) = \sum_{p,q} w(p,q) \begin{bmatrix} I_x^2(p,q) & I_x I_y(p,q) \\ I_x I_y(p,q) & I_y^2(p,q)\\ \end{bmatrix} where I_{x} and I_{y} are the respective derivatives (of pixel intensity) in the x and y direction at point (p,q); p and q are the position parameters of the weighting function w. The off-diagonal entries are the product of I_{x} and I_{y}, while the diagonal entries are squares of the respective derivatives. The weighting function w(x,y) can be uniform, but is more typically an isotropic, circular Gaussian, : w(x,y) = g(x,y,\sigma) = \frac{1}{2\pi \sigma ^2} e^{ \left (-\frac{ x^2 + y^2}{2\sigma ^2} \right )} that acts to average in a local region while weighting those values near the center more heavily. As it turns out, this A matrix describes the shape of the autocorrelation measure as due to shifts in window location. Thus, if we let \lambda_1 and \lambda_2 be the eigenvalues of A, then these values will provide a quantitative description of how the autocorrelation measure changes in space: its principal curvatures. As Harris and Stephens (1988) point out, the A matrix centered on corner points will have two large, positive eigenvalues. A derivative of order m, D_{i_1, ... i_m}, must be normalized by a factor s^m in the following manner: : D_{i_1, \dots, i_m}(\mathbf{x},s) = s^m L_{i_1, \dots, i_m}(\mathbf{x},s) These derivatives, or any arbitrary measure, can be adapted to a scale space representation by calculating this measure using a set of scales recursively where the nth scale is s_n = k^n s_0 . See scale space for a more complete description. Combining Harris detector across Gaussian scale-space The Harris–Laplace detector combines the traditional 2D Harris corner detector with the idea of a Gaussian scale space representation in order to create a scale-invariant detector. Harris-corner points are good starting points because they have been shown to have good rotational and illumination invariance in addition to identifying the interesting points of the image. However, the points are not scale invariant and thus the second-moment matrix must be modified to reflect a scale-invariant property. Let us denote, M = \mu(\mathbf{x}, \sigma_{\mathit{I}}, \sigma_{\mathit{D}}) as the scale adapted second-moment matrix used in the Harris–Laplace detector. : M = \mu(\mathbf{x}, \sigma_{\mathit{I}}, \sigma_{\mathit{D}}) = \sigma_D^2 g(\sigma_I) \otimes \begin{bmatrix} L_{x}^2(\mathbf{x}, \sigma_{D}) & L_{x}L_{y}(\mathbf{x}, \sigma_{D}) \\ L_{x}L_{y}(\mathbf{x}, \sigma_{D}) & L_{y}^2(\mathbf{x}, \sigma_{D}) \end{bmatrix} where g(\sigma_I) is the Gaussian kernel of scale \sigma_I and \mathbf{x} = (x,y). Similar to the Gaussian-scale space, L(\mathbf{x}) is the Gaussian-smoothed image. The \mathbf{\otimes} operator denotes convolution. L_{x}(\mathbf{x},\sigma_{D}) and L_{y}(\mathbf{x}, \sigma_{D}) are the derivatives in their respective direction applied to the smoothed image and calculated using a Gaussian kernel with scale \sigma_D. In terms of our Gaussian scale-space framework, the \sigma_I parameter determines the current scale at which the Harris corner points are detected. Building upon this scale-adapted second-moment matrix, the Harris–Laplace detector is a twofold process: applying the Harris corner detector at multiple scales and automatically choosing the characteristic scale. Multi-scale Harris corner points The algorithm searches over a fixed number of predefined scales. This set of scales is defined as: : {\sigma_1 \dots \sigma_n} = {k^{1}\sigma_0 \dots k^{n}\sigma_0} Mikolajczyk and Schmid (2004) use k = 1.4. For each integration scale, \sigma_I, chosen from this set, the appropriate differentiation scale is chosen to be a constant factor of the integration scale: \sigma_D = s\sigma_I. Mikolajczyk and Schmid (2004) used s = 0.7. The \sigma_I^2 factor (as discussed above in Gaussian scale-space) is used to normalize the LoG across scales and make these measures comparable, thus making a maximum relevant. Mikolajczyk and Schmid (2001) demonstrate that the LoG measure attains the highest percentage of correctly detected corner points in comparison to other scale-selection measures. -->In fact, the eigenvectors and eigenvalues of the covariance matrix define the rotation and size of the ellipsoid. Thus we can easily see that this representation allows us to completely define an arbitrary elliptical affine region over which we want to integrate or differentiate. The goal of the affine invariant detector is to identify regions in images that are related through affine transformations. We thus consider a point \mathbf{x}_L and the transformed point \mathbf{x}_R = A\mathbf{x}_L, where A is an affine transformation. In the case of images, both \mathbf{x}_R and \mathbf{x}_L live in R^2 space. The second-moment matrices are related in the following manner: : \begin{align} \mu(\mathbf{x}_L,\Sigma_{I,L}, \Sigma_{D,L}) & {} = A^T \mu (\mathbf{x}_R, \Sigma_{I,R}, \Sigma_{D,R}) A \\ M_L & {} = \mu(\mathbf{x}_L,\Sigma_{I,L}, \Sigma_{D,L}) \\ M_R & {} = \mu (\mathbf{x}_R, \Sigma_{I,R}, \Sigma_{D,R}) \\ M_L & {} = A^T M_R A \\ \Sigma_{I,R} & {} = A \Sigma_{I,L} A^T\text{ and }\Sigma_{D,R} = A \Sigma_{D,L} A^T \end{align} where \Sigma_{I,b} and \Sigma_{D,b} are the covariance matrices for the b reference frame. If we continue with this formulation and enforce that : \begin{align} \Sigma_{I,L} = \sigma_I M_L^{-1} \\ \Sigma_{D,L} = \sigma_D M_L^{-1} \end{align} where \sigma_I and \sigma_D are scalar factors, one can show that the covariance matrices for the related point are similarly related: : \begin{align} \Sigma_{I,R} = \sigma_I M_R^{-1} \\ \Sigma_{D,R} = \sigma_D M_R^{-1} \end{align} By requiring the covariance matrices to satisfy these conditions, several nice properties arise. One of these properties is that the square root of the second-moment matrix, M^{\tfrac{1}{2}} will transform the original anisotropic region into isotropic regions that are related simply through a pure rotation matrix R. These new isotropic regions can be thought of as a normalized reference frame. The following equations formulate the relation between the normalized points x_R^' and x_L^': : \begin{align} A = M_R^{-\tfrac{1}{2}} R M_L^{\tfrac{1}{2}} \\ x_R^' = M_R^{\tfrac{1}{2}}x_R \\ x_L^' = M_L^{\tfrac{1}{2}}x_L \\ x_L^' = R x_R^'\\ \end{align} The rotation matrix can be recovered using gradient methods likes those in the SIFT descriptor. As discussed with the Harris detector, the eigenvalues and eigenvectors of the second-moment matrix, M = \mu(\mathbf{x}, \Sigma_I, \Sigma_D) characterize the curvature and shape of the pixel intensities. That is, the eigenvector associated with the largest eigenvalue indicates the direction of largest change and the eigenvector associated with the smallest eigenvalue defines the direction of least change. In the 2D case, the eigenvectors and eigenvalues define an ellipse. For an isotropic region, the region should be circular in shape and not elliptical. This is the case when the eigenvalues have the same magnitude. Thus a measure of the isotropy around a local region is defined as the following: : \mathcal{Q} = \frac{\lambda_{\min}(M)}{\lambda_{\max}(M)} where \lambda denote eigenvalues. This measure has the range [0 \dots 1]. A value of 1 corresponds to perfect isotropy. Iterative algorithm Using this mathematical framework, the Harris affine detector algorithm iteratively discovers the second-moment matrix that transforms the anisotropic region into a normalized region in which the isotropic measure is sufficiently close to one. The algorithm uses this shape adaptation matrix, U, to transform the image into a normalized reference frame. In this normalized space, the interest points' parameters (spatial location, integration scale and differentiation scale) are refined using methods similar to the Harris–Laplace detector. The second-moment matrix is computed in this normalized reference frame and should have an isotropic measure close to one at the final iteration. At every kth iteration, each interest region is defined by several parameters that the algorithm must discover: the U^{(k)} matrix, position \mathbf{x}^{(k)}, integration scale \sigma_I^{(k)} and differentiation scale \sigma_D^{(k)}. Because the detector computes the second-moment matrix in the transformed domain, it's convenient to denote this transformed position as \mathbf{x}_w^{(k)} where U^{(k)}\mathbf{x}_w^{(k)} = \mathbf{x^{(k)}}. {{ordered list :U^{(0)} = \mathit{identity} and \mathbf{x}^{(0)}, \sigma_D^{(0)}, and \sigma_I^{(0)} are those from the Harris–Laplace detector. : \sigma_I^{(k)} = \underset{{\sigma_I = t\sigma_I^{(k-1)}\atop t \in [0.7, \dots, 1.4]}}{\operatorname{argmax}} \, \sigma_I^2 \det(L_{xx}(\mathbf{x}, \sigma_I) + L_{yy}(\mathbf{x},\sigma_I)) It's important to note that the integration scale in the U-normalized space differs significantly than the non-normalized space. Therefore, it is necessary to search for the integration scale as opposed to using the scale in the non-normalized space. : \sigma_D^{(k)} = \underset{\sigma_D = s\sigma_I^{(k)},\; s \in [0.5, \dots, 0.75]}{\operatorname{argmax}} \, \frac{\lambda_{\min}(\mu(\mathbf{x}_w^{(k)}, \sigma_I^{k}, \sigma_D))}{\lambda_{\max}(\mu(\mathbf{x}_w^{(k)}, \sigma_I^{k}, \sigma_D))} where \mu(\mathbf{x}_w^{(k)}, \sigma_I^{k}, \sigma_D) is the second-moment matrix evaluated in the normalized reference frame. This maximization processes causes the eigenvalues to converge to the same value. : \mathbf{x}_w^{(k)} = \underset{\mathbf{x}_w \in W(\mathbf{x}_w^{(k-1)})}{\operatorname{argmax}} \, \det(\mu(\mathbf{x}_w, \sigma_I^{k}, \sigma_D^{(k)})) - \alpha \operatorname{trace}^2(\mu(\mathbf{x}_w, \sigma_I^{k}, \sigma_D^{(k)})) where \mu is the second-moment matrix as defined above. The window W(\mathbf{x}_w^{(k-1)}) is the set of 8-nearest neighbors of the previous iteration's point in the normalized reference frame. Because our spatial localization was done in the U-normalized reference frame, the newly chosen point must be transformed back to the original reference frame. This is achieved by transforming a displacement vector and adding this to the previous point: : \mathbf{x}^{(k)} = \mathbf{x}^{(k-1)} + U^{(k-1)}\cdot (\mathbf{x}_w^{(k)} - \mathbf{x}_w^{(k-1)}) : U = \prod_{k} \mu_i^{(k)} \cdot U^{(0)} = \prod_{k} (\mu^{-\tfrac{1}{2}})^{(k)} \cdot U^{(0)} : 1 - \frac{\lambda_{\min}(\mu_i^{(k)})}{\lambda_{\max}(\mu_i^{(k)})} Mikolajczyk and Schmid (2004) had good success with \epsilon_C = 0.05 . }} == Computation and implementation ==

Computation and implementation

The computational complexity of the Harris-affine detector is broken into two parts: initial point detection and affine region normalization. The initial point detection algorithm, Harris–Laplace, has complexity \mathcal{O}(n) where n is the number of pixels in the image. The affine region normalization algorithm automatically detects the scale and estimates the shape adaptation matrix, U. This process has complexity \mathcal{O}((m+k)p), where p is the number of initial points, m is the size of the search space for the automatic scale selection and k is the number of iterations required to compute the U matrix. Some methods exist to reduce the complexity of the algorithm at the expense of accuracy. One method is to eliminate the search in the differentiation scale step. Rather than choose a factor s from a set of factors, the sped-up algorithm chooses the scale to be constant across iterations and points: \sigma_D = s \sigma_I,\; s = constant. Although this reduction in search space might decrease the complexity, this change can severely effect the convergence of the U matrix. == Analysis ==

Analysis

Convergence One can imagine that this algorithm might identify duplicate interest points at multiple scales. Because the Harris affine algorithm looks at each initial point given by the Harris–Laplace detector independently, there is no discrimination between identical points. In practice, it has been shown that these points will ultimately all converge to the same interest point. After finishing identifying all interest points, the algorithm accounts for duplicates by comparing the spatial coordinates (\mathbf{x}), the integration scale \sigma_I, the isotropic measure \tfrac{\lambda_{\min}(U)}{\lambda_{\max}(U)} and skew. : R_\text{score} = \frac{C(A,B)}{\min(n_A, n_B)} where C(A,B) are the number of corresponding points in images A and B. n_B and n_A are the number of detected points in the respective images. Because each image represents 3D space, it might be the case that the one image contains objects that are not in the second image and thus whose interest points have no chance of corresponding. In order to make the repeatability measure valid, one remove these points and must only consider points that lie in both images; n_A and n_B only count those points such that x_A = H \cdot x_B . For a pair of two images related through a homography matrix H, two points, \mathbf{x_a} and \mathbf{x_b} are said to correspond if: {{ordered list : \epsilon_S = 1 - \frac{\mu_a \cap (H^T \mu_b H)}{\mu_a \cup (H^T \mu_b H)} where \mu_a and \mu_b are the recovered elliptical regions whose points satisfy: \mu^T \mathbf{x} \mu = 1 . Basically, this measure takes a ratio of areas: the area of overlap (intersection) and the total area (union). Perfect overlap would have a ratio of one and have an \epsilon_S = 0. Different scales effect the region of overlap and thus must be taken into account by normalizing the area of each region of interest. Regions with an overlap error as high as 50% are viable detectors to be matched with a good descriptor. IBR & EBR and salient detectors. Mikolajczyk et al. analyzed both structured images and textured images in their evaluation. Linux binaries of the detectors and their test images are freely available at their webpage. A brief summary of the results of Mikolajczyk et al. (2005) follow; see A comparison of affine region detectors for a more quantitative analysis. • Viewpoint Angle Change: The Harris affine detector has reasonable (average) robustness to these types of changes. The detector maintains a repeatability score of above 50% up until a viewpoint angle of above 40 degrees. The detector tends to detect a high number of repeatable and matchable regions even under a large viewpoint change. • Scale Change: The Harris affine detector remains very consistent under scale changes. Although the number of points declines considerably at large scale changes (above 2.8), the repeatability (50–60%) and matching scores (25–30%) remain very constant especially with textured images. This is consistent with the high-performance of the automatic scale selection iterative algorithm. • Blurred Images: The Harris affine detector remains very stable under image blurring. Because the detector does not rely on image segmentation or region boundaries, the repeatability and matching scores remain constant. • JPEG Artifacts: The Harris affine detector degrades similar to other affine detectors: repeatability and matching scores drop significantly above 80% compression. • Illumination Changes: The Harris affine detector, like other affine detectors, is very robust to illumination changes: repeatability and matching scores remain constant under decreasing light. This should be expected because the detectors rely heavily on relative intensities (derivatives) and not absolute intensities. General trends • Harris affine region points tend to be small and numerous. Both the Harris-affine detector and Hessian-affine consistently identify double the number repeatable points as other affine detectors: ~1000 regions for an 800x640 image. Small regions are less likely to be occluded but have a smaller chance of overlapping neighboring regions. • The Harris affine detector responds well to textured scenes in which there are a lot of corner-like parts. However, for some structured scenes, like buildings, the Harris-affine detector performs very well. This is complementary to MSER that tends to do better with well structured (segmentable) scenes. • Overall the Harris affine detector performs very well, but still behind MSER and Hessian-affine in all cases but blurred images. • Harris-affine and Hessian-affine detectors are less accurate than others: their repeatability score increases as the overlap threshold is increased. • The detected affine-invariant regions may still differ in their rotation and illumination. Any descriptor that uses these regions must account for the invariance when using the regions for matching or other comparisons. == Applications ==

Applications

• Content-based image retrieval • Model-based recognition • Object retrieval in video • Visual data mining: identifying important objects, characters and scenes in videos • Object recognition and categorization • Remotely sensed image analysis: Object detection from remotely sensed images == Software packages ==

Software packages

• Affine Covariant Features: K. Mikolajczyk maintains a web page that contains Linux binaries of the Harris-affine detector in addition to other detectors and descriptors. Matlab code is also available that can be used to illustrate and compute the repeatability of various detectors. Code and images are also available to duplicate the results found in the Mikolajczyk et al. (2005) paper. • lip-vireo – binary code for Linux, Windows and SunOS from VIREO research group. See more from the homepage == External links ==

Source: Wikipedia ↗

tickerdossier.com tickerdossier.substack.com