In the field of
Information theory Shannon entropy is defined to quantify the complexity of a distribution
p as p \log p \,. Therefore, higher entropy means
p is more complex hence more unpredictable. To measure the complexity of an image region \{x,R\} around point x with shape R, a descriptor D that takes on values {d_1 ,\dots , d_r } (e.g., in an
8 bit grey level image, D would range from 0 to 255 for each pixel) is defined so that P_{D}(d_i,x,R), the probability of descriptor value d_i occurs in region \{x,R\} can be computed. Further, the entropy of image region R_x can compute as : H_{D}(x,R) = -\sum_{i \in (1\dots r)} P_{D}(d_i,x,R) \log P_{D}(d_i,x,R). Using this entropy equation we can further calculate H_{D}(x,R) for every point x and region shape R. A more complex region, like the eye region, has a more complex distributor and hence higher entropy. H_{D}(x,R) is a good measure for local complexity. Entropy only measures the statistic of the local attribute. It does not measure the spatial arrangement of the local attribute. However, these four regions are not equally discriminative under scale change. This observation is used to define measure on discriminative in subsections. The following subsections will discuss different methods to select regions with high local complexity and greater discrimination between different regions.
Similarity-invariant saliency The first version of the Kadir–Brady saliency detector[10] only finds Salient regions invariant under
similarity transformation. The algorithm finds circle regions with different scales. In other words, given H_{D}(x,s), where s is the
scale parameter of a circle region R, the algorithm selects a set of circle regions, \{x_i,s_i;i=1\dots N\}. The method consists of three steps: • Calculation of Shannon entropy of local image attributes for each x over a range of scales — H_{D}(x,s) = -\sum_{i \in (1\dots r)} P_{D}(d_i,x,s) \log P_{D}(d_i,x,s)/10; • Select scales at which the entropy over scale function exhibits a peak — s_p ; • Calculate the magnitude change of the PDF as a function of scale at each peak — W_D(x,s) = \sum_{i \in (1\dots r)} |\frac{\partial}{\partial s}P_{D,}(d_i,x,s)| (s). The final saliency Y_D(x,s_p) is the product of H_D(x,s_p) and W_D(x,s_p). For each x the method picks a scale s_p and calculates salient score Y_D(x,s_p). By comparing Y_D(x,s_p) of different points x the detector can rank the saliency of points and pick the most representative ones.
Affine-invariant saliency Previous method is invariant to the similarity group of geometric transformations and to photometric shifts. However, as mentioned in the opening remarks, the ideal detector should detect region invariant up to viewpoint change. There are several detector [] can detect affine invariant region which is a better approximation of viewpoint change than similarity transformation. To detect affine invariant region, the detector need to detect ellipse as in figure 4. R now is parameterized by three parameter (s, "ρ", "θ"), where "ρ" is the axis ratio and "θ" the orientation of the ellipse. This modification increases the search space of the previous algorithm from a scale to a set of parameters and therefore the complexity of the affine invariant saliency detector increases. In practice the affine invariant saliency detector starts with the
set of points and scales generated from the similarity invariant saliency detector then iteratively approximates the suboptimal parameters.
Comparison Although similarity invariant saliency detector is faster than Affine invariant saliency detector it also has the drawback of favoring isotropic structure, since the discriminative measure W_D is measured over isotropic scale. To summarize: Affine invariant saliency detector is invariant to
affine transformation and able to detect more generate salient regions.
Salient volume It is intuitive to pick points from a higher salient score directly and stop when a certain number of threshold on "number of points" or "salient score" is satisfied. Natural images contain noise and
motion blur which both act as randomisers and generally increase entropy, affecting previously low entropy values more than high entropy values. A more robust method would be to pick regions rather than points in entropy space. Although the individual pixels within a salient region may be affected at any given instant, by the noise, it is unlikely to affect all of them in such a way that the region as a whole becomes non-salient. It is also necessary to analyze the whole saliency space such that each salient feature is represented. A global threshold approach would result in highly salient features in one part of the image dominating the rest. A local threshold approach would require the setting of another scale parameter. A simple clustering algorithm meets these two requirements are used at the end of the algorithm. It works by selecting highly salient points that have local support i.e. nearby points with similar saliency and scale. Each region must be sufficiently distant from all others (in R3) to qualify as a separate entity. For robustness we use a representation that includes all of the points in a selected region. The method works as follows: • Apply a global threshold. • Choose the highest salient point in saliency-space (Y). • Find the K nearest neighbours (K is a pre-set constant). • Test the support of these using variance of the centre points. • Find distance, D, in R3 from salient regions already clustered. • Accept, if D > scalemean of the region and if sufficiently clustered (variance is less than pre-set threshold Vth ). • Store as the mean scale and spatial location of K points. • Repeat from step 2 with next highest salient point. The algorithm is implemented as GreedyCluster1.m in matlab by Dr. Timor Kadir ==Performance evaluation==