OPTIMOL (automatic Online Picture collection via Incremental MOdel Learning) approaches the problem of
learning object categories from online image searches by addressing model learning and searching simultaneously. OPTIMOL is an iterative model that updates its model of the target object category while concurrently retrieving more relevant images.
General framework OPTIMOL was presented as a general iterative framework that is independent of the specific model used for category learning. The algorithm is as follows: •
Download a large set of images from the Internet by searching for a keyword •
Initialize the dataset with seed images •
While more images needed in the dataset: •
Learn the model with most recently added dataset images •
Classify downloaded images using the updated model •
Add accepted images to the dataset Note that only the most recently added images are used in each round of learning. This allows the algorithm to run on an arbitrarily large number of input images.
Model The two categories (target object and background) are modeled as Hierarchical Dirichlet processes (HDPs). As in the pLSA approach, it is assumed that the images can be described with the
bag of words model. HDP models the distributions of an unspecified number of topics across images in a category, and across categories. The distribution of topics among images in a single category is modeled as a
Dirichlet process (a type of
non-parametric probability distribution). To allow the sharing of topics across classes, each of these Dirichlet processes is modeled as a sample from another 損arent?Dirichlet process. HDP was
first described by Teh et al. in 2005.
Implementation Initialization The dataset must be initialized, or seeded with an original batch of images which serve as good exemplars of the object category to be learned. These can be gathered automatically, using the first page or so of images returned by the search engine (which tend to be better than the subsequent images). Alternatively, the initial images can be gathered by hand.
Model learning To learn the various parameters of the HDP in an incremental manner,
Gibbs sampling is used over the latent variables. It is carried out after each new set of images is incorporated into the dataset. Gibbs sampling involves repeatedly sampling from a set of
random variables in order to approximate their distributions. Sampling involves generating a value for the random variable in question, based on the state of the other random variables on which it is dependent. Given sufficient samples, a reasonable approximation of the value can be achieved.
Classification At each iteration, \displaystyle P(z|c) and \displaystyle P(x|z,c) can be obtained from model learned after the previous round of Gibbs sampling, where \displaystyle z is a topic, \displaystyle c is a category, and \displaystyle x is a single visual word. The likelihood of an image being in a certain class, then, is: \displaystyle P(I|c) = \prod_i \sum_j P(x_i|z_j,c)P(z_j|c) This is computed for each new candidate image per iteration. The image is classified as belonging to the category with the highest likelihood.
Addition to the dataset and "cache set" In order to qualify for incorporation into the dataset, however, an image must satisfy a stronger condition: \displaystyle \frac{P(I|c_f)}{P(I|c_b)} > \frac{\lambda_{Ac_b} - \lambda_{Rc_b}}{\lambda_{Rc_f} - \lambda_{Ac_f}}\frac{P(c_b)}{P(c_f)} Where \displaystyle c_f and \displaystyle c_b are foreground (object) and background categories, respectively, and the ratio of constants describes the risk of accepting false positives and false negatives. They are adjusted automatically at every iteration, with the cost of a false positive set higher than that of a false negative. This ensures that a better dataset is collected. Once an image is accepted by meeting the above criterion and incorporated into the dataset, however, it needs to meet another criterion before it is incorporated into the 揷ache set敆the set of images to be used for training. This set is intended to be a diverse subset of the set of accepted images. If the model were trained on all accepted images, it might become more and more highly specialized, only accepting images very similar to previous ones.
Performance Performance of the OPTIMOL method is defined by three factors: •
Ability to collect images: OPTIMOL, it is found, can automatically collect large numbers of good images from the web. The size of the OPTIMOL-retrieved image sets surpass that of large human-labeled image sets for the same categories, such as those found in
Caltech 101. •
Classification accuracy: Classification accuracy was compared to the accuracy displayed by the classifier yielded by the pLSA methods discussed earlier. It was discovered that OPTIMOL achieved slightly higher accuracy, obtaining 74.8% accuracy on 7 object categories, as compared to 72.0%. •
Comparison with batch learning: An important question to address is whether OPTIMOL's
incremental learning gives it an advantage over traditional batch learning methods, when everything else about the model is held constant. When the classifier learns incrementally, by selecting the next images based on what it learned from the previous ones, three important results are observed: • Incremental learning allows OPTIMOL to collect a better dataset • Incremental learning allows OPTIMOL to learn faster (by discarding irrelevant images) • Incremental learning does not negatively affect the
ROC curve of the classifier; in fact, incremental learning yielded an improvement == Object categorization in content-based image retrieval ==