Object categorization from image search: Difference between revisions

Content deleted Content added
Cleanup
Bender the Bot (talk | contribs)
m Model: HTTP to HTTPS for Brown University
 
(28 intermediate revisions by 20 users not shown)
Line 1:
{{update|date=September 2019}}
In [[computer vision]], the problem of '''object categorization from image search''' is the problem of training a [[Statistical classification|classifier]] to recognize categories of objects, using only the[[image search]], i.e., images retrieved automatically with an Internet [[search engine]]. Ideally, automatic image collection would allow classifiers to be trained with nothing but the category names as input. This problem is closely related to that of [[content-based image retrieval]] (CBIR), where the goal is to return better image search results rather than training a classifier for image recognition.
 
Traditionally, classifiers are trained using sets of images that are labeled by hand. Collecting such a set of images is often a very time-consuming and laborious process. The use of Internet search engines to automate the process of acquiring large sets of labeled images has been described as a potential way of greatly facilitating computer vision research.<ref name = "fergus">
{{cite conference
| last = Fergus
| coauthorsfirst = R. |author2=Fei-Fei, L.; |author3=Perona, P.; |author4=Zisserman, A.;
| first = R.
| coauthors = Fei-Fei, L.; Perona, P.; Zisserman,A.;
| title = Learning Object Categories from Google抯 Image Search
| booktitlebook-title = Proc. IEEE International Conference on Computer Vision
| url = http://vision.cs.princeton.edu/documents/FergusFei-FeiPeronaZisserman_ICCV05.pdf
| year = 2005}}
Line 13:
 
== Challenges ==
 
=== Unrelated images ===
One problem with using Internet image search results as a training set for a classifier is the high percentage of unrelated images within the results. It has been estimated that, when a search engine such as Google images is queried with the name of an object category (such as ''airplane?''), up to 85% of the returned images are unrelated to the category.<ref name = "fergus"/>
 
=== Intra-class variability ===
Line 20 ⟶ 21:
 
== pLSA approach ==
In a 2005 paper by Fergus et al.,<ref name = "fergus"/>, [[pLSA]] (probabilistic latent semantic analysis) and extensions of this model were applied to the problem of object categorization from image search. pLSA was originally developed for [[document classification]], but has since been applied to [[computer vision]]. It makes the assumption that images are documents that fit the [[bag of words model]].
 
=== Model ===
Line 29 ⟶ 30:
<math>\displaystyle P(w|d) = \sum_{z=1}^Z P(w|z)P(z|d)</math>
 
An important assumption made in this model is that <math>\displaystyle w</math> and <math>\displaystyle d</math> are conditionally independent given <math>\displaystyle z</math>. Given a topic, the probability of a certain word appearing as part of that topic is independent of the rest of the image.<ref name = "hofmann">{{cite conference
| first = Thomas
| last = Hofmann
| title = Probabilistic Latent Semantic Analysis
|book-title booktitle = Uncertainty in Artificial Intelligence
| year = 1999
| url = httphttps://www.cs.brown.edu/~th/papers/Hofmann-UAI99.pdf}}</ref>
|url-status = dead
|archive-url = https://web.archive.org/web/20070710083034/http://www.cs.brown.edu/~th/papers/Hofmann-UAI99.pdf
|archive-date = 2007-07-10
}}</ref>
 
Training this model involves finding <math>\displaystyle P(w|z)</math> and <math>\displaystyle P(z|d)</math> that maximizes the likelihood of the observed words in each document. To do this, the [[expectation maximization]] algorithm is used, with the following [[objective function]]:
Line 42 ⟶ 47:
 
=== Application ===
 
==== ABS-pLSA ====
Absolute position pLSA (ABS-pLSA) attaches ___location information to each visual word by localizing it to one of X 揵ins?in the image. Here, <math>\displaystyle x</math> represents which of the bins the visual word falls into. The new equation is:
Line 59 ⟶ 65:
 
=== Implementation ===
 
==== Selecting words ====
Words in an image were selected using 4 different feature detectors:<ref name = "fergus"/>
* [[Kadir bradyKadir–Brady saliency detector]]
* [[Corner detection|Multi-scale Harris detector]]
* [[Difference of Gaussians]]
* Edge based operator, described in the study
Using these 4 detectors, approximately 700 features were detected per image. These features were then encoded as [[SIFT|Scale-invariant feature transform]] descriptors, and vector quantized to match one of 350 words contained in a codebook. The codebook was precomputed from features extracted from a large number of images spanning numerous object categories.
 
==== Possible object locations ====
Line 79 ⟶ 86:
{{cite conference
| last = Li
| first = Li-Jia |author2=Wang, Gang |author3=Fei-Fei, Li
| coauthors = Wang, Gang; Fei-Fei, Li;
| title = OPTIMOL: automatic Online Picture collection via Incremental MOdel Learning
| booktitlebook-title = Proc. IEEE Conference on Computer Vision and Pattern Recognition
| year = 2007
| url = http://vision.cs.princeton.edu/documents/LiWangFei-Fei_CVPR2007.pdf}}
Line 103 ⟶ 109:
{{cite journal
| last = Teh
| coauthorsfirst = Yw |author2=Jordan, MI; |author3=Beal, MJ; |author4=Blei, David
| first = Yw
| coauthors = Jordan, MI; Beal, MJ; Blei,David
| title = Hierarchical Dirichlet Processes
| journal = Journal of the American Statistical Association
Line 111 ⟶ 116:
| doi = 10.1198/016214506000000302
| volume = 101
| issue = 476
| page = 1566
| citeseerx = 10.1.1.5.9094 | s2cid = 7934949 }}
}}
</ref>
 
=== Implementation ===
 
==== Initialization ====
The dataset must be initialized, or seeded with an original batch of images which serve as good exemplars of the object category to be learned. These can be gathered automatically, using the first page or so of images returned by the search engine (which tend to be better than the subsequent images). Alternatively, the initial images can be gathered by hand.
Line 142 ⟶ 149:
 
* ''Ability to collect images'': OPTIMOL, it is found, can automatically collect large numbers of good images from the web. The size of the OPTIMOL-retrieved image sets surpass that of large human-labeled image sets for the same categories, such as those found in [[Caltech 101]].
 
* ''Classification accuracy'': Classification accuracy was compared to the accuracy displayed by the classifier yielded by the pLSA methods discussed earlier. It was discovered that OPTIMOL achieved slightly higher accuracy, obtaining 74.8% accuracy on 7 object categories, as compared to 72.0%.
 
* ''Comparison with batch learning'': An important question to address is whether OPTIMOL's incremental learning gives it an advantage over traditional batch learning methods, when everything else about the model is held constant. When the classifier learns incrementally, by selecting the next images based on what it learned from the previous ones, three important results are observed:
** Incremental learning allows OPTIMOL to collect a better dataset
Line 157 ⟶ 162:
{{cite conference
| last = Fergus
| first = R. |author2=Perona, P. |author3=Zisserman, A.
| coauthors = Perona,P.; Zisserman,A.;
| title = A visual category filter for Google images
| booktitlebook-title = Proc. 8th European Conf. on Computer Vision
| year = 2004
| url = http://www.robots.ox.ac.uk/~fergus/papers/Fergus_ECCV4.pdf
Line 168 ⟶ 172:
| last = Berg
| first = T.
| coauthors author2= Forsyth, D.
| title = Animals on the web
| booktitlebook-title = Proc. Computer Vision and Pattern Recognition
| year = 2006
| doi = 10.1109/CVPR.2006.57
| url = http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1640929
}}</ref>
* Yanai and Barnard, 2006 <ref>
Line 178 ⟶ 182:
| last = Yanai
| first = K
| coauthors author2= Barnard, K.
| title = Probabilistic web image gathering
| booktitlebook-title = ACM SIGMM workshop on Multimedia information retrivalretrieval
| year = 2005
| url = http://portal.acm.org/citation.cfm?id=1101838
}}</ref>
 
== References ==
<references/>
 
== External links ==
{{Empty section|date=July 2010}}
== See also ==
* [[Probabilistic latent semantic analysis]]
Line 197 ⟶ 200:
 
[[Category:Object recognition and categorization]]
[[Category:Image search]]