Content deleted Content added
Stuartyeates (talk | contribs) Disambiguated: SIFT → Scale-invariant feature transform |
m →Model: HTTP to HTTPS for Brown University |
||
(26 intermediate revisions by 18 users not shown) | |||
Line 1:
{{update|date=September 2019}}
In [[computer vision]],
Traditionally, classifiers are trained using sets of images that are labeled by hand. Collecting such a set of images is often a very time-consuming and laborious process. The use of Internet search engines to automate the process of acquiring large sets of labeled images has been described as a potential way of greatly facilitating computer vision research.<ref name = "fergus">
{{cite conference
| last = Fergus
▲| coauthors = Fei-Fei, L.; Perona, P.; Zisserman,A.;
| title = Learning Object Categories from Google抯 Image Search
|
| url = http://vision.cs.princeton.edu/documents/FergusFei-FeiPeronaZisserman_ICCV05.pdf
| year = 2005}}
Line 13:
== Challenges ==
=== Unrelated images ===
One problem with using Internet image search results as a training set for a classifier is the high percentage of unrelated images within the results. It has been estimated that, when a search engine such as Google images is queried with the name of an object category (such as ''airplane
=== Intra-class variability ===
Line 20 ⟶ 21:
== pLSA approach ==
In a 2005 paper by Fergus et al.,<ref name = "fergus"/>
=== Model ===
Line 29 ⟶ 30:
<math>\displaystyle P(w|d) = \sum_{z=1}^Z P(w|z)P(z|d)</math>
An important assumption made in this model is that <math>\displaystyle w</math> and <math>\displaystyle d</math> are conditionally independent given <math>\displaystyle z</math>. Given a topic, the probability of a certain word appearing as part of that topic is independent of the rest of the image.<ref name
|
|
|
|book-title
|
|
|url-status = dead
|archive-url = https://web.archive.org/web/20070710083034/http://www.cs.brown.edu/~th/papers/Hofmann-UAI99.pdf
|archive-date = 2007-07-10
}}</ref>
Training this model involves finding <math>\displaystyle P(w|z)</math> and <math>\displaystyle P(z|d)</math> that maximizes the likelihood of the observed words in each document. To do this, the [[expectation maximization]] algorithm is used, with the following [[objective function]]:
Line 42 ⟶ 47:
=== Application ===
==== ABS-pLSA ====
Absolute position pLSA (ABS-pLSA) attaches ___location information to each visual word by localizing it to one of X 揵ins?in the image. Here, <math>\displaystyle x</math> represents which of the bins the visual word falls into. The new equation is:
Line 59 ⟶ 65:
=== Implementation ===
==== Selecting words ====
Words in an image were selected using 4 different feature detectors:<ref name = "fergus"/>
* [[
* [[Corner detection|Multi-scale Harris detector]]
* [[Difference of Gaussians]]
Line 79 ⟶ 86:
{{cite conference
| last = Li
| first = Li-Jia |author2=Wang, Gang |author3=Fei-Fei, Li
| title = OPTIMOL: automatic Online Picture collection via Incremental MOdel Learning
|
| year = 2007
| url = http://vision.cs.princeton.edu/documents/LiWangFei-Fei_CVPR2007.pdf}}
Line 103 ⟶ 109:
{{cite journal
| last = Teh
▲| coauthors = Jordan, MI; Beal, MJ; Blei,David
| title = Hierarchical Dirichlet Processes
| journal = Journal of the American Statistical Association
Line 113 ⟶ 118:
| issue = 476
| page = 1566
| citeseerx = 10.1.1.5.9094 | s2cid = 7934949 }}
</ref>
=== Implementation ===
==== Initialization ====
The dataset must be initialized, or seeded with an original batch of images which serve as good exemplars of the object category to be learned. These can be gathered automatically, using the first page or so of images returned by the search engine (which tend to be better than the subsequent images). Alternatively, the initial images can be gathered by hand.
Line 143 ⟶ 149:
* ''Ability to collect images'': OPTIMOL, it is found, can automatically collect large numbers of good images from the web. The size of the OPTIMOL-retrieved image sets surpass that of large human-labeled image sets for the same categories, such as those found in [[Caltech 101]].
* ''Classification accuracy'': Classification accuracy was compared to the accuracy displayed by the classifier yielded by the pLSA methods discussed earlier. It was discovered that OPTIMOL achieved slightly higher accuracy, obtaining 74.8% accuracy on 7 object categories, as compared to 72.0%.
* ''Comparison with batch learning'': An important question to address is whether OPTIMOL's incremental learning gives it an advantage over traditional batch learning methods, when everything else about the model is held constant. When the classifier learns incrementally, by selecting the next images based on what it learned from the previous ones, three important results are observed:
** Incremental learning allows OPTIMOL to collect a better dataset
Line 158 ⟶ 162:
{{cite conference
| last = Fergus
| first = R. |author2=Perona, P. |author3=Zisserman, A.
| title = A visual category filter for Google images
|
| year = 2004
| url = http://www.robots.ox.ac.uk/~fergus/papers/Fergus_ECCV4.pdf
Line 169 ⟶ 172:
| last = Berg
| first = T.
|
| title = Animals on the web
|
| year = 2006
| doi = 10.1109/CVPR.2006.57
}}</ref>
* Yanai and Barnard, 2006 <ref>
Line 179 ⟶ 182:
| last = Yanai
| first = K
|
| title = Probabilistic web image gathering
|
| year = 2005
| url = http://portal.acm.org/citation.cfm?id=1101838
}}</ref>
== References ==
<references/>
== See also ==
* [[Probabilistic latent semantic analysis]]
Line 198 ⟶ 200:
[[Category:Object recognition and categorization]]
[[Category:Image search]]
|