Object categorization from image search: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 14:04, 8 May 2016 edit 108.30.32.22 (talk) No edit summary ← Previous edit		Latest revision as of 09:28, 20 August 2025 edit undo Bender the Bot (talk \| contribs) Bots 1,064,377 edits m →Model: HTTP to HTTPS for Brown University Tag: AWB
(20 intermediate revisions by 14 users not shown)
Line 1: {{update\|date=September 2019}} In [[computer vision]], ~~the problem of~~ '''object categorization from image search''' is the problem of training a [[Statistical classification\|classifier]] to recognize categories of objects, using only ~~the~~[[image search]], i.e., images retrieved automatically with an Internet [[search engine]]. Ideally, automatic image collection would allow classifiers to be trained with nothing but the category names as input. This problem is closely related to that of [[content-based image retrieval]] (CBIR), where the goal is to return better image search results rather than training a classifier for image recognition. Traditionally, classifiers are trained using sets of images that are labeled by hand. Collecting such a set of images is often a very time-consuming and laborious process. The use of Internet search engines to automate the process of acquiring large sets of labeled images has been described as a potential way of greatly facilitating computer vision research.<ref name = "fergus"> {{cite conference \| last = Fergus \| first = R. \|author2=Fei-Fei, L. \|author3=Perona, P. \|author4=Zisserman, A. \| title = Learning Object Categories from Google抯 Image Search \| ~~booktitle~~book-title = Proc. IEEE International Conference on Computer Vision \| url = http://vision.cs.princeton.edu/documents/FergusFei-FeiPeronaZisserman_ICCV05.pdf \| year = 2005}} Line 14 ⟶ 15: === Unrelated images === One problem with using Internet image search results as a training set for a classifier is the high percentage of unrelated images within the results. It has been estimated that, when a search engine such as Google images is queried with the name of an object category (such as ''airplane?''), up to 85% of the returned images are unrelated to the category.<ref name = "fergus"/> === Intra-class variability === Line 29 ⟶ 30: <math>\displaystyle P(w\|d) = \sum_{z=1}^Z P(w\|z)P(z\|d)</math> An important assumption made in this model is that <math>\displaystyle w</math> and <math>\displaystyle d</math> are conditionally independent given <math>\displaystyle z</math>. Given a topic, the probability of a certain word appearing as part of that topic is independent of the rest of the image.<ref name = "hofmann">{{cite conference \| first = Thomas \| last = Hofmann \| title = Probabilistic Latent Semantic Analysis \|book-title ~~booktitle~~ = Uncertainty in Artificial Intelligence \| year = 1999 \| url = ~~http~~https://www.cs.brown.edu/~th/papers/Hofmann-UAI99.pdf~~}}</ref>~~ \|url-status = dead \|archive-url = https://web.archive.org/web/20070710083034/http://www.cs.brown.edu/~th/papers/Hofmann-UAI99.pdf \|archive-date = 2007-07-10 }}</ref> Training this model involves finding <math>\displaystyle P(w\|z)</math> and <math>\displaystyle P(z\|d)</math> that maximizes the likelihood of the observed words in each document. To do this, the [[expectation maximization]] algorithm is used, with the following [[objective function]]: Line 63 ⟶ 68: ==== Selecting words ==== Words in an image were selected using 4 different feature detectors:<ref name = "fergus"/> * [[~~Kadir-Brady~~Kadir–Brady saliency detector]] * [[Corner detection\|Multi-scale Harris detector]] * [[Difference of Gaussians]] Line 83 ⟶ 88: \| first = Li-Jia \|author2=Wang, Gang \|author3=Fei-Fei, Li \| title = OPTIMOL: automatic Online Picture collection via Incremental MOdel Learning \| ~~booktitle~~book-title = Proc. IEEE Conference on Computer Vision and Pattern Recognition \| year = 2007 \| url = http://vision.cs.princeton.edu/documents/LiWangFei-Fei_CVPR2007.pdf}} Line 104 ⟶ 109: {{cite journal \| last = Teh \| first = Yw \|author2=Jordan, MI \|author3=Beal, MJ \|author4=Blei, David \| title = Hierarchical Dirichlet Processes \| journal = Journal of the American Statistical Association Line 113 ⟶ 118: \| issue = 476 \| page = 1566 \| citeseerx = 10.1.1.5.9094 \| s2cid = 7934949 }} }} </ref> Line 144 ⟶ 149: * ''Ability to collect images'': OPTIMOL, it is found, can automatically collect large numbers of good images from the web. The size of the OPTIMOL-retrieved image sets surpass that of large human-labeled image sets for the same categories, such as those found in [[Caltech 101]]. * ''Classification accuracy'': Classification accuracy was compared to the accuracy displayed by the classifier yielded by the pLSA methods discussed earlier. It was discovered that OPTIMOL achieved slightly higher accuracy, obtaining 74.8% accuracy on 7 object categories, as compared to 72.0%. * ''Comparison with batch learning'': An important question to address is whether OPTIMOL's incremental learning gives it an advantage over traditional batch learning methods, when everything else about the model is held constant. When the classifier learns incrementally, by selecting the next images based on what it learned from the previous ones, three important results are observed: ** Incremental learning allows OPTIMOL to collect a better dataset Line 159 ⟶ 162: {{cite conference \| last = Fergus \| first = R. \|author2=Perona, P. \|author3=Zisserman, A. \| title = A visual category filter for Google images \| ~~booktitle~~book-title = Proc. 8th European Conf. on Computer Vision \| year = 2004 \| url = http://www.robots.ox.ac.uk/~fergus/papers/Fergus_ECCV4.pdf Line 171 ⟶ 174: \|author2=Forsyth, D. \| title = Animals on the web \| ~~booktitle~~book-title = Proc. Computer Vision and Pattern Recognition \| year = 2006 \| doi = 10.1109/CVPR.2006.57 ~~\| url = http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1640929~~ }}</ref> * Yanai and Barnard, 2006 <ref> Line 181 ⟶ 184: \|author2=Barnard, K. \| title = Probabilistic web image gathering \| ~~booktitle~~book-title = ACM SIGMM workshop on Multimedia information retrieval \| year = 2005 \| url = http://portal.acm.org/citation.cfm?id=1101838 Line 188 ⟶ 191: == References == <references/> ~~== External links ==~~ ~~{{Empty section\|date=July 2010}}~~ == See also == Line 200: [[Category:Object recognition and categorization]] [[Category:Image search]]