Content-based image retrieval: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 00:03, 29 July 2021 edit Citation bot (talk \| contribs) Bots 5,870,298 edits Add: series. \| Use this bot. Report bugs. \| Suggested by Headbomb \| #UCB_toolbar ← Previous edit		Latest revision as of 14:51, 15 September 2024 edit undo Neutronstar2 (talk \| contribs) Extended confirmed users 1,222 edits m →Texture
(32 intermediate revisions by 21 users not shown)
Line 1: {{Short description\|Method of image retrieval}} [[File:Principe cbir.png\|thumb\|General scheme of content-based image retrieval]] '''Content-based image retrieval''', also known as '''query by image content''' ('''[[#QBIC\|QBIC]]''') and '''content-based visual information retrieval''' ('''CBVIR'''), is the application of [[computer vision]] techniques to the [[image retrieval]] problem, that is, the problem of searching for [[digital image]]s in large [[database]]s (see this survey<ref name="Survey">''[http://www.ugmode.com/prior_art/lew2006cbm.pdf Content-based Multimedia Information Retrieval: State of the Art and Challenges]'' Line 7 ⟶ 8: "Content-based" means that the search analyzes the contents of the image rather than the [[Metadata (computing)\|metadata]] such as keywords, tags, or descriptions associated with the image. The term "content" in this context might refer to colors, shapes, textures, or any other information that can be derived from the image itself. CBIR is desirable because searches that rely purely on metadata are dependent on [[Automatic image annotation\|annotation]] quality and completeness. ==Comparison with metadata searching== ~~Having~~An [[image meta search]] requires humans to have manually ~~annotate~~annotated images by entering keywords or metadata in a large database, which can be time -consuming and may not capture the keywords desired to describe the image. The evaluation of the effectiveness of keyword image search is subjective and has not been well-defined. In the same regard, CBIR systems have similar challenges in defining success.<ref name="Eakins">{{cite web \|url=http://www.jisc.ac.uk/uploaded_documents/jtap-039.doc \|title=Content-based Image Retrieval \|author=Eakins, John \|author2=Graham, Margaret \|publisher=University of Northumbria at Newcastle \|access-date=2014-03-10 \|url-status=dead \|archive-url=https://web.archive.org/web/20120205153636/http://www.jisc.ac.uk/uploaded_documents/jtap-039.doc \|archive-date=2012-02-05 }}</ref> "Keywords also limit the scope of queries to the set of predetermined criteria." and, "having been set up" are less reliable than using the content itself.<ref name=IW.1996/> ==History== The term "content-based image retrieval" seems to have originated in 1992 when it was used by Japanese [[Electrotechnical Laboratory]] engineer Toshikazu Kato to describe experiments into automatic retrieval of images from a database, based on the colors and shapes present.<ref name="Eakins"/><ref>{{cite journal \|last1=Kato \|first1=Toshikazu \|editor-first1=Albert A. \|editor-first2=Carlton W. \|editor-last1=Jamberdino \|editor-last2=Niblack \|title=Database architecture for content-based image retrieval \|journal=Image Storage and Retrieval Systems \|date=April 1992 \|volume=1662 \|pages=112–123 \|doi=10.1117/12.58497 \|bibcode=1992SPIE.1662..112K \|publisher=International Society for Optics and Photonics\|s2cid=14342247 }}</ref> Since then, the term has been used to describe the process of retrieving desired images from a large collection on the basis of syntactical image features. The techniques, tools, and algorithms that are used originate from fields such as statistics, pattern recognition, signal processing, and computer vision.<ref name="Survey" /> {{anchor\|Content-based video browsing}}Content-based [[video browsing]] was introduced by Iranian engineer Farshid Arman, Taiwanese computer scientist Arding Hsu, and computer scientist Ming-Yee Chiu, while working at [[Siemens]], and it was presented at the [[Association for Computing Machinery\|ACM International Conference]] in August 1993.<ref>{{cite journal \|last1=Arman \|first1=Farshid \|last2=Hsu \|first2=Arding \|last3=Chiu \|first3=Ming-Yee \|title=Image Processing on Compressed Data for Large Video Databases \|journal=Proceedings of the First ACM International Conference on Multimedia \|series=Multimedia '93 \|date=August 1993 \|pages=267–272 \|doi=10.1145/166266.166297 \|isbn=0897915968 \|url=https://dl.acm.org/citation.cfm?id=166297 \|publisher=[[Association for Computing Machinery]]\|s2cid=10392157 }}</ref><ref name="Arman1994">{{cite conference \|last1=Arman \|first1=Farshid \|last2=Depommier \|first2=Remi \|last3=Hsu \|first3=Arding \|last4=Chiu \|first4=Ming-Yee \|title=Content-based Browsing of Video Sequences \|book-title=Proceedings of the Second ACM International Conference on Multimedia \|date=October 1994 \|pages=97–103 \|doi=10.1145/192593.192630 \|citeseerx=10.1.1.476.7139 \|isbn=0897916867 \|url=https://dl.acm.org/citation.cfm?id=192630 \|publisher=[[Association for Computing Machinery]]\|s2cid=1360834 }}</ref> They described a [[shot detection]] algorithm for [[compressed video]] that was originally encoded with [[discrete cosine transform]] (DCT) [[video coding standards]] such as [[JPEG]], [[MPEG]] and [[H.26x]]. The basic idea was that, since the DCT coefficients are mathematically related to the spatial ___domain and represent the content of each frame, they can be used to detect the differences between video frames. In the algorithm, a subset of blocks in a frame and a subset of DCT coefficients for each block are used as [[motion vector]] representation for the frame. By operating on compressed DCT representations, the algorithm significantly reduces the computational requirements for decompression and enables effective video browsing.<ref>{{cite book \|last1=Zhang \|first1=HongJiang \|chapter=Content-Based Video Browsing And Retrieval \|editor-last1=Furht \|editor-first1=Borko \|title=Handbook of Internet and Multimedia Systems and Applications \|date=1998 \|publisher=[[CRC Press]] \|isbn=9780849318580 \|pages=[https://archive.org/details/handbookofintern0000unse_a3l0/page/83 83–108 (89)] \|chapter-url=https://books.google.com/books?id=5zfC1wI0wzUC&pg=PA89 \|url=https://archive.org/details/handbookofintern0000unse_a3l0/page/83 }}</ref> The algorithm represents separate shots of a video sequence by an r-frame, a thumbnail of the shot framed by a motion tracking region. A variation of this concept was later adopted for QBIC video content mosaics, where each r-frame is a salient still from the shot it represents.<ref>{{cite journal \|last1=Steele \|first1=Michael \|last2=Hearst \|first2=Marti A. \|last3=Lawrence \|first3=A. Rowe \|s2cid=18212394 \|title=The Video Workbench: a direct manipulation interface for digital media editing by amateur videographers \|journal=[[Semantic Scholar]] \|date=1998 \|pages=1–19 (14) }}</ref> ==={{Visible anchor\|QBIC}} - Query By Image Content=== Line 45: \|issue=9 \|pages=23–32 }}</ref><ref name="Rui">{{cite journal\|last1=Rui\|first1=Yong\|last2=Huang\|first2=Thomas S.\|last3=Chang\|first3=Shih-Fu\|title=Image Retrieval: Current Techniques, Promising Directions, and Open Issues\|journal=Journal of Visual Communication and Image Representation\|date=1999\|volume=10\|pages=39–62\|doi=10.1006/jvci.1999.0413\|citeseerx=10.1.1.32.7819\|s2cid=2910032 }}{{dead link\|date=September 2017 \|bot=InternetArchiveBot \|fix-attempted=yes }}</ref> Recent network- and graph -based approaches have presented a simple and attractive alternative to existing methods.<ref name="Banerjee">{{cite journal\|last1=Banerjee, S. J.\|display-authors=et al\|title=Using complex networks towards information retrieval and diagnostics in multidimensional imaging\|journal=Scientific Reports\|date=2015\|volume=5\|pages=17271\|doi=10.1038/srep17271\|arxiv=1506.02602\|pmid=26626047\|pmc=4667282\|bibcode=2015NatSR...517271B}}</ref> While the storing of multiple images as part of a single entity preceded the term [[Object storage\|BLOB]] ('''B'''inary '''L'''arge '''OB'''ject),<ref>{{cite web \|url=http://www.cvalde.net/misc/blob_true_history.htm \|archive-url=https://web.archive.org/web/20110723065224/http://www.cvalde.net/misc/blob_true_history.htm \|url-status=dead \|archive-date=2011-07-23 \|title=The true story of BLOBs}}</ref> the ability to fully search by content, rather than by description, had to await IBM's QBIC.<ref name=IW.1996>{{cite magazine \|magazine=[[InformationWeek\|Information Week]] (OnLine-reprinted in Silicon Investor's Stock Discussion Forums (Aug. 6, 1996) \|page=69 (IW) \|author=Julie Anderson \|date=April 29, 1996 \|title=Search Images / Object Design Inc - Bargain of the year Stock Discussion Forums (Aug. 6, 1996) \|url=https://www.siliconinvestor.com/readmsgs.aspx?subjectid=6903%26msgnum=17%26batchsize=10%26batchtype=Previous \|quote=At DB Expo in San Francisco earlier this month ... }}{{Dead link\|date=July 2019 \|bot=InternetArchiveBot \|fix-attempted=yes }}</ref> === VisualRank === {{excerpt\|VisualRank}} ==Technical progress== Line 60 ⟶ 62: ==Techniques== Many CBIR systems have been developed, but {{~~asof~~as of\|2006\|lc=y}}, the problem of retrieving images on the basis of their pixel content remains largely unsolved.<ref name="Survey"/>{{~~needs~~ update inline\|date=January 2020}} Different query techniques and implementations of CBIR make use of different types of user queries. Line 77 ⟶ 79: ===Semantic retrieval=== ''Semantic'' retrieval starts with a user making a request like "find pictures of Abraham Lincoln". This type of open-ended task is very difficult for computers to perform - Lincoln may not always be facing the camera or in the same [[pose (computer vision)\|pose]]. Many CBIR systems therefore generally make use of lower-level features like texture, color, and shape. These features are either used in combination with interfaces that allow easier input of the criteria or with databases that have already been trained to match features (such as faces, fingerprints, or shape matching). However, in general, image retrieval requires human feedback in order to identify higher-level concepts.<ref name="Rui" /> ===Relevance feedback (human interaction)=== Combining CBIR search techniques available with the wide range of potential users and their intent can be a difficult task. An aspect of making CBIR successful relies entirely on the ability to understand the user intent.<ref name="Ddata">{{cite journal \| last=Datta \| first=Ritendra \|author2=Dhiraj Joshi \|author3=Jia Li\|author3-link=Jia Li \|author4=James Z. Wang \| title=Image Retrieval: Ideas, Influences, and Trends of the New Age \| journal=ACM Computing Surveys \| url=http://infolab.stanford.edu/~wangz/project/imsearch/review/JOUR/ \| year=2008 \| doi=10.1145/1348246.1348248 \| volume=40 \| issue=2 \| pages=1–60\| s2cid=7060187 }}</ref> CBIR systems can make use of ''[[relevance feedback]]'', where the user progressively refines the search results by marking images in the results as "relevant", "not relevant", or "neutral" to the search query, then repeating the search with the new information. Examples of this type of interface have been developed.<ref name="Bird"/> ===Iterative/machine learning=== Line 86 ⟶ 88: ===Other query methods=== Other query methods include browsing for example images, navigating customized/hierarchical categories, querying by image region (rather than the entire image), querying by multiple example images, querying by visual sketch, querying by direct specification of image features, and [[multimodal interaction\|multimodal]] queries (e.g. combining touch, voice, etc.)<ref name="Mayron">{{cite web\|url=http://mayron.net/liam/pub/mayron_dissertation.pdf \|title=Image Retrieval Using Visual Attention \|author=Liam M. Mayron \|publisher=Mayron.net \|access-date=2012-10-18}}</ref> ==Content comparison using image distance measures== The most common method for comparing two images in content-based image retrieval (typically an example image and an image from the database) is using an image distance measure. An image distance measure compares the [[similarity measure\|similarity]] of two images in various dimensions such as color, texture, shape, and others. For example, a distance of 0 signifies an exact match with the query, with respect to the dimensions that were considered. As one may intuitively gather, a value greater than 0 indicates various degrees of similarities between the images. Search results then can be sorted based on their distance to the queried image.<ref name="Shapiro2001" /> Many measures of image distance (Similarity Models) have been developed.<ref>Eidenberger, Horst (2011). "Fundamental Media Understanding", atpress. {{ISBN\|978-3-8423-7917-6}}.</ref> ===Color=== Line 97 ⟶ 99: [[Image texture\|Texture]] measures look for visual patterns in images and how they are spatially defined. Textures are represented by [[Texel (graphics)\|texels]] which are then placed into a number of sets, depending on how many textures are detected in the image. These sets not only define the texture, but also where in the image the texture is located.<ref name="Shapiro2001"/> Texture is a difficult concept to represent. The identification of specific textures in an image is achieved primarily by modeling texture as a two-dimensional gray level variation. The relative brightness of pairs of pixels is computed such that degree of contrast, regularity, coarseness and directionality may be estimated.<ref name="Rui"/><ref name="Tamura">{{cite journal \| last=Tamura\| first=Hideyuki \|author2=Mori, Shunji \|author3=Yamawaki, Takashi \| title=Textural Features Corresponding to Visual Perception \| journal=IEEE Transactions on Systems, Man, and Cybernetics\| year=1978\|volume=8\|issue=6\|pages=460, 473 \| doi=10.1109/tsmc.1978.4309999\| s2cid=32197839 }}</ref> The problem is in identifying patterns of co-pixel variation and associating them with particular classes of textures such as ''silky'', or ''rough''. Other methods of classifying textures include: * [[Image texture#Co-occurrence Matrices\|Co-occurrence matrix]] * [[Image texture#Laws Texture Energy Measures\|Laws texture energy]] * [[Wavelet transform]] * [[Orthogonal ~~transforms~~transform]]s (~~Discrete~~discrete ~~Tchebichef~~Chebyshev moments)]] ===Shape=== Shape does not refer to the shape of an image but to the shape of a particular region that is being sought out. Shapes will often be determined first applying [[Segmentation (image processing)\|segmentation]] or [[edge detection]] to an image. Other methods use shape filters to identify given shapes of an image.<ref>{{cite book \| last=Tushabe \| first=F. \|author2=M.H.F. Wilkinson \| title=Advances in Multilingual and Multimodal Information Retrieval \| chapter=Content-~~based~~Based Image Retrieval Using Combined 2D Attribute Pattern Spectra \| volume=5152 \| pages=554–561 \| year=2008\| doi=10.1007/978-3-540-85760-0_69 \| series=Lecture Notes in Computer Science \| isbn=978-3-540-85759-4 \| s2cid=18566543 \| url=https://pure.rug.nl/ws/files/2720522/2008LNCSTushabe.pdf }}</ref> Shape descriptors may also need to be invariant to translation, rotation, and scale.<ref name="Rui"/> Some shape descriptors include:<ref name="Rui"/> Line 114 ⟶ 116: == Vulnerabilities, attacks and defenses == Like other tasks in [[computer vision]] such as recognition and detection, recent neural network based retrieval algorithms are susceptible to [[generative adversarial network\|adversarial attacks]], both as candidate and the query attacks.<ref name="Zhou Niu Wang Zhang 2020">{{cite arXiv \| last1=Zhou \| first1=Mo \| last2=Niu \| first2=Zhenxing \| last3=Wang \| first3=Le \| last4=Zhang \| first4=Qilin \| last5=Hua \| first5=Gang \| title=Adversarial Ranking Attack and Defense \| year=2020 \| class=cs.CV \| eprint=2002.11293v2 }}</ref> It is shown that retrieved ranking could be dramatically altered with only small perturbations imperceptible to human beings. In addition, model-agnostic transferable adversarial examples are also possible, which enables black-box adversarial attacks on deep ranking systems without requiring access to their underlying implementations.<ref name="Zhou Niu Wang Zhang 2020"/><ref name="Li Ji Liu Hong pp. 4899–4908">{{cite ~~arxiv~~arXiv \| last1=Li \| first1=Jie \| last2=Ji \| first2=Rongrong \| last3=Liu \| first3=Hong \| last4=Hong \| first4=Xiaopeng \| last5=Gao \| first5=Yue \| last6=Tian \| first6=Qi \| title=Universal Perturbation Attack Against Image Retrieval <!-- \| website=International Conference on Computer Vision (ICCV 2019) --> \| year=2019 \| pages=4899–4908\| class=cs.CV \| eprint=1812.00552 }}</ref> Conversely, the resistance to such attacks can be improved via adversarial defenses such as the Madry defense.<ref name="Madry Makelov Schmidt Tsipras 2017">{{cite arXiv \| last1=Madry \| first1=Aleksander \| last2=Makelov \| first2=Aleksandar \| last3=Schmidt \| first3=Ludwig \| last4=Tsipras \| first4=Dimitris \| last5=Vladu \| first5=Adrian \| title=Towards Deep Learning Models Resistant to Adversarial Attacks \| date=2017-06-19 \| class=stat.ML \| eprint=1706.06083v4 }}</ref> Line 135 ⟶ 137: * Photograph archives * Retail catalogs * Nudity-detection filters<ref>{{cite journal \| last=Wang \|first = James Ze \|author2=Jia Li \|author2-link=Jia Li\|author3=Gio Wiederhold \|author4=Oscar Firschein\|title=System for Screening Objectionable Images\|journal=Computer Communications\|year = 1998\|volume=21\|issue=15\|pages=1355–1360\|doi=10.1016/s0140-3664(98)00203-5\|citeseerx = 10.1.1.78.7689 }}</ref> * [[Facial recognition system\|Face Finding]] * Textiles Industry<ref name="Bird">{{cite conference \| last=Bird \| first=C.L. \| author2=P.J. Elliott \|author3=E. Griffiths \| title=User interfaces for content-based image retrieval \|book-title=IEE Colloquium on Intelligent Image Databases \|publisher=IET \|doi=10.1049/ic:19960746 \|date=1996}}</ref> Line 171 ⟶ 173: ==Further reading== {{external links\|date=November 2022}} ===Relevant research papers=== * ''[http://doi.ieeecomputersociety.org/10.1109/2.410146 Query by Image and Video Content: The QBIC System]'', (Flickner, 1995) Line 187 ⟶ 191: * ''[https://doi.org/10.1007%2F3-540-45479-9_17 FACERET: An Interactive Face Retrieval System Based on Self-Organizing Maps]'' (Ruiz-del-Solar et al., 2002) * ''[http://www-db.stanford.edu/~wangz/project/imsearch/ALIP/PAMI03/ Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach]'' (Li and Wang, 2003) * ''[~~http~~https://ieeexplore.ieee.org/~~iel5~~stamp/~~8769/27772/01238663~~stamp.~~pdf~~jsp?arnumber=1238663 Video google: A text retrieval approach to object matching in videos]'' (Sivic & Zisserman, 2003) * ''[http://www.svcl.ucsd.edu/publications/journal/2004/sp04/sp04.pdf Minimum Probability of Error Image Retrieval]'' (Vasconcelos, 2004) * ''[http://www.svcl.ucsd.edu/publications/journal/2004/it04/it04.pdf On the Efficient Evaluation of Probabilistic Similarity Functions for Image Retrieval]'' (Vasconcelos, 2004) Line 214 ⟶ 218: * ''[http://www-db.deis.unibo.it/research/papers/SIGMAP11.pdf The Windsurf Library for the Efficient Retrieval of Multimedia Hierarchical Data]'' (Bartolini, Patella, and Stromei, 2011) * "[https://web.archive.org/web/20141129085237/http://identify.plantnet-project.org/en/ Pl@ntNet: Interactive plant identification based on social image data]" (Joly, Alexis et al.) * "[https://link.springer.com/book/10.1007%2F978-981-10-6759-4 Content based Image Retrieval]'' (Tyagi, VVipin, 2017) * ''[https://dx.doi.org/10.1145/2578726.2578741 Superimage: Packing Semantic-Relevant Images for Indexing and Retrieval]'' (Luo, Zhang, Huang, Gao, Tian, 2014) * ''[https://dx.doi.org/10.1145/2461466.2461470 Indexing and searching 100M images with Map-Reduce]'' (Moise, Shestakov, Gudmundsson, and Amsaleg, 2013) Line 220 ⟶ 224: ==External links== * {{cite journal \| last=Alkhazraj \| first=Huthaefa \| title=study for constant-based image relative :A Review \| journal=IET Image Processing \| volume=IEEE \| issue=image processing \| date=2017-08-09 \| issn=1751-9659 \| url=https://www.researchgate.net/publication/319007558 \| access-date=2019-01-22}} - the original article * [http://cbir.info/articles/ cbir.info] CBIR-related articles * [https://www.springer.com/13735 IJMIR] many CBIR-related articles * [http://www.sepham.com/ Search by Drawing] * [https://web.archive.org/web/20120518124442/http://pixolution.does-it.net/fileadmin/template/visual_web_demo.html Demonstration of a visual search engine for images. (Search by example image or colors)]2.242654 {{DEFAULTSORT:Content-Based Image Retrieval}} [[Category:Applications of computer vision]] [[Category:~~Artificial~~Applications ~~intelligence~~of ~~applications~~artificial intelligence]] [[Category:Image search]] ~~[[Category:Iranian inventions]]~~ ~~[[Category:Japanese inventions]]~~ ~~[[Category:Taiwanese inventions]]~~