Content-based image retrieval: Difference between revisions

Content deleted Content added
No edit summary
m clean up, replaced: | journal=Springer → | publisher=Springer
Line 12:
The term "content-based image retrieval" seems to have originated in 1992 when it was used by Japanese [[Electrotechnical Laboratory]] engineer Toshikazu Kato to describe experiments into automatic retrieval of images from a database, based on the colors and shapes present.<ref name="Eakins"/><ref>{{cite journal |last1=Kato |first1=Toshikazu |title=Database architecture for content-based image retrieval |journal=Image Storage and Retrieval Systems |date=April 1992 |volume=1662 |pages=112–123 |doi=10.1117/12.58497 |bibcode=1992SPIE.1662..112K |publisher=International Society for Optics and Photonics|s2cid=14342247 }}</ref> Since then, the term has been used to describe the process of retrieving desired images from a large collection on the basis of syntactical image features. The techniques, tools, and algorithms that are used originate from fields such as statistics, pattern recognition, signal processing, and computer vision.<ref name="Survey" />
 
{{anchor|Content-based video browsing}}Content-based [[video browsing]] was introduced by Iranian engineer Farshid Arman, Taiwanese computer scientist Arding Hsu, and computer scientist Ming-Yee Chiu, while working at [[Siemens]], and it was presented at the [[Association for Computing Machinery|ACM International Conference]] in August 1993.<ref>{{cite journal |last1=Arman |first1=Farshid |last2=Hsu |first2=Arding |last3=Chiu |first3=Ming-Yee |title=Image Processing on Compressed Data for Large Video Databases |journal=Proceedings of the First ACM International Conference on Multimedia |date=August 1993 |pages=267–272 |doi=10.1145/166266.166297 |isbn=0897915968 |url=https://dl.acm.org/citation.cfm?id=166297 |publisher=[[Association for Computing Machinery]]|s2cid=10392157 }}</ref><ref name="Arman1994">{{cite journal |last1=Arman |first1=Farshid |last2=Depommier |first2=Remi |last3=Hsu |first3=Arding |last4=Chiu |first4=Ming-Yee |title=Content-based Browsing of Video Sequences |journal=Proceedings of the Second ACM International Conference on Multimedia |date=October 1994 |pages=97–103 |doi=10.1145/192593.192630 |citeseerx=10.1.1.476.7139 |isbn=0897916867 |url=https://dl.acm.org/citation.cfm?id=192630 |publisher=[[Association for Computing Machinery]]|s2cid=1360834 }}</ref> They described a [[shot detection]] algorithm for [[compressed video]] that was originally encoded with [[discrete cosine transform]] (DCT) [[video coding standards]] such as [[JPEG]], [[MPEG]] and [[H.26x]]. The basic idea was that, since the DCT coefficients are mathematically related to the spatial ___domain and represent the content of each frame, they can be used to detect the differences between video frames. In the algorithm, a subset of blocks in a frame and a subset of DCT coefficients for each block are used as [[motion vector]] representation for the frame. By operating on compressed DCT representations, the algorithm significantly reduces the computational requirements for decompression and enables effective video browsing.<ref>{{cite book |last1=Zhang |first1=HongJiang |chapter=Content-Based Video Browsing And Retrieval |editor-last1=Furht |editor-first1=Borko |title=Handbook of Internet and Multimedia Systems and Applications |date=1998 |publisher=[[CRC Press]] |isbn=9780849318580 |pages=[https://archive.org/details/handbookofintern0000unse_a3l0/page/83 83–108 (89)] |chapter-url=https://books.google.com/books?id=5zfC1wI0wzUC&pg=PA89 |url=https://archive.org/details/handbookofintern0000unse_a3l0/page/83 }}</ref> The algorithm represents separate shots of a video sequence by an r-frame, a thumbnail of the shot framed by a motion tracking region. A variation of this concept was later adopted for QBIC video content mosaics, where each r-frame is a salient still from the shot it represents.<ref>{{cite journal |last1=Steele |first1=Michael |last2=Hearst |first2=Marti A. |last3=Lawrence |first3=A. Rowe |s2cid=18212394 |title=The Video Workbench: a direct manipulation interface for digital media editing by amateur videographers |journal=[[Semantic Scholar]] |date=1998 |pages=1-191–19 (14) }}</ref>
 
==={{Visible anchor|QBIC}} - Query By Image Content===
Line 106:
 
===Shape===
Shape does not refer to the shape of an image but to the shape of a particular region that is being sought out. Shapes will often be determined first applying [[Segmentation (image processing)|segmentation]] or [[edge detection]] to an image. Other methods use shape filters to identify given shapes of an image.<ref>{{cite book | last=Tushabe | first=F. |author2=M.H.F. Wilkinson | title=Content-based Image Retrieval Using Combined 2D Attribute Pattern Spectra | journal=Springer Lecture Notes in Computer Science | volume=5152 | pages=554–561 | year=2008| doi=10.1007/978-3-540-85760-0_69 | series=Lecture Notes in Computer Science | isbn=978-3-540-85759-4 | url=https://pure.rug.nl/ws/files/2720522/2008LNCSTushabe.pdf }}</ref> Shape descriptors may also need to be invariant to translation, rotation, and scale.<ref name="Rui"/>
 
Some shape descriptors include:<ref name="Rui"/>
Line 114:
== Vulnerabilities, attacks and defenses ==
 
Like other tasks in [[computer vision]] such as recognition and detection, recent neural network based retrieval algorithms are susceptible to [[generative adversarial network | adversarial attacks]], both as candidate and the query attacks.<ref name="Zhou Niu Wang Zhang 2020">{{cite arXiv | last1=Zhou | first1=Mo | last2=Niu | first2=Zhenxing | last3=Wang | first3=Le | last4=Zhang | first4=Qilin | last5=Hua | first5=Gang | title=Adversarial Ranking Attack and Defense | year=2020 | class=cs.CV | eprint=2002.11293v2 }}</ref> It is shown that retrieved ranking could be dramatically altered with only small perturbations imperceptible to human beings. In addition, model-agnostic transferable adversarial examples are also possible, which enables black-box adversarial attacks on deep ranking systems without requiring access to their underlying implementations.<ref name="Zhou Niu Wang Zhang 2020">{{cite arXiv | last1=Zhou | first1=Mo | last2=Niu | first2=Zhenxing | last3=Wang | first3=Le | last4=Zhang | first4=Qilin | last5=Hua | first5=Gang | title=Adversarial Ranking Attack and Defense | year=2020 | class=cs.CV | eprint=2002.11293v2 }}</ref><ref name="Li Ji Liu Hong pp. 4899–4908">{{cite web | last1=Li | first1=Jie | last2=Ji | first2=Rongrong | last3=Liu | first3=Hong | last4=Hong | first4=Xiaopeng | last5=Gao | first5=Yue | last6=Tian | first6=Qi | title=Universal Perturbation Attack Against Image Retrieval | website=International Conference on Computer Vision (ICCV 2019) | url=https://openaccess.thecvf.com/content_ICCV_2019/html/Li_Universal_Perturbation_Attack_Against_Image_Retrieval_ICCV_2019_paper.html | pages=4899–4908}}</ref>
 
Conversely, the resistance to such attacks can be improved via adversarial defenses such as the Madry defense.<ref name="Madry Makelov Schmidt Tsipras 2017">{{cite arXiv | last1=Madry | first1=Aleksander | last2=Makelov | first2=Aleksandar | last3=Schmidt | first3=Ludwig | last4=Tsipras | first4=Dimitris | last5=Vladu | first5=Adrian | title=Towards Deep Learning Models Resistant to Adversarial Attacks | date=2017-06-19 | class=stat.ML | eprint=1706.06083v4 }}</ref>
Line 136:
* Retail catalogs
* Nudity-detection filters<ref>{{cite journal | last=Wang |first = James Ze |author2=Jia Li |author3=Gio Wiederhold |author4=Oscar Firschein|title=System for Screening Objectionable Images|journal=Computer Communications|year = 1998|volume=21|issue=15|pages=1355–1360|doi=10.1016/s0140-3664(98)00203-5|citeseerx = 10.1.1.78.7689 }}</ref>
* [[Facial_recognition_systemFacial recognition system|Face Finding]]
* Textiles Industry<ref name="Bird">{{cite journal | last=Bird | first=C.L. | author2=P.J. Elliott, Griffiths | title=User interfaces for content-based image retrieval | year=1996}}</ref>
 
Line 215:
* "[https://web.archive.org/web/20141129085237/http://identify.plantnet-project.org/en/ Pl@ntNet: Interactive plant identification based on social image data]" (Joly, Alexis et al.)
* "[https://link.springer.com/book/10.1007%2F978-981-10-6759-4 Content based Image Retrieval]'' (Tyagi, V, 2017)
 
* ''[https://dx.doi.org/10.1145/2578726.2578741 Superimage: Packing Semantic-Relevant Images for Indexing and Retrieval]'' (Luo, Zhang, Huang, Gao, Tian, 2014)
* ''[https://dx.doi.org/10.1145/2461466.2461470 Indexing and searching 100M images with Map-Reduce]'' (Moise, Shestakov, Gudmundsson, and Amsaleg, 2013)