Revision as of 16:34, 12 January 2021 edit 37.4.252.185 (talk) No edit summary ← Previous edit		Revision as of 19:23, 27 June 2021 edit undo Headbomb (talk \| contribs) Edit filter managers, Autopatrolled, Extended confirmed users, Page movers, File movers, New page reviewers, Pending changes reviewers, Rollbackers, Template editors 473,522 edits m clean up, replaced: \| journal=Springer → \| publisher=Springer Tag: AWB Next edit →
Line 12: The term "content-based image retrieval" seems to have originated in 1992 when it was used by Japanese [[Electrotechnical Laboratory]] engineer Toshikazu Kato to describe experiments into automatic retrieval of images from a database, based on the colors and shapes present.<ref name="Eakins"/><ref>{{cite journal \|last1=Kato \|first1=Toshikazu \|title=Database architecture for content-based image retrieval \|journal=Image Storage and Retrieval Systems \|date=April 1992 \|volume=1662 \|pages=112–123 \|doi=10.1117/12.58497 \|bibcode=1992SPIE.1662..112K \|publisher=International Society for Optics and Photonics\|s2cid=14342247 }}</ref> Since then, the term has been used to describe the process of retrieving desired images from a large collection on the basis of syntactical image features. The techniques, tools, and algorithms that are used originate from fields such as statistics, pattern recognition, signal processing, and computer vision.<ref name="Survey" /> {{anchor\|Content-based video browsing}}Content-based [[video browsing]] was introduced by Iranian engineer Farshid Arman, Taiwanese computer scientist Arding Hsu, and computer scientist Ming-Yee Chiu, while working at [[Siemens]], and it was presented at the [[Association for Computing Machinery\|ACM International Conference]] in August 1993.<ref>{{cite journal \|last1=Arman \|first1=Farshid \|last2=Hsu \|first2=Arding \|last3=Chiu \|first3=Ming-Yee \|title=Image Processing on Compressed Data for Large Video Databases \|journal=Proceedings of the First ACM International Conference on Multimedia \|date=August 1993 \|pages=267–272 \|doi=10.1145/166266.166297 \|isbn=0897915968 \|url=https://dl.acm.org/citation.cfm?id=166297 \|publisher=[[Association for Computing Machinery]]\|s2cid=10392157 }}</ref><ref name="Arman1994">{{cite journal \|last1=Arman \|first1=Farshid \|last2=Depommier \|first2=Remi \|last3=Hsu \|first3=Arding \|last4=Chiu \|first4=Ming-Yee \|title=Content-based Browsing of Video Sequences \|journal=Proceedings of the Second ACM International Conference on Multimedia \|date=October 1994 \|pages=97–103 \|doi=10.1145/192593.192630 \|citeseerx=10.1.1.476.7139 \|isbn=0897916867 \|url=https://dl.acm.org/citation.cfm?id=192630 \|publisher=[[Association for Computing Machinery]]\|s2cid=1360834 }}</ref> They described a [[shot detection]] algorithm for [[compressed video]] that was originally encoded with [[discrete cosine transform]] (DCT) [[video coding standards]] such as [[JPEG]], [[MPEG]] and [[H.26x]]. The basic idea was that, since the DCT coefficients are mathematically related to the spatial ___domain and represent the content of each frame, they can be used to detect the differences between video frames. In the algorithm, a subset of blocks in a frame and a subset of DCT coefficients for each block are used as [[motion vector]] representation for the frame. By operating on compressed DCT representations, the algorithm significantly reduces the computational requirements for decompression and enables effective video browsing.<ref>{{cite book \|last1=Zhang \|first1=HongJiang \|chapter=Content-Based Video Browsing And Retrieval \|editor-last1=Furht \|editor-first1=Borko \|title=Handbook of Internet and Multimedia Systems and Applications \|date=1998 \|publisher=[[CRC Press]] \|isbn=9780849318580 \|pages=[https://archive.org/details/handbookofintern0000unse_a3l0/page/83 83–108 (89)] \|chapter-url=https://books.google.com/books?id=5zfC1wI0wzUC&pg=PA89 \|url=https://archive.org/details/handbookofintern0000unse_a3l0/page/83 }}</ref> The algorithm represents separate shots of a video sequence by an r-frame, a thumbnail of the shot framed by a motion tracking region. A variation of this concept was later adopted for QBIC video content mosaics, where each r-frame is a salient still from the shot it represents.<ref>{{cite journal \|last1=Steele \|first1=Michael \|last2=Hearst \|first2=Marti A. \|last3=Lawrence \|first3=A. Rowe \|s2cid=18212394 \|title=The Video Workbench: a direct manipulation interface for digital media editing by amateur videographers \|journal=[[Semantic Scholar]] \|date=1998 \|pages=~~1-19~~1–19 (14) }}</ref> ==={{Visible anchor\|QBIC}} - Query By Image Content=== Line 106: ===Shape=== Shape does not refer to the shape of an image but to the shape of a particular region that is being sought out. Shapes will often be determined first applying [[Segmentation (image processing)\|segmentation]] or [[edge detection]] to an image. Other methods use shape filters to identify given shapes of an image.<ref>{{cite book \| last=Tushabe \| first=F. \|author2=M.H.F. Wilkinson \| title=Content-based Image Retrieval Using Combined 2D Attribute Pattern Spectra ~~\| journal=Springer Lecture Notes in Computer Science~~ \| volume=5152 \| pages=554–561 \| year=2008\| doi=10.1007/978-3-540-85760-0_69 \| series=Lecture Notes in Computer Science \| isbn=978-3-540-85759-4 \| url=https://pure.rug.nl/ws/files/2720522/2008LNCSTushabe.pdf }}</ref> Shape descriptors may also need to be invariant to translation, rotation, and scale.<ref name="Rui"/> Some shape descriptors include:<ref name="Rui"/> Line 114: == Vulnerabilities, attacks and defenses == Like other tasks in [[computer vision]] such as recognition and detection, recent neural network based retrieval algorithms are susceptible to [[generative adversarial network \| adversarial attacks]], both as candidate and the query attacks.<ref name="Zhou Niu Wang Zhang 2020">{{cite arXiv \| last1=Zhou \| first1=Mo \| last2=Niu \| first2=Zhenxing \| last3=Wang \| first3=Le \| last4=Zhang \| first4=Qilin \| last5=Hua \| first5=Gang \| title=Adversarial Ranking Attack and Defense \| year=2020 \| class=cs.CV \| eprint=2002.11293v2 }}</ref> It is shown that retrieved ranking could be dramatically altered with only small perturbations imperceptible to human beings. In addition, model-agnostic transferable adversarial examples are also possible, which enables black-box adversarial attacks on deep ranking systems without requiring access to their underlying implementations.<ref name="Zhou Niu Wang Zhang 2020">{{cite arXiv \| last1=Zhou \| first1=Mo \| last2=Niu \| first2=Zhenxing \| last3=Wang \| first3=Le \| last4=Zhang \| first4=Qilin \| last5=Hua \| first5=Gang \| title=Adversarial Ranking Attack and Defense \| year=2020 \| class=cs.CV \| eprint=2002.11293v2 }}</~~ref~~><ref name="Li Ji Liu Hong pp. 4899–4908">{{cite web \| last1=Li \| first1=Jie \| last2=Ji \| first2=Rongrong \| last3=Liu \| first3=Hong \| last4=Hong \| first4=Xiaopeng \| last5=Gao \| first5=Yue \| last6=Tian \| first6=Qi \| title=Universal Perturbation Attack Against Image Retrieval \| website=International Conference on Computer Vision (ICCV 2019) \| url=https://openaccess.thecvf.com/content_ICCV_2019/html/Li_Universal_Perturbation_Attack_Against_Image_Retrieval_ICCV_2019_paper.html \| pages=4899–4908}}</ref> Conversely, the resistance to such attacks can be improved via adversarial defenses such as the Madry defense.<ref name="Madry Makelov Schmidt Tsipras 2017">{{cite arXiv \| last1=Madry \| first1=Aleksander \| last2=Makelov \| first2=Aleksandar \| last3=Schmidt \| first3=Ludwig \| last4=Tsipras \| first4=Dimitris \| last5=Vladu \| first5=Adrian \| title=Towards Deep Learning Models Resistant to Adversarial Attacks \| date=2017-06-19 \| class=stat.ML \| eprint=1706.06083v4 }}</ref> Line 136: * Retail catalogs * Nudity-detection filters<ref>{{cite journal \| last=Wang \|first = James Ze \|author2=Jia Li \|author3=Gio Wiederhold \|author4=Oscar Firschein\|title=System for Screening Objectionable Images\|journal=Computer Communications\|year = 1998\|volume=21\|issue=15\|pages=1355–1360\|doi=10.1016/s0140-3664(98)00203-5\|citeseerx = 10.1.1.78.7689 }}</ref> * [[~~Facial_recognition_system~~Facial recognition system\|Face Finding]] * Textiles Industry<ref name="Bird">{{cite journal \| last=Bird \| first=C.L. \| author2=P.J. Elliott, Griffiths \| title=User interfaces for content-based image retrieval \| year=1996}}</ref> Line 215: * "[https://web.archive.org/web/20141129085237/http://identify.plantnet-project.org/en/ Pl@ntNet: Interactive plant identification based on social image data]" (Joly, Alexis et al.) * "[https://link.springer.com/book/10.1007%2F978-981-10-6759-4 Content based Image Retrieval]'' (Tyagi, V, 2017) * ''[https://dx.doi.org/10.1145/2578726.2578741 Superimage: Packing Semantic-Relevant Images for Indexing and Retrieval]'' (Luo, Zhang, Huang, Gao, Tian, 2014) * ''[https://dx.doi.org/10.1145/2461466.2461470 Indexing and searching 100M images with Map-Reduce]'' (Moise, Shestakov, Gudmundsson, and Amsaleg, 2013)

Content-based image retrieval: Difference between revisions