Content-based image retrieval: Difference between revisions

Content deleted Content added
Citation bot (talk | contribs)
Add: eprint, class. Removed parameters. Some additions/deletions were parameter name changes. | Use this bot. Report bugs. | Suggested by Headbomb | #UCB_toolbar
Citation bot (talk | contribs)
Add: series. | Use this bot. Report bugs. | Suggested by Headbomb | #UCB_toolbar
Line 12:
The term "content-based image retrieval" seems to have originated in 1992 when it was used by Japanese [[Electrotechnical Laboratory]] engineer Toshikazu Kato to describe experiments into automatic retrieval of images from a database, based on the colors and shapes present.<ref name="Eakins"/><ref>{{cite journal |last1=Kato |first1=Toshikazu |title=Database architecture for content-based image retrieval |journal=Image Storage and Retrieval Systems |date=April 1992 |volume=1662 |pages=112–123 |doi=10.1117/12.58497 |bibcode=1992SPIE.1662..112K |publisher=International Society for Optics and Photonics|s2cid=14342247 }}</ref> Since then, the term has been used to describe the process of retrieving desired images from a large collection on the basis of syntactical image features. The techniques, tools, and algorithms that are used originate from fields such as statistics, pattern recognition, signal processing, and computer vision.<ref name="Survey" />
 
{{anchor|Content-based video browsing}}Content-based [[video browsing]] was introduced by Iranian engineer Farshid Arman, Taiwanese computer scientist Arding Hsu, and computer scientist Ming-Yee Chiu, while working at [[Siemens]], and it was presented at the [[Association for Computing Machinery|ACM International Conference]] in August 1993.<ref>{{cite journal |last1=Arman |first1=Farshid |last2=Hsu |first2=Arding |last3=Chiu |first3=Ming-Yee |title=Image Processing on Compressed Data for Large Video Databases |journal=Proceedings of the First ACM International Conference on Multimedia |series=Multimedia '93 |date=August 1993 |pages=267–272 |doi=10.1145/166266.166297 |isbn=0897915968 |url=https://dl.acm.org/citation.cfm?id=166297 |publisher=[[Association for Computing Machinery]]|s2cid=10392157 }}</ref><ref name="Arman1994">{{cite conference |last1=Arman |first1=Farshid |last2=Depommier |first2=Remi |last3=Hsu |first3=Arding |last4=Chiu |first4=Ming-Yee |title=Content-based Browsing of Video Sequences |book-title=Proceedings of the Second ACM International Conference on Multimedia |date=October 1994 |pages=97–103 |doi=10.1145/192593.192630 |citeseerx=10.1.1.476.7139 |isbn=0897916867 |url=https://dl.acm.org/citation.cfm?id=192630 |publisher=[[Association for Computing Machinery]]|s2cid=1360834 }}</ref> They described a [[shot detection]] algorithm for [[compressed video]] that was originally encoded with [[discrete cosine transform]] (DCT) [[video coding standards]] such as [[JPEG]], [[MPEG]] and [[H.26x]]. The basic idea was that, since the DCT coefficients are mathematically related to the spatial ___domain and represent the content of each frame, they can be used to detect the differences between video frames. In the algorithm, a subset of blocks in a frame and a subset of DCT coefficients for each block are used as [[motion vector]] representation for the frame. By operating on compressed DCT representations, the algorithm significantly reduces the computational requirements for decompression and enables effective video browsing.<ref>{{cite book |last1=Zhang |first1=HongJiang |chapter=Content-Based Video Browsing And Retrieval |editor-last1=Furht |editor-first1=Borko |title=Handbook of Internet and Multimedia Systems and Applications |date=1998 |publisher=[[CRC Press]] |isbn=9780849318580 |pages=[https://archive.org/details/handbookofintern0000unse_a3l0/page/83 83–108 (89)] |chapter-url=https://books.google.com/books?id=5zfC1wI0wzUC&pg=PA89 |url=https://archive.org/details/handbookofintern0000unse_a3l0/page/83 }}</ref> The algorithm represents separate shots of a video sequence by an r-frame, a thumbnail of the shot framed by a motion tracking region. A variation of this concept was later adopted for QBIC video content mosaics, where each r-frame is a salient still from the shot it represents.<ref>{{cite journal |last1=Steele |first1=Michael |last2=Hearst |first2=Marti A. |last3=Lawrence |first3=A. Rowe |s2cid=18212394 |title=The Video Workbench: a direct manipulation interface for digital media editing by amateur videographers |journal=[[Semantic Scholar]] |date=1998 |pages=1–19 (14) }}</ref>
 
==={{Visible anchor|QBIC}} - Query By Image Content===