Vector quantization: Difference between revisions

Content deleted Content added
m Reverted edits by 1.47.5.152 (talk) (HG) (3.4.12)
OAbot (talk | contribs)
m Open access bot: hdl updated in citation with #oabot.
 
(16 intermediate revisions by 7 users not shown)
Line 3:
{{Missing information|something|date=February 2009}}
{{Original research|date=November 2016}}
{{Technical|date=October 2023}}
}}
'''Vector quantization''' ('''VQ''') is a classical [[Quantization (signal processing)|quantization]] technique from [[signal processing]] that allows the modeling of [[probability density functions]] by the distribution of prototype vectors. ItDeveloped in the early 1980s by [[Robert M. Gray]], it was originally used for [[data compression]]. It works by dividing a large set of points ([[coordinate vector|vector]]s) into groups having approximately the same number of points closest to them. Each group is represented by its [[centroid]] point, as in [[k-means]] and some other [[Cluster analysis|clustering]] algorithms. In simpler terms, vector quantization chooses a set of points to represent a larger set of points.
 
The density matching property of vector quantization is powerful, especially for identifying the density of large and high-dimensional data. Since data points are represented by the index of their closest centroid, commonly occurring data have low error, and rare data high error. This is why VQ is suitable for [[lossy data compression]]. It can also be used for lossy data correction and [[density estimation]].
Line 66 ⟶ 67:
</ref>
* [[Cinepak]]
* [[Daala]] is transform-based but uses [[pyramid vector quantization]] on transformed coefficients<ref>{{cite IETF |title= Pyramid Vector Quantization for Video Coding | first1= JM. |last1= Valin | draft=draft-valin-videocodec-pvq-00 | date=October 2012 |publisher=[[Internet Engineering Task Force|IETF]] |access-date=2013-12-17 |url=httphttps://tools.ietf.org/html/draft-valin-videocodec-pvq-00}} See also arXiv:1602.05209</ref>
* [[Digital Video Interactive]]: Production-Level Video and Real-Time Video
* [[Indeo]]
Line 80 ⟶ 81:
* [[AMR-WB+]]
* [[CELP]]
* [[CELT]] (now part of [[Opus (codec)|Opus]] ) is transform-based but uses [[pyramid vector quantization]] on transformed coefficients
* [[Codec 2]]
* [[DTS Coherent Acoustics|DTS]]
Line 89 ⟶ 91:
| publisher = Xiph.org
| date = 2007-03-09
| url = httphttps://xiph.org/vorbis/doc/Vorbis_I_spec.html
| access-date = 2007-03-09 }}
</ref>
* [[Opus (codec)|Opus]] is transform-based but uses [[pyramid vector quantization]] on transformed coefficients
* [[TwinVQ]]
 
=== Use in pattern recognition ===
VQ was also used in the eighties for speech<ref>{{cite journalbook|last=Burton|first=D. K.|author2=Shore, J. E. |author3=Buck, J. T. |title=AICASSP generalization'83. of isolated word recognition using vector quantization|journal=IEEE International Conference on Acoustics, Speech, and Signal Processing ICASSP|chapter=A generalization of isolated word recognition using vector quantization |volume=8|year=1983|pages=1021–1024|doi=10.1109/ICASSP.1983.1171915}}</ref> and [[speaker recognition]].<ref>{{cite journalbook|last=Soong|first=F.|author2=A. Rosenberg |author3=L. Rabiner |author4=B. Juang |title=AICASSP vector'85. Quantization approach to Speaker Recognition|journal=IEEE Proceedings International Conference on Acoustics, Speech, and Signal Processing ICASSP|chapter=A vector quantization approach to speaker recognition |year=1985|volume=1|pages=387–390|doi=10.1109/ICASSP.1985.1168412|s2cid=8970593|url=https://www.semanticscholar.org/paper/9e1d50d98ae09c15354dbcb126609e337d3dc6fb}}</ref>
Recently it has also been used for efficient [[nearest neighbor search]]
<ref>{{cite journal|author=H. Jegou |author2=M. Douze |author3=C. Schmid|title=Product Quantization for Nearest Neighbor Search|journal=IEEE Transactions on Pattern Analysis and Machine Intelligence|year=2011|volume=33|issue=1|pages=117–128|doi=10.1109/TPAMI.2010.57|pmid=21088323 |url=http://hal.archives-ouvertes.fr/docs/00/51/44/62/PDF/paper_hal.pdf |archive-url=https://web.archive.org/web/20111217142048/http://hal.archives-ouvertes.fr/docs/00/51/44/62/PDF/paper_hal.pdf |archive-date=2011-12-17 |url-status=live|citeseerx=10.1.1.470.8573 |s2cid=5850884 }}</ref>
Line 102 ⟶ 103:
In [[pattern recognition]] applications, one codebook is constructed for each class (each class being a user in biometric applications) using acoustic vectors of this user. In the testing phase the quantization distortion of a testing signal is worked out with the whole set of codebooks obtained in the training phase. The codebook that provides the smallest vector quantization distortion indicates the identified user.
 
The main advantage of VQ in [[pattern recognition]] is its low computational burden when compared with other techniques such as [[dynamic time warping]] (DTW) and [[hidden Markov model]] (HMM). The main drawback when compared to DTW and HMM is that it does not take into account the temporal evolution of the signals (speech, signature, etc.) because all the vectors are mixed up. In order to overcome this problem a multi-section codebook approach has been proposed.<ref>{{cite journal|last=Faundez-Zanuy|first=Marcos|author2=Juan Manuel Pascual-Gaspar |title=Efficient On-line signature recognition based on Multi-section VQ|journal=Pattern Analysis and Applications|year=2011|volume=14|issue=1|pages=37–45|doi=10.1007/s10044-010-0176-8|s2cid=24868914|url=https://www.semanticscholar.org/paper/acf19e33b76ca5520e85e5c1be54c9920aa590b1}}</ref> The multi-section approach consists of modelling the signal with several sections (for instance, one codebook for the initial part, another one for the center and a last codebook for the ending part).
 
=== Use as clustering algorithm ===
As VQ is seeking for centroids as density points of nearby lying samples, it can be also directly used as a prototype-based clustering method: each centroid is then associated with one prototype.
By aiming to minimize the expected squared quantization error<ref>{{cite journal|last=Gray|first=R.M.|title=Vector Quantization|journal=IEEE ASSP Magazine|year=1984|volume=1|issue=2|pages=4–29|doi=10.1109/massp.1984.1162229|hdl=2060/19890012969|hdl-access=free}}</ref> and introducing a decreasing learning gain fulfilling the Robbins-Monro conditions, multiple iterations over the whole data set with a concrete but fixed number of prototypes converges to the solution of [[k-means]] clustering algorithm in an incremental manner.
 
=== Generative Adversarial Networks (GAN) ===
Line 112 ⟶ 113:
 
== See also ==
'''Subtopics'''
{{col div|colwidth=40em}}
* [[Linde–Buzo–Gray algorithm|Linde,Buzo,Gray Algorithm]] (LBG)]]
* [[Learning vector quantization]]
* [[Lloyd's algorithm]]
* [[Neural gas|Growing Neural Gas]], a neural network-like system for vector quantization
{{colend}}
 
'''Related topics'''
{{col div|colwidth=40em}}
* [[Speech coding]]
Line 118 ⟶ 128:
* [[Rate-distortion function]]
* [[Data clustering]]
* [[Learning vector quantization]]
* [[Centroidal Voronoi tessellation]]
* [[Neural gas|Growing Neural Gas]], a neural network-like system for vector quantization
* [[Image segmentation]]
* [[Lloyd's algorithm]]
* [[Linde–Buzo–Gray algorithm|Linde,Buzo,Gray Algorithm (LBG)]]
* [[K-means clustering]]
* [[Autoencoder]]
Line 135 ⟶ 141:
 
==External links==
* http://www.data-compression.com/vq.html {{Webarchive|url=https://web.archive.org/web/20171210201342/http://www.data-compression.com/vq.html |date=2017-12-10 }}
* [httphttps://qccpack.sourceforge.net QccPack — Quantization, Compression, and Coding Library (open source)]
* [https://dl.acm.org/citation.cfm?id=1535126 VQ Indexes Compression and Information Hiding Using Hybrid Lossless Index Coding], Wen-Jan Chen and Wen-Tsung Huang