Vector quantization: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 19:09, 6 May 2021 edit Jarble (talk \| contribs) Autopatrolled, Extended confirmed users 150,084 edits m link speaker recognition using Find link ← Previous edit		Latest revision as of 03:44, 11 August 2025 edit undo OAbot (talk \| contribs) Bots 643,717 edits m Open access bot: hdl updated in citation with #oabot.
(26 intermediate revisions by 17 users not shown)
Line 1: {{short description\|Classical quantization technique from signal processing}} {{Multiple issues\| {{Missing information\|something\|date=February 2009}} {{Original research\|date=November 2016}} {{Technical\|date=October 2023}} }} '''Vector quantization''' ('''VQ''') is a classical [[Quantization (signal processing)\|quantization]] technique from [[signal processing]] that allows the modeling of [[probability density functions]] by the distribution of prototype vectors. ItDeveloped in the early 1980s by [[Robert M. Gray]], it was originally used for [[data compression]]. It works by dividing a large set of points ([[coordinate vector\|vector]]s) into groups having approximately the same number of points closest to them. Each group is represented by its [[centroid]] point, as in [[k-means]] and some other [[Cluster analysis\|clustering]] algorithms. In simpler terms, vector quantization chooses a set of points to represent a larger set of points. The density matching property of vector quantization is powerful, especially for identifying the density of large and high-dimensional data. Since data points are represented by the index of their closest centroid, commonly occurring data have low error, and rare data high error. This is why VQ is suitable for [[lossy data compression]]. It can also be used for lossy data correction and [[density estimation]]. Line 65 ⟶ 67: </ref> * [[Cinepak]] * [[Daala]] is transform-based but uses [[pyramid vector quantization]] on transformed coefficients<ref>{{cite IETF \|title= Pyramid Vector Quantization for Video Coding \| first1= JM. \|last1= Valin \| draft=draft-valin-videocodec-pvq-00 \| date=October 2012 \|publisher=[[Internet Engineering Task Force\|IETF]] \|access-date=2013-12-17 \|url=~~http~~https://tools.ietf.org/html/draft-valin-videocodec-pvq-00}} See also arXiv:1602.05209</ref> * [[Digital Video Interactive]]: Production-Level Video and Real-Time Video * [[Indeo]] Line 79 ⟶ 81: * [[AMR-WB+]] * [[CELP]] * [[CELT]] (now part of [[Opus (codec)\|Opus]] ) is transform-based but uses [[pyramid vector quantization]] on transformed coefficients▼ * [[Codec 2]] * [[DTS Coherent Acoustics\|DTS]] * [[G.729]] * [[iLBC]] * [[Ogg Vorbis]] <ref> {{cite web \| title = Vorbis I Specification \| publisher = Xiph.org \| date = 2007-03-09 \| url = ~~http~~https://xiph.org/vorbis/doc/Vorbis_I_spec.html \| access-date = 2007-03-09 }} </ref> ▲* [[Opus (codec)\|Opus]] is transform-based but uses [[pyramid vector quantization]] on transformed coefficients * [[TwinVQ]] === Use in pattern recognition === VQ was also used in the eighties for speech<ref>{{cite ~~journal~~book\|last=Burton\|first=D. K.\|author2=Shore, J. E. \|author3=Buck, J. T. \|title=AICASSP ~~generalization~~'83. ~~of isolated word recognition using vector quantization\|journal=~~IEEE International Conference on Acoustics, Speech, and Signal Processing ~~ICASSP~~\|chapter=A generalization of isolated word recognition using vector quantization \|volume=8\|year=1983\|pages=1021–1024\|doi=10.1109/ICASSP.1983.1171915}}</ref> and [[speaker recognition]].<ref>{{cite ~~journal~~book\|last=Soong\|first=F.\|author2=A. Rosenberg \|author3=L. Rabiner \|author4=B. Juang \|title=AICASSP ~~vector~~'85. ~~Quantization approach to Speaker Recognition\|journal=~~IEEE ~~Proceedings~~ International Conference on Acoustics, Speech, and Signal Processing ~~ICASSP~~\|chapter=A vector quantization approach to speaker recognition \|year=1985\|volume=1\|pages=387–390\|doi=10.1109/ICASSP.1985.1168412\|s2cid=8970593~~\|url=https://www.semanticscholar.org/paper/9e1d50d98ae09c15354dbcb126609e337d3dc6fb~~}}</ref> Recently it has also been used for efficient [[nearest neighbor search]] <ref>{{cite journal\|author=H. Jegou \|author2=M. Douze \|author3=C. Schmid\|title=Product Quantization for Nearest Neighbor Search\|journal=IEEE Transactions on Pattern Analysis and Machine Intelligence\|year=2011\|volume=33\|issue=1\|pages=117–128\|doi=10.1109/TPAMI.2010.57\|pmid=21088323 \|url=http://hal.archives-ouvertes.fr/docs/00/51/44/62/PDF/paper_hal.pdf \|archive-url=https://web.archive.org/web/20111217142048/http://hal.archives-ouvertes.fr/docs/00/51/44/62/PDF/paper_hal.pdf \|archive-date=2011-12-17 \|url-status=live\|citeseerx=10.1.1.470.8573 \|s2cid=5850884 }}</ref> and on-line signature recognition.<ref>{{cite journal\|last=Faundez-Zanuy\|first=Marcos\|title=offline and On-line signature recognition based on VQ-DTW\|journal=Pattern Recognition\|year=2007\|volume=40\|issue=3\|pages=981–992\|doi=10.1016/j.patcog.2006.06.007}}</ref> In [[pattern recognition]] applications, one codebook is constructed for each class (each class being a user in biometric applications) using acoustic vectors of this user. In the testing phase the quantization distortion of a testing signal is worked out with the whole set of codebooks obtained in the training phase. The codebook that provides the smallest vector quantization distortion indicates the identified user. The main advantage of VQ in [[pattern recognition]] is its low computational burden when compared with other techniques such as [[dynamic time warping]] (DTW) and [[hidden Markov model]] (HMM). The main drawback when compared to DTW and HMM is that it does not take into account the temporal evolution of the signals (speech, signature, etc.) because all the vectors are mixed up. In order to overcome this problem a multi-section codebook approach has been proposed.<ref>{{cite journal\|last=Faundez-Zanuy\|first=Marcos\|author2=Juan Manuel Pascual-Gaspar \|title=Efficient On-line signature recognition based on Multi-section VQ\|journal=Pattern Analysis and Applications\|year=2011\|volume=14\|issue=1\|pages=37–45\|doi=10.1007/s10044-010-0176-8\|s2cid=24868914~~\|url=https://www.semanticscholar.org/paper/acf19e33b76ca5520e85e5c1be54c9920aa590b1~~}}</ref> The multi-section approach consists of modelling the signal with several sections (for instance, one codebook for the initial part, another one for the center and a last codebook for the ending part). === Use as clustering algorithm === As VQ is seeking for centroids as density points of nearby lying samples, it can be also directly used as a prototype-based clustering method: each centroid is then associated with one prototype. By aiming to minimize the expected squared quantization error<ref>{{cite journal\|last=Gray\|first=R.M.\|title=Vector Quantization\|journal=IEEE ASSP Magazine\|year=1984\|volume=1\|issue=2\|pages=4–29\|doi=10.1109/massp.1984.1162229\|hdl=2060/19890012969\|hdl-access=free}}</ref> and introducing a decreasing learning gain fulfilling the Robbins-Monro conditions, multiple iterations over the whole data set with a concrete but fixed number of prototypes converges to the solution of [[k-means]] clustering algorithm in an incremental manner. === Generative Adversarial Networks (GAN) === VQ has been used to quantize a feature representation layer in the discriminator of ~~GANs~~[[Generative adversarial network]]s. The feature quantization (FQ) technique performs implicit feature matching.<ref>Feature Quantization Improves GAN Training https://arxiv.org/abs/2004.02088</ref> It improves the GAN training, and yields an improved performance on a variety of popular GAN models: BigGAN for image generation, StyleGAN for face synthesis, and U-GAT-IT for unsupervised image-to-image translation. == See also == '''Subtopics''' {{col div\|colwidth=40em}} * [[Linde–Buzo–Gray algorithm~~\|Linde,Buzo,Gray Algorithm~~]] (LBG)]]▼ * [[Learning vector quantization]]▼ * [[Lloyd's algorithm]]▼ * [[Neural gas\|Growing Neural Gas]], a neural network-like system for vector quantization▼ {{colend}} '''Related topics''' {{col div\|colwidth=40em}} * [[Speech coding]] Line 117 ⟶ 128: * [[Rate-distortion function]] * [[Data clustering]] ▲* [[Learning vector quantization]] * [[Centroidal Voronoi tessellation]] ▲* [[Neural gas\|Growing Neural Gas]], a neural network-like system for vector quantization * [[Image segmentation]] ▲* [[Lloyd's algorithm]] ▲* [[Linde–Buzo–Gray algorithm\|Linde,Buzo,Gray Algorithm (LBG)]] * [[K-means clustering]] * [[Autoencoder]] Line 134 ⟶ 141: ==External links== * http://www.data-compression.com/vq.html {{Webarchive\|url=https://web.archive.org/web/20171210201342/http://www.data-compression.com/vq.html \|date=2017-12-10 }} * [~~http~~https://qccpack.sourceforge.net QccPack — Quantization, Compression, and Coding Library (open source)] * [https://dl.acm.org/citation.cfm?id=1535126 VQ Indexes Compression and Information Hiding Using Hybrid Lossless Index Coding], Wen-Jan Chen and Wen-Tsung Huang