In [[computer science]], '''learning vector quantization''' ('''LVQ''') is a [[prototype|prototype-based]] [[supervised learning|supervised]] [[Statistical classification|classification]] [[algorithm]]. LVQ is the supervised counterpart of [[vector quantization]] systems. LVQ can be understood as a special case of an [[artificial neural network]], more precisely, it applies a [[winner-take-all (computing)|winner-take-all]] [[Hebbian learning]]-based approach. It is a precursor to [[self-organizing map]]s (SOM) and related to [[neural gas]] and the [[k-nearest neighbor algorithm]] (k-NN). LVQ was invented by [[Teuvo Kohonen]].<ref>T. Kohonen. Self-Organizing Maps. Springer, Berlin, 1997.</ref>
== Overview ==
LVQ can be understood as a special case of an [[artificial neural network]], more precisely, it applies a [[winner-take-all (computing)|winner-take-all]] [[Hebbian learning]]-based approach. It is a precursor to [[self-organizing map]]s (SOM) and related to [[neural gas]], and to the [[k-nearest neighbor algorithm]] (k-NN). LVQ was invented by [[Teuvo Kohonen]].<ref>T. Kohonen. Self-Organizing Maps. Springer, Berlin, 1997.</ref>
== Definition ==
An LVQ system is represented by prototypes <math>W=(w(i),...,w(n))</math> which are defined in the [[feature space]] of observed data. In winner-take-all training algorithms one determines, for each data point, the prototype which is closest to the input according to a given distance measure. The position of this so-called winner prototype is then adapted, i.e. the winner is moved closer if it correctly classifies the data point or moved away if it classifies the data point incorrectly.
LVQ systems can be applied to [[multi-class classification]] problems in a natural way.
A key issue in LVQ is the choice of an appropriate measure of distance or similarity for training and classification. Recently, techniques have been developed which adapt a parameterized distance measure in the course of training the system, see e.g. (Schneider, Biehl, and Hammer, 2009)<ref>{{cite journal|authorsauthor1=P. Schneider, |author2=B. Hammer, and |author3=M. Biehl |title=Adaptive Relevance Matrices in Learning Vector Quantization|journal= Neural Computation|volume=21|issue=10|pages=3532–3561|year=2009|doi=10.1162/neco.2009.10-08-892|pmid=19635012|citeseerx=10.1.1.216.1183|s2cid=17306078}}</ref> and references therein.
LVQ can be a source of great help in classifying text documents.{{Citation needed|date=December 2019|reason=removed citation to predatory publisher content}}
== Algorithm ==
The algorithms are presented as in.<ref>{{Citation |last=Kohonen |first=Teuvo |title=Learning Vector Quantization |date=2001 |work=Self-Organizing Maps |volume=30 |pages=245–261 |url=http://link.springer.com/10.1007/978-3-642-56927-2_6 |place=Berlin, Heidelberg |publisher=Springer Berlin Heidelberg |doi=10.1007/978-3-642-56927-2_6 |isbn=978-3-540-67921-9|url-access=subscription }}</ref>
Below follows an informal description.<br>
The algorithm consists of three basic steps. The algorithm's input is:
Set up:
* how many neurons the system will have <math>M</math> (in the simplest case it is equal to the number of classes)
* what weight each neuron has <math>\vec{w_i}</math> for <math>i = 0,1,...,M - 1 </math>
* Let the correspondingdata labelbe denoted by <math>c_ix_i \in \R^D</math>, toand eachtheir neuroncorresponding labels by <math>y_i \in \vec{w_i}1, 2, \dots, C\}</math>.
* howThe fastcomplete thedataset neurons are learningis <math>\{(x_i, y_i)\eta }_{i=1}^N</math>.
* The set of code vectors is <math>w_j \in \R^D</math>.
* and an input list <math> L </math> containing all the vectors of which the labels are known already (training set).
* The learning rate at iteration step <math>t</math> is denoted by <math>\alpha_t</math>.
* The hyperparameters <math>w</math> and <math>\epsilon</math> are used by LVQ2 and LVQ3. The original paper suggests <math>\epsilon \in [0.1, 0.5]</math> and <math>w \in [0.2, 0.3]</math>.
=== LVQ1 ===
Initialize several code vectors per label. Iterate until convergence criteria is reached.
# Sample a datum <math>x_i</math>, and find out the code vector <math>w_j</math>, such that <math>x_i</math> falls within the [[Voronoi diagram|Voronoi cell]] of <math>w_j</math>.
# If its label <math>y_i</math> is the same as that of <math>w_j</math>, then <math>w_j \leftarrow w_j + \alpha_t(x_i - w_j)</math>, otherwise, <math>w_j \leftarrow w_j - \alpha_t(x_i - w_j)</math>.
The algorithm's flow is:
#LVQ2 Foris nextthe inputsame as LVQ3, but with this sentence removed: "If <math>\vec{x}w_j</math> (with labeland <math>yw_k</math>) inand <math> L x_i</math> findhave the closestsame neuronclass, then <math>w_j \vec{w_m}leftarrow w_j - \alpha_t(x_i - w_j)</math>, <br>i.e. and <math>d(w_k \vec{x},\vec{w_m})leftarrow =w_k + \min\limits_ialpha_t(x_i {d(\vec{x},\vec{w_i}- w_k)} </math>,.". whereIf <math>\, dw_j</math> isand the<math>w_k</math> metricand used<math>x_i</math> (have [[Euclideanthe same distance|Euclidean]]class, etc.then )nothing happens.
# Update <math>\vec{w_m}</math>. A better explanation is get <math>\vec{w_m}</math> closer to the input <math>\vec{x}</math>, if <math>\vec{x}</math> and <math>\vec{w_m}</math> belong to the same label and get them further apart if they don't. <br><math> \vec{w_m} \gets \vec{w_m} + \eta \cdot \left( \vec{x} - \vec{w_m} \right) </math> if <math> c_m = y</math> (closer together) <br> or <math> \vec{w_m} \gets \vec{w_m} - \eta \cdot \left( \vec{x} - \vec{w_m} \right) </math> if <math> c_m \neq y</math> (further apart).
# While there are vectors left in <math> L </math> go to step 1, else terminate.
=== LVQ3 ===
Note: <math>\vec{w_i}</math> and <math>\vec{x}</math> are [[vector space|vectors]] in feature space.
[[File:Apollonian_circles.svg|thumb|Some Apollonian circles. Every blue circle intersects every red circle at a right angle. Every red circle passes through the two points ''{{mvar|C, D}}'', and every blue circle separates the two points.]]
Initialize several code vectors per label. Iterate until convergence criteria is reached.
# Sample a datum <math>x_i</math>, and find out two code vectors <math>w_j, w_k</math> closest to it.
== Variants ==
# Let <math>d_j := \|x_i - w_j\|, d_k := \|x_i - w_k\|</math>.
# If <math>\min \left(\frac{d_j}{d_k}, \frac{d_k}{d_j}\right)>s </math>, where <math>s=\frac{1-w}{1+w}</math>, then
* Generalized LVQ (GLVQ) {{cite web |url=https://papers.nips.cc/paper/1995/file/9c3b1830513cc3b8fc4b76635d32e692-Paper.pdf |title=Generalized Learning Vector Quantization |last= |first= |date= |website=nips.cc |publisher=NeurIPS Proceedings |access-date=25 August 2022 |quote=}}
#* If <math>w_j</math> and <math>x_i</math> have the same class, and <math>w_k</math> and <math>x_i</math> have different classes, then <math>w_j \leftarrow w_j + \alpha_t(x_i - w_j)</math> and <math>w_k \leftarrow w_k - \alpha_t(x_i - w_k)</math>.
* Generalized Relevance LVQ (GRLVQ) <ref name="HammerVillmann2002">{{cite journal | last1 = Hammer | first1 = Barbara | last2 = Villmann | first2 = Thomas | title = Generalized relevance learning vector quantization | journal = Neural Networks | date = October 2002 | volume = 15 | issue = 8–9 | pages = 1059–1068 | issn = 0893-6080 | doi = 10.1016/S0893-6080(02)00079-5 | pmid = | url = }}</ref>
#* If <math>w_k</math> and <math>x_i</math> have the same class, and <math>w_j</math> and <math>x_i</math> have different classes, then <math>w_j \leftarrow w_j - \alpha_t(x_i - w_j)</math> and <math>w_k \leftarrow w_k + \alpha_t(x_i - w_k)</math>.
* Generalized Matrix LVQ (GMLVQ) <ref name="SchneiderBiehlHammer2009">{{cite journal | last1 = Schneider | first1 = Petra | last2 = Biehl | first2 = Michael | last3 = Hammer | first3 = Barbara | title = Adaptive Relevance Matrices in Learning Vector Quantization | journal = Neural Computation | date = December 2009 | volume = 21 | issue = 12 | pages = 3532–3561 | issn = 0899-7667 | eissn = 1530-888X | doi = 10.1162/neco.2009.11-08-908 | pmid = 19764875 | url = }}</ref>
#* If <math>w_j</math> and <math>w_k</math> and <math>x_i</math> have the same class, then <math>w_j \leftarrow w_j - \epsilon\alpha_t(x_i - w_j)</math> and <math>w_k \leftarrow w_k + \epsilon\alpha_t(x_i - w_k)</math>.
* Relational GLVQ (RGLVQ) <ref name="HammerSchleifZhu2011">{{cite book | title = Neural Information Processing | last1 = Hammer | first1 = Barbara | last2 = Schleif | first2 = Frank-Michael | last3 = Zhu | first3 = Xibin | chapter = Relational Extensions of Learning Vector Quantization | date = 2011 | pages = 481–489 | publisher = Springer Berlin Heidelberg | issn = 0302-9743 | eissn = 1611-3349 | doi = 10.1007/978-3-642-24958-7_56 | url = }}</ref>
#* If <math>w_k</math> and <math>x_i</math> have different classes, and <math>w_j</math> and <math>x_i</math> have different classes, then the original paper simply does not explain what happens in this case, but presumably nothing happens in this case.
* Median GLVQ (MGLVQ) <ref name="NebelHammerFrohberg2015">{{cite journal | last1 = Nebel | first1 = David | last2 = Hammer | first2 = Barbara | last3 = Frohberg | first3 = Kathleen | last4 = Villmann | first4 = Thomas | title = Median variants of learning vector quantization for learning of dissimilarity data | journal = Neurocomputing | date = December 2015 | volume = 169 | pages = 295–305 | issn = 0925-2312 | doi = 10.1016/j.neucom.2014.12.096 | pmid = | url = }}</ref>
# Otherwise, skip.
* Limited Rank Matrix LVQ (LiRaMLVQ) <ref name="BunteSchneiderHammer2012">{{cite journal | last1 = Bunte | first1 = Kerstin | last2 = Schneider | first2 = Petra | last3 = Hammer | first3 = Barbara | last4 = Schleif | first4 = Frank-Michael | last5 = Villmann | first5 = Thomas | last6 = Biehl | first6 = Michael | title = Limited Rank Matrix Learning, discriminative dimension reduction and visualization | journal = Neural Networks | date = February 2012 | volume = 26 | pages = 159–173 | issn = 0893-6080 | doi = 10.1016/j.neunet.2011.10.001 | pmid = 22041220 | url = }}</ref>
Note that condition <math>\min \left(\frac{d_j}{d_k}, \frac{d_k}{d_j}\right)>s </math>, where <math>s=\frac{1-w}{1+w}</math>, precisely means that the point <math>x_i</math> falls between two [[Apollonian circles|Apollonian spheres]].
* Robust soft LVQ (RSLVQ) <ref name="SeoObermayer2003">{{cite journal | last1 = Seo | first1 = Sambu | last2 = Obermayer | first2 = Klaus | title = Soft Learning Vector Quantization | journal = Neural Computation | date = 1 July 2003 | volume = 15 | issue = 7 | pages = 1589–1604 | issn = 0899-7667 | eissn = 1530-888X | doi = 10.1162/089976603321891819 | pmid = 12816567 | url = }}</ref>
* Border-sensitive GLVQ (BSGLVQ) <ref name="KadenRiedelHermann2014">{{cite journal | last1 = Kaden | first1 = Marika | last2 = Riedel | first2 = Martin | last3 = Hermann | first3 = Wieland | last4 = Villmann | first4 = Thomas | title = Border-sensitive learning in generalized learning vector quantization: an alternative to support vector machines | journal = Soft Computing | date = 20 November 2014 | volume = 19 | issue = 9 | pages = 2423–2434 | issn = 1432-7643 | eissn = 1433-7479 | doi = 10.1007/s00500-014-1496-1 | pmid = | url = }}</ref>
== References ==
== Further reading ==
* {{cite journal |last1=Somervuo |first1=Panu |last2=Kohonen |first2=Teuvo |date=1999 |title=Self-organizing maps and learning vector quantization for feature sequences |journal=Neural Processing Letters |volume=10 |issue=2 |pages=151–159 |doi=10.1023/A:1018741720065}}
* [http://www.cis.hut.fi/panus/papers/dtwsom.pdf Self-Organizing Maps and Learning Vector Quantization for Feature Sequences, Somervuo and Kohonen. 2004] (pdf)
* {{Cite journal |last=Nova |first=David |last2=Estévez |first2=Pablo A. |date=2014-09-01 |title=A review of learning vector quantization classifiers |url=https://link.springer.com/article/10.1007/s00521-013-1535-3 |journal=Neural Computing and Applications |language=en |volume=25 |issue=3 |pages=511–524 |doi=10.1007/s00521-013-1535-3 |issn=1433-3058|arxiv=1509.07093 }}
== External links ==
|