Learning vector quantization: Difference between revisions

Content deleted Content added
Undid revision 1106625313 by MrOllie (talk)
Tag: Reverted
OAbot (talk | contribs)
m Open access bot: url-access updated in citation with #oabot.
 
(18 intermediate revisions by 7 users not shown)
Line 1:
In [[computer science]], '''learning vector quantization''' ('''LVQ''') is a [[prototype|prototype-based]] [[supervised learning|supervised]] [[Statistical classification|classification]] [[algorithm]]. LVQ is the supervised counterpart of [[vector quantization]] systems. LVQ can be understood as a special case of an [[artificial neural network]], more precisely, it applies a [[winner-take-all (computing)|winner-take-all]] [[Hebbian learning]]-based approach. It is a precursor to [[self-organizing map]]s (SOM) and related to [[neural gas]] and the [[k-nearest neighbor algorithm]] (k-NN). LVQ was invented by [[Teuvo Kohonen]].<ref>T. Kohonen. Self-Organizing Maps. Springer, Berlin, 1997.</ref>
 
== Overview ==
LVQ can be understood as a special case of an [[artificial neural network]], more precisely, it applies a [[winner-take-all (computing)|winner-take-all]] [[Hebbian learning]]-based approach. It is a precursor to [[self-organizing map]]s (SOM) and related to [[neural gas]], and to the [[k-nearest neighbor algorithm]] (k-NN). LVQ was invented by [[Teuvo Kohonen]].<ref>T. Kohonen. Self-Organizing Maps. Springer, Berlin, 1997.</ref>
 
== Definition ==
An LVQ system is represented by prototypes <math>W=(w(i),...,w(n))</math> which are defined in the [[feature space]] of observed data. In winner-take-all training algorithms one determines, for each data point, the prototype which is closest to the input according to a given distance measure. The position of this so-called winner prototype is then adapted, i.e. the winner is moved closer if it correctly classifies the data point or moved away if it classifies the data point incorrectly.
 
Line 9 ⟶ 7:
LVQ systems can be applied to [[multi-class classification]] problems in a natural way.
 
A key issue in LVQ is the choice of an appropriate measure of distance or similarity for training and classification. Recently, techniques have been developed which adapt a parameterized distance measure in the course of training the system, see e.g. (Schneider, Biehl, and Hammer, 2009)<ref>{{cite journal|authorsauthor1=P. Schneider, |author2=B. Hammer, and |author3=M. Biehl |title=Adaptive Relevance Matrices in Learning Vector Quantization|journal= Neural Computation|volume=21|issue=10|pages=3532–3561|year=2009|doi=10.1162/neco.2009.10-08-892|pmid=19635012|citeseerx=10.1.1.216.1183|s2cid=17306078}}</ref> and references therein.
 
LVQ can be a source of great help in classifying text documents.{{Citation needed|date=December 2019|reason=removed citation to predatory publisher content}}
 
== Algorithm ==
The algorithms are presented as in.<ref>{{Citation |last=Kohonen |first=Teuvo |title=Learning Vector Quantization |date=2001 |work=Self-Organizing Maps |volume=30 |pages=245–261 |url=http://link.springer.com/10.1007/978-3-642-56927-2_6 |place=Berlin, Heidelberg |publisher=Springer Berlin Heidelberg |doi=10.1007/978-3-642-56927-2_6 |isbn=978-3-540-67921-9|url-access=subscription }}</ref>
Below follows an informal description.<br>
 
The algorithm consists of three basic steps. The algorithm's input is:
Set up:
* how many neurons the system will have <math>M</math> (in the simplest case it is equal to the number of classes)
 
* what weight each neuron has <math>\vec{w_i}</math> for <math>i = 0,1,...,M - 1 </math>
* Let the correspondingdata labelbe denoted by <math>c_ix_i \in \R^D</math>, toand eachtheir neuroncorresponding labels by <math>y_i \in \vec{w_i}1, 2, \dots, C\}</math>.
* howThe fastcomplete thedataset neurons are learningis <math>\{(x_i, y_i)\eta }_{i=1}^N</math>.
* The set of code vectors is <math>w_j \in \R^D</math>.
* and an input list <math> L </math> containing all the vectors of which the labels are known already (training set).
* The learning rate at iteration step <math>t</math> is denoted by <math>\alpha_t</math>.
* The hyperparameters <math>w</math> and <math>\epsilon</math> are used by LVQ2 and LVQ3. The original paper suggests <math>\epsilon \in [0.1, 0.5]</math> and <math>w \in [0.2, 0.3]</math>.
 
=== LVQ1 ===
Initialize several code vectors per label. Iterate until convergence criteria is reached.
 
# Sample a datum <math>x_i</math>, and find out the code vector <math>w_j</math>, such that <math>x_i</math> falls within the [[Voronoi diagram|Voronoi cell]] of <math>w_j</math>.
# If its label <math>y_i</math> is the same as that of <math>w_j</math>, then <math>w_j \leftarrow w_j + \alpha_t(x_i - w_j)</math>, otherwise, <math>w_j \leftarrow w_j - \alpha_t(x_i - w_j)</math>.
 
*=== LVQ2.1 ===
The algorithm's flow is:
#LVQ2 Foris nextthe inputsame as LVQ3, but with this sentence removed: "If <math>\vec{x}w_j</math> (with labeland <math>yw_k</math>) inand <math> L x_i</math> findhave the closestsame neuronclass, then <math>w_j \vec{w_m}leftarrow w_j - \alpha_t(x_i - w_j)</math>, <br>i.e. and <math>d(w_k \vec{x},\vec{w_m})leftarrow =w_k + \min\limits_ialpha_t(x_i {d(\vec{x},\vec{w_i}- w_k)} </math>,.". whereIf <math>\, dw_j</math> isand the<math>w_k</math> metricand used<math>x_i</math> (have [[Euclideanthe same distance|Euclidean]]class, etc.then )nothing happens.
# Update <math>\vec{w_m}</math>. A better explanation is get <math>\vec{w_m}</math> closer to the input <math>\vec{x}</math>, if <math>\vec{x}</math> and <math>\vec{w_m}</math> belong to the same label and get them further apart if they don't. <br><math> \vec{w_m} \gets \vec{w_m} + \eta \cdot \left( \vec{x} - \vec{w_m} \right) </math> if <math> c_m = y</math> (closer together) <br> or <math> \vec{w_m} \gets \vec{w_m} - \eta \cdot \left( \vec{x} - \vec{w_m} \right) </math> if <math> c_m \neq y</math> (further apart).
# While there are vectors left in <math> L </math> go to step 1, else terminate.
 
=== LVQ3 ===
Note: <math>\vec{w_i}</math> and <math>\vec{x}</math> are [[vector space|vectors]] in feature space.
[[File:Apollonian_circles.svg|thumb|Some Apollonian circles. Every blue circle intersects every red circle at a right angle. Every red circle passes through the two points ''{{mvar|C, D}}'', and every blue circle separates the two points.]]
Initialize several code vectors per label. Iterate until convergence criteria is reached.
 
# Sample a datum <math>x_i</math>, and find out two code vectors <math>w_j, w_k</math> closest to it.
== Variants ==
# Let <math>d_j := \|x_i - w_j\|, d_k := \|x_i - w_k\|</math>.
* LVQ2.1
# If <math>\min \left(\frac{d_j}{d_k}, \frac{d_k}{d_j}\right)>s </math>, where <math>s=\frac{1-w}{1+w}</math>, then
* Generalized LVQ (GLVQ) {{cite web |url=https://papers.nips.cc/paper/1995/file/9c3b1830513cc3b8fc4b76635d32e692-Paper.pdf |title=Generalized Learning Vector Quantization |last= |first= |date= |website=nips.cc |publisher=NeurIPS Proceedings |access-date=25 August 2022 |quote=}}
#* If <math>w_j</math> and <math>x_i</math> have the same class, and <math>w_k</math> and <math>x_i</math> have different classes, then <math>w_j \leftarrow w_j + \alpha_t(x_i - w_j)</math> and <math>w_k \leftarrow w_k - \alpha_t(x_i - w_k)</math>.
* Generalized Relevance LVQ (GRLVQ) <ref name="HammerVillmann2002">{{cite journal | last1 = Hammer | first1 = Barbara | last2 = Villmann | first2 = Thomas | title = Generalized relevance learning vector quantization | journal = Neural Networks | date = October 2002 | volume = 15 | issue = 8–9 | pages = 1059–1068 | issn = 0893-6080 | doi = 10.1016/S0893-6080(02)00079-5 | pmid = | url = }}</ref>
#* If <math>w_k</math> and <math>x_i</math> have the same class, and <math>w_j</math> and <math>x_i</math> have different classes, then <math>w_j \leftarrow w_j - \alpha_t(x_i - w_j)</math> and <math>w_k \leftarrow w_k + \alpha_t(x_i - w_k)</math>.
* Generalized Matrix LVQ (GMLVQ) <ref name="SchneiderBiehlHammer2009">{{cite journal | last1 = Schneider | first1 = Petra | last2 = Biehl | first2 = Michael | last3 = Hammer | first3 = Barbara | title = Adaptive Relevance Matrices in Learning Vector Quantization | journal = Neural Computation | date = December 2009 | volume = 21 | issue = 12 | pages = 3532–3561 | issn = 0899-7667 | eissn = 1530-888X | doi = 10.1162/neco.2009.11-08-908 | pmid = 19764875 | url = }}</ref>
#* If <math>w_j</math> and <math>w_k</math> and <math>x_i</math> have the same class, then <math>w_j \leftarrow w_j - \epsilon\alpha_t(x_i - w_j)</math> and <math>w_k \leftarrow w_k + \epsilon\alpha_t(x_i - w_k)</math>.
* Relational GLVQ (RGLVQ) <ref name="HammerSchleifZhu2011">{{cite book | title = Neural Information Processing | last1 = Hammer | first1 = Barbara | last2 = Schleif | first2 = Frank-Michael | last3 = Zhu | first3 = Xibin | chapter = Relational Extensions of Learning Vector Quantization | date = 2011 | pages = 481–489 | publisher = Springer Berlin Heidelberg | issn = 0302-9743 | eissn = 1611-3349 | doi = 10.1007/978-3-642-24958-7_56 | url = }}</ref>
#* If <math>w_k</math> and <math>x_i</math> have different classes, and <math>w_j</math> and <math>x_i</math> have different classes, then the original paper simply does not explain what happens in this case, but presumably nothing happens in this case.
* Median GLVQ (MGLVQ) <ref name="NebelHammerFrohberg2015">{{cite journal | last1 = Nebel | first1 = David | last2 = Hammer | first2 = Barbara | last3 = Frohberg | first3 = Kathleen | last4 = Villmann | first4 = Thomas | title = Median variants of learning vector quantization for learning of dissimilarity data | journal = Neurocomputing | date = December 2015 | volume = 169 | pages = 295–305 | issn = 0925-2312 | doi = 10.1016/j.neucom.2014.12.096 | pmid = | url = }}</ref>
# Otherwise, skip.
* Limited Rank Matrix LVQ (LiRaMLVQ) <ref name="BunteSchneiderHammer2012">{{cite journal | last1 = Bunte | first1 = Kerstin | last2 = Schneider | first2 = Petra | last3 = Hammer | first3 = Barbara | last4 = Schleif | first4 = Frank-Michael | last5 = Villmann | first5 = Thomas | last6 = Biehl | first6 = Michael | title = Limited Rank Matrix Learning, discriminative dimension reduction and visualization | journal = Neural Networks | date = February 2012 | volume = 26 | pages = 159–173 | issn = 0893-6080 | doi = 10.1016/j.neunet.2011.10.001 | pmid = 22041220 | url = }}</ref>
Note that condition <math>\min \left(\frac{d_j}{d_k}, \frac{d_k}{d_j}\right)>s </math>, where <math>s=\frac{1-w}{1+w}</math>, precisely means that the point <math>x_i</math> falls between two [[Apollonian circles|Apollonian spheres]].
* Robust soft LVQ (RSLVQ) <ref name="SeoObermayer2003">{{cite journal | last1 = Seo | first1 = Sambu | last2 = Obermayer | first2 = Klaus | title = Soft Learning Vector Quantization | journal = Neural Computation | date = 1 July 2003 | volume = 15 | issue = 7 | pages = 1589–1604 | issn = 0899-7667 | eissn = 1530-888X | doi = 10.1162/089976603321891819 | pmid = 12816567 | url = }}</ref>
* Border-sensitive GLVQ (BSGLVQ) <ref name="KadenRiedelHermann2014">{{cite journal | last1 = Kaden | first1 = Marika | last2 = Riedel | first2 = Martin | last3 = Hermann | first3 = Wieland | last4 = Villmann | first4 = Thomas | title = Border-sensitive learning in generalized learning vector quantization: an alternative to support vector machines | journal = Soft Computing | date = 20 November 2014 | volume = 19 | issue = 9 | pages = 2423–2434 | issn = 1432-7643 | eissn = 1433-7479 | doi = 10.1007/s00500-014-1496-1 | pmid = | url = }}</ref>
 
== References ==
Line 44 ⟶ 49:
 
== Further reading ==
* {{cite journal |last1=Somervuo |first1=Panu |last2=Kohonen |first2=Teuvo |date=1999 |title=Self-organizing maps and learning vector quantization for feature sequences |journal=Neural Processing Letters |volume=10 |issue=2 |pages=151–159 |doi=10.1023/A:1018741720065}}
* [http://www.cis.hut.fi/panus/papers/dtwsom.pdf Self-Organizing Maps and Learning Vector Quantization for Feature Sequences, Somervuo and Kohonen. 2004] (pdf)
* {{Cite journal |last=Nova |first=David |last2=Estévez |first2=Pablo A. |date=2014-09-01 |title=A review of learning vector quantization classifiers |url=https://link.springer.com/article/10.1007/s00521-013-1535-3 |journal=Neural Computing and Applications |language=en |volume=25 |issue=3 |pages=511–524 |doi=10.1007/s00521-013-1535-3 |issn=1433-3058|arxiv=1509.07093 }}
 
== External links ==