Content deleted Content added
added prepositions |
Citation bot (talk | contribs) Removed URL that duplicated identifier. | Use this bot. Report bugs. | Suggested by Headbomb | Linked from Wikipedia:WikiProject_Academic_Journals/Journals_cited_by_Wikipedia/Sandbox | #UCB_webform_linked 787/967 |
||
(11 intermediate revisions by 10 users not shown) | |||
Line 1:
{{Short description|
{{Machine learning|Paradigms}}
'''Unsupervised learning''' is a framework in [[machine learning]] where, in contrast to [[supervised learning]], algorithms learn patterns exclusively from unlabeled data.<ref name="WeiWu">{{Cite web |last=Wu |first=Wei |title=Unsupervised Learning |url=https://na.uni-tuebingen.de/ex/ml_seminar_ss2022/Unsupervised_Learning%20Final.pdf |access-date=26 April 2024 |archive-date=14 April 2024 |archive-url=https://web.archive.org/web/20240414213810/https://na.uni-tuebingen.de/ex/ml_seminar_ss2022/Unsupervised_Learning%20Final.pdf |url-status=live }}</ref> Other frameworks in the spectrum of supervisions include [[
Conceptually, unsupervised learning divides into the aspects of data, training, algorithm, and downstream applications. Typically, the dataset is harvested cheaply "in the wild", such as massive [[text corpus]] obtained by [[Web crawler|web crawling]], with only minor filtering (such as [[Common Crawl]]). This compares favorably to supervised learning, where the dataset (such as the [[ImageNet|ImageNet1000]]) is typically constructed manually, which is much more expensive.
There were algorithms designed specifically for unsupervised learning, such as [[Cluster analysis|clustering algorithms]] like [[K-means clustering|k-means]], [[dimensionality reduction]] techniques like [[Principal component analysis|principal component analysis (PCA)]], [[Boltzmann machine|Boltzmann machine learning]], and [[
Sometimes a trained model can be used as-is, but more often they are modified for downstream applications. For example, the generative pretraining method trains a model to generate a textual dataset, before finetuning it for other applications, such as text classification.<ref name="gpt1paper">{{cite web |last1=Radford |first1=Alec |last2=Narasimhan |first2=Karthik |last3=Salimans |first3=Tim |last4=Sutskever |first4=Ilya |date=11 June 2018 |title=Improving Language Understanding by Generative Pre-Training |url=https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf |url-status=live |archive-url=https://web.archive.org/web/20210126024542/https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf |archive-date=26 January 2021 |access-date=23 January 2021 |publisher=[[OpenAI]] |page=12}}</ref><ref>{{Cite journal |
== Tasks ==
[[File:Task-guidance.png|thumb|left|300px|Tendency for a task to employ supervised vs. unsupervised methods. Task names straddling circle boundaries is intentional. It shows that the classical division of imaginative tasks (left) employing unsupervised methods is blurred in today's learning schemes.]]Tasks are often categorized as [[Discriminative model|discriminative]] (recognition) or [[Generative model|generative]] (imagination). Often but not always, discriminative tasks use supervised methods and generative tasks use unsupervised (see [[Venn diagram]]); however, the separation is very hazy. For example, object recognition favors supervised learning but unsupervised learning can also cluster objects into groups. Furthermore, as progress marches onward, some tasks employ both methods, and some tasks swing from one to another. For example, image recognition started off as heavily supervised, but became hybrid by employing unsupervised pre-training, and then moved towards supervision again with the advent of [[
A typical generative task is as follows. At each step, a datapoint is sampled from the dataset, and part of the data is removed, and the model must infer the removed part. This is particularly clear for the [[Autoencoder|denoising autoencoders]] and [[BERT (language model)|BERT]].
Line 23:
=== Energy ===
An energy function is a macroscopic measure of a network's activation state. In Boltzmann machines, it plays the role of the Cost function. This analogy with physics is inspired by Ludwig Boltzmann's analysis of a gas' macroscopic energy from the microscopic probabilities of particle motion <math>p \propto e^{-E/kT}</math>, where k is the Boltzmann constant and T is temperature. In the [[
=== Networks ===
Line 35:
|| [[File:Boltzmannexamplev1.png |thumb|Network is separated into 2 layers (hidden vs. visible), but still using symmetric 2-way weights. Following Boltzmann's thermodynamics, individual probabilities give rise to macroscopic energies.]]
|| [[File:Restricted Boltzmann machine.svg|thumb|Restricted Boltzmann Machine. This is a Boltzmann machine where lateral connections within a layer are prohibited to make analysis tractable.]]
|| [[File:Stacked-boltzmann.png|thumb|
|}
Line 44:
|| [[File:Helmholtz Machine.png |thumb|Instead of the bidirectional symmetric connection of the stacked Boltzmann machines, we have separate one-way connections to form a loop. It does both generation and discrimination.]]
|| [[File:Autoencoder_schema.png |thumb|A feed forward network that aims to find a good middle layer representation of its input world. This network is deterministic, so it is not as robust as its successor the VAE.]]
|| [[File:VAE blocks.png |thumb|Applies Variational Inference to the Autoencoder. The middle layer is a set of means & variances for Gaussian distributions. The stochastic nature allows for more robust imagination than the deterministic autoencoder.
|}
Line 55:
| 1974 || Ising magnetic model proposed by {{ill|William A. Little (physicist)|lt=WA Little|de|William A. Little}} for cognition
|-
| 1980 || [[Kunihiko Fukushima]] introduces the [[neocognitron]], which is later called a [[convolutional neural network]]. It is mostly used in SL, but deserves a mention here.
|-
| 1982 || Ising variant Hopfield net described as [[Content-
|-
| 1983 || Ising variant Boltzmann machine with probabilistic neurons described by [[Geoffrey Hinton|Hinton]] & [[Terry Sejnowski|Sejnowski]] following Sherington & Kirkpatrick's 1975 work.
Line 63:
| 1986 || [[Paul Smolensky]] publishes Harmony Theory, which is an RBM with practically the same Boltzmann energy function. Smolensky did not give a practical training scheme. Hinton did in mid-2000s.
|-
| 1995 || Schmidthuber introduces the [[
|-
| 1995 || Dayan & Hinton introduces Helmholtz machine
Line 72:
=== Specific Networks ===
Here, we highlight some characteristics of select networks. The details of each are given in the comparison table below.
{{glossary}}
Line 88:
{{term |1=[[Helmholtz machine]]}}
{{defn |1=These are early inspirations for the Variational Auto Encoders. Its 2 networks combined into one—forward weights operates recognition and backward weights implements imagination. It is perhaps the first network to do both. Helmholtz did not work in machine learning but he inspired the view of "statistical inference engine whose function is to infer probable causes of sensory input".<ref name=
{{term |1=[[Variational autoencoder]]}}
Line 124:
== Probabilistic methods ==
Two of the main methods used in unsupervised learning are [[Principal component analysis|principal component]] and [[cluster analysis]].
A central application of unsupervised learning is in the field of [[density estimation]] in [[statistics]],<ref name="JordanBishop2004" /> though unsupervised learning encompasses many other domains involving summarizing and explaining data features. It can be contrasted with supervised learning by saying that whereas supervised learning intends to infer a [[conditional probability distribution]] conditioned on the label of input data; unsupervised learning intends to infer an [[a priori probability]] distribution .
Line 133:
* [[Data clustering|Clustering]] methods include: [[hierarchical clustering]],<ref name="Hastie" /> [[k-means]],<ref name="tds-kmeans" /> [[mixture models]], [[model-based clustering]], [[DBSCAN]], and [[OPTICS algorithm]]
* [[Anomaly detection]] methods include: [[Local Outlier Factor]], and [[Isolation Forest]]
* Approaches for learning [[latent variable model]]s such as [[Expectation–maximization algorithm]] (EM), [[Method of moments (statistics)|Method of moments]], and [[Blind signal separation]] techniques (
=== Method of moments ===
Line 166:
<ref name="Carpenter" >{{cite journal|author1=Carpenter, G.A.|author2=Grossberg, S.|name-list-style=amp|year=1988|title=The ART of adaptive pattern recognition by a self-organizing neural network|journal=Computer|volume=21|issue=3|pages=77–88|url=http://www.cns.bu.edu/Profiles/Grossberg/CarGro1988Computer.pdf|doi=10.1109/2.33|s2cid=14625094|access-date=2013-09-16|archive-date=2018-05-16|archive-url=https://web.archive.org/web/20180516131553/http://www.cns.bu.edu/Profiles/Grossberg/CarGro1988Computer.pdf|url-status=dead}}</ref>
<ref name="Hinton2010" >{{cite book |last=Hinton |first=G. |date=2012 |chapter=A Practical Guide to Training Restricted Boltzmann Machines |chapter-url=http://www.cs.utoronto.ca/~hinton/absps/guideTR.pdf |publisher=Springer |title=Neural Networks: Tricks of the Trade |series=Lecture Notes in Computer Science |volume=7700 |pages=599–619 |doi=10.1007/978-3-642-35289-8_32 |isbn=978-3-642-35289-8 |access-date=2022-11-03 |archive-date=2022-09-03 |archive-url=https://web.archive.org/web/20220903215809/http://www.cs.utoronto.ca/~hinton/absps/guideTR.pdf |url-status=live }}</ref>
<ref name="HintonMlss2009" >{{cite web
}}
|