Autoencoder: Difference between revisions

Content deleted Content added
Line 79:
 
===Relationship with principal component analysis (PCA)===
If linear activations are used, or only a single sigmoid hidden layer, then the optimal solution to an autoencoder is strongly related to [[principal component analysis]] (PCA).<ref>{{Cite journal| last1 = Bourlard | first1 = H. | last2 = Kamp | first2 = Y. | doi = 10.1007/BF00332918 | title = Auto-association by multilayer perceptrons and singular value decomposition | journal = Biological Cybernetics | volume = 59 | issue = 4–5 | pages = 291–294 | year = 1988 | pmid = 3196773| pmc = }}</ref><ref>{{cite book|doi=10.1145/2649387.2649442|chapter=Deep autoencoder neural networks for gene ontology annotation predictions|title=Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics - BCB '14|pages=533|year=2014|last1=Chicco|first1=Davide|last2=Sadowski|first2=Peter|last3=Baldi|first3=Pierre|isbn=9781450328944}}</ref> The weights of an autoencoder with a single hidden layer of size <math>p</math> (where <math>p</math> is less than the size of the input) span the same vector subspace as the one spanned by the first <math>p</math> principal components, and the output of the autoencoder is an orthogonal projection onto this subspace. The autoencoder weights are not equal to the principal components, and are generally not orthogonal, yet the principal components may be recovered from them using the [[Gram-Schmidt process]].<ref>{{cite arxiv|last1=Plaut|first1=E|title=From Principal Subspaces to Principal Components with Linear Autoencoders|eprint=1804.10253|date=2018|class=stat.ML}}</ref>
 
==Training==