Revision as of 10:50, 13 April 2017 edit Jonas August (talk \| contribs) 56 edits m →Structure ← Previous edit		Revision as of 09:53, 16 April 2017 edit undo 89.210.225.160 (talk) →Variational autoencoder (VAE) Tag: extraneous markup Next edit →
Line 66: Sparsity may be achieved by additional terms in the [[loss function]] during training (by [[Kullback–Leibler divergence\|comparing the probability distribution]] of the hidden unit activations with some low desired value),<ref>{{citation\|title=sparse autoencoders\|url=https://web.stanford.edu/class/cs294a/sparseAutoencoder.pdf}}</ref> or by manually zeroing all but the few strongest hidden unit activations (referred to as a ''k-sparse autoencoder'').<ref>{{citation\|title=k-sparse autoencoder\|arxiv=1312.5663}}</ref> ''Italic text''====Variational autoencoder (VAE)==== Variational autoencoder models inherit autoencoder architecture, but make strong assumptions concerning the distribution of latent variables. They use [[Variational Bayesian methods\|variational approach]] for latent representation learning, which results in an additional loss component and specific training algorithm called ''Stochastic Gradient Variational Bayes (SGVB)''.<ref name="VAE" /> It assumes that the data is generated by a directed [[graphical model]] <math>p(\mathbf{x}\|\mathbf{z})</math> and that the encoder is learning an approximation <math>q_{\phi}(\mathbf{z}\|\mathbf{x})</math> to the [[Posterior probability\|posterior distribution]] <math>p_{\theta}(\mathbf{z}\|\mathbf{x})</math> where <math>\mathbf{\phi}</math> and <math>\mathbf{\theta}</math> denote the parameters of the encoder (recognition model) and decoder (generative model) respectively. The objective of the variational autoencoder in this case has the following form: :<math>\mathcal{L}(\mathbf{\phi},\mathbf{\theta},\mathbf{x})=D_{KL}(q_{\phi}(\mathbf{z}\|\mathbf{x})\|\|p_{\theta}(\mathbf{z}))-\mathbb{E}_{q_{\phi}(\mathbf{z}\|\mathbf{x})}\big(\log p_{\theta}(\mathbf{x}\|\mathbf{z})\big)</math> Here, <math>D_{KL}</math> stands for the [[Kullback–Leibler divergence]]. The prior over the latent variables is usually set to be the centred isotropic multivariate Gaussian <math>p_{\theta}(\mathbf{z})=\mathcal{N}(\mathbf{0,I})</math>; however, alternative configurations have also been recently considered, e.g. <ref> Harris Partaourides and Sotirios P. Chatzis, “Asymmetric Deep Generative Models,” Neurocomputing, vol. 241, pp. 90-96, June 2017. [http://www.sciencedirect.com/science/article/pii/S0925231217302989] </ref> ==== Contractive autoencoder (CAE) ====

Autoencoder: Difference between revisions