Talk:Variational autoencoder: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Add topic

Revision as of 19:30, 26 December 2024 edit 46.199.5.20 (talk) →Rating this article C-class ← Previous edit		Latest revision as of 18:57, 27 December 2024 edit undo 82.102.110.228 (talk) →The image shows just a normal autoencoder, not a variational autoencoder: Reply Tag: Reply
(6 intermediate revisions by one other user not shown)
Line 57: :I also found this incredibly confusing. As the prior on z is usually fixed and doesn't depend on any parameter. [[User:EitanPorat\|EitanPorat]] ([[User talk:EitanPorat\|talk]]) 00:16, 19 March 2023 (UTC) ::I see the confusion. p(z) is a probability distribution, but sometimes the same notation is used in conjunction with a parameter set to indicate that actually it is a parameterized function! The article should be cleared up. The encoder should be called q_phi everywhere and the decoder should be called p_theta. The reason is that to optimize the encoder you need gradients that only come from the KL divergence and then you take the derivative of the free energy with regard to the parameters of the encoder. Those gradients update only the encoder parameters. But the encoder also gets the reconstruction gradients from theta! [[Special:Contributions/46.199.5.20\|46.199.5.20]] ([[User talk:46.199.5.20\|talk]]) 19:47, 26 December 2024 (UTC) == The image shows just a normal autoencoder, not a variational autoencoder == Line 65 ⟶ 66: I'm not sure that image should just be removed, or whether it make sense in the section anyway. [[User:Volker Siegel\|Volker Siegel]] ([[User talk:Volker Siegel\|talk]]) 14:18, 24 January 2022 (UTC) :Just to make this point clear: The reparameterization trick is for the gradients! The trick separates the source of randomness to another node in the DAG that does not have any parameters, so that we can propagate gradients through the rest of the DAG that is now a deterministic function. [[Special:Contributions/82.102.110.228\|82.102.110.228]] ([[User talk:82.102.110.228\|talk]]) 18:57, 27 December 2024 (UTC) == This is a highly technical topic == Line 73 ⟶ 76: The architecture section is filled with unclear phrases and undefined terms. For example, "noise distribution", "q-distributions or variational posteriors", "p-distributions", "amortized approach", "which is usually intractable" (what is intractable?), "free energy expression". None of these are defined. It is unclear if this section of the article is useful to anyone who is not already familiar with how variational autoencoders work. [[User:Joshuame13\|Joshuame13]] ([[User talk:Joshuame13\|talk]]) 15:14, 31 January 2023 (UTC) :I've fixed most of those. The free energy really needs its own section. It is a lower bound that is obtained by using Jensen's inequality on the log likelihood. However, I don't think that Jenssen's inequality is within the scope of this article. [[Special:Contributions/46.199.5.20\|46.199.5.20]] ([[User talk:46.199.5.20\|talk]]) 19:50, 26 December 2024 (UTC) == The ELBO section needs more derivation == Line 81 ⟶ 86: :I agree p_theta(z) doesn't make sense. [[User:EitanPorat\|EitanPorat]] ([[User talk:EitanPorat\|talk]]) 00:17, 19 March 2023 (UTC) ::Agreed. It should be p_phi(z) or even better q_phi(z). [[Special:Contributions/46.199.5.20\|46.199.5.20]] ([[User talk:46.199.5.20\|talk]]) 20:22, 26 December 2024 (UTC) == Rating this article C-class == Line 91 ⟶ 97: I hope that this clears things up. We have four variables. The mean and variance of the encoder. And the mean and variance of the decoder. These variables can be multidimensional for the multivariate Gaussian, but they are still four variables. Here are some equations to help you understand: z = mu(x) + sigma(x)epsilon # reparameterization trick x' = MU(z) + SIGMA(z)epsilon And here is the legend: x: input z: sample from the latent, aka sample from the encoder, aka output of mu(x) plus output of sigma(x) with randomness mu, sigma: encoder neural networks MU, SIGMA: decoder neural networks x': output At the end of the day, people have to juggle the interaction of two probability distributions. I doubt that it can be simplified enough for the general populace at this time. [[Special:Contributions/46.199.5.20\|46.199.5.20]] ([[User talk:46.199.5.20\|talk]]) 19:34, 26 December 2024 (UTC)