Flow-based generative model: Difference between revisions

Content deleted Content added
Line 129:
: <math>x = F(z_0) = z_T = z_0 + \int_0^t f(z_t, t) dt</math>
 
Wherewhere <math>f</math> is an arbitrary function and can be modeled with e.g. neural networks.
 
The inverse function is then naturally:<ref name="ffjord" />
Line 138:
 
: <math>\log(p(x)) = \log(p(z_0)) - \int_0^t \text{Tr}\left[\frac{\partial f}{\partial z_t} dt\right]</math>
 
Since the trace depends only on the diagonal of the Jacobian <math>\partial_{z_t} f</math>, this allows "free-form" Jacobian.<ref>{{Cite journal |last=Grathwohl |first=Will |last2=Chen |first2=Ricky T. Q. |last3=Bettencourt |first3=Jesse |last4=Sutskever |first4=Ilya |last5=Duvenaud |first5=David |date=2018-10-22 |title=FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models |url=http://arxiv.org/abs/1810.01367 |journal=arXiv:1810.01367 [cs, stat]}}</ref> Here, "free-form" means that there is no restriction on the Jacobian's form. It is contrasted with previous discrete models of normalizing flow, where the Jacobian is carefully designed to be only upper- or lower-diagonal, so that the Jacobian can be evaluated efficiently.
 
The trace can be estimated by "Hutchinson's trick"<ref>{{Cite journal |last=Finlay |first=Chris |last2=Jacobsen |first2=Joern-Henrik |last3=Nurbekyan |first3=Levon |last4=Oberman |first4=Adam |date=2020-11-21 |title=How to Train Your Neural ODE: the World of Jacobian and Kinetic Regularization |url=https://proceedings.mlr.press/v119/finlay20a.html |journal=International Conference on Machine Learning |language=en |publisher=PMLR |pages=3154–3164}}</ref><ref>{{Cite journal |last=Hutchinson |first=M.F. |date=1989-01 |title=A Stochastic Estimator of the Trace of the Influence Matrix for Laplacian Smoothing Splines |url=http://www.tandfonline.com/doi/abs/10.1080/03610918908812806 |journal=Communications in Statistics - Simulation and Computation |language=en |volume=18 |issue=3 |pages=1059–1076 |doi=10.1080/03610918908812806 |issn=0361-0918}}</ref>:<blockquote>Given any matrix <math>W\in \R^{n\times n}</math>, and any random <math>u\in \R^n</math> with <math>E[uu^T] = I</math>, we have <math>E[u^T W u] = tr(W)</math>. (Proof: expand the expectation directly.)</blockquote>Usually, the random vector is sampled from <math>N(0, I)</math> (normal distribution) or <math>\{\pm n^{-1/2}\}^n</math> ([[Rademacher distribution|Radamacher distribution]]).
 
BecauseWhen of<math>f</math> theis useimplemented ofas integration,a techniquesneural such asnetwork, [[Neuralneural ODE]] methods<ref>{{cite arXiv | eprint=1806.07366| last1=Chen| first1=Ricky T. Q.| last2=Rubanova| first2=Yulia| last3=Bettencourt| first3=Jesse| last4=Duvenaud| first4=David| title=Neural Ordinary Differential Equations| year=2018| class=cs.LG}}</ref> maywould be needed. Indeed, CNF was first proposed in practicethe same paper that proposed neural ODE.
 
There are two main deficiencies of CNF, one is that a continuous flow must be a [[homeomorphism]], thus preserve orientation and [[ambient isotopy]] (for example, it's impossible to flip a left-hand to a right-hand by continuous deforming of space, and it's impossible to [[Sphere eversion|turn a sphere inside out]], or undo a knot), and the other is that the learned flow <math>f</math> might be ill-behaved, due to degeneracy (that is, there are an infinite number of possible <math>f</math> that all solve the same problem).
 
By adding extra dimensions, the CNF gains enough freedom to reverse orientation and go beyond ambient isotopy (just like how one can pick up a polygon from a desk and flip it around in 3-space, or unknot a knot in 4-space), yielding the "augmented neural ODE".<ref>{{Cite journal |last=Dupont |first=Emilien |last2=Doucet |first2=Arnaud |last3=Teh |first3=Yee Whye |date=2019 |title=Augmented Neural ODEs |url=https://proceedings.neurips.cc/paper/2019/hash/21be9a4bd4f81549a9d1d241981cec3c-Abstract.html |journal=Advances in Neural Information Processing Systems |publisher=Curran Associates, Inc. |volume=32}}</ref>
 
To regularize the flow <math>f</math>, one can impose regularization losses on <math>\nabla_z f(z, t)</math>. The paper <ref>{{Cite journal |last=Finlay |first=Chris |last2=Jacobsen |first2=Joern-Henrik |last3=Nurbekyan |first3=Levon |last4=Oberman |first4=Adam |date=2020-11-21 |title=How to Train Your Neural ODE: the World of Jacobian and Kinetic Regularization |url=https://proceedings.mlr.press/v119/finlay20a.html |journal=International Conference on Machine Learning |language=en |publisher=PMLR |pages=3154–3164}}</ref> proposed the following regularization loss based on [[Optimal transport|optimal transport theory]].
 
== Applications ==