Revision as of 06:05, 18 July 2022 edit Cosmia Nebula (talk \| contribs) Extended confirmed users 11,296 edits →Continuous Normalizing Flow (CNF): cit Tag: Visual edit ← Previous edit		Revision as of 07:10, 18 July 2022 edit undo Cosmia Nebula (talk \| contribs) Extended confirmed users 11,296 edits →Continuous Normalizing Flow (CNF) Tag: Visual edit Next edit →
Line 129: : <math>x = F(z_0) = z_T = z_0 + \int_0^t f(z_t, t) dt</math> ~~Where~~where <math>f</math> is an arbitrary function and can be modeled with e.g. neural networks. The inverse function is then naturally:<ref name="ffjord" /> Line 138: : <math>\log(p(x)) = \log(p(z_0)) - \int_0^t \text{Tr}\left[\frac{\partial f}{\partial z_t} dt\right]</math> Since the trace depends only on the diagonal of the Jacobian <math>\partial_{z_t} f</math>, this allows "free-form" Jacobian.<ref>{{Cite journal \|last=Grathwohl \|first=Will \|last2=Chen \|first2=Ricky T. Q. \|last3=Bettencourt \|first3=Jesse \|last4=Sutskever \|first4=Ilya \|last5=Duvenaud \|first5=David \|date=2018-10-22 \|title=FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models \|url=http://arxiv.org/abs/1810.01367 \|journal=arXiv:1810.01367 [cs, stat]}}</ref> Here, "free-form" means that there is no restriction on the Jacobian's form. It is contrasted with previous discrete models of normalizing flow, where the Jacobian is carefully designed to be only upper- or lower-diagonal, so that the Jacobian can be evaluated efficiently. The trace can be estimated by "Hutchinson's trick"<ref>{{Cite journal \|last=Finlay \|first=Chris \|last2=Jacobsen \|first2=Joern-Henrik \|last3=Nurbekyan \|first3=Levon \|last4=Oberman \|first4=Adam \|date=2020-11-21 \|title=How to Train Your Neural ODE: the World of Jacobian and Kinetic Regularization \|url=https://proceedings.mlr.press/v119/finlay20a.html \|journal=International Conference on Machine Learning \|language=en \|publisher=PMLR \|pages=3154–3164}}</ref><ref>{{Cite journal \|last=Hutchinson \|first=M.F. \|date=1989-01 \|title=A Stochastic Estimator of the Trace of the Influence Matrix for Laplacian Smoothing Splines \|url=http://www.tandfonline.com/doi/abs/10.1080/03610918908812806 \|journal=Communications in Statistics - Simulation and Computation \|language=en \|volume=18 \|issue=3 \|pages=1059–1076 \|doi=10.1080/03610918908812806 \|issn=0361-0918}}</ref>:<blockquote>Given any matrix <math>W\in \R^{n\times n}</math>, and any random <math>u\in \R^n</math> with <math>E[uu^T] = I</math>, we have <math>E[u^T W u] = tr(W)</math>. (Proof: expand the expectation directly.)</blockquote>Usually, the random vector is sampled from <math>N(0, I)</math> (normal distribution) or <math>\{\pm n^{-1/2}\}^n</math> ([[Rademacher distribution\|Radamacher distribution]]). ~~Because~~When of<math>f</math> ~~the~~is ~~use~~implemented ofas ~~integration,~~a ~~techniques~~neural ~~such as~~network, [[~~Neural~~neural ODE]] methods<ref>{{cite arXiv \| eprint=1806.07366\| last1=Chen\| first1=Ricky T. Q.\| last2=Rubanova\| first2=Yulia\| last3=Bettencourt\| first3=Jesse\| last4=Duvenaud\| first4=David\| title=Neural Ordinary Differential Equations\| year=2018\| class=cs.LG}}</ref> ~~may~~would be needed. Indeed, CNF was first proposed in ~~practice~~the same paper that proposed neural ODE. There are two main deficiencies of CNF, one is that a continuous flow must be a [[homeomorphism]], thus preserve orientation and [[ambient isotopy]] (for example, it's impossible to flip a left-hand to a right-hand by continuous deforming of space, and it's impossible to [[Sphere eversion\|turn a sphere inside out]], or undo a knot), and the other is that the learned flow <math>f</math> might be ill-behaved, due to degeneracy (that is, there are an infinite number of possible <math>f</math> that all solve the same problem). By adding extra dimensions, the CNF gains enough freedom to reverse orientation and go beyond ambient isotopy (just like how one can pick up a polygon from a desk and flip it around in 3-space, or unknot a knot in 4-space), yielding the "augmented neural ODE".<ref>{{Cite journal \|last=Dupont \|first=Emilien \|last2=Doucet \|first2=Arnaud \|last3=Teh \|first3=Yee Whye \|date=2019 \|title=Augmented Neural ODEs \|url=https://proceedings.neurips.cc/paper/2019/hash/21be9a4bd4f81549a9d1d241981cec3c-Abstract.html \|journal=Advances in Neural Information Processing Systems \|publisher=Curran Associates, Inc. \|volume=32}}</ref> To regularize the flow <math>f</math>, one can impose regularization losses on <math>\nabla_z f(z, t)</math>. The paper <ref>{{Cite journal \|last=Finlay \|first=Chris \|last2=Jacobsen \|first2=Joern-Henrik \|last3=Nurbekyan \|first3=Levon \|last4=Oberman \|first4=Adam \|date=2020-11-21 \|title=How to Train Your Neural ODE: the World of Jacobian and Kinetic Regularization \|url=https://proceedings.mlr.press/v119/finlay20a.html \|journal=International Conference on Machine Learning \|language=en \|publisher=PMLR \|pages=3154–3164}}</ref> proposed the following regularization loss based on [[Optimal transport\|optimal transport theory]]. == Applications ==

Flow-based generative model: Difference between revisions