Flow-based generative model

A flow-based generative model is a generative model used in machine learning that explicitly models a probability distribution by leveraging normalizing flow,^[1] which is a statistical method using the change-of-variable law of probabilities to transform a simple distribution into a complex one.

The direct modeling of likelihood provides many advantages. For example, the negative log-likelihood can be directly computed and minimized as the loss function. Additionally, novel samples can be generated by sampling from the initial distribution, and applying the flow transformation.

In contrast, many alternative generative modeling methods such as variational autoencoder (VAE) and generative adversarial network do not explicitly represent the likelihood function.

Method

Scheme for normalizing flows

Let $z_{0}$ be a (possibly multivariate) random variable with distribution $p_{0}(z_{0})$ .

For $i=1,...,K$ , let $z_{i}=f_{i}(z_{i-1})$ be a sequence of random variables transformed from $z_{0}$ . The functions $f_{1},...,f_{K}$ should be invertible, i.e. the inverse function $f_{i}^{-1}$ exists. The final output $z_{K}$ models the target distribution.

The log likelihood of $z_{K}$ is (see derivation):

\log p_{K}(z_{K})=\log p_{0}(z_{0})-\sum _{i=1}^{K}\log \left|\det {\frac {df_{i}(z_{i-1})}{dz_{i-1}}}\right|

To efficiently compute the log likelihood, the functions $f_{1},...,f_{K}$ should be 1. easy to invert, and 2. easy to compute the determinant of its Jacobian. In practice, the functions $f_{1},...,f_{K}$ are modeled using deep neural networks, and are trained to minimize the negative log-likelihood of data samples from the target distribution. These architectures are usually designed such that only the forward pass of the neural network is required in both the inverse and the Jacobian determinant calculations. Examples of such architectures include NICE,^[2] RealNVP,^[3] and Glow.^[4]

Derivation of log likelihood

Consider $z_{1}$ and $z_{0}$ . Note that $z_{0}=f_{1}^{-1}(z_{1})$ .

By the change of variable formula, the distribution of $z_{1}$ is:

p_{1}(z_{1})=p_{0}(z_{0})\left|\det {\frac {df_{1}^{-1}(z_{1})}{dz_{1}}}\right|

Where $\det {\frac {df_{1}^{-1}(z_{1})}{dz_{1}}}$ is the determinant of the Jacobian matrix of $f_{1}^{-1}$ .

By the inverse function theorem:

p_{1}(z_{1})=p_{0}(z_{0})\left|\det \left({\frac {df_{1}(z_{0})}{dz_{0}}}\right)^{-1}\right|

By the identity $\det(A^{-1})=\det(A)^{-1}$ (where $A$ is an invertible matrix), we have:

p_{1}(z_{1})=p_{0}(z_{0})\left|\det {\frac {df_{1}(z_{0})}{dz_{0}}}\right|^{-1}

The log likelihood is thus:

\log p_{1}(z_{1})=\log p_{0}(z_{0})-\log \left|\det {\frac {df_{1}(z_{0})}{dz_{0}}}\right|

In general, the above applies to any $z_{i}$ and $z_{i-1}$ . Since $\log p_{i}(z_{i})$ is equal to $\log p_{i-1}(z_{i-1})$ subtracted by a non-recursive term, we can infer by induction that:

\log p_{K}(z_{K})=\log p_{0}(z_{0})-\sum _{i=1}^{K}\log \left|\det {\frac {df_{i}(z_{i-1})}{dz_{i-1}}}\right|

Training method

Flow-based models are generally trained by maximum likelihood. A pseudocode is as follows:^[5]

INPUT. dataset $x_{1:n}$ , normalizing flow model $f_{\theta }(\cdot ),q_{0}$ .
SOLVE. $\max _{\theta }\sum _{j}\ln p_{\theta }(x_{j})$ by gradient descent
RETURN. ${\hat {\theta }}$

Variants

Continuous Normalizing Flow (CNF)

Instead of constructing flow by function composition, another approach is to formulate the flow as a continuous-time dynamic.^[6] Let $z_{0}$ be the latent variable with distribution $p(z_{0})$ . Map this latent variable to data space with the following flow function:

x=F(z_{0})=z_{T}=z_{0}+\int _{0}^{t}f(z_{t},t)dt

Where $f$ is an arbitrary function and can be modeled with e.g. neural networks.

The inverse function is then naturally:^[6]

z_{0}=F^{-1}(x)=z_{T}+\int _{t}^{0}-f(z_{t},t)dt

And the log-likelihood of $x$ can be found as:^[6]

\log(p(x))=\log(p(z_{0}))-\int _{0}^{t}{\text{Tr}}\left[{\frac {\partial f}{\partial z_{t}}}dt\right]

Because of the use of integration, techniques such as Neural ODE ^[7] may be needed in practice.

Applications

Flow-based generative models have been applied on a variety of modeling tasks, including:

Audio generation^[8]
Image generation^[4]
Molecular graph generation^[9]
Point-cloud modeling^[10]
Video generation^[11]

References

^ Danilo Jimenez Rezende; Mohamed, Shakir (2015). "Variational Inference with Normalizing Flows". arXiv:1505.05770 [stat.ML].
^ Dinh, Laurent; Krueger, David; Bengio, Yoshua (2014). "NICE: Non-linear Independent Components Estimation". arXiv:1410.8516 [cs.LG].
^ Dinh, Laurent; Sohl-Dickstein, Jascha; Bengio, Samy (2016). "Density estimation using Real NVP". arXiv:1605.08803 [cs.LG].
^ ^a ^b Kingma, Diederik P.; Dhariwal, Prafulla (2018). "Glow: Generative Flow with Invertible 1x1 Convolutions". arXiv:1807.03039 [stat.ML].
^ Kobyzev, Ivan; Prince, Simon J.D.; Brubaker, Marcus A. (2021-11). "Normalizing Flows: An Introduction and Review of Current Methods". IEEE Transactions on Pattern Analysis and Machine Intelligence. 43 (11): 3964–3979. doi:10.1109/TPAMI.2020.2992934. ISSN 1939-3539. {{cite journal}}: Check date values in: |date= (help)
^ ^a ^b ^c Grathwohl, Will; Chen, Ricky T. Q.; Bettencourt, Jesse; Sutskever, Ilya; Duvenaud, David (2018). "FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models". arXiv:1810.01367 [cs.LG].
^ Chen, Ricky T. Q.; Rubanova, Yulia; Bettencourt, Jesse; Duvenaud, David (2018). "Neural Ordinary Differential Equations". arXiv:1806.07366 [cs.LG].
^ Ping, Wei; Peng, Kainan; Zhao, Kexin; Song, Zhao (2019). "WaveFlow: A Compact Flow-based Model for Raw Audio". arXiv:1912.01219 [cs.SD].
^ Shi, Chence; Xu, Minkai; Zhu, Zhaocheng; Zhang, Weinan; Zhang, Ming; Tang, Jian (2020). "GraphAF: A Flow-based Autoregressive Model for Molecular Graph Generation". arXiv:2001.09382 [cs.LG].
^ Yang, Guandao; Huang, Xun; Hao, Zekun; Liu, Ming-Yu; Belongie, Serge; Hariharan, Bharath (2019). "PointFlow: 3D Point Cloud Generation with Continuous Normalizing Flows". arXiv:1906.12320 [cs.CV].
^ Kumar, Manoj; Babaeizadeh, Mohammad; Erhan, Dumitru; Finn, Chelsea; Levine, Sergey; Dinh, Laurent; Kingma, Durk (2019). "VideoFlow: A Conditional Flow-Based Model for Stochastic Video Generation". arXiv:1903.01434 [cs.CV].

External links

[1] Danilo Jimenez Rezende; Mohamed, Shakir (2015). "Variational Inference with Normalizing Flows". arXiv:1505.05770 [stat.ML].

[2] Dinh, Laurent; Krueger, David; Bengio, Yoshua (2014). "NICE: Non-linear Independent Components Estimation". arXiv:1410.8516 [cs.LG].

[3] Dinh, Laurent; Sohl-Dickstein, Jascha; Bengio, Samy (2016). "Density estimation using Real NVP". arXiv:1605.08803 [cs.LG].

[glow-4] Kingma, Diederik P.; Dhariwal, Prafulla (2018). "Glow: Generative Flow with Invertible 1x1 Convolutions". arXiv:1807.03039 [stat.ML].

[5] Kobyzev, Ivan; Prince, Simon J.D.; Brubaker, Marcus A. (2021-11). "Normalizing Flows: An Introduction and Review of Current Methods". IEEE Transactions on Pattern Analysis and Machine Intelligence. 43 (11): 3964–3979. doi:10.1109/TPAMI.2020.2992934. ISSN 1939-3539. {{cite journal}}: Check date values in: |date= (help)

[ffjord-6] Grathwohl, Will; Chen, Ricky T. Q.; Bettencourt, Jesse; Sutskever, Ilya; Duvenaud, David (2018). "FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models". arXiv:1810.01367 [cs.LG].

[7] Chen, Ricky T. Q.; Rubanova, Yulia; Bettencourt, Jesse; Duvenaud, David (2018). "Neural Ordinary Differential Equations". arXiv:1806.07366 [cs.LG].

[8] Ping, Wei; Peng, Kainan; Zhao, Kexin; Song, Zhao (2019). "WaveFlow: A Compact Flow-based Model for Raw Audio". arXiv:1912.01219 [cs.SD].

[9] Shi, Chence; Xu, Minkai; Zhu, Zhaocheng; Zhang, Weinan; Zhang, Ming; Tang, Jian (2020). "GraphAF: A Flow-based Autoregressive Model for Molecular Graph Generation". arXiv:2001.09382 [cs.LG].

[10] Yang, Guandao; Huang, Xun; Hao, Zekun; Liu, Ming-Yu; Belongie, Serge; Hariharan, Bharath (2019). "PointFlow: 3D Point Cloud Generation with Continuous Normalizing Flows". arXiv:1906.12320 [cs.CV].

[11] Kumar, Manoj; Babaeizadeh, Mohammad; Erhan, Dumitru; Finn, Chelsea; Levine, Sergey; Dinh, Laurent; Kingma, Durk (2019). "VideoFlow: A Conditional Flow-Based Model for Stochastic Video Generation". arXiv:1903.01434 [cs.CV].

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]