Revision as of 03:38, 21 March 2024 edit PopoDameron (talk \| contribs) Extended confirmed users, Pending changes reviewers, Rollbackers 2,461 edits sure, but let's not imply that the sampling is something that the model itself is doing ← Previous edit		Revision as of 15:45, 22 March 2024 edit undo DarkMoonDragon (talk \| contribs) 22 edits The detailed and professional elaboration on the rectified flow has been updated. Tags: Visual edit Mobile edit Mobile web edit Next edit →
Line 173: === Rectified flow === Rectified flow is a method for learning transport maps between two distributions, offers a new perspective on understanding diffusion models and their ODE variants. Distinct from the complex SDE models, rectified flow is purely ODE-based, offering a straightforward and unified framework for generative and transfer modeling. Given the infinite possibilities of ODEs/SDEs to transfer data between two distributions, rectified flow specifically advocates for ODEs with solution paths that are straight lines. By learning straight flows, it provides a principled approach to learning ODEs with fast inference, effectively training one-step models with ODEs as intermediate steps. Rectified Flow (RF) formulation is employed in the Stable Diffusion 3. <ref>{{Cite web \|title=Stable Diffusion 3: Research Paper \|url=https://stability.ai/news/stable-diffusion-3-research-paper \|access-date=2024-03-22 \|website=Stability AI \|language=en-GB}}</ref> In standard diffusion modeling, the forward process turns the dataset distribution into white noise by adding a little bit of white noise at a time, and the backward process turns white noise back to the dataset distribution by removing a little bit of white noise at a time. ~~If the forward process is "nice", then the backward process might also be nice. The rectified flow is a certain nice forward process, and it is used in Stable Diffusion 3.0.~~ Rescale the time-interval to <math>[0, 1]</math>, then let the starting point be <math>x_0</math>, an image sampled from the natural image distribution. A forward process neural network is a function <math>v(x_t, t)</math>, such that integrating<math display="block">dx_t = v(x_t, t) dt</math>would give us <math>x_1</math>, a white-noise image. Given two distributions <math>\pi_0</math> and <math>\pi_1</math>, probability flow models implicitly learns the transport map by constructing an ODE driven by a drift force in <math>\mathbb R^d \times [0,1]</math>: <math display="block">\mathrm d \mathbf Z_t = \mathbf v(\mathbf Z_t , t) \, \mathrm dt, \quad t \in [0,1], \quad \text{starting from }\mathbf Z_0 \sim \mathbf\pi_0</math> such that <math>\mathbf Z_1 \sim \pi_1</math> when following the ODE starting from <math>\mathbf Z_0 \sim \pi_0</math>. Generally, for any time-differentiable process <math>\mathbf X(t)</math>, <math>\mathbf v</math> can be estimated by solving: <math display="block">\min_{\mathbf v} \int_0^1 \mathbb{E}\left [\lVert{\dot{\mathbf X}_t - \mathbf v(\mathbf X_t, t)}\rVert^2\right] \,\mathrm{d}t.</math> By injecting strong priors that intermediate trajectories are straight in rectified flow, it can achieve both theoretical relevance for optimal transport <ref>{{Citation \|last=Liu \|first=Qiang \|title=Rectified Flow: A Marginal Preserving Approach to Optimal Transport \|date=2022-09-29 \|url=http://arxiv.org/abs/2209.14577 \|access-date=2024-03-22 \|doi=10.48550/arXiv.2209.14577}}</ref> computational efficiency, as ODEs with straight paths can be simulated precisely without time discretization. Given a probabilistic coupling over <math>(x_0, x_1) </math>, we can train a new velocity field <math>v_\theta</math> on the space of all images by minimizing the expectation of <math>\int_0^1 \\|(x_1 - x_0 ) - v_\theta(x_t, t) \\|^2 dt</math>. Intuitively, this means that the velocity field attempts to guide each noisy image to a natural-looking image on a path as straight as possible, except those points where it is very ambiguous, at which point the image is guided to the average of several natural-looking images that it might end up with. ~~Now~~Specifically, ~~since a velocity field defines another probabilistic coupling, by integrating the velocity field into a path, we can iterate the~~ rectified flow ~~operation.~~seeks ~~Eventually,~~to wematch ~~would~~an ~~obtain~~ODE a probabilistic coupling between noise and natural-looking images that are connected by mostly-straight paths in a vector field. This then gives us a particularly nice forward process. We can then train a backward process neural network bywith the ~~same~~marginal ~~denoising~~distributions ~~loss function. Since~~of the ~~forward~~'''linear ~~process~~interpolation''' ~~consists~~between ofpoints ~~rather~~from ~~straight~~distributions ~~paths, instead of jagged Brownian motion paths, the backward process can be nicer~~<math>\pi_0</math> and ~~can handle large steps in Euler sampling.~~<math>\pi_1</math><ref name=":70">{{Citation \|last=Liu \|first=Xingchao \|title=Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow \|date=2022-09-07 \|url=http://arxiv.org/abs/2209.03003 \|access-date=2024-03-0622 \|doi=10.48550/arXiv.2209.03003 \|last2=Gong \|first2=Chengyue \|last3=Liu \|first3=Qiang}}</ref>. Given observations <~~ref~~math>\mathbf{X}_0 ~~name=":8"~~\sim \pi_0</math> and <math>\mathbf{~~{Cite~~X}_1 ~~web~~\sim ~~\|title~~\pi_1</math>, the canonical linear interpolation <math>\mathbf{X}_t=~~Rectified~~t\mathbf{X}_1 ~~Flow~~+ —(1-t)\mathbf{X}_0, ~~Rectified~~t\in ~~Flow~~[0,1]</math> yields a trivial case <math>\dot{\mathbf X}_t ~~\|url~~=~~https:~~ \mathbf X_1 - \mathbf X_0</math>, which cannot be causally simulated without <math>\mathbf{X}_1</~~www~~math>.~~cs.utexas.edu~~ To address this, <math>\mathbf{X}_t</~~~lqiang~~math> is "projected" into a space of causally simulatable ODEs, expressed as <math>\mathrm{d}\mathbf{Z}_t = \mathbf{v}(\mathbf{Z}_t, t)</~~rectflow/html/intro.html~~math>, by minimizing the least squares loss with respect to the direction <math>\mathbf{X}_1 ~~\|access~~-~~date~~ \mathbf{X}_0</math>: <math display=~~2024~~"block">\min_{\mathbf v} \int_0^1 \mathbb{E}\left [\lVert{(\mathbf X_1-03\mathbf X_0) -06 ~~\|website=www.cs.utexas.edu~~\mathbf v(\mathbf X_t, t)}\rVert^2\right] \,\mathrm{d}t.</~~ref~~math> The data pair <math>(\mathbf{X}_0, \mathbf{X}_1)</math> can be any coupling of <math>\pi_0</math> and <math>\pi_1</math>, typically independent (i.e., <math>(\mathbf{X}_0,\mathbf{X}_1) \sim \pi_0 \times \pi_1</math>) obtained by randomly combining observations from <math>\pi_0</math> and <math>\pi_1</math>. This process ensures that the <math>\mathbf{Z}_t</math> trajectories closely mirror the density map of <math>\mathbf{X}_t</math> trajectories but ''reroute'' at intersections to ensure causality. This rectifying process is also referred to as Flow Matching, Stochastic Interpolation, and Alpha-Blending. A distinctive aspect of rectified flow is its capability for "'''reflow'''", which straightens the trajectory of ODE paths. Denote the rectified flow <math>\boldsymbol{Z}^0 = \{\mathbf{Z}_t: t\in[0,1]\}</math> induced from <math>(\mathbf{X}_0,\mathbf{X}_1)</math> as <math>\boldsymbol{Z}^0 = \mathsf{Rectflow}((\mathbf{X}_0,\mathbf{X}_1))</math>. Recursively applying this <math>\mathsf{Rectflow}(\cdot)</math> operator generates a series of rectified flows <math>\boldsymbol{Z}^{k+1} = \mathsf{Rectflow}((\mathbf{Z}_0^k, \mathbf{Z}_1^k))</math>, starting with <math>(\mathbf{Z}_0^0,\mathbf{Z}_1^0)=(\mathbf{X}_0,\mathbf{X}_1)</math>, where <math>\boldsymbol{Z}^k</math> is the <math>k</math>-th iteration of rectified flow induced from <math>(\mathbf{X}_0,\mathbf{X}_1)</math>. This "reflow" process not only reduces transport costs but also straightens the paths of rectified flows, making <math>\boldsymbol{Z}^k</math> paths straighter with increasing <math>k</math>. Rectified flow includes a nonlinear extension where linear interpolation <math>\mathbf{X}_t</math> is replaced with any time-differentiable curve that connects <math>\mathbf{X}_0</math> and <math>\mathbf{X}_1</math>, given by <math>\mathbf{X}_t = \alpha_t \mathbf{X}_1 + \beta_t \mathbf{X}_0</math>. This framework encompasses DDIM and probability flow ODEs as special cases, with particular choices of <math>\alpha_t</math> and <math>\beta_t</math>. However, in the case when the path of <math>\mathbf{X}</math> is not straight, the pair <math>(\mathbf{Z}_0, \mathbf{Z}_1)</math> no longer ensures a reduction in convex transport costs, and the reflow process also no longer straighten the paths of <math>\mathbf{Z}_t</math> <ref name=":0" />. Flows with nearly straight paths offer a significant computational benefit by minimizing time-discretization errors in numerical simulations. Specifically, if an ODE <math>\mathrm{d} \mathbf{Z}_t = \mathbf{v}(\mathbf{Z}_t,t)\; \mathrm{d}t</math> follows perfectly straight paths, it simplifies to <math>\mathbf{Z}_t = \mathbf{Z}_0 + t \cdot \mathbf{v}(\mathbf{Z}_0, 0)</math>, allowing for exact solutions with just ''one single Euler step''. This addresses the very bottleneck of slow inference in ODE/SDE models. Consequently, the reflow/straightening procedure emerges as a unique strategy for training one-step generative models, such as GANs and VAEs, using ODEs as an intermediate mechanism. == Choice of architecture ==

Diffusion model: Difference between revisions