Optical flow: Difference between revisions

Content deleted Content added
Citation bot (talk | contribs)
Added bibcode. | Use this bot. Report bugs. | Suggested by Abductive | Category:Motion in computer vision | #UCB_Category 12/27
Estimation: Made major revision as discussed on talk page.
Line 10:
== Estimation ==
 
Sequences of ordered images allow the estimation of motion as either instantaneous image velocities or discrete image displacements.<ref name="S. S. Beauchemin, J. L. Barron 1995" /> Fleet and Weiss provide a tutorial introduction to gradient based optical flow.<ref>{{Cite book |title=Handbook of Mathematical Models in Computer Vision |last1=Fleet |first1=David J. |last2=Weiss |first2=Yair |publisher=Springer |year=2006 |isbn=978-0-387-26371-7 |editor-last=Paragios |editor-first=Nikos |pages=237–257 |chapter=Optical Flow Estimation |editor-last2=Chen |editor-first2=Yunmei |editor-last3=Faugeras |editor-first3=Olivier D. |chapter-url=http://www.cs.toronto.edu/~fleet/research/Papers/flowChapter05.pdf}}</ref>
John L. Barron, David J. Fleet, and Steven Beauchemin provide a performance analysis of a number of optical flow techniques. It emphasizes the accuracy and density of measurements.<ref>{{Cite journal |last1=Barron |first1=John L. |last2=Fleet |first2=David J. |last3=Beauchemin |first3=Steven |name-list-style=amp |year=1994 |title=Performance of optical flow techniques |url=http://www.cs.toronto.edu/~fleet/research/Papers/ijcv-94.pdf |journal=International Journal of Computer Vision |volume=12 |pages=43–77 |citeseerx=10.1.1.173.481 |doi=10.1007/bf01420984|s2cid=1290100 }}</ref>
 
Optical flow can be estimated in a number of ways. Broadly, optical flow estimation approaches can be divided into machine learning based models (sometimes called data-driven models), classical models (sometimes called knowledge-driven models) which do not use machine learning and hybrid models which use aspects of both learning based models and classical models.<ref name="Zhai_Survey_2021">{{cite journal |last1=Zhai |first1=Mingliang |last2=Xiang |first2=Xuezhi |last3=Lv |first3=Ning |last4=Kong |first4=Xiangdong |title=Optical flow and scene flow estimation: A survey |journal=Pattern Recognition |date=2021 |volume=114 |pages=107861 |doi=10.1016/j.patcog.2021.107861 |url=https://www.sciencedirect.com/science/article/pii/S0031320321000480}}</ref>
The optical flow methods try to calculate the motion between two image frames which are taken at times <math>t</math> and <math>t+\Delta t</math> at every [[voxel]] position. These methods are called differential since they are based on local [[Taylor series]] approximations of the image signal; that is, they use partial derivatives with respect to the spatial and temporal coordinates.
 
===Classical Models===
For a (2D&nbsp;+&nbsp;''t'')-dimensional case (3D or ''n''-D cases are similar) a voxel at ___location <math>(x,y,t)</math> with intensity <math>I(x,y,t)</math> will have moved by <math>\Delta x</math>, <math>\Delta y</math> and <math>\Delta t</math> between the two image frames, and the following ''brightness constancy constraint'' can be given:
:<math>I(x,y,t) = I(x+\Delta x, y + \Delta y, t + \Delta t)</math>
 
Many classical models use the intuitive assumption of ''brightness constancy''; that even if a point moves between frames, its brightness stays constant.
Assuming the movement to be small, the image constraint at <math>I(x,y,t)</math> with [[Taylor series]] can be developed to get:
<ref name="Fortun_Survey_2015">{{cite journal |last1=Fortun |first1=Denis |last2=Bouthemy |first2=Patrick |last3=Kervrann |first3=Charles|title=Optical flow modeling and computation: A survey |journal=Computer Vision and Image Understanding |date=2015-05-01 |volume=134 |pages=1-21 |doi=10.1016/j.cviu.2015.02.008 |url=https://www.sciencedirect.com/science/article/pii/S1077314215000429 |access-date=2024-12-23}}</ref>
:<math>I(x+\Delta x,y+\Delta y,t+\Delta t) = I(x,y,t) + \frac{\partial I}{\partial x}\,\Delta x+\frac{\partial I}{\partial y}\,\Delta y+\frac{\partial I}{\partial t} \, \Delta t+{}</math>[[higher-order terms]]
To formalise this intuitive assumption, consider two consecutive frames from a video sequence, with intensity <math>I(x, y, t)</math>, where <math>(x, y)</math> refer to pixel coordinates and <math>t</math> refers to time.
By truncating the higher order terms (which performs a linearization) it follows that:
In this case, the brightness constancy constraint is
:<math>\frac{\partial I}{\partial x}\Delta x+\frac{\partial I}{\partial y}\Delta y+\frac{\partial I}{\partial t}\Delta t = 0</math>
:<math>
or, dividing by <math>\Delta t</math>,
:<math>I(x, y, t) =- I(x +\Delta xu, y + \Delta yv, t + \Delta t1)</math> = 0,
:<math>\frac{\partial I}{\partial x}\frac{\Delta x}{\Delta t} + \frac{\partial I}{\partial y}\frac{\Delta y}{\Delta t} + \frac{\partial I}{\partial t} \frac{\Delta t}{\Delta t} = 0</math>
</math>
which results in
where <math>\mathbf{w}:= (u, v)</math> is the displacement vector between a point in the first frame and the corresponding point in the second frame.
:<math>\frac{\partial I}{\partial x}V_x+\frac{\partial I}{\partial y}V_y+\frac{\partial I}{\partial t} = 0</math>
By itself, the brightness constancy constraint cannot be solved for <math>u</math> and <math>v</math> at each pixel, since there is only one equation and two unknowns.
where <math>V_x,V_y</math> are the <math>x</math> and <math>y</math> components of the velocity or optical flow of <math>I(x,y,t)</math> and <math>\tfrac{\partial I}{\partial x}</math>, <math>\tfrac{\partial I}{\partial y}</math> and <math>\tfrac{\partial I}{\partial t}</math> are the derivatives of the image at <math>(x,y,t)</math> in the corresponding directions. <math>I_x</math>,<math> I_y</math> and <math> I_t</math> can be written for the derivatives in the following.
This is known as the ''[[Motion perception#The aperture problem|aperture problem]]''.
Therefore, additional constraints must be imposed to estimate the flow field.<ref name="Brox_2004">{{cite conference |url=http://link.springer.com/10.1007/978-3-540-24673-2_3 |title=High Accuracy Optical Flow Estimation Based on a Theory for Warping |last1=Brox |first1=Thomas |last2=Bruhn |first2=Andrés |last3=Papenberg |first3=Nils |last4=Weickert |first4=Joachim |date=2004 |publisher=Springer Berlin Heidelberg |book-title=Computer Vision - ECCV 2004 |pages=25-36 |___location=Berlin, Heidelberg |conference=ECCV 2004}}</ref><ref name="Baker_2011">{{cite journal |last1=Baker |first1=Simon |last2=Scharstein |first2=Daniel |last3=Lewis |first3=J. P. |last4=Roth |first4=Stefan |last5=Black |first5=Michael J. |last6=Szeliski |first6=Richard |title=A Database and Evaluation Methodology for Optical Flow |journal=International Journal of Computer Vision |date=1 March 2011 |volume=92 |issue=1 |pages=1–31 |doi=10.1007/s11263-010-0390-2 |url=https://link.springer.com/article/10.1007/s11263-010-0390-2 |access-date=25 Dec 2024 |language=en |issn=1573-1405}}</ref>
 
==== Regularized Models ====
Thus:
Perhaps the most natural approach to addressing the aperture problem is to apply a smoothness constraint or a ''regularization constraint'' to the flow field.
:<math>I_xV_x+I_yV_y=-I_t</math>
One can combine both of these constraints to formulate estimating optical flow as an [[Optimization problem|optimization problem]], where the goal is to minimize the cost function of the form,
or
:<math>E = \iint_\Omega \Psi(I(x + u, y + v, t + 1) - I(x, y, t)) + \alpha \Psi(|\nabla u|) + \alpha \Psi(|\nabla v|) dx dy, </math>
:<math>\nabla I\cdot\vec{V} = -I_t</math>
where <math>\Omega</math> is the extent of the images <math>I(x, y)</math>, <math>\nabla</math> is the gradient operator, <math>\alpha</math> is a constant, and <math>\Psi()</math> is a [[loss function]].
This is an equation in two unknowns and cannot be solved as such. This is known as the ''[[Motion perception#The aperture problem|aperture problem]]'' of the optical flow algorithms. To find the optical flow another set of equations is needed, given by some additional constraint. All optical flow methods introduce additional conditions for estimating the actual flow.
<ref name="Fortun_Survey_2015" /><ref name="Brox_2004" />
 
This optimisation problem is difficult to solve owing to its non-linearity.
=== Methods for determination ===
To address this issue, one can use a ''variational approach'' and linearise the brightness constancy constraint using a first order [[Taylor series]] approximation. Specifically, the brightness constancy constraint is approximated as,
:<math>I(x+\Delta x,y+\Delta y,t+\Delta t) = I(x,y,t) + \frac{\partial I}{\partial x}\,\Delta xu+\frac{\partial I}{\partial y}\,\Delta yv+\frac{\partial I}{\partial t} \,= \Delta t+{}0.</math>[[higher-order terms]]
whereFor <math>V_xconvenience,V_y</math> are the <math>x</math> and <math>y</math> componentsderivatives of the velocity or optical flow of <math>I(ximage,y,t)</math> and <math>\tfrac{\partial I}{\partial x}</math>, <math>\tfrac{\partial I}{\partial y}</math> and <math>\tfrac{\partial I}{\partial t}</math> are theoften derivativescondensed of the imageto at <math>(x,y,t)</math> in the corresponding directions.become <math>I_x</math>, <math> I_y</math> and <math> I_t</math> can be written for the derivatives in the following.
Doing so, allows one to rewrite the linearised brightness constancy constraint as,<ref name="Baker_2011"/>
:<math>I_xV_xI_x u +I_yV_y=- I_y v+ I_t = 0.</math>
The optimization problem can now be rewritten as
:<math>E = \iint_\Omega \Psi(I_x u + I_y v + I_t) + \alpha \Psi(|\nabla u|) + \alpha \Psi(|\nabla v|) dx dy. </math>
For the choice of <math>\Psi(x) = x^2</math>, this method is the same as the [[Horn-Schunck method]].
<ref name="Horn_1980"/>
Of course, other choices of cost function have been used such as <math>\Psi(x) = \sqrt{x^2 + \epsilon^2}</math>, which is a differentiable variant of the [[Taxicab geometry |<math>L^1</math> norm]].<ref name="Fortun_Survey_2015" />
<ref>
{{cite conference |url=https://ieeexplore.ieee.org/abstract/document/5539939 |title=Secrets of optical flow estimation and their principles |last1=Sun |first1=Deqing |last2=Roth |first2=Stefan |last3=Black |first3="Micahel J." |date=2010 |publisher=IEEE |book-title=2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition |pages= 2432-2439 |___location=San Francisco, CA, USA |conference=2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition}}</ref>
 
To solve the aforementioned optimization problem, one can use the [[Euler-Lagrange equations]] to provide a system of partial differential equations for each point in <math>I(x, y, t)</math>. In the simplest case of using <math>\Psi(x) = x^2</math>, these equations are,
*[[Phase correlation]] – inverse of normalized [[cross-power spectrum]]
:<math> I_x(I_xu+I_yv+I_t) - \alpha \Delta u = 0,</math>
*Block-based methods – minimizing sum of squared differences or [[sum of absolute differences]], or maximizing normalized [[cross-correlation]]
:<math> I_y(I_xu+I_yv+I_t) - \alpha \Delta v = 0,</math>
*Differential methods of estimating optical flow, based on partial derivatives of the image signal and/or the sought flow field and higher-order partial derivatives, such as:
:where <math>\frac{\partialDelta I}{\partial= x}\Delta x+\frac{\partial I^2}{\partial yx^2}\Delta y+ \frac{\partial I^2}{\partial ty^2}\Delta t = 0</math> denotes the [[Laplace operator]].
**[[Lucas–Kanade method]] – regarding image patches and an affine model for the flow field<ref name="Zhang2018">{{Cite journal |last1=Zhang |first1=G. |last2=Chanson |first2=H. |author-link2=Hubert Chanson |year=2018 |title=Application of Local Optical Flow Methods to High-Velocity Free-surface Flows: Validation and Application to Stepped Chutes |url=http://staff.civil.uq.edu.au/h.chanson/reprints/Zhang_Chanson_etfs_2018.pdf |journal=Experimental Thermal and Fluid Science |volume=90 |pages=186–199 |doi=10.1016/j.expthermflusci.2017.09.010|bibcode=2018ETFS...90..186Z }}</ref>
Since the image data is made up of discrete pixels, these equations are discretised.
**[[Horn–Schunck method]] – optimizing a functional based on residuals from the brightness constancy constraint, and a particular regularization term expressing the expected smoothness of the flow field<ref name="Zhang2018" />
Doing so yields a system of linear equations which can be solved for <math>(u, v)</math> at each pixel, using an iterative scheme such as [[Gauss-Seidel]].<ref name="Horn_1980" />
**[[Buxton–Buxton method]] – based on a model of the motion of edges in image sequences<ref>{{Cite book |url=https://books.google.com/books?id=NiQXkMbx-lUC&q=optical-flow+Buxton-and-Buxton&pg=PA107 |title=Visual Cognition |last=Glyn W. Humphreys and [[Vicki Bruce]] |publisher=Psychology Press |year=1989 |isbn=978-0-86377-124-8}}</ref>
 
**[[Black–Jepson method]] – coarse optical flow via correlation<ref name="S. S. Beauchemin, J. L. Barron 1995" />
An alternate approach is to discretize the optimisation problem and then perform a search of the possible <math>(u, v)</math> values without linearising it.<ref name="Steinbrucker_2009">{{cite conference |url=https://ieeexplore.ieee.org/document/5459364 |title=Large Displacement Optical Flow Computation without Warping |last1=Steinbr¨ucker |first1=Frank |last2=Pock |first2=Thomas |last3=Cremers |first3=Daniel |last4=Weickert |first4=Joachim |date=2009 |publisher=IEEE |book-title=2009 IEEE 12th International Conference on Computer Vision |pages=1609-1614 |conference=2009 IEEE 12th International Conference on Computer Vision}}</ref>
**General [[variational methods]] – a range of modifications/extensions of Horn–Schunck, using other data terms and other smoothness terms.
This search is often performed using [[Max-flow min-cut theorem]] algorithms, linear programming or [[belief propagation]] methods.
*Discrete optimization methods – the search space is quantized, and then image matching is addressed through label assignment at every pixel, such that the corresponding deformation minimizes the distance between the source and the target image.<ref>{{Cite book |url=http://vision.mas.ecp.fr/pub/mian08.pdf |title=Dense Image Registration through MRFs and Efficient Linear Programming |last1=B. Glocker |last2=N. Komodakis |last3=G. Tziritas |last4=N. Navab |last5=N. Paragios |publisher=Medical Image Analysis Journal |year=2008}}</ref> The optimal solution is often recovered through [[Max-flow min-cut theorem]] algorithms, linear programming or [[belief propagation]] methods.
 
Many of these, in addition to the current state-of-the-art algorithms are evaluated on the Middlebury Benchmark Dataset.<ref>{{Cite journal |last1=Baker |first1=Simon |last2=Scharstein |first2=Daniel |last3=Lewis |first3=J. P. |last4=Roth |first4=Stefan |last5=Black |first5=Michael J. |last6=Szeliski |first6=Richard |date=March 2011 |title=A Database and Evaluation Methodology for Optical Flow |journal=International Journal of Computer Vision |language=en |volume=92 |issue=1 |pages=1–31 |doi=10.1007/s11263-010-0390-2 |s2cid=316800 |issn=0920-5691|doi-access=free }}</ref><ref>{{Cite web |url=http://vision.middlebury.edu/flow/ |title=Optical Flow |last1=Baker |first1=Simon |last2=Scharstein |first2=Daniel |website=vision.middlebury.edu |access-date=2019-10-18 |last3=Lewis |first3=J. P. |last4=Roth |first4=Stefan |last5=Black |first5=Michael J. |last6=Szeliski |first6=Richard}}</ref> Other popular benchmark datasets are [[KITTI]] and [[Sintel]].
==== Parametric Models ====
 
Instead of applying the regularization constraint on a point by point basis as per a regularized model, one can group pixels into regions and estimate the motion of these regions.
This is known as a ''parametric model'', since the motion of these regions is [[parameter|parameterized]].
In formulating optical flow estimation in this way, one makes the assumption that the motion field in each region be fully characterised by a set of parameters.
Therefore, the goal of a parametric model is to estimate the motion parameters that minimise a loss function which can be written as,
:<math>
\hat{\boldsymbol{\alpha}} = \arg \min_{\boldsymbol{\alpha}} \sum_{(x, y) \in \mathcal{R}} g(x, y) \rho(x, y, I_1, I_2, u_{\boldsymbol{\alpha}}, v_{\boldsymbol{\alpha}}),
</math>
where <math>{\boldsymbol{\alpha}}</math> is the set of parameters determining the motion in the region <math>\mathcal{R}</math>, <math>\rho()</math> is data cost term, <math>g()</math> is a weighting function that determines the influence of pixel <math>(x, y)</math> on the total cost, and <math>I_1</math> and <math>I_2</math> are frames 1 and 2 from a pair of consecutive frames.
<ref name="Fortun_Survey_2015" />
 
The simplest parametric model is the [[Lucas-Kanade method]]. This uses rectangular regions and parameterises the motion as purely translational. The Lucas-Kanade method uses the original brightness constancy constrain as the data cost term and selects <math>g(x, y) = 1</math>.
This yields the local loss function,
:<math>
\hat{\boldsymbol{\alpha}} = \arg \min_{\boldsymbol{\alpha}} \sum_{(x, y) \in \mathcal{R}} | I(x + u_{\boldsymbol{\alpha}}, y + v_{\boldsymbol{\alpha}}, t + 1) - I(x, y, t)| .
</math>
Other possible local loss functions include the negative normalized [[cross-correlation]] between the two frames.<ref>{{cite conference |last=Lucas |first=Bruce D. |last2=Kanade |first2=Takeo |date=1981-08-24 |title=An iterative image registration technique with an application to stereo vision |url=https://dl.acm.org/doi/10.5555/1623264.1623280 |journal=Proceedings of the 7th International Joint Conference on Artificial intelligence - Volume 2 |series=IJCAI'81 |___location=San Francisco, CA, USA |publisher=Morgan Kaufmann Publishers Inc. |pages=674–679}}</ref>
 
===Learning Based Models===
 
These models Instead of seeking to model optical flow directly, one can train a [[machine learning]] system to estimate optical flow. Since 2015, when FlowNet<ref>{{Cite conference |last=Dosovitskiy |first=Alexey |last2=Fischer |first2=Philipp |last3=Ilg |first3=Eddy |last4=Hausser |first4=Philip |last5=Hazirbas |first5=Caner |last6=Golkov |first6=Vladimir |last7=Smagt |first7=Patrick van der |last8=Cremers |first8=Daniel |last9=Brox |first9=Thomas |date=2015 |title=FlowNet: Learning Optical Flow with Convolutional Networks |url=https://ieeexplore.ieee.org/document/7410673/ |publisher=IEEE |pages=2758–2766 |doi=10.1109/ICCV.2015.316 |isbn=978-1-4673-8391-2 | conference=2015 IEEE International Conference on Computer Vision (ICCV)}}</ref> was proposed, learning based models have been applied to optical flow and have gained prominence. Initially, these approaches were based on [[Convolutional neural network|Convolutional Neural Networks]] arranged in a [[U-Net]] architecture. However, with the advent of [[Transformer (deep learning architecture)|transformer architecture]] in 2017, transformer based have gained prominence.<ref>{{Cite journal |last=Alfarano |first=Andrea |last2=Maiano |first2=Luca |last3=Papa |first3=Lorenzo |last4=Amerini |first4=Irene |date=2024 |title=Estimating optical flow: A comprehensive review of the state of the art |url=https://linkinghub.elsevier.com/retrieve/pii/S1077314224002418 |journal=Computer Vision and Image Understanding |language=en |volume=249 |pages=104160 |doi=10.1016/j.cviu.2024.104160}}</ref>
 
== Uses ==