Content deleted Content added
Citation bot (talk | contribs) Added bibcode. | Use this bot. Report bugs. | Suggested by Abductive | Category:Motion in computer vision | #UCB_Category 12/27 |
→Estimation: Made major revision as discussed on talk page. |
||
Line 10:
== Estimation ==
Optical flow can be estimated in a number of ways. Broadly, optical flow estimation approaches can be divided into machine learning based models (sometimes called data-driven models), classical models (sometimes called knowledge-driven models) which do not use machine learning and hybrid models which use aspects of both learning based models and classical models.<ref name="Zhai_Survey_2021">{{cite journal |last1=Zhai |first1=Mingliang |last2=Xiang |first2=Xuezhi |last3=Lv |first3=Ning |last4=Kong |first4=Xiangdong |title=Optical flow and scene flow estimation: A survey |journal=Pattern Recognition |date=2021 |volume=114 |pages=107861 |doi=10.1016/j.patcog.2021.107861 |url=https://www.sciencedirect.com/science/article/pii/S0031320321000480}}</ref>
===Classical Models===
:<math>I(x,y,t) = I(x+\Delta x, y + \Delta y, t + \Delta t)</math>▼
Many classical models use the intuitive assumption of ''brightness constancy''; that even if a point moves between frames, its brightness stays constant.
<ref name="Fortun_Survey_2015">{{cite journal |last1=Fortun |first1=Denis |last2=Bouthemy |first2=Patrick |last3=Kervrann |first3=Charles|title=Optical flow modeling and computation: A survey |journal=Computer Vision and Image Understanding |date=2015-05-01 |volume=134 |pages=1-21 |doi=10.1016/j.cviu.2015.02.008 |url=https://www.sciencedirect.com/science/article/pii/S1077314215000429 |access-date=2024-12-23}}</ref>
:<math>I(x+\Delta x,y+\Delta y,t+\Delta t) = I(x,y,t) + \frac{\partial I}{\partial x}\,\Delta x+\frac{\partial I}{\partial y}\,\Delta y+\frac{\partial I}{\partial t} \, \Delta t+{}</math>[[higher-order terms]]▼
To formalise this intuitive assumption, consider two consecutive frames from a video sequence, with intensity <math>I(x, y, t)</math>, where <math>(x, y)</math> refer to pixel coordinates and <math>t</math> refers to time.
In this case, the brightness constancy constraint is
:<math>\frac{\partial I}{\partial x}\Delta x+\frac{\partial I}{\partial y}\Delta y+\frac{\partial I}{\partial t}\Delta t = 0</math>▼
:<math>
</math>
where <math>\mathbf{w}:= (u, v)</math> is the displacement vector between a point in the first frame and the corresponding point in the second frame.
By itself, the brightness constancy constraint cannot be solved for <math>u</math> and <math>v</math> at each pixel, since there is only one equation and two unknowns.
where <math>V_x,V_y</math> are the <math>x</math> and <math>y</math> components of the velocity or optical flow of <math>I(x,y,t)</math> and <math>\tfrac{\partial I}{\partial x}</math>, <math>\tfrac{\partial I}{\partial y}</math> and <math>\tfrac{\partial I}{\partial t}</math> are the derivatives of the image at <math>(x,y,t)</math> in the corresponding directions. <math>I_x</math>,<math> I_y</math> and <math> I_t</math> can be written for the derivatives in the following.▼
This is known as the ''[[Motion perception#The aperture problem|aperture problem]]''.
Therefore, additional constraints must be imposed to estimate the flow field.<ref name="Brox_2004">{{cite conference |url=http://link.springer.com/10.1007/978-3-540-24673-2_3 |title=High Accuracy Optical Flow Estimation Based on a Theory for Warping |last1=Brox |first1=Thomas |last2=Bruhn |first2=Andrés |last3=Papenberg |first3=Nils |last4=Weickert |first4=Joachim |date=2004 |publisher=Springer Berlin Heidelberg |book-title=Computer Vision - ECCV 2004 |pages=25-36 |___location=Berlin, Heidelberg |conference=ECCV 2004}}</ref><ref name="Baker_2011">{{cite journal |last1=Baker |first1=Simon |last2=Scharstein |first2=Daniel |last3=Lewis |first3=J. P. |last4=Roth |first4=Stefan |last5=Black |first5=Michael J. |last6=Szeliski |first6=Richard |title=A Database and Evaluation Methodology for Optical Flow |journal=International Journal of Computer Vision |date=1 March 2011 |volume=92 |issue=1 |pages=1–31 |doi=10.1007/s11263-010-0390-2 |url=https://link.springer.com/article/10.1007/s11263-010-0390-2 |access-date=25 Dec 2024 |language=en |issn=1573-1405}}</ref>
==== Regularized Models ====
Perhaps the most natural approach to addressing the aperture problem is to apply a smoothness constraint or a ''regularization constraint'' to the flow field.
:<math>I_xV_x+I_yV_y=-I_t</math>▼
One can combine both of these constraints to formulate estimating optical flow as an [[Optimization problem|optimization problem]], where the goal is to minimize the cost function of the form,
:<math>E = \iint_\Omega \Psi(I(x + u, y + v, t + 1) - I(x, y, t)) + \alpha \Psi(|\nabla u|) + \alpha \Psi(|\nabla v|) dx dy, </math>
where <math>\Omega</math> is the extent of the images <math>I(x, y)</math>, <math>\nabla</math> is the gradient operator, <math>\alpha</math> is a constant, and <math>\Psi()</math> is a [[loss function]].
<ref name="Fortun_Survey_2015" /><ref name="Brox_2004" />
This optimisation problem is difficult to solve owing to its non-linearity.
To address this issue, one can use a ''variational approach'' and linearise the brightness constancy constraint using a first order [[Taylor series]] approximation. Specifically, the brightness constancy constraint is approximated as,
▲:<math>
▲
Doing so, allows one to rewrite the linearised brightness constancy constraint as,<ref name="Baker_2011"/>
The optimization problem can now be rewritten as
:<math>E = \iint_\Omega \Psi(I_x u + I_y v + I_t) + \alpha \Psi(|\nabla u|) + \alpha \Psi(|\nabla v|) dx dy. </math>
For the choice of <math>\Psi(x) = x^2</math>, this method is the same as the [[Horn-Schunck method]].
<ref name="Horn_1980"/>
Of course, other choices of cost function have been used such as <math>\Psi(x) = \sqrt{x^2 + \epsilon^2}</math>, which is a differentiable variant of the [[Taxicab geometry |<math>L^1</math> norm]].<ref name="Fortun_Survey_2015" />
<ref>
{{cite conference |url=https://ieeexplore.ieee.org/abstract/document/5539939 |title=Secrets of optical flow estimation and their principles |last1=Sun |first1=Deqing |last2=Roth |first2=Stefan |last3=Black |first3="Micahel J." |date=2010 |publisher=IEEE |book-title=2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition |pages= 2432-2439 |___location=San Francisco, CA, USA |conference=2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition}}</ref>
To solve the aforementioned optimization problem, one can use the [[Euler-Lagrange equations]] to provide a system of partial differential equations for each point in <math>I(x, y, t)</math>. In the simplest case of using <math>\Psi(x) = x^2</math>, these equations are,
:<math> I_x(I_xu+I_yv+I_t) - \alpha \Delta u = 0,</math>
:<math> I_y(I_xu+I_yv+I_t) - \alpha \Delta v = 0,</math>
▲
Since the image data is made up of discrete pixels, these equations are discretised.
Doing so yields a system of linear equations which can be solved for <math>(u, v)</math> at each pixel, using an iterative scheme such as [[Gauss-Seidel]].<ref name="Horn_1980" />
An alternate approach is to discretize the optimisation problem and then perform a search of the possible <math>(u, v)</math> values without linearising it.<ref name="Steinbrucker_2009">{{cite conference |url=https://ieeexplore.ieee.org/document/5459364 |title=Large Displacement Optical Flow Computation without Warping |last1=Steinbr¨ucker |first1=Frank |last2=Pock |first2=Thomas |last3=Cremers |first3=Daniel |last4=Weickert |first4=Joachim |date=2009 |publisher=IEEE |book-title=2009 IEEE 12th International Conference on Computer Vision |pages=1609-1614 |conference=2009 IEEE 12th International Conference on Computer Vision}}</ref>
This search is often performed using [[Max-flow min-cut theorem]] algorithms, linear programming or [[belief propagation]] methods.
==== Parametric Models ====
Instead of applying the regularization constraint on a point by point basis as per a regularized model, one can group pixels into regions and estimate the motion of these regions.
This is known as a ''parametric model'', since the motion of these regions is [[parameter|parameterized]].
In formulating optical flow estimation in this way, one makes the assumption that the motion field in each region be fully characterised by a set of parameters.
Therefore, the goal of a parametric model is to estimate the motion parameters that minimise a loss function which can be written as,
:<math>
\hat{\boldsymbol{\alpha}} = \arg \min_{\boldsymbol{\alpha}} \sum_{(x, y) \in \mathcal{R}} g(x, y) \rho(x, y, I_1, I_2, u_{\boldsymbol{\alpha}}, v_{\boldsymbol{\alpha}}),
</math>
where <math>{\boldsymbol{\alpha}}</math> is the set of parameters determining the motion in the region <math>\mathcal{R}</math>, <math>\rho()</math> is data cost term, <math>g()</math> is a weighting function that determines the influence of pixel <math>(x, y)</math> on the total cost, and <math>I_1</math> and <math>I_2</math> are frames 1 and 2 from a pair of consecutive frames.
<ref name="Fortun_Survey_2015" />
The simplest parametric model is the [[Lucas-Kanade method]]. This uses rectangular regions and parameterises the motion as purely translational. The Lucas-Kanade method uses the original brightness constancy constrain as the data cost term and selects <math>g(x, y) = 1</math>.
This yields the local loss function,
:<math>
\hat{\boldsymbol{\alpha}} = \arg \min_{\boldsymbol{\alpha}} \sum_{(x, y) \in \mathcal{R}} | I(x + u_{\boldsymbol{\alpha}}, y + v_{\boldsymbol{\alpha}}, t + 1) - I(x, y, t)| .
</math>
Other possible local loss functions include the negative normalized [[cross-correlation]] between the two frames.<ref>{{cite conference |last=Lucas |first=Bruce D. |last2=Kanade |first2=Takeo |date=1981-08-24 |title=An iterative image registration technique with an application to stereo vision |url=https://dl.acm.org/doi/10.5555/1623264.1623280 |journal=Proceedings of the 7th International Joint Conference on Artificial intelligence - Volume 2 |series=IJCAI'81 |___location=San Francisco, CA, USA |publisher=Morgan Kaufmann Publishers Inc. |pages=674–679}}</ref>
===Learning Based Models===
These models Instead of seeking to model optical flow directly, one can train a [[machine learning]] system to estimate optical flow. Since 2015, when FlowNet<ref>{{Cite conference |last=Dosovitskiy |first=Alexey |last2=Fischer |first2=Philipp |last3=Ilg |first3=Eddy |last4=Hausser |first4=Philip |last5=Hazirbas |first5=Caner |last6=Golkov |first6=Vladimir |last7=Smagt |first7=Patrick van der |last8=Cremers |first8=Daniel |last9=Brox |first9=Thomas |date=2015 |title=FlowNet: Learning Optical Flow with Convolutional Networks |url=https://ieeexplore.ieee.org/document/7410673/ |publisher=IEEE |pages=2758–2766 |doi=10.1109/ICCV.2015.316 |isbn=978-1-4673-8391-2 | conference=2015 IEEE International Conference on Computer Vision (ICCV)}}</ref> was proposed, learning based models have been applied to optical flow and have gained prominence. Initially, these approaches were based on [[Convolutional neural network|Convolutional Neural Networks]] arranged in a [[U-Net]] architecture. However, with the advent of [[Transformer (deep learning architecture)|transformer architecture]] in 2017, transformer based have gained prominence.<ref>{{Cite journal |last=Alfarano |first=Andrea |last2=Maiano |first2=Luca |last3=Papa |first3=Lorenzo |last4=Amerini |first4=Irene |date=2024 |title=Estimating optical flow: A comprehensive review of the state of the art |url=https://linkinghub.elsevier.com/retrieve/pii/S1077314224002418 |journal=Computer Vision and Image Understanding |language=en |volume=249 |pages=104160 |doi=10.1016/j.cviu.2024.104160}}</ref>
== Uses ==
|