User:Moderately Sized Greg/sandbox: Difference between revisions

Content deleted Content added
Learning Based Models: Added a few lines of text to this section.
More grammar edits. There was one sentence that was in a short paragraph so I put it in a more relevant section
 
(13 intermediate revisions by 2 users not shown)
Line 2:
<!-- EDIT BELOW THIS LINE -->
 
This is a currently a working draft for some changes for the [[Optical_flow|Optical flow]] page.
 
==Optical Flow==
Line 9:
[[Image:Opticfloweg.png|thumb|right|400px|The optic flow experienced by a rotating observer (in this case a fly). The direction and magnitude of optic flow at each ___location is represented by the direction and length of each arrow.]]
 
'''Optical flow''' or '''optic flow''' is the pattern of apparent [[motion (physics)|motion]] of objects, surfaces, and edges in a visual scene caused by the [[relative motion]] between an observer and a scene.<ref>{{Cite book |url={{google books|plainurl=yes|id=CSgOAAAAQAAJ|pg=PA77|text=optical flow}} |title=Thinking in Perspective: Critical Essays in the Study of Thought Processes |last1=Burton |first1=Andrew |last2=Radford |first2=John |publisher=Routledge |year=1978 |isbn=978-0-416-85840-2}}</ref><ref>{{Cite book |url={{google books|plainurl=yes|id=-I_Hazgqx8QC|pg=PA414|text=optical flow}} |title=Electronic Spatial Sensing for the Blind: Contributions from Perception |last1=Warren |first1=David H. |last2=Strelow |first2=Edward R. |publisher=Springer |year=1985 |isbn=978-90-247-2689-9}}</ref> Optical flow can also be defined as the distribution of apparent velocities of movement of brightness pattern in an image.<ref name="Horn_1980">{{Cite journal |last1=Horn |first1=Berthold K.P. |last2=Schunck |first2=Brian G. |date=August 1981 |title=Determining optical flow |url=http://image.diku.dk/imagecanon/material/HornSchunckOptical_Flow.pdf |journal=Artificial Intelligence |language=en |volume=17 |issue=1–3 |pages=185–203 |doi=10.1016/0004-3702(81)90024-2|hdl=1721.1/6337 }}</ref>
 
The concept of optical flow was introduced by the American psychologist [[James J. Gibson]] in the 1940s to describe the visual stimulus provided to animals moving through the world.<ref>{{Cite book |title=The Perception of the Visual World |last=Gibson |first=J.J. |publisher=Houghton Mifflin |year=1950}}</ref> Gibson stressed the importance of optic flow for [[Affordance|affordance perception]], the ability to discern possibilities for action within the environment. Followers of Gibson and his [[Ecological Psychology|ecological approach]] to psychology]] have further demonstrated the role of the optical flow stimulus for the perception of movement by the observer in the world; perception of the shape, distance and movement of objects in the world; and the control of [[Animal locomotion|locomotion]].<ref>{{Cite journal |last1=Royden |first1=C. S. |last2=Moore |first2=K. D. |year=2012 |title=Use of speed cues in the detection of moving objects by moving observers |journal=Vision Research |volume=59 |pages=17–24 |doi=10.1016/j.visres.2012.02.006|pmid=22406544 |s2cid=52847487 |doi-access=free }}</ref>
 
The term optical flow is also used by roboticists, encompassing related techniques from image processing and control of navigation including [[motion detection]], [[Image segmentation|object segmentation]], time-to-contact information, focus of expansion calculations, luminance, [[motion compensation|motion compensated]] encoding, and stereo disparity measurement.<ref name="Kelson R. T. Aires, Andre M. Santana, Adelardo A. D. Medeiros 2008">{{Cite book |url=http://www.dca.ufrn.br/~adelardo/artigos/SAC08.pdf |title=Optical Flow Using Color Information |last1=Aires |first1=Kelson R. T. |last2=Santana |first2=Andre M. |last3=Medeiros |first3=Adelardo A. D. |publisher=ACM New York, NY, USA |year=2008 |isbn=978-1-59593-753-7}}</ref><ref name="S. S. Beauchemin, J. L. Barron 1995">{{Cite journal |url=http://portal.acm.org/ft_gateway.cfm?id=212141&type=pdf&coll=GUIDE&dl=GUIDE&CFID=72158298&CFTOKEN=85078203 |title=The computation of optical flow |last1=Beauchemin |first1=S. S. |last2=Barron |first2=J. L. |journal=ACM Computing Surveys |publisher=ACM New York, USA |year=1995|volume=27 |issue=3 |pages=433–466 |doi=10.1145/212094.212141 |s2cid=1334552 |doi-access=free }}</ref>
 
== Estimation ==
 
Optical flow can be estimated in a number of ways. Broadly, optical flow estimation approaches can be divided into machine learning based models (sometimes called data-driven models), classical models (sometimes called knowledge-driven models) which do not use machine learning and hybrid models which use aspects of both learning based models and classical models.<ref name="Zhai_Survey_2021">{{cite journal |last1=Zhai |first1=Mingliang |last2=Xiang |first2=Xuezhi |last3=Lv |first3=Ning |last4=Kong |first4=Xiangdong |title=Optical flow and scene flow estimation: A survey |journal=Pattern Recognition |date=2021 |volume=114 |pages=107861 |doi=10.1016/j.patcog.2021.107861 |url=https://www.sciencedirect.com/science/article/pii/S0031320321000480}}</ref>
 
===Classical Models===
 
Many classical models use the intuitive assumption of ''brightness constancy''; that even if a point moves between frames, its brightness stays constant.<ref name="Fortun_Survey_2015">{{cite journal |last1=Fortun |first1=Denis |last2=Bouthemy |first2=Patrick |last3=Kervrann |first3=Charles|title=Optical flow modeling and computation: A survey |journal=Computer Vision and Image Understanding |date=2015-05-01 |volume=134 |pages=1-21 |doi=10.1016/j.cviu.2015.02.008 |url=https://www.sciencedirect.com/science/article/pii/S1077314215000429 |access-date=2024-12-23}}</ref>
To formalise this intuitive assumption, consider two consecutive frames from a video sequence, with intensity <math>I(x, y, t)</math>, where <math>(x, y)</math> refer to pixel coordinates and <math>t</math> refers to time.
<ref name="Fortun_Survey_2015">{{cite journal |last1=Fortun |first1=Denis |last2=Bouthemy |first2=Patrick |last3=Kervrann |first3=Charles|title=Optical flow modeling and computation: A survey |journal=Computer Vision and Image Understanding |date=2015-05-01 |volume=134 |pages=1-21 |doi=10.1016/j.cviu.2015.02.008 |url=https://www.sciencedirect.com/science/article/pii/S1077314215000429 |access-date=2024-12-23}}</ref>
To formalise this intuitive assumption, consider two consecutive frames from a video sequence, with intensity <math>I(x, y, t)</math>, where <math>(x, y)</math> refer to pixel coordinates and <math>t</math> refers to time.
In this case, the brightness constancy constraint is
:<math>
I(x, y, t) - I(x + u, y + v, t + 1) = 0,
</math>
where <math>\mathbf{w}:= (u, v)</math> is the displacement vector between a point in the first frame and the corresponding point in the second frame.
By itself, the brightness constancy constraint cannot be solved for <math>u</math> and <math>v</math> at each pixel, since there is only one equation and two unknowns.
This is known as the ''[[Motion perception#The aperture problem|aperture problem]]''.
Therefore, additional constraints must be imposed to estimate the flow field.<ref name="Brox_2004">{{cite conference |url=http://link.springer.com/10.1007/978-3-540-24673-2_3 |title=High Accuracy Optical Flow Estimation Based on a Theory for Warping |last1=Brox |first1=Thomas |last2=Bruhn |first2=Andrés |last3=Papenberg |first3=Nils |last4=Weickert |first4=Joachim |date=2004 |publisher=Springer Berlin Heidelberg |book-title=Computer Vision - ECCV 2004 |pages=25-36 |___location=Berlin, Heidelberg |conference=ECCV 2004}}</ref><ref name="Baker_2011">{{cite journal |last1=Baker |first1=Simon |last2=Scharstein |first2=Daniel |last3=Lewis |first3=J. P. |last4=Roth |first4=Stefan |last5=Black |first5=Michael J. |last6=Szeliski |first6=Richard |title=A Database and Evaluation Methodology for Optical Flow |journal=International Journal of Computer Vision |date=1 March 2011 |volume=92 |issue=1 |pages=1–31 |doi=10.1007/s11263-010-0390-2 |url=https://link.springer.com/article/10.1007/s11263-010-0390-2 |access-date=25 Dec 2024 |language=en |issn=1573-1405}}</ref>
 
==== Regularized Models ====
Perhaps the most natural approach to addressing the aperture problem is to apply a smoothness constraint or a ''regularization constraint'' to the flow field.
One can combine both of these constraints to formulate estimating optical flow as an [[Optimization problem|optimization problem]], where the goal is to minimize the cost function of the form,
:<math>E = \iint_\Omega \Psi(I(x + u, y + v, t + 1) - I(x, y, t)) + \alpha \Psi(|\nabla u|) + \alpha \Psi(|\nabla v|) dx dy, </math>
where <math>\Omega</math> is the extent of the images <math>I(x, y)</math>, <math>\nabla</math> is the gradient operator, <math>\alpha</math> is a constant, and <math>\Psi()</math> is a [[loss function|loss function]].<ref name="Fortun_Survey_2015" /><ref name="Brox_2004" />
<ref name="Fortun_Survey_2015" /> <ref name="Brox_2004" />
 
This optimisation problem is difficult to solve owing to its non-linearity.
Line 44 ⟶ 42:
:<math>\frac{\partial I}{\partial x}u+\frac{\partial I}{\partial y}v+\frac{\partial I}{\partial t} = 0.</math>
For convenience, the derivatives of the image, <math>\tfrac{\partial I}{\partial x}</math>, <math>\tfrac{\partial I}{\partial y}</math> and <math>\tfrac{\partial I}{\partial t}</math> are often condensed to become <math>I_x</math>, <math>I_y</math> and <math> I_t</math>.
Doing so, allows one to rewrite the linearised brightness constancy constraint as, <ref name="Baker_2011" />
:<math>I_x u + I_y v+ I_t = 0.</math>
The optimization problem can now be rewritten as
:<math>E = \iint_\Omega \Psi(I_x u + I_y v + I_t) + \alpha \Psi(|\nabla u|) + \alpha \Psi(|\nabla v|) dx dy. </math>
For the choice of <math>\Psi(x) = x^2</math>, this method is the same as the [[Horn-Schunck method]].<ref name="Horn_1980" />
Of course, other choices of cost function have been used such as <math>\Psi(x) = \sqrt{x^2 + \epsilon^2}</math>, which is a differentiable variant of the [[Taxicab geometry |<math>L^1</math> norm]].<ref name="Fortun_Survey_2015" /><ref>{{cite conference |url=https://ieeexplore.ieee.org/abstract/document/5539939 |title=Secrets of optical flow estimation and their principles |last1=Sun |first1=Deqing |last2=Roth |first2=Stefan |last3=Black |first3="Micahel J." |date=2010 |publisher=IEEE |book-title=2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition |pages= 2432-2439 |___location=San Francisco, CA, USA |conference=2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition}}</ref>
<ref name="Horn_1980"/>
Of course, other choices of cost function have been used such as <math>\Psi(x) = \sqrt{x^2 + \epsilon^2}</math>, which is a differentiable variant of the [[Taxicab geometry |<math>L^1</math> norm]]. <ref name="Fortun_Survey_2015" />
<ref>
{{cite conference |url=https://ieeexplore.ieee.org/abstract/document/5539939 |title=Secrets of optical flow estimation and their principles |last1=Sun |first1=Deqing |last2=Roth |first2=Stefan |last3=Black |first3="Micahel J." |date=2010 |publisher=IEEE |book-title=2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition |pages= 2432-2439 |___location=San Francisco, CA, USA |conference=2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition}}</ref>
 
To solve the aforementioned optimization problem, one can use the [[Euler-Lagrange equations]] to provide a system of partial differential equations for each point in <math>I(x, y, t)</math>. In the simplest case of using <math>\Psi(x) = x^2</math>, these equations are,
Line 58 ⟶ 53:
:<math> I_y(I_xu+I_yv+I_t) - \alpha \Delta v = 0,</math>
where <math>\Delta = \frac{\partial^2}{\partial x^2} + \frac{\partial^2}{\partial y^2} </math> denotes the [[Laplace operator]].
Since the image data is made up of discrete pixels, these equations are discretised.
Doing so yields a system of linear equations which can be solved for <math>(u, v)</math> at each pixel, using an iterative scheme such as [[Gauss-Seidel]]. <ref name="Horn_1980" />
 
Although, linearising the brightness constancy constraint simplifies the optimisation problem significantly, the linearisation is only valid for small displacements and/or smooth images. To avoid this problem, a multi-scale or coarse-to-fine approach is often used. In such a scheme, the images are initially [[downsampling|downsampled]] and the linearised Euler-Lagrange equations are solved at the reduced resolution. The estimated flow field at this scale is then used to initialise the process at next scale.<ref>{{cite journal |last1=Meinhardt-Llopis |first1=Enric |last2=Pérez |first2=Javier Sánchez |last3=Kondermann |first3=Daniel |title=Horn-Schunck Optical Flow with a Multi-Scale Strategy |journal=Image Processing On Line |date=19 July 2013 |volume=3 |pages=151–172 |doi=10.5201/ipol.2013.20}}</ref> This initialisation process is often performed by [[image warping|warping]] one frame using the current estimate of flow field so that it is as similar to other as possible.<ref name="Brox_2004" /><ref>{{cite journal |last1=Black |first1=Michael J. |last2=Anandan |first2=P. |title=The Robust Estimation of Multiple Motions: Parametric and Piecewise-Smooth Flow Fields |journal=Computer Vision and Image Understanding |date=1 January 1996 |volume=63 |issue=1 |pages=75–104 |doi=10.1006/cviu.1996.0006 |issn=1077-3142}}</ref>
An alternate approach is to discretize the optimisation problem and then perform a search of the possible <math>(u, v)</math> values without linearising it. <ref name="Steinbrucker_2009">{{cite conference |url=https://ieeexplore.ieee.org/document/5459364 |title=Large Displacement Optical Flow Computation without Warping |last1=Steinbr¨ucker |first1=Frank |last2=Pock |first2=Thomas |last3=Cremers |first3=Daniel |last4=Weickert |first4=Joachim |date=2009 |publisher=IEEE |book-title=2009 IEEE 12th International Conference on Computer Vision |pages=1609-1614 |conference=2009 IEEE 12th International Conference on Computer Vision}}</ref>
 
An alternate approach is to discretize the optimisation problem and then perform a search of the possible <math>(u, v)</math> values without linearising it. <ref name="Steinbrucker_2009">{{cite conference |url=https://ieeexplore.ieee.org/document/5459364 |title=Large Displacement Optical Flow Computation without Warping |last1=Steinbr¨ucker |first1=Frank |last2=Pock |first2=Thomas |last3=Cremers |first3=Daniel |last4=Weickert |first4=Joachim |date=2009 |publisher=IEEE |book-title=2009 IEEE 12th International Conference on Computer Vision |pages=1609-1614 |conference=2009 IEEE 12th International Conference on Computer Vision}}</ref>
This search is often performed using [[Max-flow min-cut theorem]] algorithms, linear programming or [[belief propagation]] methods.
 
==== Parametric Models ====
 
Instead of applying the regularization constraint on a point by point basis as per a regularized model, one can group pixels into regions and estimate the motion of these regions.
This is known as a ''parametric model'', since the motion of these regions is [[parameter|parameterized]].
In formulating optical flow estimation in this way, one makes the assumption that the motion field in each region be fully characterised by a set of parameters.
Therefore, the goal of a parametric model is to estimate the motion parameters that minimise a loss function which can be written as,
:<math>
\hat{\boldsymbol{\alpha}} = \arg \min_{\boldsymbol{\alpha}} \sum_{(x, y) \in \mathcal{R}} g(x, y) \rho(x, y, I_1, I_2, u_{\boldsymbol{\alpha}}, v_{\boldsymbol{\alpha}}),
</math>
where <math>{\boldsymbol{\alpha}}</math> is the set of parameters determining the motion in the region <math>\mathcal{R}</math>, <math>\rho()</math> is data cost term, <math>g()</math> is a weighting function that determines the influence of pixel <math>(x, y)</math> on the total cost, and <math>I_1</math> and <math>I_2</math> are frames 1 and 2 from a pair of consecutive frames.<ref name="Fortun_Survey_2015" />
<ref name="Fortun_Survey_2015" />
 
The simplest parametric model is the [[Lucas-Kanade method]]. This uses rectangular regions and parameterises the motion as purely translational. The Lucas-Kanade method uses the original brightness constancy constrain as the data cost term and selects <math>g(x, y) = 1</math>.
This yields the local loss function,
:<math>
\hat{\boldsymbol{\alpha}} = \arg \min_{\boldsymbol{\alpha}} \sum_{(x, y) \in \mathcal{R}} | I(x + u_{\boldsymbol{\alpha}}, y + v_{\boldsymbol{\alpha}}, t + 1) - I(x, y, t)| .
</math>
Other possible local loss functions include the negative normalized [[cross-correlation]] between the two frames. <ref>{{cite conference |last=Lucas |first=Bruce D. |last2=Kanade |first2=Takeo |date=1981-08-24 |title=An iterative image registration technique with an application to stereo vision |url=https://dl.acm.org/doi/10.5555/1623264.1623280 |journal=Proceedings of the 7th International Joint Conference on Artificial intelligence - Volume 2 |series=IJCAI'81 |___location=San Francisco, CA, USA |publisher=Morgan Kaufmann Publishers Inc. |pages=674–679}}</ref>
 
===Learning -Based Models===
 
These models Instead of seeking to model optical flow directly, one can train a [[machine learning]] system to estimate optical flow. Since 2015, when FlowNet<ref>{{Cite conference |last=Dosovitskiy |first=Alexey |last2=Fischer |first2=Philipp |last3=Ilg |first3=Eddy |last4=Hausser |first4=Philip |last5=Hazirbas |first5=Caner |last6=Golkov |first6=Vladimir |last7=Smagt |first7=Patrick van der |last8=Cremers |first8=Daniel |last9=Brox |first9=Thomas |date=2015 |title=FlowNet: Learning Optical Flow with Convolutional Networks |url=https://ieeexplore.ieee.org/document/7410673/ |publisher=IEEE |pages=2758–2766 |doi=10.1109/ICCV.2015.316 |isbn=978-1-4673-8391-2 | conference=2015 IEEE International Conference on Computer Vision (ICCV)}}</ref> was proposed, learning based models have been applied to optical flow and have gained prominence. Initially, these approaches were based on [[Convolutional neural network|Convolutional Neural Networks]] arranged in a [[U-Net]] architecture. However, with the advent of [[Transformer (deep learning architecture)|transformer architecture]] in 2017, transformer based models have gained prominence. <ref>{{Cite journal |last=Alfarano |first=Andrea |last2=Maiano |first2=Luca |last3=Papa |first3=Lorenzo |last4=Amerini |first4=Irene |date=2024 |title=Estimating optical flow: A comprehensive review of the state of the art |url=https://linkinghub.elsevier.com/retrieve/pii/S1077314224002418 |journal=Computer Vision and Image Understanding |language=en |volume=249 |pages=104160 |doi=10.1016/j.cviu.2024.104160}}</ref>
 
Most learning-based approaches to optical flow use [[supervised learning]]. In this case, many frame pairs of video data and their corresponding [[ground truth|ground-truth]] flow fields are used to optimise the parameters of the learning-based model to accurately estimate optical flow. This process often relies on vast training datasets due to the number of parameters involved.<ref>{{cite journal |last1=Tu |first1=Zhigang |last2=Xie |first2=Wei |last3=Zhang |first3=Dejun |last4=Poppe |first4=Ronald |last5=Veltkamp |first5=Remco C. |last6=Li |first6=Baoxin |last7=Yuan |first7=Junsong |title=A survey of variational and CNN-based optical flow techniques |journal=Signal Processing: Image Communication |date=1 March 2019 |volume=72 |pages=9–24 |doi=10.1016/j.image.2018.12.002}}</ref>
 
These models Instead of seeking to model optical flow directly, one can train a [[machine learning]] system to estimate optical flow. Since 2015, when FlowNet<ref>{{Cite conference |last=Dosovitskiy |first=Alexey |last2=Fischer |first2=Philipp |last3=Ilg |first3=Eddy |last4=Hausser |first4=Philip |last5=Hazirbas |first5=Caner |last6=Golkov |first6=Vladimir |last7=Smagt |first7=Patrick van der |last8=Cremers |first8=Daniel |last9=Brox |first9=Thomas |date=2015 |title=FlowNet: Learning Optical Flow with Convolutional Networks |url=https://ieeexplore.ieee.org/document/7410673/ |publisher=IEEE |pages=2758–2766 |doi=10.1109/ICCV.2015.316 |isbn=978-1-4673-8391-2 | conference=2015 IEEE International Conference on Computer Vision (ICCV)}}</ref> was proposed, learning based models have been applied to optical flow and have gained prominence. Initially, these approaches were based on [[Convolutional neural network|Convolutional Neural Networks]] arranged in a [[U-Net]] architecture. However, with the advent of [[Transformer (deep learning architecture)|transformer architecture]] in 2017, transformer based have gained prominence. <ref>{{Cite journal |last=Alfarano |first=Andrea |last2=Maiano |first2=Luca |last3=Papa |first3=Lorenzo |last4=Amerini |first4=Irene |date=2024 |title=Estimating optical flow: A comprehensive review of the state of the art |url=https://linkinghub.elsevier.com/retrieve/pii/S1077314224002418 |journal=Computer Vision and Image Understanding |language=en |volume=249 |pages=104160 |doi=10.1016/j.cviu.2024.104160}}</ref>
== Uses ==
 
Line 101 ⟶ 100:
{{distinguish|Optical flowmeter}}
 
Various configurations of optical flow sensors exist. One configuration is an image sensor chip connected to a processor programmed to run an optical flow algorithm. Another configuration uses a vision chip, which is an integrated circuit having both the [[image sensor]] and the processor on the same die, allowing for a compact implementation.<ref>{{Cite book |title=Vision Chips |last=Moini |first=Alireza |date=2000 |publisher=Springer US |isbn=9781461552673 |___location=Boston, MA |oclc=851803922}}</ref><ref>{{Cite book |title=Analog VLSI and neural systems |last=Mead |first=Carver |date=1989 |publisher=Addison-Wesley |isbn=0201059924 |___location=Reading, Mass. |oclc=17954003 |url-access=registration |url=https://archive.org/details/analogvlsineural00mead }}</ref> An example of this is a generic optical mouse sensor used in an [[optical mouse]] sensor. In some cases, the processing circuitry may be implemented using analog or mixed-signal circuits to enable fast optical flow computation using minimal current consumption.<!--Optical flow sensors are used as the main sensing component for measuring the motion of the mouse across a surface. (from a short paragraph related to optical mice. It's best to merge the two together somehow to make the paragraph more focused)-->
 
One area of contemporary research is the use of [[neuromorphic engineering]] techniques to implement circuits that respond to optical flow, and thus may be appropriate for use in an optical flow sensor.<ref>{{Cite book |title=Analog VLSI circuits for the perception of visual motion |last=Stocker |first=Alan A. |date=2006 |publisher=John Wiley & Sons |isbn=0470034882 |___location=Chichester, England |oclc=71521689}}</ref> Such circuits may draw inspiration from biological neural circuitry that similarly responds to optical flow.
 
Optical flow sensors are used extensively in computer [[optical mouse|optical mice]], as the main sensing component for measuring the motion of the mouse across a surface.
 
Optical flow sensors are also being used in [[robotics]] applications, primarily where there is a need to measure visual motion or relative motion between the robot and other objects in the vicinity of the robot. The use of optical flow sensors in [[unmanned aerial vehicle| unmanned aerial vehicles (UAVs)]], for stability and obstacle avoidance, is also an area of current research.<ref>{{Cite book |title=Flying insects and robots |date=2009 |publisher=Springer |isbn=9783540893936 |editor-last=Floreano |editor-first=Dario |___location=Heidelberg |oclc=495477442 |editor-last2=Zufferey |editor-first2=Jean-Christophe |editor-last3=Srinivasan |editor-first3=Mandyam V. |editor-last4=Ellington |editor-first4=Charlie}}</ref>
Line 119 ⟶ 116:
== References ==
 
{{reflist|1=2}}
 
<!--