Revision as of 00:44, 23 March 2025 edit Moderately Sized Greg (talk \| contribs) 36 edits m Removed excess spaces and arranged references properly as per OpalYosutebito's edits on: https://en.wikipedia.org/w/index.php?title=User:Moderately_Sized_Greg/sandbox&action=history ← Previous edit		Revision as of 03:42, 17 April 2025 edit undo Citation bot (talk \| contribs) Bots 5,869,758 edits Alter: pages, journal, url, first3. URLs might have been anonymized. Add: doi, bibcode, authors 1-1. Removed parameters. Formatted dashes. Some additions/deletions were parameter name changes. Upgrade ISBN10 to 13. \| Use this bot. Report bugs. \| Suggested by Dominic3203 \| Linked from User:LinguisticMystic/cs/outline \| #UCB_webform_linked 1487/2277 Next edit →
Line 10: == Estimation == Optical flow can be estimated in a number of ways. Broadly, optical flow estimation approaches can be divided into machine learning based models (sometimes called data-driven models), classical models (sometimes called knowledge-driven models) which do not use machine learning and hybrid models which use aspects of both learning based models and classical models.<ref>{{cite journal \|last1=Zhai \|first1=Mingliang \|last2=Xiang \|first2=Xuezhi \|last3=Lv \|first3=Ning \|last4=Kong \|first4=Xiangdong \|title=Optical flow and scene flow estimation: A survey \|journal=Pattern Recognition \|date=2021 \|volume=114 \|pages=107861 \|doi=10.1016/j.patcog.2021.107861 \|bibcode=2021PatRe.11407861Z \|url=https://www.sciencedirect.com/science/article/pii/S0031320321000480}}</ref> ===Classical Models=== Many classical models use the intuitive assumption of ''brightness constancy''; that even if a point moves between frames, its brightness stays constant.<ref name="Fortun_Survey_2015">{{cite journal \|last1=Fortun \|first1=Denis \|last2=Bouthemy \|first2=Patrick \|last3=Kervrann \|first3=Charles\|title=Optical flow modeling and computation: A survey \|journal=Computer Vision and Image Understanding \|date=2015-05-01 \|volume=134 \|pages=~~1-21~~1–21 \|doi=10.1016/j.cviu.2015.02.008 \|url=https://www.sciencedirect.com/science/article/pii/S1077314215000429 \|access-date=2024-12-23}}</ref> To formalise this intuitive assumption, consider two consecutive frames from a video sequence, with intensity <math>I(x, y, t)</math>, where <math>(x, y)</math> refer to pixel coordinates and <math>t</math> refers to time. In this case, the brightness constancy constraint is Line 23: By itself, the brightness constancy constraint cannot be solved for <math>u</math> and <math>v</math> at each pixel, since there is only one equation and two unknowns. This is known as the ''[[Motion perception#The aperture problem\|aperture problem]]''. Therefore, additional constraints must be imposed to estimate the flow field.<ref name="Brox_2004">{{cite conference \|url=http://link.springer.com/10.1007/978-3-540-24673-2_3 \|title=High Accuracy Optical Flow Estimation Based on a Theory for Warping \|last1=Brox \|first1=Thomas \|last2=Bruhn \|first2=Andrés \|last3=Papenberg \|first3=Nils \|last4=Weickert \|first4=Joachim \|date=2004 \|publisher=Springer Berlin Heidelberg \|book-title=Computer Vision - ECCV 2004 \|pages=~~25-36~~25–36 \|___location=Berlin, Heidelberg \|doi=10.1007/978-3-540-24673-2_3 \|conference=ECCV 2004}}</ref><ref name="Baker_2011">{{cite journal \|last1=Baker \|first1=Simon \|last2=Scharstein \|first2=Daniel \|last3=Lewis \|first3=J. P. \|last4=Roth \|first4=Stefan \|last5=Black \|first5=Michael J. \|last6=Szeliski \|first6=Richard \|title=A Database and Evaluation Methodology for Optical Flow \|journal=International Journal of Computer Vision \|date=1 March 2011 \|volume=92 \|issue=1 \|pages=1–31 \|doi=10.1007/s11263-010-0390-2 \|url=https://link.springer.com/article/10.1007/s11263-010-0390-2 \|access-date=25 Dec 2024 \|language=en \|issn=1573-1405}}</ref> ==== Regularized Models ==== Line 40: :<math>E = \iint_\Omega \Psi(I_x u + I_y v + I_t) + \alpha \Psi(\|\nabla u\|) + \alpha \Psi(\|\nabla v\|) dx dy. </math> For the choice of <math>\Psi(x) = x^2</math>, this method is the same as the [[Horn-Schunck method]].<ref name="Horn_1980" /> Of course, other choices of cost function have been used such as <math>\Psi(x) = \sqrt{x^2 + \epsilon^2}</math>, which is a differentiable variant of the [[Taxicab geometry \|<math>L^1</math> norm]].<ref name="Fortun_Survey_2015" /><ref>{{cite conference \|url=https://ieeexplore.ieee.org~~/abstract~~/document/5539939 \|title=Secrets of optical flow estimation and their principles \|last1=Sun \|first1=Deqing \|last2=Roth \|first2=Stefan \|last3=Black \|first3="Micahel J." \|date=2010 \|publisher=IEEE \|book-title=2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition \|pages= ~~2432-2439~~2432–2439 \|___location=San Francisco, CA, USA \|conference=2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition}}</ref> To solve the aforementioned optimization problem, one can use the [[Euler-Lagrange equations]] to provide a system of partial differential equations for each point in <math>I(x, y, t)</math>. In the simplest case of using <math>\Psi(x) = x^2</math>, these equations are, Line 49: Doing so yields a system of linear equations which can be solved for <math>(u, v)</math> at each pixel, using an iterative scheme such as [[Gauss-Seidel]].<ref name="Horn_1980" /> Although, linearising the brightness constancy constraint simplifies the optimisation problem significantly, the linearisation is only valid for small displacements and/or smooth images. To avoid this problem, a multi-scale or coarse-to-fine approach is often used. In such a scheme, the images are initially [[downsampling\|downsampled]] and the linearised Euler-Lagrange equations are solved at the reduced resolution. The estimated flow field at this scale is then used to initialise the process at next scale.<ref>{{cite journal \|last1=Meinhardt-Llopis \|first1=Enric \|last2=Pérez \|first2=Javier Sánchez \|last3=Kondermann \|first3=Daniel \|title=Horn-Schunck Optical Flow with a Multi-Scale Strategy \|journal=Image Processing Onon Line \|date=19 July 2013 \|volume=3 \|pages=151–172 \|doi=10.5201/ipol.2013.20}}</ref> This initialisation process is often performed by [[image warping\|warping]] one frame using the current estimate of flow field so that it is as similar to other as possible.<ref name="Brox_2004" /><ref>{{cite journal \|last1=Black \|first1=Michael J. \|last2=Anandan \|first2=P. \|title=The Robust Estimation of Multiple Motions: Parametric and Piecewise-Smooth Flow Fields \|journal=Computer Vision and Image Understanding \|date=1 January 1996 \|volume=63 \|issue=1 \|pages=75–104 \|doi=10.1006/cviu.1996.0006 \|issn=1077-3142}}</ref> An alternate approach is to discretize the optimisation problem and then perform a search of the possible <math>(u, v)</math> values without linearising it.<ref>{{cite conference \|url=https://ieeexplore.ieee.org/document/5459364 \|title=Large Displacement Optical Flow Computation without Warping \|last1=Steinbr¨ucker \|first1=Frank \|last2=Pock \|first2=Thomas \|last3=Cremers \|first3=Daniel \|last4=Weickert \|first4=Joachim \|date=2009 \|publisher=IEEE \|book-title=2009 IEEE 12th International Conference on Computer Vision \|pages=~~1609-1614~~1609–1614 \|conference=2009 IEEE 12th International Conference on Computer Vision}}</ref> This search is often performed using [[Max-flow min-cut theorem]] algorithms, linear programming or [[belief propagation]] methods. Line 70: \hat{\boldsymbol{\alpha}} = \arg \min_{\boldsymbol{\alpha}} \sum_{(x, y) \in \mathcal{R}} \| I(x + u_{\boldsymbol{\alpha}}, y + v_{\boldsymbol{\alpha}}, t + 1) - I(x, y, t)\| . </math> Other possible local loss functions include the negative normalized [[cross-correlation]] between the two frames.<ref>{{cite conference \|~~last~~last1=Lucas \|~~first~~first1=Bruce D. \|last2=Kanade \|first2=Takeo \|date=1981-08-24 \|title=An iterative image registration technique with an application to stereo vision \|url=https://dl.acm.org/doi/10.5555/1623264.1623280 \|journal=Proceedings of the 7th International Joint Conference on Artificial ~~intelligence~~Intelligence - Volume 2 \|series=IJCAI'81 \|___location=San Francisco, CA, USA \|publisher=Morgan Kaufmann Publishers Inc. \|pages=674–679}}</ref> ===Learning-Based Models=== Instead of seeking to model optical flow directly, one can train a [[machine learning]] system to estimate optical flow. Since 2015, when FlowNet<ref>{{Cite conference \|~~last~~last1=Dosovitskiy \|~~first~~first1=Alexey \|last2=Fischer \|first2=Philipp \|last3=Ilg \|first3=Eddy \|last4=Hausser \|first4=Philip \|last5=Hazirbas \|first5=Caner \|last6=Golkov \|first6=Vladimir \|last7=Smagt \|first7=Patrick van der \|last8=Cremers \|first8=Daniel \|last9=Brox \|first9=Thomas \|date=2015 \|title=FlowNet: Learning Optical Flow with Convolutional Networks \|url=https://ieeexplore.ieee.org/document/7410673/ \|publisher=IEEE \|pages=2758–2766 \|doi=10.1109/ICCV.2015.316 \|isbn=978-1-4673-8391-2 \| conference=2015 IEEE International Conference on Computer Vision (ICCV)}}</ref> was proposed, learning based models have been applied to optical flow and have gained prominence. Initially, these approaches were based on [[Convolutional neural network\|Convolutional Neural Networks]] arranged in a [[U-Net]] architecture. However, with the advent of [[Transformer (deep learning architecture)\|transformer architecture]] in 2017, transformer based models have gained prominence.<ref>{{Cite journal \|~~last~~last1=Alfarano \|~~first~~first1=Andrea \|last2=Maiano \|first2=Luca \|last3=Papa \|first3=Lorenzo \|last4=Amerini \|first4=Irene \|date=2024 \|title=Estimating optical flow: A comprehensive review of the state of the art \|url=https://linkinghub.elsevier.com/retrieve/pii/S1077314224002418 \|journal=Computer Vision and Image Understanding \|language=en \|volume=249 \|pages=104160 \|doi=10.1016/j.cviu.2024.104160}}</ref> Most learning-based approaches to optical flow use [[supervised learning]]. In this case, many frame pairs of video data and their corresponding [[ground truth\|ground-truth]] flow fields are used to optimise the parameters of the learning-based model to accurately estimate optical flow. This process often relies on vast training datasets due to the number of parameters involved.<ref>{{cite journal \|last1=Tu \|first1=Zhigang \|last2=Xie \|first2=Wei \|last3=Zhang \|first3=Dejun \|last4=Poppe \|first4=Ronald \|last5=Veltkamp \|first5=Remco C. \|last6=Li \|first6=Baoxin \|last7=Yuan \|first7=Junsong \|title=A survey of variational and CNN-based optical flow techniques \|journal=Signal Processing: Image Communication \|date=1 March 2019 \|volume=72 \|pages=9–24 \|doi=10.1016/j.image.2018.12.002}}</ref>

Optical flow: Difference between revisions