Graph cuts in computer vision: Difference between revisions

Content deleted Content added
top: make statement more precise + provide citation
Bender the Bot (talk | contribs)
m History: HTTP to HTTPS for Brown University
 
(46 intermediate revisions by 23 users not shown)
Line 1:
As applied in the field of [[computer vision]], '''[[Cut (graph theory)|graphcut cutsoptimization]]''' can be employed to [[Polynomial time|efficiently]] solve a wide variety of low-level computer vision problems (''early vision''<ref>Adelson, Edward H., and James R. Bergen (1991), "[http://persci.mit.edu/pub_pdfs/elements91.pdf The plenoptic function and the elements of early vision]", Computational models of visual processing 1.2 (1991).</ref>), such as image [[image smoothing]], the stereo [[correspondence problem]], [[image segmentation]], [[object co-segmentation]], and many other computer vision problems that can be formulated in terms of [[energy minimization]]. Many of these energy minimization problems can be approximated by solving a [[maximum flow problem]] in a [[Graph (discrete mathematics)|graph]]<ref>Boykov, Y., Veksler, O., and Zabih, R. (2001), "Fast approximate energy minimization via graph cuts," ''IEEE Trans. Pattern Analysis and Machine Intelligence,'' 23(11): 1222-1239.</ref> (and thus, by the [[max-flow min-cut theorem]], define a minimal cut of the graph). Under most formulations of such problems in computer vision, the minimum energy solution corresponds to the [[MAP estimate|maximum a posteriori estimate]] of a solution. Although many computer vision algorithms involve cutting a graph (e.g., normalized cuts), the term "graph cuts" is applied specifically to those models which employ a max-flow/min-cut optimization (other graph cutting algorithms may be considered as [[graph partitioning]] algorithms).
 
Many of these energy minimization problems can be approximated by solving a [[maximum flow problem]] in a [[Graph (discrete mathematics)|graph]]<ref>Boykov, Y., Veksler, O., and Zabih, R. (2001), "[https://www.cs.cornell.edu/rdz/Papers/BVZ-pami01-final.pdf Fast approximate energy minimization via graph cuts]," ''IEEE Transactions on Pattern Analysis and Machine Intelligence,'' 23(11): 1222-1239.</ref> (and thus, by the [[max-flow min-cut theorem]], define a minimal [[cut (graph theory)|cut]] of the graph).
"Binary" problems (such as denoising a [[binary image]]) can be solved exactly using this approach; problems where pixels can be labeled with more than two different labels (such as stereo correspondence, or denoising of a [[grayscale]] image) cannot be solved exactly, but solutions produced are usually near the [[global optimum]].
 
Under most formulations of such problems in computer vision, the minimum energy solution corresponds to the [[Maximum a posteriori estimation|maximum a posteriori estimate]] of a solution.
 
Although many computer vision algorithms involve cutting a graph (e.g., normalized cuts), the term "graph cuts" is applied specifically to those models which employ a max-flow/min-cut optimization (other graph cutting algorithms may be considered as [[graph partition]]ing algorithms).
 
"Binary" problems (such as [[denoising]] a [[binary image]]) can be solved exactly using this approach; problems where pixels can be labeled with more than two different labels (such as stereo correspondence, or denoising of a [[grayscale]] image) cannot be solved exactly, but solutions produced are usually near the [[global optimum]].
 
== History ==
The foundational theory of [[Cut (graph theory)|graph cuts]] was first applied in [[computer vision]] in the seminal paper by Greig, Porteous and Seheult<ref name="D.M. Greig, B.T 1989">D.M. Greig, B.T. Porteous and A.H. Seheult (1989), ''[https://classes.cs.uchicago.edu/archive/2006/fall/35040-1/gps.pdf Exact maximum a posteriori estimation for binary images]'', Journal of the Royal Statistical Society, Series B, '''51''', 271–279.</ref> of [[Durham University]]. Allan Seheult and Bruce Porteous were members of Durham's lauded statistics group of the time, led by [[Julian Besag]] and [[Peter Green (statistician)|Peter Green]], with the optimisation expert [[Margaret Greig]] notable as the first ever female member of staff of the Durham Mathematical Sciences Department.

In the [[Bayesian statistics|Bayesian]] statistical context of [[smoothing]] noisy (or corrupted) images, they showed how the [[MAP estimate|maximum a posteriori estimate]] of a [[binary image]] can be obtained ''exactly'' by maximizing the [[Flow network|flow]] through an associated image network, involving the introduction of a ''source'' and ''sink''. The problem was therefore shown to be efficiently solvable. Prior to this result, ''approximate'' techniques such as [[simulated annealing]] (as proposed by the [[Donald Geman|Geman brothers]]),<ref>D. Geman and S. Geman (1984), ''[https://www.dam.brown.edu/people/documents/stochasticrelaxation.pdf Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images]'', IEEE Trans. Pattern Anal. Mach. Intell., '''6''', 721–741.</ref>), or [[iterated conditional modes]] (a type of [[greedy algorithm]] as suggested by [[Julian Besag]])<ref>J.E. Besag (1986), ''On the statistical analysis of dirty pictures (with discussion)'', [[Journal of the Royal Statistical Society]] Series B, '''48''', 259–302</ref> were used to solve such image smoothing problems.
 
Although the general [[Graph coloring|<math>k</math>[[Graph coloring|-colour problem]] remainsis NP unsolvedhard for <math>k > 2,</math> the approach of Greig, Porteous and Seheult<ref name="D.M. Greig, B.T 1989" /> has turned out<ref>Y. Boykov, O. Veksler and R. Zabih (1998), "[httphttps://www.cs.cornell.edu/~rdz/Papers/BVZ-cvpr98.pdf Markov Random Fields with Efficient Approximations]", ''International Conference on Computer Vision and Pattern Recognition (CVPR)''.</ref><ref name="boykov2001fast">Y. Boykov, O. Veksler and R. Zabih (2001), "[httphttps://www.cs.cornell.edu/~rdz/Papers/BVZ-pami01-final.pdf Fast approximate energy minimisation via graph cuts]", ''IEEE Transactions on Pattern Analysis and Machine Intelligence'', '''29''', 1222–1239.</ref> to have wide applicability in general computer vision problems. For general problems, Greig, Porteous and Seheult's approachesapproach areis often applied iteratively to a sequencesequences of related binary problems, usually yielding near optimal solutions.
 
In 2011, C. Couprie ''et al''.<ref>Camille Couprie, Leo Grady, Laurent Najman and Hugues Talbot, "[http://leogrady.net/wp-content/uploads/2017/01/couprie2011power.pdf Power Watersheds: A Unifying Graph-Based Optimization Framework]”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 33, No. 7, pp. 1384-1399, July 2011</ref> proposed a general image segmentation framework, called the "Power Watershed", that minimized a real-valued [[indicator function]] from [0,1] over a graph, constrained by user seeds (or unary terms) set to 0 or 1, in which the minimization of the indicator function over the graph is optimized with respect to an exponent <math>p</math>. When <math>p=1</math>, the Power Watershed is optimized by graph cuts, when <math>p=0</math> the Power Watershed is optimized by shortest paths, <math>p=2</math> is optimized by the [[random walker algorithm]] and <math>p=\infty</math> is optimized by the [[Watershed (image processing)|watershed]] algorithm. In this way, the Power Watershed may be viewed as a generalization of graph cuts that provides a straightforward connection with other energy optimization segmentation/clustering algorithms.
Although the general <math>k</math>[[Graph coloring|-colour problem]] remains unsolved for <math>k > 2,</math> the approach of Greig, Porteous and Seheult<ref name="D.M. Greig, B.T 1989"/> has turned out<ref>Y. Boykov, O. Veksler and R. Zabih (1998), "[http://www.cs.cornell.edu/~rdz/Papers/BVZ-cvpr98.pdf Markov Random Fields with Efficient Approximations]", ''International Conference on Computer Vision and Pattern Recognition (CVPR)''.</ref><ref name="boykov2001fast">Y. Boykov, O. Veksler and R. Zabih (2001), "[http://www.cs.cornell.edu/~rdz/Papers/BVZ-pami01-final.pdf Fast approximate energy minimisation via graph cuts]", ''IEEE Transactions on Pattern Analysis and Machine Intelligence'', '''29''', 1222–1239.</ref> to have wide applicability in general computer vision problems. Greig, Porteous and Seheult approaches are often applied iteratively to a sequence of binary problems, usually yielding near optimal solutions.
 
==Binary segmentation of images==
Line 26 ⟶ 36:
 
=== Energy function ===
<big>: <math>\Pr(x|\mid S) = K</math><sup><math>(^{-E)}</math></sup></big>
where the energy <math>E</math> is composed of 2two different models (<math>E_{\rm color}</math> and <math>E_{\rm coherence}</math>):
 
==== Likelihood / Color model / Regional term ====
<math>E_{\rm color}</math> — unary term describing the likelihood of each color.
* This term can be modeled using different local (e.g. {{not a typo|texons}}) or global (e.g. histograms, GMMs, Adaboost likelihood) approaches that are described below.
 
===== Histogram =====
Line 37 ⟶ 47:
* Then, we use these histograms to set the regional penalties as negative log-likelihoods.
 
===== GMM (Gaussian Mixturemixture Modelmodel) =====
* We usually use 2two distributions: toone modelfor background modelling and another for foreground pixels.
* Use a Gaussian mixture model (with 5–8 components) to model those 2 distributions.
* Goal: Try to pull apart those 2two distributions.
 
===== Texon =====
 
* A {{not a typo|texon}} (or {{not a typo|texton}}) is a set of pixels that has certain characteristics and is repeated in an image.
* Steps:
# Determine a good natural scale for the texture elements.
# Compute non-parametric statistics of the model-interior {{not a typo|texons}}, either on intensity or on Gabor filter responses.
 
* Examples:
** [https://web.archive.org/web/20110605231530/http://www.research.rutgers.edu/~xiaolei/EMMCVPR_paper.pdf Deformable-model based Textured Object Segmentation]
** [http://www.cs.berkeley.edu/~malik/papers/MalikBLS.pdf Contour and Texture Analysis for Image Segmentation]
 
Line 58 ⟶ 68:
* Costs can be based on local intensity gradient, Laplacian zero-crossing, gradient direction, color mixture model,...
* Different energy functions have been defined:
** Standard [[Markov random field]] (MRF): Associate a penalty to disagreeing pixels by evaluating the difference between their segmentation label (crude measure of the length of the boundaries). See Boykov and Kolmogorov ICCV 2003
** [[Conditional random field]] (CRF): If the color is very different, it might be a good place to put a boundary. See Lafferty et al. 2001; Kumar and Hebert 2003
 
== Criticism ==
 
Graph cuts methods have become popular alternatives to the level set-based approaches for optimizing the ___location of a contour (see<ref>Leo Grady and Christopher Alvino (2009), "[http://wwwleogrady.cns.bu.edunet/wp-content/uploads/2017/~lgrady01/grady2009combinatorial.pdf The Piecewise Smooth Mumford-Shah Functional on an Arbitrary Graph]", IEEE Trans. on Image Processing, pp. 2547–2561</ref> for an extensive comparison). However, graph cut approaches have been criticized in the literature for several issues:
* Metrication artifacts: When an image is represented by a 4-connected lattice, graph cuts methods can exhibit unwanted "blockiness" artifacts. Various methods have been proposed for addressing this issue, such as using additional edges<ref>Yuri Boykov and Vladimir Kolmogorov (2003), "[http://www.csd.uwo.ca/~yuri/Papers/iccv03.pdf Computing Geodesics and Minimal Surfaces via Graph Cuts]", Proc. of ICCV</ref> or by formulating the max-flow problem in continuous space.<ref>Ben Appleton and Hugues Talbot (2006), "[https://ai.google/research/pubs/pub32799.pdf Globally Minimal Surfaces by Continuous Maximal Flows]", IEEE Trans.Transactions on Pattern Analysis and Machine Intelligence, pp. 106–118</ref>
* Shrinking bias: Since graph cuts finds a minimum cut, the algorithm can be biased toward producing a small contour.<ref>Ali Kemal Sinop and Leo Grady, "[http://wwwciteseerx.cnsist.bupsu.edu/~lgradyviewdoc/sinop2007linfdownload?doi=10.1.1.93.6438&rep=rep1&type=pdf A Seeded Image Segmentation Framework Unifying Graph Cuts and Random Walker Which Yields A New Algorithm]", Proc. of ICCV, 2007</ref> For example, the algorithm is not well-suited for segmentation of thin objects like blood vessels (see<ref>Vladimir Kolmogorov and Yuri Boykov (2005), "[http://pub.ist.ac.at/~vnk/papers/KB-ICCV05.pdf What Metrics Can Be Approximated by Geo-Cuts, or Global Optimization of Length/Area and Flux]", Proc. of ICCV pp. 564–571</ref> for a proposed fix).
* Multiple labels: Graph cuts is only able to find a global optimum for binary labeling (i.e., two labels) problems, such as foreground/background image segmentation. Extensions have been proposed that can find approximate solutions for multilabel graph cuts problems.<ref name="boykov2001fast" />
* Memory: the memory usage of graph cuts increaseincreases quickly as the image size increaseincreases. As an illustration, the Boykov-Kolmogorov max-flow algorithm v2.2 allocates <math>24n+14m</math> bytes (<math>n</math> and <math>m</math> are respectively the number of nodes and edges in the graph). Nevertheless, some amount of work has been recently done in this direction for reducing the graphs before the maximum-flow computation.<ref>Nicolas Lermé, François Malgouyres and Lucas Létocart (2010), "[http://lipn.fr/~lerme/docs/reducing_graphs_in_graph_cut_segmentation.pdf Reducing graphs in graph cut segmentation] {{Webarchive|url=https://web.archive.org/web/20120327014313/http://lipn.fr/~lerme/docs/reducing_graphs_in_graph_cut_segmentation.pdf |date=2012-03-27 }}", Proc. of ICIP, pp. 3045–3048</ref><ref>Herve Lombaert, Yiyong Sun, Leo Grady, Chenyang Xu (2005), "[http://step.polymtl.ca/~rv101/bandedgraphcuts.pdf A Multilevel Banded Graph Cuts Method for Fast Image Segmentation]", Proc. of ICCV, pp. 259–265</ref><ref>Yin Li, Jian Sun, Chi-Keung Tang, and Heung-Yeung Shum (2004), "[http://research.microsoft.com/pubs/69040/lazysnapping_siggraph04.pdf Lazy Snapping]", ACM Transactions on Graphics, pp. 303–308</ref>
 
== Algorithm ==
{{see also|Graph cut optimization}}
* Minimization is done using a standard minimum cut algorithm.
* Due to the [[Maxmax-flow min-cut theorem]] we can solve energy minimization by maximizing the flow over the network. The Max Flow[[max-flow problem]] consists of a directed graph with edges labeled with capacities, and there are two distinct nodes: the source and the sink. Intuitively, it's is easy to see that the maximum flow is determined by the bottleneck.
 
=== Implementation (exact) ===
{{Wikibooks|Algorithm Implementation|Graphs/Maximum flow/Boykov & Kolmogorov}}
 
The Boykov-Kolmogorov algorithm <ref>Yuri Boykov, Vladimir Kolmogorov: [http://discovery.ucl.ac.uk/13383/1/13383.pdf An Experimental Comparison of Min-Cut/Max-Flow Algorithms for Energy Minimization in Vision]. IEEE Trans. Pattern Anal. Mach. Intell. 26(9): 1124–1137 (2004)</ref> is an efficient way to compute the max-flow for computer vision -related graphgraphs.
 
=== Implementation (approximation) ===
 
The Sim Cut algorithm <ref>P.J. Yim: "[https://patentimages.storage.googleapis.com/2b/1e/e9/5834a9cc3312a0/US9214029.pdf Method and System for Image Segmentation]," United States Patent US8929636, January 6, 2016 </ref> approximates the minimum graph cut. byThe thealgorithm implements a solution by simulation of an electrical network. This is the setapproach ofsuggested simultaneousby non-linear[[Cederbaum's equationsmaximum basedflow ontheorem]].<ref>{{Cite thejournal|last=Cederbaum|first=I.|date=1962-08-01|title=On analogoptimal electricaloperation modelof communication nets|journal=Journal of the flowFranklin networkInstitute|volume=274|issue=2|pages=130–141|doi=10.1016/0016-0032(62)90401-5|issn=0016-0032}}</ref><ref> I.T. Frisch, "On Electrical analogs for flow networks," Proceedings of IEEE, 57:2, pp. 209-210, 1969 </ref> Acceleration of the algorithm is possible through [[parallel computing]].
 
== Software ==
Line 88 ⟶ 99:
* [http://virtualscalpel.com/ http://virtualscalpel.com/] — An implementation of the '''Sim Cut'''; an algorithm for computing an approximate solution of the minimum s-t cut in a massively parallel manner.
 
== References ==
{{reflist}}
 
Line 95 ⟶ 106:
[[Category:Computer vision]]
[[Category:Computational problems in graph theory]]
[[Category:Image segmentation]]