Graph cuts in computer vision: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 08:37, 8 April 2007 edit Bruceporteous (talk \| contribs) 96 edits No edit summary ← Previous edit		Latest revision as of 08:06, 20 August 2025 edit undo Bender the Bot (talk \| contribs) Bots 1,064,377 edits m →History: HTTP to HTTPS for Brown University Tag: AWB
(196 intermediate revisions by 87 users not shown)
Line 1: As applied in the field of [[computer vision]], '''[[graph cut optimization]]''' can be employed to [[Polynomial time\|efficiently]] solve a wide variety of low-level computer vision problems (''early vision''<ref>Adelson, Edward H., and James R. Bergen (1991), "[http://persci.mit.edu/pub_pdfs/elements91.pdf The plenoptic function and the elements of early vision]", Computational models of visual processing 1.2 (1991).</ref>), such as [[image smoothing]], the stereo [[correspondence problem]], [[image segmentation]], [[object co-segmentation]], and many other computer vision problems that can be formulated in terms of [[energy minimization]]. ~~The theory of graph cuts, [[Cut (graph theory)]], was first applied in [[Computer vision]] in the now "classic" 1989 paper by Greig, Porteous and Seheult of Durham University, UK, as referenced below.~~ Many of these energy minimization problems can be approximated by solving a [[maximum flow problem]] in a [[Graph (discrete mathematics)\|graph]]<ref>Boykov, Y., Veksler, O., and Zabih, R. (2001), "[https://www.cs.cornell.edu/rdz/Papers/BVZ-pami01-final.pdf Fast approximate energy minimization via graph cuts]," ''IEEE Transactions on Pattern Analysis and Machine Intelligence,'' 23(11): 1222-1239.</ref> (and thus, by the [[max-flow min-cut theorem]], define a minimal [[cut (graph theory)\|cut]] of the graph). Under most formulations of such problems in computer vision, the minimum energy solution corresponds to the [[Maximum a posteriori estimation\|maximum a posteriori estimate]] of a solution. In the Bayesian statistical context of smoothing noisy, or corrupted, images, Greig, Porteous and Seheult showed how the maximum a posteriori estimate of a binary image can be obtained ''exactly'' by maximising the flow through an associated image network. The problem was therefore converted into a NP hard problem which could be solved using known efficient algorithms. Although many computer vision algorithms involve cutting a graph (e.g., normalized cuts), the term "graph cuts" is applied specifically to those models which employ a max-flow/min-cut optimization (other graph cutting algorithms may be considered as [[graph partition]]ing algorithms). "Binary" problems (such as [[denoising]] a [[binary image]]) can be solved exactly using this approach; problems where pixels can be labeled with more than two different labels (such as stereo correspondence, or denoising of a [[grayscale]] image) cannot be solved exactly, but solutions produced are usually near the global optimum. Prior to this result ''approximate'', although more general techniques such as simulated annealing, as proposed by the Geman brothers, or iterated conditional modes, as suggested by Julian Besag, were used to solve these types of problems. See the references below. == History == The foundational theory of [[Cut (graph theory)\|graph cuts]] was first applied in [[computer vision]] in the seminal paper by Greig, Porteous and Seheult<ref name="D.M. Greig, B.T 1989">D.M. Greig, B.T. Porteous and A.H. Seheult (1989), ''[https://classes.cs.uchicago.edu/archive/2006/fall/35040-1/gps.pdf Exact maximum a posteriori estimation for binary images]'', Journal of the Royal Statistical Society, Series B, '''51''', 271–279.</ref> of [[Durham University]]. Allan Seheult and Bruce Porteous were members of Durham's lauded statistics group of the time, led by [[Julian Besag]] and [[Peter Green (statistician)\|Peter Green]], with the optimisation expert [[Margaret Greig]] notable as the first ever female member of staff of the Durham Mathematical Sciences Department. In the [[Bayesian statistics\|Bayesian]] statistical context of smoothing noisy (or corrupted) images, they showed how the [[MAP estimate\|maximum a posteriori estimate]] of a [[binary image]] can be obtained ''exactly'' by maximizing the [[Flow network\|flow]] through an associated image network, involving the introduction of a ''source'' and ''sink''. The problem was therefore shown to be efficiently solvable. Prior to this result, ''approximate'' techniques such as [[simulated annealing]] (as proposed by the [[Donald Geman\|Geman brothers]]),<ref>D. Geman and S. Geman (1984), ''[https://www.dam.brown.edu/people/documents/stochasticrelaxation.pdf Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images]'', IEEE Trans. Pattern Anal. Mach. Intell., '''6''', 721–741.</ref> or [[iterated conditional modes]] (a type of [[greedy algorithm]] suggested by [[Julian Besag]])<ref>J.E. Besag (1986), ''On the statistical analysis of dirty pictures (with discussion)'', [[Journal of the Royal Statistical Society]] Series B, '''48''', 259–302</ref> were used to solve such image smoothing problems. Although the general k-colour problem remains unsolved for k > 2, the approach of Greig, Porteous and Seheult is now very commonly used in general computer vision problems, often by applying their approach iteratively to a sequence of binary images that it can be shown yield near optimal solutions. Although the general [[Graph coloring\|<math>k</math>-colour problem]] is NP hard for <math>k > 2,</math> the approach of Greig, Porteous and Seheult<ref name="D.M. Greig, B.T 1989" /> has turned out<ref>Y. Boykov, O. Veksler and R. Zabih (1998), "[https://www.cs.cornell.edu/~rdz/Papers/BVZ-cvpr98.pdf Markov Random Fields with Efficient Approximations]", ''International Conference on Computer Vision and Pattern Recognition (CVPR)''.</ref><ref name="boykov2001fast">Y. Boykov, O. Veksler and R. Zabih (2001), "[https://www.cs.cornell.edu/~rdz/Papers/BVZ-pami01-final.pdf Fast approximate energy minimisation via graph cuts]", ''IEEE Transactions on Pattern Analysis and Machine Intelligence'', '''29''', 1222–1239.</ref> to have wide applicability in general computer vision problems. For general problems, Greig, Porteous and Seheult's approach is often applied iteratively to sequences of related binary problems, usually yielding near optimal solutions. In 2011, C. Couprie ''et al''.<ref>Camille Couprie, Leo Grady, Laurent Najman and Hugues Talbot, "[http://leogrady.net/wp-content/uploads/2017/01/couprie2011power.pdf Power Watersheds: A Unifying Graph-Based Optimization Framework]”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 33, No. 7, pp. 1384-1399, July 2011</ref> proposed a general image segmentation framework, called the "Power Watershed", that minimized a real-valued [[indicator function]] from [0,1] over a graph, constrained by user seeds (or unary terms) set to 0 or 1, in which the minimization of the indicator function over the graph is optimized with respect to an exponent <math>p</math>. When <math>p=1</math>, the Power Watershed is optimized by graph cuts, when <math>p=0</math> the Power Watershed is optimized by shortest paths, <math>p=2</math> is optimized by the [[random walker algorithm]] and <math>p=\infty</math> is optimized by the [[Watershed (image processing)\|watershed]] algorithm. In this way, the Power Watershed may be viewed as a generalization of graph cuts that provides a straightforward connection with other energy optimization segmentation/clustering algorithms. ~~== References ==~~ J.E. Besag (1986), ''On the statistical analysis of dirty pictures (with discussion)'', Journal of the Royal Statistical Society Series B, '''48''', 259 - 302. D. Geman and S. Geman (1984), ''Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images'', IEEE Trans. Pattn Anal. Mach. Intell., '''6''', 721 - 741. D.M. Greig, B.T. Porteous and A.H. Seheult (1989), ''Exact maximum a posteriori estimation for binary images'', Journal of the Royal Statistical Society Series B, '''51''', 271 - 279. ==Binary segmentation of images== === Notation === Image: <math>x \in \{R,G,B\}^N</math> * Output: Segmentation (also called opacity) <math>S \in R^N</math> (soft segmentation). For hard segmentation <math>S \in \{0 \text{ for background}, 1 \text{ for foreground/object to be detected}\}^N</math> * [[Energy function]]: <math>E(x, S, C, \lambda)</math> where C is the color parameter and λ is the coherence parameter. * <math>E(x,S,C,\lambda)=E_{\rm color} + E_{\rm coherence}</math> * Optimization: The segmentation can be estimated as a global minimum over S: <math>{\arg\min}_S E(x, S, C, \lambda)</math> === Existing methods === * Standard Graph cuts: optimize energy function over the segmentation (unknown S value). * Iterated Graph cuts: # First step optimizes over the color parameters using K-means. # Second step performs the usual graph cuts algorithm. :These 2 steps are repeated recursively until convergence. * Dynamic graph cuts:<br />Allows to re-run the algorithm much faster after modifying the problem (e.g. after new seeds have been added by a user). === Energy function === : <math>\Pr(x\mid S) = K^{-E}</math> where the energy <math>E</math> is composed of two different models (<math>E_{\rm color}</math> and <math>E_{\rm coherence}</math>): ==== Likelihood / Color model / Regional term ==== <math>E_{\rm color}</math> — unary term describing the likelihood of each color. * This term can be modeled using different local (e.g. {{not a typo\|texons}}) or global (e.g. histograms, GMMs, Adaboost likelihood) approaches that are described below. ===== Histogram ===== * We use intensities of pixels marked as seeds to get histograms for object (foreground) and background intensity distributions: P(I\|O) and P(I\|B). * Then, we use these histograms to set the regional penalties as negative log-likelihoods. ===== GMM (Gaussian mixture model) ===== * We usually use two distributions: one for background modelling and another for foreground pixels. * Use a Gaussian mixture model (with 5–8 components) to model those 2 distributions. * Goal: Try to pull apart those two distributions. ===== Texon ===== * A {{not a typo\|texon}} (or {{not a typo\|texton}}) is a set of pixels that has certain characteristics and is repeated in an image. * Steps: # Determine a good natural scale for the texture elements. # Compute non-parametric statistics of the model-interior {{not a typo\|texons}}, either on intensity or on Gabor filter responses. * Examples: [https://web.archive.org/web/20110605231530/http://www.research.rutgers.edu/~xiaolei/EMMCVPR_paper.pdf Deformable-model based Textured Object Segmentation] [http://www.cs.berkeley.edu/~malik/papers/MalikBLS.pdf Contour and Texture Analysis for Image Segmentation] ==== Prior / Coherence model / Boundary term ==== <math>E_{\rm coherence}</math> — binary term describing the coherence between neighborhood pixels. * In practice, pixels are defined as neighbors if they are adjacent either horizontally, vertically or diagonally (4 way connectivity or 8 way connectivity for 2D images). * Costs can be based on local intensity gradient, Laplacian zero-crossing, gradient direction, color mixture model,... * Different energy functions have been defined: Standard [[Markov random field]]: Associate a penalty to disagreeing pixels by evaluating the difference between their segmentation label (crude measure of the length of the boundaries). See Boykov and Kolmogorov ICCV 2003 [[Conditional random field]]: If the color is very different, it might be a good place to put a boundary. See Lafferty et al. 2001; Kumar and Hebert 2003 == Criticism == Graph cuts methods have become popular alternatives to the level set-based approaches for optimizing the ___location of a contour (see<ref>Leo Grady and Christopher Alvino (2009), "[http://leogrady.net/wp-content/uploads/2017/01/grady2009combinatorial.pdf The Piecewise Smooth Mumford-Shah Functional on an Arbitrary Graph]", IEEE Trans. on Image Processing, pp. 2547–2561</ref> for an extensive comparison). However, graph cut approaches have been criticized in the literature for several issues: * Metrication artifacts: When an image is represented by a 4-connected lattice, graph cuts methods can exhibit unwanted "blockiness" artifacts. Various methods have been proposed for addressing this issue, such as using additional edges<ref>Yuri Boykov and Vladimir Kolmogorov (2003), "[http://www.csd.uwo.ca/~yuri/Papers/iccv03.pdf Computing Geodesics and Minimal Surfaces via Graph Cuts]", Proc. of ICCV</ref> or by formulating the max-flow problem in continuous space.<ref>Ben Appleton and Hugues Talbot (2006), "[https://ai.google/research/pubs/pub32799.pdf Globally Minimal Surfaces by Continuous Maximal Flows]", IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 106–118</ref> * Shrinking bias: Since graph cuts finds a minimum cut, the algorithm can be biased toward producing a small contour.<ref>Ali Kemal Sinop and Leo Grady, "[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.93.6438&rep=rep1&type=pdf A Seeded Image Segmentation Framework Unifying Graph Cuts and Random Walker Which Yields A New Algorithm]", Proc. of ICCV, 2007</ref> For example, the algorithm is not well-suited for segmentation of thin objects like blood vessels (see<ref>Vladimir Kolmogorov and Yuri Boykov (2005), "[http://pub.ist.ac.at/~vnk/papers/KB-ICCV05.pdf What Metrics Can Be Approximated by Geo-Cuts, or Global Optimization of Length/Area and Flux]", Proc. of ICCV pp. 564–571</ref> for a proposed fix). * Multiple labels: Graph cuts is only able to find a global optimum for binary labeling (i.e., two labels) problems, such as foreground/background image segmentation. Extensions have been proposed that can find approximate solutions for multilabel graph cuts problems.<ref name="boykov2001fast" /> * Memory: the memory usage of graph cuts increases quickly as the image size increases. As an illustration, the Boykov-Kolmogorov max-flow algorithm v2.2 allocates <math>24n+14m</math> bytes (<math>n</math> and <math>m</math> are respectively the number of nodes and edges in the graph). Nevertheless, some amount of work has been recently done in this direction for reducing the graphs before the maximum-flow computation.<ref>Nicolas Lermé, François Malgouyres and Lucas Létocart (2010), "[http://lipn.fr/~lerme/docs/reducing_graphs_in_graph_cut_segmentation.pdf Reducing graphs in graph cut segmentation] {{Webarchive\|url=https://web.archive.org/web/20120327014313/http://lipn.fr/~lerme/docs/reducing_graphs_in_graph_cut_segmentation.pdf \|date=2012-03-27 }}", Proc. of ICIP, pp. 3045–3048</ref><ref>Herve Lombaert, Yiyong Sun, Leo Grady, Chenyang Xu (2005), "[http://step.polymtl.ca/~rv101/bandedgraphcuts.pdf A Multilevel Banded Graph Cuts Method for Fast Image Segmentation]", Proc. of ICCV, pp. 259–265</ref><ref>Yin Li, Jian Sun, Chi-Keung Tang, and Heung-Yeung Shum (2004), "[http://research.microsoft.com/pubs/69040/lazysnapping_siggraph04.pdf Lazy Snapping]", ACM Transactions on Graphics, pp. 303–308</ref> == Algorithm == {{see also\|Graph cut optimization}} * Minimization is done using a standard minimum cut algorithm. * Due to the [[max-flow min-cut theorem]] we can solve energy minimization by maximizing the flow over the network. The [[max-flow problem]] consists of a directed graph with edges labeled with capacities, and there are two distinct nodes: the source and the sink. Intuitively, it is easy to see that the maximum flow is determined by the bottleneck. === Implementation (exact) === {{Wikibooks\|Algorithm Implementation\|Graphs/Maximum flow/Boykov & Kolmogorov}} The Boykov-Kolmogorov algorithm<ref>Yuri Boykov, Vladimir Kolmogorov: [http://discovery.ucl.ac.uk/13383/1/13383.pdf An Experimental Comparison of Min-Cut/Max-Flow Algorithms for Energy Minimization in Vision]. IEEE Trans. Pattern Anal. Mach. Intell. 26(9): 1124–1137 (2004)</ref> is an efficient way to compute the max-flow for computer vision-related graphs. === Implementation (approximation) === The Sim Cut algorithm<ref>P.J. Yim: "[https://patentimages.storage.googleapis.com/2b/1e/e9/5834a9cc3312a0/US9214029.pdf Method and System for Image Segmentation]," United States Patent US8929636, January 6, 2016</ref> approximates the minimum graph cut. The algorithm implements a solution by simulation of an electrical network. This is the approach suggested by [[Cederbaum's maximum flow theorem]].<ref>{{Cite journal\|last=Cederbaum\|first=I.\|date=1962-08-01\|title=On optimal operation of communication nets\|journal=Journal of the Franklin Institute\|volume=274\|issue=2\|pages=130–141\|doi=10.1016/0016-0032(62)90401-5\|issn=0016-0032}}</ref><ref>I.T. Frisch, "On Electrical analogs for flow networks," Proceedings of IEEE, 57:2, pp. 209-210, 1969</ref> Acceleration of the algorithm is possible through [[parallel computing]]. == Software == * [http://pub.ist.ac.at/~vnk/software.html http://pub.ist.ac.at/~vnk/software.html] — An implementation of the maxflow algorithm described in "An Experimental Comparison of Min-Cut/Max-Flow Algorithms for Energy Minimization in Computer Vision" by Vladimir Kolmogorov * [http://vision.csd.uwo.ca/code/ http://vision.csd.uwo.ca/code/] — some graph cut libraries and MATLAB wrappers * [http://gridcut.com/ http://gridcut.com/] — fast multi-core max-flow/min-cut solver optimized for grid-like graphs * [http://virtualscalpel.com/ http://virtualscalpel.com/] — An implementation of the '''Sim Cut'''; an algorithm for computing an approximate solution of the minimum s-t cut in a massively parallel manner. == References == {{reflist}} {{DEFAULTSORT:Graph Cuts In Computer Vision}} ~~[[Category: Bayesian statistics]]~~ [[Category:Bayesian ~~Computer vision~~statistics]] [[Category:Computer ~~Graph theory~~vision]] [[Category:Computational problems in graph theory]] [[Category:Image segmentation]]