Segmentation-based object categorization: Difference between revisions

Content deleted Content added
link malik
m link remote sensing
 
(48 intermediate revisions by 30 users not shown)
Line 1:
The [[image segmentation]] problem is concerned with partitioning an image into multiple regions according to some homogeneity criterion. This article is primarily concerned with graph theoretic approaches to image segmentation applying [[graph partitioning]] via [[minimum cut]] or [[maximum cut]]. '''Segmentation-based object categorization''' can be viewed as a specific case of [[spectral clustering]] applied to image segmentation.
 
<!-- Unsourced image removed: [[Image:aenep11.png|right|thumb|Figure 1: Input image.]] -->
<!-- Unsourced image removed: [[Image:aenep12.png|right|thumb|Figure 2: First partition.]] -->
Line 13 ⟶ 12:
<!-- Unsourced image removed: [[Image:aenep24.png|right|thumb|Figure 11: Segmentation results.]] -->
 
==Applications of Imageimage Segmentationsegmentation==
* '''Image Compressioncompression'''
** Segment the image into homogeneous components, and use the most suitable compression algorithm for each component to improve compression.
* '''Medical Diagnosisdiagnosis'''
** Automatic segmentation of MRI images for identification of cancerous regions.
* '''Mapping and Measurementmeasurement'''
** Automatic analysis of [[remote sensing]] data from satellites to identify and measure regions of interest.
* '''Transportation'''
** Partition a transportation network makes it possible to identify regions characterized by homogeneous traffic states.<ref>{{Cite journal|last1=Lopez|first1=Clélia|last2=Leclercq|first2=Ludovic|last3=Krishnakumari|first3=Panchamy|last4=Chiabaut|first4=Nicolas|last5=Van Lint|first5=Hans|date=25 October 2017|title=Revealing the day-to-day regularity of urban congestion patterns with 3D speed maps|journal=Scientific Reports|volume=7 |issue=14029|pages=14029|doi=10.1038/s41598-017-14237-8|pmid=29070859|pmc=5656590|bibcode=2017NatSR...714029L }}</ref>
 
==Segmentation using Normalizednormalized Cutscuts==
 
==Segmentation using Normalized Cuts==
===Graph theoretic formulation===
The set of points in an arbitrary feature space can be represented as a weighted undirected complete graph G = (V, E), where the nodes of the graph are the points in the feature space. The weight <math>w_{ij}</math> of an edge <math>(i, j) \in E</math> is a function of the similarity between the nodes <math>i</math> and <math>j</math>. In this context, we can formulate the image segmentation problem as a graph partitioning problem that asks for a partition <math>V_1, \cdots, V_k</math> of the vertex set <math>V</math>, where, according to some measure, the vertices in any set <math>V_i</math> have high similarity, and the vertices in two different sets <math>V_i, V_j</math> have low similarity.
 
===Normalized Cutscuts===
Let ''G'' = (''V'', ''E'', ''w'') be a weighted graph. Let <math>A</math> and <math>B</math> be two subsets of vertices.
 
Let:
 
: <math>w(A, B) = \sum \limits_{i \in A, j \in B} w_{ij}</math>
 
: <math>\operatorname{ncut}(A, B) = \frac{w(A, B)}{w(A, V)} + \frac{w(A, B)}{w(B, V)}</math>
 
: <math>\operatorname{nassoc}(A, B) = \frac{w(A, A)}{w(A, V)} + \frac{w(B, B)}{w(B, V)}</math>
 
In the normalized cuts approach,<ref>Jianbo Shi and [[Jitendra Malik]] (1997): "Normalized Cuts and Image Segmentation", IEEE Conference on Computer Vision and Pattern Recognition, pp 731-737 731–737</ref>, for any cut <math>(S, \overline{S})</math> in <math>G</math>, <math>\operatorname{ncut}(S, \overline{S})</math> measures the similarity between different parts, and <math>\operatorname{nassoc}(S, \overline{S})</math> measures the total similarity of vertices in the same part.
 
Since <math>\operatorname{ncut}(S, \overline{S}) = 2 - \operatorname{nassoc}(S, \overline{S})</math>, a cut <math>(S^{*}, {\overline{S}}^{*})</math> that minimizes <math>\operatorname{ncut}(S, \overline{S})</math> also maximizes <math>\operatorname{nassoc}(S, \overline{S})</math>.
 
Computing a cut <math>(S^{*}, {\overline{S}}^{*})</math> that minimizes <math>\operatorname{ncut}(S, \overline{S})</math> is an [[NP-hard]] problem. However, we can find in polynomial time a cut <math>(S, \overline{S})</math> of small normalized weight <math>\operatorname{ncut}(S, \overline{S})</math> using [[Spectral graph theory|spectral techniques]].
 
===The Ncutncut Algorithmalgorithm===
Let:
 
: <math>d(i) = \sum \limits_j w_{ij}</math>
 
Also, let ''D'' be an <math>n \times n</math> diagonal matrix with <math>d</math> on the diagonal, and let <math>W</math> be an <math>n \times n</math> symmetricalsymmetric matrix with <math>W_w_{ij} = w_{ijji}</math>.
 
After some algebraic manipulations, we get:
 
: <math>\min \limits_{(S, \overline{S})} \operatorname{ncut}(S, \overline{S}) = \min \limits_y \frac{y^T (D - W) y}{y^T D y}</math>
 
subject to the constraints:
Line 57 ⟶ 59:
* <math>y^t D 1 = 0 </math>
 
Minimizing <math>\frac{y^T (D - W) y}{y^T D y}</math> subject to the constraints above is [[NP-hard]]. To make the problem tractable, we relax the constraints on <math>y</math>, and allow it to take real values. The relaxed problem can be solved by solving the generalized eigenvalue problem <math>(D - W)y = \lambda D y</math> for the second smallest generalized eigenvalue.
 
'''The partitioning algorithm:'''
# Given a set of features, set up a weighted graph <math>G = (V, E)</math>, compute the weight of each edge, and summarize the information in <math>D</math> and <math>W</math>.
# Solve <math>(D - W)y = \lambda D y</math> for eigenvectors with the second smallest eigenvalues.
# Use the eigenvector with the second smallest eigenvalue to bipartition the graph (e.g. grouping according to sign).
# Decide if the current partition should be subdivided.
# Recursively partition the segmented parts, if necessary.
 
===Computational Complexity===
===Example===
Figures 1-7 exemplify the Ncut algorithm.
 
===Limitations===
Solving a standard eigenvalue problem for all eigenvectors (using the [[QR algorithm]], for instance) takes <math>O(n^3)</math> time. This is impractical for image segmentation applications where <math>n</math> is the number of pixels in the image.
 
Since only one eigenvector, corresponding to the second smallest generalized eigenvalue, is used by the uncut algorithm, efficiency can be dramatically improved if the solve of the corresponding eigenvalue problem is performed in a [[Matrix-free methods|matrix-free fashion]], i.e., without explicitly manipulating with or even computing the matrix W, as, e.g., in the [[Lanczos algorithm]]. [[Matrix-free methods]] require only a function that performs a matrix-vector product for a given vector, on every iteration. For image segmentation, the matrix W is typically sparse, with a number of nonzero entries <math>O(n)</math>, so such a matrix-vector product takes <math>O(n)</math> time.
 
For high-resolution images, the second eigenvalue is often [[ill-conditioned]], leading to slow convergence of iterative eigenvalue solvers, such as the [[Lanczos algorithm]]. [[Preconditioner#Preconditioning for eigenvalue problems|Preconditioning]] is a key technology accelerating the convergence, e.g., in the matrix-free [[LOBPCG]] method. Computing the eigenvector using an optimally preconditioned matrix-free method takes <math>O(n)</math> time, which is the optimal complexity, since the eigenvector has <math>n</math> components.
 
===Software Implementations===
[[scikit-learn]]<ref>{{Cite web|url=https://scikit-learn.org/stable/modules/clustering.html#spectral-clustering|title=Spectral Clustering — scikit-learn documentation}}</ref> uses [[LOBPCG]] from [[SciPy]] with [[Multigrid method#Algebraic multigrid (AMG)|algebraic multigrid preconditioning]] for solving the [[eigenvalue]] problem for the [[graph Laplacian]] to perform [[image segmentation]] via spectral [[graph partitioning]] as first proposed in <ref>{{Cite conference | url = https://www.researchgate.net/publication/343531874 | title = Modern preconditioned eigensolvers for spectral image segmentation and graph bisection | conference = Clustering Large Data Sets; Third IEEE International Conference on Data Mining (ICDM 2003) Melbourne, Florida: IEEE Computer Society| editor = Boley| editor2 = Dhillon| editor3 = Ghosh| editor4 = Kogan | pages = 59–62| year = 2003| last1 = Knyazev| first1 = Andrew V.}}</ref> and actually tested in <ref>{{Cite conference | url = https://www.researchgate.net/publication/354448354 | title = Multiscale Spectral Image Segmentation Multiscale preconditioning for computing eigenvalues of graph Laplacians in image segmentation | conference = Fast Manifold Learning Workshop, WM Williamburg, VA| year = 2006| last1 = Knyazev| first1 = Andrew V. | doi=10.13140/RG.2.2.35280.02565}}</ref> and.<ref>{{Cite conference | url = https://www.researchgate.net/publication/343531874 | title = Multiscale Spectral Graph Partitioning and Image Segmentation | conference = Workshop on Algorithms for Modern Massive Datasets Stanford University and Yahoo! Research| year = 2006| last1 = Knyazev| first1 = Andrew V.}}</ref>
 
==OBJ CUT==
 
OBJ CUT<ref>M. P. Kumar, P. H. S. Torr, and A. Zisserman. Obj cut. In ''Proceedings of IEEE Conference on Computer Vision and Pattern Recognition'', San Diego, pages 18-2518–25, 2005.</ref> is an efficient method that automatically segments an object. The OBJ CUT method is a generic method, and therefore it is applicable to any object category model.
Given an image D containing an instance of a known object category, e.g. cows, the OBJ CUT algorithm computes a segmentation of the object, that is, it infers a set of labels &nbsp;''m''.
 
Let m be a set of binary labels, and let <math>\Theta</math> be a shape parameter(<math>\Theta</math> is a shape prior on the labels from a [[Layeredlayered Pictorialpictorial Structurestructure]] (LPS) model). We define anAn energy function <math>E(m, \Theta)</math> is defined as follows.
 
: <math>E(m, \Theta) = \sum \phi_x(D|m_x) + \phi_x(m_x|\Theta) + \sum \Psi_{xy}(m_x, m_y) + \phi(D|m_x, m_y)</math> (1)
 
The term <math>\phi_x(D|m_x) + \phi_x(m_x|\Theta)</math> is called a unary term, and the term <math>\Psi_{xy}(m_x, m_y) + \phi(D|m_x, m_y)</math> is called a pairwise term.
AnA unary term consists of the likelihood <math>\phi_x(D|m_x)</math> based on color, and the unary potential <math>\phi_x(m_x|\Theta)</math> based on the distance from <math>\Theta</math>. A pairwise term consists of a prior <math>\Psi_{xy}(m_x, m_y)</math> and a contrast term <math>\phi(D|m_x, m_y)</math>.
 
The best labeling <math>m^{*}</math> minimizes <math>\sum \limits_i w_i E(m, \Theta_i)</math>, where <math>w_i</math> is the weight of the parameter <math>\Theta_i</math>.
 
: <math>m^{*} = \arg \min \limits_m \sum \limits_i w_i E(m, \Theta_i)</math> (2)
 
===The OBJ CUT algorithmAlgorithm===
# Given an image D, an object category is chosen, e.g. cows or horses.
# The corresponding LPS model is matched to D to obtain the samples <math>\Theta_1, \cdots, \Theta_s</math>
# The objective function given by equation (2) is determined by computing <math>E(m, \Theta_i)</math> and using <math>w_i = g(\Theta_i|Z)</math>
# The objective function is minimized using a single [[Max-flow min-cut theorem|MINCUT]] operation to obtain the segmentation '''m'''.
 
===Example===
Figures 8-11 exemplify the OBJ CUT algorithm.
 
==Other approaches==
* Jigsaw approach<ref> E. Borenstein, S. Ullman: [http://www.csd.uwo.ca/~olga/Courses/Fall2007/840/StudentPapers/BorensteinUllman2002.pdf Class-specicspecific, top-down segmentation]. In Proceedings of the 7th European Conference on Computer Vision, Copenhagen, Denmark, pages 109-124109–124, 2002.</ref>
* Image parsing <ref>Z. Tu, X. Chen, A. L. Yuille, S. C. Zhu: [https://cloudfront.escholarship.org/dist/prd/content/qt8n57f107/qt8n57f107.pdf Image Parsing: Unifying Segmentation, Detection, and Recognition]. Toward Category-Level Object Recognition 2006: 545-576545–576</ref>
* Interleaved segmentation <ref>B. Leibe, A. Leonardis, B. Schiele: [http://www.vision.ee.ethz.ch/en/publications/papers/bookchapters/eth_biwi_00421.pdf An Implicit Shape Model for Combined Object Categorization and Segmentation]. Toward Category-Level Object Recognition 2006: 508-524508–524</ref>
* LOCUS <ref> J. Winn, N. Joijic. [http://people.eecs.berkeley.edu/~efros/courses/AP06/Papers/winn-iccv-05.pdf Locus: Learning object classes with unsupervised segmentation]. In Proceedings of the IEEE International Conference on Computer Vision, Beijing, 2005.</ref>
* LayoutCRF <ref>J. M. Winn, J. Shotton: [http://www.wisdom.weizmann.ac.il/~/vision/courses/2007_2/files/LCRF.pdf The Layout Consistent Random Field for Recognizing and Segmenting Partially Occluded Objects]. CVPR (1) 2006: 37-4437–44</ref>
* [[Minimum_Spanning_Tree-based_Segmentation|Minimum Spanningspanning Treetree-based segmentation]]
 
==References==
{{reflist|32em}}
 
[[Category:Computer vision]]
[[Category:Object recognition and categorization]]
[[Category:ComputerImage visionsegmentation]]