Segmentation-based object categorization: Difference between revisions

Content deleted Content added
link malik
some cleanup; more is needed
Line 1:
The image segmentation problem is concerned with partitioning an image into multiple regions according to some homogeneity criterion. This article is primarily concerned with graph theoretic approaches to image segmentation.
 
<!-- Unsourced image removed: [[Image:aenep11.png|right|thumb|Figure 1: Input image.]] -->
<!-- Unsourced image removed: [[Image:aenep12.png|right|thumb|Figure 2: First partition.]] -->
Line 13 ⟶ 12:
<!-- Unsourced image removed: [[Image:aenep24.png|right|thumb|Figure 11: Segmentation results.]] -->
 
==Applications of Imageimage Segmentationsegmentation==
* '''Image Compression'''
** Segment the image into homogeneous components, and use the most suitable compression algorithm for each component to improve compression.
Line 21 ⟶ 20:
** Automatic analysis of remote sensing data from satellites to identify and measure regions of interest.
 
==Segmentation using Normalizednormalized Cutscuts==
===Graph theoretic formulation===
The set of points in an arbitrary feature space can be represented as a weighted undirected complete graph G = (V, E), where the nodes of the graph are the points in the feature space. The weight <math>w_{ij}</math> of an edge <math>(i, j) \in E</math> is a function of the similarity between the nodes <math>i</math> and <math>j</math>. In this context, we can formulate the image segmentation problem as a graph partitioning problem that asks for a partition <math>V_1, \cdots, V_k</math> of the vertex set <math>V</math>, where, according to some measure, the vertices in any set <math>V_i</math> have high similarity, and the vertices in two different sets <math>V_i, V_j</math> have low similarity.
 
===Normalized Cutscuts===
Let G = (V, E) be a weighted graph. Let <math>A</math> and <math>B</math> be two subsets of vertices.
 
Let:
 
: <math>w(A, B) = \sum \limits_{i \in A, j \in B} w_{ij}</math>
 
: <math>ncut(A, B) = \frac{w(A, B)}{w(A, V)} + \frac{w(A, B)}{w(B, V)}</math>
 
: <math>nassoc(A, B) = \frac{w(A, A)}{w(A, V)} + \frac{w(B, B)}{w(B, V)}</math>
 
In the normalized cuts approach<ref>Jianbo Shi and [[Jitendra Malik]] (1997): "Normalized Cuts and Image Segmentation", IEEE Conference on Computer Vision and Pattern Recognition, pp 731-737731–737 </ref>, for any cut <math>(S, \overline{S})</math> in <math>G</math>, <math>ncut(S, \overline{S})</math> measures the similarity between different parts, and <math>nassoc(S, \overline{S})</math> measures the total similarity of vertices in the same part.
 
Since <math>ncut(S, \overline{S}) = 2 - nassoc(S, \overline{S})</math>, a cut <math>(S^{*}, {\overline{S}}^{*})</math> that minimizes <math>ncut(S, \overline{S})</math> also maximizes <math>nassoc(S, \overline{S})</math>.
Line 42 ⟶ 41:
Computing a cut <math>(S^{*}, {\overline{S}}^{*})</math> that minimizes <math>ncut(S, \overline{S})</math> is an [[NP-hard]] problem. However, we can find in polynomial time a cut <math>(S, \overline{S})</math> of small normalized weight <math>ncut(S, \overline{S})</math> using [[Spectral graph theory|spectral techniques]].
 
===The Ncut Algorithmalgorithm===
Let:
 
Line 74 ⟶ 73:
==OBJ CUT==
 
OBJ CUT<ref>M. P. Kumar, P. H. S. Torr, and A. Zisserman. Obj cut. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, San Diego, pages 18-2518–25, 2005.</ref> is an efficient method that automatically segments an object. The OBJ CUT method is a generic method, and therefore it is applicable to any object category model.
Given an image D containing an instance of a known object category, e.g. cows, the OBJ CUT algorithm computes a segmentation of the object, that is, it infers a set of labels m.
 
Let m be a set of binary labels, and let <math>\Theta</math> be a shape parameter(<math>\Theta</math> is a shape prior on the labels from a [[Layered Pictorial Structure]] (LPS) model). We define an energy function <math>E(m, \Theta)</math> as follows.
 
: <math>E(m, \Theta) = \sum \phi_x(D|m_x) + \phi_x(m_x|\Theta) + \sum \Psi_{xy}(m_x, m_y) + \phi(D|m_x, m_y)</math> (1)
 
The term <math>\phi_x(D|m_x) + \phi_x(m_x|\Theta)</math> is called a unary term, and the term <math>\Psi_{xy}(m_x, m_y) + \phi(D|m_x, m_y)</math> is called a pairwise term.
Line 86 ⟶ 85:
The best labeling <math>m^{*}</math> minimizes <math>\sum \limits_i w_i E(m, \Theta_i)</math>, where <math>w_i</math> is the weight of the parameter <math>\Theta_i</math>.
 
: <math>m^{*} = \arg \min \limits_m \sum \limits_i w_i E(m, \Theta_i)</math> (2)
 
===The OBJ CUT algorithm===
Line 95 ⟶ 94:
 
===Example===
Figures 8-118–11 exemplify the OBJ CUT algorithm.
 
==Other approaches==
* Jigsaw approach<ref> E. Borenstein, S. Ullman: Class-specic, top-down segmentation. In Proceedings of the 7th European Conference on Computer Vision, Copenhagen, Denmark, pages 109-124109–124, 2002.</ref>
* Image parsing <ref>Z. Tu, X. Chen, A. L. Yuille, S. C. Zhu: Image Parsing: Unifying Segmentation, Detection, and Recognition. Toward Category-Level Object Recognition 2006: 545-576545–576</ref>
* Interleaved segmentation <ref>B. Leibe, A. Leonardis, B. Schiele: An Implicit Shape Model for Combined Object Categorization and Segmentation. Toward Category-Level Object Recognition 2006: 508-524508–524</ref>
* LOCUS <ref> J. Winn, N. Joijic. Locus: Learning object classes with unsupervised segmentation. In Proceedings of the IEEE International Conference on Computer Vision, Beijing, 2005.</ref>
* LayoutCRF <ref>J. M. Winn, J. Shotton: The Layout Consistent Random Field for Recognizing and Segmenting Partially Occluded Objects. CVPR (1) 2006: 37-4437–44</ref>
* [[Minimum_Spanning_Tree-based_Segmentation|Minimum Spanning Tree-based segmentation]]