Image segmentation: Difference between revisions

Content deleted Content added
Citation bot (talk | contribs)
Add: publisher, authors 1-1. Removed parameters. Some additions/deletions were parameter name changes. | Use this bot. Report bugs. | #UCB_CommandLine 16248/36731
m cite repair;
Line 2:
[[File:Model of a segmented femur - journal.pone.0079004.g005.png|thumb|Model of a segmented left human [[femur]]. It shows the outer surface (red), the surface between compact bone and spongy bone (green) and the surface of the bone marrow (blue).]]
 
In [[digital image processing]] and [[computer vision]], '''image segmentation''' is the process of partitioning a [[digital image]] into multiple '''image segments''', also known as '''image regions''' or '''image objects''' ([[Set (mathematics)|sets]] of [[pixel]]s). The goal of segmentation is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze.<ref name="computervision">[[Linda Shapiro|Linda G. Shapiro]] and George C. Stockman (2001): “Computer"Computer Vision”Vision", pp 279–325, New Jersey, Prentice-Hall, {{ISBN|0-13-030796-3}}</ref><ref>Barghout, Lauren, and Lawrence W. Lee. "Perceptual information processing system." Paravue Inc. U.S. Patent Application 10/618,543, filed July 11, 2003.</ref> Image segmentation is typically used to locate objects and [[Boundary tracing|boundaries]] (lines, curves, etc.) in images. More precisely, image segmentation is the process of assigning a label to every pixel in an image such that pixels with the same label share certain characteristics.
 
The result of image segmentation is a set of segments that collectively cover the entire image, or a set of [[Contour line|contour]]s extracted from the image (see [[edge detection]]). Each of the pixels in a region are similar with respect to some characteristic or computed property<ref>{{cite conference | last1=Nielsen | first1=Frank | last2=Nock | first2=Richard
Line 11:
 
== Applications ==
[[File:3D CT of thorax.jpg|thumb|Volume segmentation of a 3D-rendered [[CT scan]] of the [[thorax]]: The anterior thoracic wall, the airways and the pulmonary vessels anterior to the root of the lung have been digitally removed in order to visualize thoracic contents: <br />– <span style="color:blue;">blue</span>: [[pulmonary arteries]] <br />– <span style="color:red;">red</span>: [[pulmonary veins]] (and also the [[abdominal wall]])<br />– <span style="color:yellow;">yellow</span>: the [[mediastinum]] <br />– <span style="color:violet;">violet</span>: the [[Thoracic diaphragm|diaphragm]] ]]
 
Some of the practical applications of image segmentation are:
Line 50:
* '''Semantic segmentation''' is an approach detecting, for every pixel, the belonging class.<ref>{{Cite journal|last1=Guo|first1=Dazhou|last2=Pei|first2=Yanting|last3=Zheng|first3=Kang|last4=Yu|first4=Hongkai|last5=Lu|first5=Yuhang|last6=Wang|first6=Song|date=2020|title=Degraded Image Semantic Segmentation With Dense-Gram Networks|journal=IEEE Transactions on Image Processing|volume=29|pages=782–795|doi=10.1109/TIP.2019.2936111|pmid=31449020|bibcode=2020ITIP...29..782G|s2cid=201753511|issn=1057-7149|doi-access=free}}</ref> For example, in a figure with many people, all the pixels belonging to persons will have the same class id and the pixels in the background will be classified as background.
* '''Instance segmentation''' is an approach that identifies, for every pixel, the specific belonging instance of the object. It detects each distinct object of interest in the image.<ref>{{Cite journal|last1=Yi|first1=Jingru|last2=Wu|first2=Pengxiang|last3=Jiang|first3=Menglin|last4=Huang|first4=Qiaoying|last5=Hoeppner|first5=Daniel J.|last6=Metaxas|first6=Dimitris N.|date=July 2019|title=Attentive neural cell instance segmentation|journal=Medical Image Analysis|language=en|volume=55|pages=228–240|doi=10.1016/j.media.2019.05.004|pmid=31103790|s2cid=159038604|doi-access=free}}</ref> For example, when each person in a figure is segmented as an individual object.
* '''Panoptic segmentation''' combines both semantic and instance segmentation. Like semantic segmentation, panoptic segmentation is an approach that identifies, for every pixel, the belonging class. Moreover, like in instance segmentation, panoptic segmentation distinguishes different instances of the same class.<ref name="Panoptic Segmentation">{{cite arXiv|authorsauthor=Alexander Kirillov, |author2=Kaiming He, |author3=Ross Girshick, |author4=Carsten Rother, |author5=Piotr Dollár |title=Panoptic Segmentation|eprint=1801.00868|class=cs.CV|year=2018}}</ref>
 
== Thresholding ==
Line 58:
The key of this method is to select the threshold value (or values when multiple-levels are selected). Several popular methods are used in industry including the maximum entropy method, [[balanced histogram thresholding]], [[Otsu's method]] (maximum variance), and [[k-means clustering]].
 
Recently, methods have been developed for thresholding computed tomography (CT) images. The key idea is that, unlike Otsu's method, the thresholds are derived from the radiographs instead of the (reconstructed) image.<ref>{{cite journal |last1 = Batenburg |first1 = K J. |last2 = Sijbers |first2 = J. |year = 2009|title = Adaptive thresholding of tomograms by projection distance minimization |journal = Pattern Recognition |volume = 42 |issue = 10 |pages = 2297–2305 |doi = 10.1016/j.patcog.2008.11.027 |bibcode = 2009PatRe..42.2297B |citeseerx = 10.1.1.182.8483 }}</ref><ref>{{cite journal |first1 = K J. |last1 = Batenburg |first2 = J. |last2 = Sijbers |title = Optimal Threshold Selection for Tomogram Segmentation by Projection Distance Minimization |journal = IEEE Transactions on Medical Imaging |volume = 28 |issue = 5 |pages = 676–686 |date = June 2009 |url = http://www.visielab.ua.ac.be/publications/optimal-threshold-selection-tomogram-segmentation-projection-distance-minimization |format = PDF |doi = 10.1109/tmi.2008.2010437 |pmid = 19272989 |s2cid = 10994501 |access-date = 2012-07-31 |archive-url = https://web.archive.org/web/20130503171943/http://www.visielab.ua.ac.be/publications/optimal-threshold-selection-tomogram-segmentation-projection-distance-minimization |archive-date = 2013-05-03 |url-status = dead }}</ref>
 
New methods suggested the usage of multi-dimensional fuzzy rule-based non-linear thresholds. In these works decision over each pixel's membership to a segment is based on multi-dimensional rules derived from fuzzy logic and evolutionary algorithms based on image lighting environment and application.<ref>{{cite book |first1 = A. |last1 = Kashanipour |first2 = N |last2 = Milani |first3 = A. |last3 = Kashanipour |first4 = H. |last4 = Eghrary |title = 2008 Congress on Image and Signal Processing |chapter = Robust Color Classification Using Fuzzy Rule-Based Particle Swarm Optimization |publisher = IEEE Congress on Image and Signal Processing |volume = 2 |pages = 110–114 |date = May 2008 |doi = 10.1109/CISP.2008.770 |isbn = 978-0-7695-3119-9 |s2cid = 8422475 }}</ref>
Line 76:
| caption2 = Image after running ''k''-means with ''k = 16''. Note that a common technique to improve performance for large images is to downsample the image, compute the clusters, and then reassign the values to the larger image if necessary.
}}
The [[K-means algorithm]] is an [[iterative]] technique that is used to [[Cluster analysis|partition an image]] into ''K'' clusters.<ref>{{cite journal | last1 = Barghout | first1 = Lauren | last2 = Sheynin | first2 = Jacob | year = 2013 | title = Real-world scene perception and perceptual organization: Lessons from Computer Vision | journal = Journal of Vision | volume = 13 | issue = 9| pagespage = 709 | doi=10.1167/13.9.709| doi-access = free }}</ref> The basic [[algorithm]] is
 
# Pick ''K'' cluster centers, either [[random]]ly or based on some [[heuristic]] method, for example [[K-means++]]
Line 101:
== Compression-based methods ==
 
Compression based methods postulate that the optimal segmentation is the one that minimizes, over all possible segmentations, the coding length of the data.<ref>{{cite journal |author1=Hossein Mobahi |author2=Shankar Rao |author3=Allen Yang |author4=Shankar Sastry |author5=Yi Ma. |url=http://perception.csl.illinois.edu/coding/papers/MobahiH2011-IJCV.pdf |title=Segmentation of Natural Images by Texture and Boundary Compression |journal=International Journal of Computer Vision |volume=95 |pages=86–98 |year=2011 |doi=10.1007/s11263-011-0444-0 |arxiv=1006.3679 |citeseerx=10.1.1.180.3579 |s2cid=11070572 |access-date=2011-05-08 |archive-url=https://web.archive.org/web/20170808173212/http://perception.csl.illinois.edu/coding//papers/MobahiH2011-IJCV.pdf |archive-date=2017-08-08 |url-status=dead }}</ref><ref>Shankar Rao, Hossein Mobahi, Allen Yang, Shankar Sastry and Yi Ma [http://perception.csl.illinois.edu/coding/papers/RaoS2009-ACCV.pdf Natural Image Segmentation with Adaptive Texture and Boundary Encoding] {{Webarchive|url=https://web.archive.org/web/20160519101956/http://perception.csl.illinois.edu/coding/papers/RaoS2009-ACCV.pdf |date=2016-05-19 }}, Proceedings of the Asian Conference on Computer Vision (ACCV) 2009, H. Zha, R.-i. Taniguchi, and S. Maybank (Eds.), Part I, LNCS 5994, pp. 135–146, Springer.</ref> The connection between these two concepts is that segmentation tries to find patterns in an image and any regularity in the image can be used to compress it. The method describes each segment by its texture and boundary shape. Each of these components is modeled by a probability distribution function and its coding length is computed as follows:
 
# The boundary encoding leverages the fact that regions in natural images tend to have a smooth contour. This prior is used by [[Huffman coding]] to encode the difference [[chain code]] of the contours in an image. Thus, the smoother a boundary is, the shorter coding length it attains.
Line 143:
Another [[region-growing]] method is the unseeded region growing method. It is a modified algorithm that does not require explicit seeds. It starts with a single region <math>A_1</math>—the pixel chosen here does not markedly influence the final segmentation. At each iteration it considers the neighboring pixels in the same way as seeded region growing. It differs from seeded region growing in that if the minimum <math>\delta</math> is less than a predefined threshold <math>T</math> then it is added to the respective region <math>A_j</math>. If not, then the pixel is considered different from all current regions <math>A_i</math> and a new region <math>A_{n+1}</math> is created with this pixel.
 
One variant of this technique, proposed by [[Haralick]] and Shapiro (1985),<ref name="computervision" /> is based on pixel [[Brightness|intensities]]. The [[Arithmetic mean|mean]] and [[Statistical dispersion|scatter]] of the region and the intensity of the candidate pixel are used to compute a test statistic. If the test statistic is sufficiently small, the pixel is added to the region, and the region’sregion's mean and scatter are recomputed. Otherwise, the pixel is rejected, and is used to form a new region.
 
A special region-growing method is called <math>\lambda</math>-connected segmentation (see also [[lambda-connectedness]]). It is based on pixel [[Brightness|intensities]] and neighborhood-linking paths. A degree of connectivity (connectedness) is calculated based on a path that is formed by pixels. For a certain value of <math>\lambda</math>, two pixels are called <math>\lambda</math>-connected if there is a path linking those two pixels and the connectedness of this path is at least <math>\lambda</math>. <math>\lambda</math>-connectedness is an equivalence relation.<ref name="lambda-connectedness">L. Chen, H. D. Cheng, and J. Zhang, [https://www.sciencedirect.com/science/article/pii/1069011594900094 Fuzzy subfiber and its application to seismic lithology classification], Information Sciences: Applications, Vol 1, No 2, pp 77–95, 1994.</ref>
Line 320:
An image segmentation [[neural network]] can process small areas of an image to extract simple features such as edges.<ref name="Transactions on Engineering, Computing and Technology">[[Mahinda Pathegama]] & Ö Göl (2004): "Edge-end pixel extraction for edge-based image segmentation", ''Transactions on Engineering, Computing and Technology,'' vol. 2, pp 213–216, ISSN 1305-5313</ref> Another neural network, or any decision-making mechanism, can then combine these features to label the areas of an image accordingly. A type of network designed this way is the [[Kohonen map]].
 
[[Pulse-coupled networks|Pulse-coupled neural networks (PCNNs)]] are neural models proposed by modeling a cat’scat's visual cortex and developed for high-performance [[biomimetic]] [[image processing]]. In 1989, Reinhard Eckhorn introduced a neural model to emulate the mechanism of a cat’scat's visual cortex. The Eckhorn model provided a simple and effective tool for studying the visual cortex of small mammals, and was soon recognized as having significant application potential in image processing. In 1994, the Eckhorn model was adapted to be an image processing algorithm by John L. Johnson, who termed this algorithm Pulse-Coupled Neural Network.<ref>{{cite journal|last1=Johnson|first1=John L.|date=September 1994|title=Pulse-coupled neural nets: translation, rotation, scale, distortion, and intensity signal invariance for images|doi=10.1364/AO.33.006239|pmid=20936043|publisher=OSA|volume=33|journal=Applied Optics|number=26|pages=6239–6253|bibcode=1994ApOpt..33.6239J}}</ref> Over the past decade, PCNNs have been utilized for a variety of image processing applications, including: image segmentation, feature generation, face extraction, motion detection, region growing, noise reduction, and so on. A PCNN is a two-dimensional neural network. Each neuron in the network corresponds to one pixel in an input image, receiving its corresponding pixel’spixel's color information (e.g. intensity) as an external stimulus. Each neuron also connects with its neighboring neurons, receiving local stimuli from them. The external and local stimuli are combined in an internal activation system, which accumulates the stimuli until it exceeds a dynamic threshold, resulting in a pulse output. Through iterative computation, PCNN neurons produce temporal series of pulse outputs. The temporal series of pulse outputs contain information of input images and can be utilized for various image processing applications, such as image segmentation and feature generation. Compared with conventional image processing means, PCNNs have several significant merits, including robustness against noise, independence of geometric variations in input patterns, capability of bridging minor intensity variations in input patterns, etc.
 
[[U-Net]] is a [[convolutional neural network]] which takes as input an image and outputs a label for each pixel.<ref>{{cite arXiv|last1=Ronneberger|first1=Olaf|last2=Fischer|first2=Philipp|last3=Brox|first3=Thomas|title=U-Net: Convolutional Networks for Biomedical Image Segmentation|eprint=1505.04597|date=2015|class=cs.CV}}</ref> U-Net initially was developed to detect cell boundaries in biomedical images. U-Net follows classical [[autoencoder]] architecture, as such it contains two sub-structures. The encoder structure follows the traditional stack of convolutional and max pooling layers to increase the receptive field as it goes through the layers. It is used to capture the context in the image. The decoder structure utilizes transposed convolution layers for upsampling so that the end dimensions are close to that of the input image. Skip connections are placed between convolution and transposed convolution layers of the same shape in order to preserve details that would have been lost otherwise.