Image segmentation: Difference between revisions

Content deleted Content added
Applications: | Alter: template type. Add: s2cid, pmid, pages, issue, volume, journal, year, title, doi, authors 1-4. Changed bare reference to CS1/2. | Use this tool. Report bugs. | #UCB_Gadget
Rescuing 0 sources and tagging 1 as dead.) #IABot (v2.0.9.5
 
(27 intermediate revisions by 22 users not shown)
Line 1:
{{Short description|Partitioning a digital image into segments}}
{{Use dmy dates|date=November 2024}}
[[File:Model of a segmented femur - journal.pone.0079004.g005.png|thumb|Model of a segmented left human [[femur]]. It shows the outer surface (red), the surface between compact bone and spongy bone (green) and the surface of the bone marrow (blue).]]
 
In [[digital image processing]] and [[computer vision]], '''image segmentation''' is the process of partitioning a [[digital image]] into multiple '''image segments''', also known as '''image regions''' or '''image objects''' ([[Set (mathematics)|sets]] of [[pixel]]s). The goal of segmentation is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze.<ref name="computervision">[[Linda Shapiro|Linda G. Shapiro]] and George C. Stockman (2001): “Computer"Computer Vision”Vision", pp 279–325, New Jersey, Prentice-Hall, {{ISBN|0-13-030796-3}}</ref><ref>Barghout, Lauren, and Lawrence W. Lee. "Perceptual information processing system." Paravue Inc. U.S. Patent Application 10/618,543, filed 11 July 11, 2003.</ref> Image segmentation is typically used to locate objects and [[Boundary tracing|boundaries]] (lines, curves, etc.) in images. More precisely, image segmentation is the process of assigning a label to every pixel in an image such that pixels with the same label share certain characteristics.
 
The result of image segmentation is a set of segments that collectively cover the entire image, or a set of [[Contour line|contour]]s extracted from the image (see [[edge detection]]). Each of the pixels in a region are similar with respect to some characteristic or computed property,<ref>{{cite conference | last1=Nielsen | first1=Frank | last2=Nock | first2=Richard
| title=2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings. |
chapter=On region merging: The statistical soundness of fast sorting, with applications |
, publisher=IEEE | year=2003 | volume=2 | doi=10.1109/CVPR.2003.1211447 | pages=II:19–26 | isbn=0-7695-1900-8 }}</ref> such as [[color]], [[luminous intensity|intensity]], or [[Image texture|texture]]. Adjacent regions are significantly different colorwith respect to the same characteristic(s).<ref name="computervision" /> When applied to a stack of images, typical in [[medical imaging]], the resulting contours after image segmentation can be used to create [[3D reconstruction]]s with the help of geometry reconstruction algorithms like [[marching cubes]].<ref>Zachow, Stefan, Michael Zilske, and Hans-Christian Hege. "[https://opus4.kobv.de/opus4-zib/files/1044/ZR_07_41.pdf 3D reconstruction of individual anatomy from medical image data: Segmentation and geometry processing]{{Dead link|date=August 2025 |bot=InternetArchiveBot |fix-attempted=yes }}." (2007).</ref>
publisher=IEEE | year=2003 | volume=2 | doi=10.1109/CVPR.2003.1211447 | pages=II:19–26 | isbn=0-7695-1900-8 }}</ref>
, such as [[color]], [[luminous intensity|intensity]], or [[Image texture|texture]]. Adjacent regions are significantly different color respect to the same characteristic(s).<ref name="computervision" /> When applied to a stack of images, typical in [[medical imaging]], the resulting contours after image segmentation can be used to create [[3D reconstruction]]s with the help of geometry reconstruction algorithms like [[marching cubes]].<ref>Zachow, Stefan, Michael Zilske, and Hans-Christian Hege. "[https://opus4.kobv.de/opus4-zib/files/1044/ZR_07_41.pdf 3D reconstruction of individual anatomy from medical image data: Segmentation and geometry processing]." (2007).</ref>
 
== Applications ==
[[File:3D CT of thorax.jpg|thumb|Volume segmentation of a 3D-rendered [[CT scan]] of the [[thorax]]: The anterior thoracic wall, the airways and the pulmonary vessels anterior to the root of the lung have been digitally removed in order to visualize thoracic contents: <br />– <span style="color:blue;">blue</span>: [[pulmonary arteries]] <br />– <span style="color:red;">red</span>: [[pulmonary veins]] (and also the [[abdominal wall]])<br />– <span style="color:yellow;">yellow</span>: the [[mediastinum]] <br />– <span style="color:violet;">violet</span>: the [[Thoracic diaphragm|diaphragm]] ]]
 
Some of the practical applications of image segmentation are:
Line 17:
* [[Content-based image retrieval]]<ref>Belongie, Serge, et al. "[http://people.eecs.berkeley.edu/~malik/papers/blobworld98.pdf Color-and texture-based image segmentation using EM and its application to content-based image retrieval]." Sixth International Conference on Computer Vision (IEEE Cat. No. 98CH36271). IEEE, 1998.</ref>
* [[Machine vision]]
* [[Medical imaging]],<ref>{{cite journal | last1 = Pham | first1 = Dzung L. | last2 = Xu | first2 = Chenyang | last3 = Prince | first3 = Jerry L. | year = 2000 | title = Current Methods in Medical Image Segmentation | journal = Annual Review of Biomedical Engineering | volume = 2 | pages = 315–337 | pmid = 11701515 | doi = 10.1146/annurev.bioeng.2.1.315 }}</ref><ref>{{cite journal | last1 = Forghani| first1 = M. | last2 = Forouzanfar | first2 = M.| last3 = Teshnehlab| first3 = M. | year = 2010 | title = Parameter optimization of improved fuzzy c-means clustering algorithm for brain MR image segmentation | journal = Engineering Applications of Artificial Intelligence | volume = 23 | issue = 2 | pages = 160–168 | doi = 10.1016/j.engappai.2009.10.002 }}</ref> and imaging studies in biomedical research, including [[volume rendering|volume rendered]] images from [[CT scan|computed tomography]] and, [[magnetic resonance imaging]], as well as volume electron microscopy techniques such as FIB-SEM.<ref>{{Cite journal |last1=Reznikov |first1=Natalie |last2=Buss |first2=Dan J. |last3=Provencher |first3=Benjamin |last4=McKee |first4=Marc D. |last5=Piché |first5=Nicolas |date=October 2020 |title=Deep learning for 3D imaging and image analysis in biomineralization research |url=http://dx.doi.org/10.1016/j.jsb.2020.107598 |journal=Journal of Structural Biology |volume=212 |issue=1 |pages=107598 |doi=10.1016/j.jsb.2020.107598 |pmid=32783967 |s2cid=221126896 |issn=1047-8477|url-access=subscription }}</ref>
** Locate tumors and other pathologies<ref>{{cite journal | url=https://link.springer.com/article/10.1007/s11548-013-0922-7 | doi=10.1007/s11548-013-0922-7 | title=Brain tumor detection and segmentation in a CRF (Conditional random fields) framework with pixel-pairwise affinity and superpixel-level features | year=2014 | last1=Wu | first1=Wei | last2=Chen | first2=Albert Y. C. | last3=Zhao | first3=Liang | last4=Corso | first4=Jason J. | journal=International Journal of Computer Assisted Radiology and Surgery | volume=9 | issue=2 | pages=241–253 | pmid=23860630 | s2cid=13474403 | url-access=subscription }}</ref><ref>E. B. George and M. Karnan (2012): "[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.411.7411&rep=rep1&type=pdf MR Brain image segmentation using Bacteria Foraging Optimization Algorithm]", ''International Journal of Engineering and Technology'', Vol. 4.</ref>
** Measure tissue volumes<ref>{{Cite journal |last1=Ye |first1=Run Zhou |last2=Noll |first2=Christophe |last3=Richard |first3=Gabriel |last4=Lepage |first4=Martin |last5=Turcotte |first5=Éric E. |last6=Carpentier |first6=André C. |date=February 2022 |title=DeepImageTranslator: A free, user-friendly graphical interface for image translation using deep-learning and its applications in 3D CT image analysis |journal=SLAS Technology |volume=27 |issue=1 |pages=76–84 |doi=10.1016/j.slast.2021.10.014 |pmid=35058205 |issn=2472-6303|doi-access=free }}</ref><ref>{{Cite journal |lastlast1=Ye |firstfirst1=En Zhou |last2=Ye |first2=En Hui |last3=Bouthillier |first3=Maxime |last4=Ye |first4=Run Zhou |date=2022-02-18 February 2022 |title=DeepImageTranslator V2: analysis of multimodal medical images using semantic segmentation maps generated through deep learning |language=en |biorxiv=10.1101/2021.10.12.464160464160v2 |doi=10.1101/2021.10.12.464160 |s2cid=239012446|doi-access=free }}</ref>
** Diagnosis, study of anatomical structure<ref>{{cite journal|last1=Kamalakannan|first1=Sridharan|last2=Gururajan|first2=Arunkumar|last3=Sari-Sarraf|first3=Hamed|last4=Rodney|first4=Long|last5=Antani|first5=Sameer|title=Double-Edge Detection of Radiographic Lumbar Vertebrae Images Using Pressurized Open DGVF Snakes|journal=IEEE Transactions on Biomedical Engineering|date=17 February 2010|volume=57|issue=6|pages=1325–1334|doi=10.1109/tbme.2010.2040082|pmid=20172792|s2cid=12766600}}</ref>
** Surgery planning
** Virtual surgery simulation
** Intra-surgery navigation
** Radiotherapy<ref>{{Cite arXiv |last1=Georgescu |first1=Mariana-Iuliana |last2=Ionescu |first2=Radu Tudor |last3=Miron |first3=Andreea-Iuliana |date=2022-12-21 December 2022 |title=Diversity-Promoting Ensemble for Medical Image Segmentation |class=eess.IV |eprint=2210.12388 }}</ref>
* [[Object detection]]<ref>J. A. Delmerico, P. David and J. J. Corso (2011): "[http://www.jeffdelmerico.com/wp-content/papercite-data/pdf/delmerico2011building.pdf Building façade detection, segmentation and parameter estimation for mobile robot localization and guidance]", International Conference on Intelligent Robots and Systems, pp. 1632–1639.</ref>
** [[Pedestrian detection]]
Line 34:
** [[Fingerprint recognition]]
** [[Iris recognition]]
** Prohibited Item at [[Airport security]] checkpoints
* Traffic control systems
* [[Video surveillance]]
Line 48 ⟶ 49:
 
* '''Semantic segmentation''' is an approach detecting, for every pixel, the belonging class.<ref>{{Cite journal|last1=Guo|first1=Dazhou|last2=Pei|first2=Yanting|last3=Zheng|first3=Kang|last4=Yu|first4=Hongkai|last5=Lu|first5=Yuhang|last6=Wang|first6=Song|date=2020|title=Degraded Image Semantic Segmentation With Dense-Gram Networks|journal=IEEE Transactions on Image Processing|volume=29|pages=782–795|doi=10.1109/TIP.2019.2936111|pmid=31449020|bibcode=2020ITIP...29..782G|s2cid=201753511|issn=1057-7149|doi-access=free}}</ref> For example, in a figure with many people, all the pixels belonging to persons will have the same class id and the pixels in the background will be classified as background.
* '''Instance segmentation''' is an approach that identifies, for every pixel, the specific belonging instance of the object. It detects each distinct object of interest in the image.<ref>{{Cite journal|last1=Yi|first1=Jingru|last2=Wu|first2=Pengxiang|last3=Jiang|first3=Menglin|last4=Huang|first4=Qiaoying|last5=Hoeppner|first5=Daniel J.|last6=Metaxas|first6=Dimitris N.|date=July 2019|title=Attentive neural cell instance segmentation|journal=Medical Image Analysis|language=en|volume=55|pages=228–240|doi=10.1016/j.media.2019.05.004|pmid=31103790|s2cid=159038604|doi-access=free}}</ref> For example, when each person in a figure is segmented as an individual object.
* '''Panoptic segmentation''' combines both semantic and instance segmentation. Like semantic segmentation, panoptic segmentation is an approach that identifies, for every pixel, the belonging class. Moreover, like in instance segmentation, panoptic segmentation distinguishes different instances of the same class.<ref name="Panoptic Segmentation">{{cite arXiv|authorsauthor=Alexander Kirillov, |author2=Kaiming He, |author3=Ross Girshick, |author4=Carsten Rother, |author5=Piotr Dollár |title=Panoptic Segmentation|eprint=1801.00868|class=cs.CV|year=2018}}</ref>
 
== Thresholding ==
Line 57 ⟶ 58:
The key of this method is to select the threshold value (or values when multiple-levels are selected). Several popular methods are used in industry including the maximum entropy method, [[balanced histogram thresholding]], [[Otsu's method]] (maximum variance), and [[k-means clustering]].
 
Recently, methods have been developed for thresholding computed tomography (CT) images. The key idea is that, unlike Otsu's method, the thresholds are derived from the radiographs instead of the (reconstructed) image.<ref>{{cite journal |last1 = Batenburg |first1 = K J. |last2 = Sijbers |first2 = J. |year = 2009|title = Adaptive thresholding of tomograms by projection distance minimization |journal = Pattern Recognition |volume = 42 |issue = 10 |pages = 2297–2305 |doi = 10.1016/j.patcog.2008.11.027 |bibcode = 2009PatRe..42.2297B |citeseerx = 10.1.1.182.8483 }}</ref><ref>{{cite journal |first1 = K J. |last1 = Batenburg |first2 = J. |last2 = Sijbers |title = Optimal Threshold Selection for Tomogram Segmentation by Projection Distance Minimization |journal = IEEE Transactions on Medical Imaging |volume = 28 |issue = 5 |pages = 676–686 |date = June 2009 |url = http://www.visielab.ua.ac.be/publications/optimal-threshold-selection-tomogram-segmentation-projection-distance-minimization |format = PDF |doi = 10.1109/tmi.2008.2010437 |pmid = 19272989 |s2cid = 10994501 |access-date = 2012-07-31 July 2012 |archive-url = https://web.archive.org/web/20130503171943/http://www.visielab.ua.ac.be/publications/optimal-threshold-selection-tomogram-segmentation-projection-distance-minimization |archive-date = 2013-05-033 |url-statusMay =2013 dead }}</ref>
 
New methods suggestedsuggest the usageuse of multi-dimensional, fuzzy rule-based, non-linear thresholds. In these worksapproaches, the decision overregarding each pixel's membership toin a segment is based on multi-dimensional rules derived from fuzzy logic and evolutionary algorithms, basedconsidering onfactors such as image lighting, environment, and application.<ref>{{cite book |first1 = A. |last1 = Kashanipour |first2 = N |last2 = Milani |first3 = A. |last3 = Kashanipour |first4 = H. |last4 = Eghrary |title = 2008 Congress on Image and Signal Processing |chapter = Robust Color Classification Using Fuzzy Rule-Based Particle Swarm Optimization |journalpublisher = IEEE Congress on Image and Signal Processing |volume = 2 |pages = 110–114 |date = May 2008 |doi = 10.1109/CISP.2008.770 |isbn = 978-0-7695-3119-9 |s2cid = 8422475 }}</ref>
 
== Clustering methods ==
Line 68 ⟶ 69:
| direction = vertical
| width = 300
| image1 = PolarlichtAurora 2borealis over Eielson Air Force Base, Alaska.jpg
| alt1 = Original image
| caption1 = Source image.
Line 75 ⟶ 76:
| caption2 = Image after running ''k''-means with ''k = 16''. Note that a common technique to improve performance for large images is to downsample the image, compute the clusters, and then reassign the values to the larger image if necessary.
}}
The [[K-means algorithm]] is an [[iterative]] technique that is used to [[Cluster analysis|partition an image]] into ''K'' clusters.<ref>{{cite journal | last1 = Barghout | first1 = Lauren | last2 = Sheynin | first2 = Jacob | year = 2013 | title = Real-world scene perception and perceptual organization: Lessons from Computer Vision | journal = Journal of Vision | volume = 13 | issue = 9| pagespage = 709 | doi=10.1167/13.9.709| doi-access = free }}</ref> The basic [[algorithm]] is
 
# Pick ''K'' cluster centers, either [[random]]ly or based on some [[heuristic]] method, for example [[K-means++]]
Line 100 ⟶ 101:
== Compression-based methods ==
 
Compression based methods postulate that the optimal segmentation is the one that minimizes, over all possible segmentations, the coding length of the data.<ref>{{cite journal |author1=Hossein Mobahi |author2=Shankar Rao |author3=Allen Yang |author4=Shankar Sastry |author5=Yi Ma. |url=http://perception.csl.illinois.edu/coding/papers/MobahiH2011-IJCV.pdf |title=Segmentation of Natural Images by Texture and Boundary Compression |journal=International Journal of Computer Vision |volume=95 |pages=86–98 |year=2011 |doi=10.1007/s11263-011-0444-0 |arxiv=1006.3679 |citeseerx=10.1.1.180.3579 |s2cid=11070572 |access-date=8 May 2011-05-08 |archive-url=https://web.archive.org/web/20170808173212/http://perception.csl.illinois.edu/coding//papers/MobahiH2011-IJCV.pdf |archive-date=8 August 2017-08-08 |url-status=dead }}</ref><ref>Shankar Rao, Hossein Mobahi, Allen Yang, Shankar Sastry and Yi Ma [http://perception.csl.illinois.edu/coding/papers/RaoS2009-ACCV.pdf Natural Image Segmentation with Adaptive Texture and Boundary Encoding] {{Webarchive|url=https://web.archive.org/web/20160519101956/http://perception.csl.illinois.edu/coding/papers/RaoS2009-ACCV.pdf |date=2016-05-19 May 2016 }}, Proceedings of the Asian Conference on Computer Vision (ACCV) 2009, H. Zha, R.-i. Taniguchi, and S. Maybank (Eds.), Part I, LNCS 5994, pp. 135–146, Springer.</ref> The connection between these two concepts is that segmentation tries to find patterns in an image and any regularity in the image can be used to compress it. The method describes each segment by its texture and boundary shape. Each of these components is modeled by a probability distribution function and its coding length is computed as follows:
 
# The boundary encoding leverages the fact that regions in natural images tend to have a smooth contour. This prior is used by [[Huffman coding]] to encode the difference [[chain code]] of the contours in an image. Thus, the smoother a boundary is, the shorter coding length it attains.
Line 127 ⟶ 128:
 
Segmentation methods can also be applied to edges obtained from edge detectors. Lindeberg and Li<ref>{{cite journal | last1 = Lindeberg | first1 = T. | last2 = Li | first2 = M.-X. | year = 1997 | title = Segmentation and classification of edges using minimum description length approximation and complementary junction cues | url = http://kth.diva-portal.org/smash/record.jsf?pid=diva2%3A473385&dswid=-8029 | journal = Computer Vision and Image Understanding | volume = 67 | issue = 1| pages = 88–98 | doi=10.1006/cviu.1996.0510}}</ref> developed an integrated method that segments edges into straight and curved edge segments for parts-based object recognition, based on a minimum description length (M<sub>DL</sub>) criterion that was optimized by a split-and-merge-like method with candidate breakpoints obtained from complementary junction cues to obtain more likely points at which to consider partitions into different segments.
 
== Isolated Point Detection ==
 
The detection of isolated points in an image is a fundamental part of image segmentation. This process primarily depends on the second derivative, indicating the use of the Laplacian operator. The Laplacian of a function <math>f(x, y)</math> is given by:
 
:<math> \nabla^2 f(x,y) = \frac{{\partial^2 f}}{{\partial x^2}} + \frac{{\partial^2 f}}{{\partial y^2}} </math>
 
The Laplacian operator is employed such that the partial derivatives are derived from a specific equation. The second partial derivative of <math>f(x, y)</math> with respect to <math>x</math> and <math>y</math> are given by:
 
:<math> \frac{{\partial^2 f(x,y)}}{{\partial x^2}} = f(x+1,y) + f(x-1,y) - 2f(x,y) </math>
 
:<math> \frac{{\partial^2 f(x,y)}}{{\partial y^2}} = f(x,y+1) + f(x,y-1) - 2f(x,y) </math>
 
These partial derivatives are then used to compute the Laplacian as:
 
:<math> \nabla^2 f(x,y) = f(x+1,y) + f(x-1,y) + f(x,y+1) + f(x,y-1) - 4f(x,y) </math>
 
This mathematical expression can be implemented by convolving with an appropriate mask. If we extend this equation to three dimensions (x,y,z), the intensity at each pixel ___location around a central pixel at (x, y, z) is replaced by their corresponding values. This equation becomes particularly useful when we assume that all pixels have unit spacing along each axis.
 
A sphere mask has been developed for use with three-dimensional datasets. The sphere mask is designed to use only integer arithmetic during calculations, thereby eliminating the need for floating-point hardware or software.
 
When applying these concepts to actual images represented as arrays of numbers, we need to consider what happens when we reach an edge or border region. The function <math>g(x, y)</math> is defined as:
 
:<math> g(x, y) = \begin{cases} 1 & \text{if } |R(x, y)| \geq T \\ 0 & \text{otherwise} \end{cases} </math>
 
This above equation is used to determine whether a point in the image is an isolated point based on the response magnitude <math>|R(x, y)|</math> and a threshold value <math>T</math>. If the response magnitude is greater than or equal to the threshold, the function returns 1, indicating the presence of an isolated point; otherwise, it returns 0. This helps in the effective detection and segmentation of isolated points in the image.<ref>Digital Image Processing (2007, Pearson) by Rafael C. Gonzalez, Richard E. Woods</ref>
 
== Application of Isolated Point Detection in X-ray Image Processing ==
 
The detection of isolated points has significant applications in various fields, including X-ray image processing. For instance, an original X-ray image of a turbine blade can be examined pixel-by-pixel to detect porosity in the upper-right quadrant of the blade. The result of applying an edge detector’s response to this X-ray image can be approximated. This demonstrates the segmentation of isolated points in an image with the aid of single-pixel probes.<ref>Digital Image Processing (2007, Pearson) by Rafael C. Gonzalez, Richard E. Woods</ref>
 
== Dual clustering method ==
 
This method is a combination of three characteristics of the image: partition of the image based on histogram analysis is checked by high compactness of the clusters (objects), and high gradients of their borders. For that purpose two spaces have to be introduced: one space is the one-dimensional histogram of brightness ''H'' =&nbsp;''H''(''B''); the second space is the dual 3-dimensional space of the original image itself ''B'' =&nbsp;''B''(''x'',&nbsp;''y''). The first space allows to measure how compactly the brightness of the image is distributed by calculating a minimal clustering kmin. Threshold brightness T corresponding to kmin defines the binary (black-and-white) image&nbsp;– bitmap ''b'' =&nbsp;''φ''(''x'',&nbsp;''y''), where ''φ''(''x'',&nbsp;''y'') =&nbsp;0, if ''B''(''x'',&nbsp;''y'')&nbsp;<&nbsp;''T'', and ''φ''(''x'',&nbsp;''y'') =&nbsp;1, if ''B''(''x'',&nbsp;''y'')&nbsp;≥&nbsp;''T''. The bitmap ''b'' is an object in dual space. On that bitmap a measure has to be defined reflecting how compact distributed black (or white) pixels are. So, the goal is to find objects with good borders. For all ''T'' the measure ''M''<sub>DC</sub> =&nbsp;''G''/(''k''&nbsp;×&nbsp;''L'') has to be calculated (where ''k'' is difference in brightness between the object and the background, ''L'' is length of all borders, and ''G'' is mean gradient on the borders). Maximum of MDC defines the segmentation.<ref>[http://gth.krammerbuch.at/sites/default/files/articles/AHAH%20callback/01_Guberman_KORR.pdf] {{Webarchive|url=https://web.archive.org/web/20171013224758/http://gth.krammerbuch.at/sites/default/files/articles/AHAH%20callback/01_Guberman_KORR.pdf|date=2017-10-13 October 2017}}[[Guberman Shelia (Shelija)|Shelia Guberman]]<span>, Vadim V. Maximov, Alex Pashintsev Gestalt and Image Understanding. GESTALT THEORY 2012, Vol. 34, No.2, 143–166.</span></ref>
 
== Region-growing methods ==
Line 142 ⟶ 173:
Another [[region-growing]] method is the unseeded region growing method. It is a modified algorithm that does not require explicit seeds. It starts with a single region <math>A_1</math>—the pixel chosen here does not markedly influence the final segmentation. At each iteration it considers the neighboring pixels in the same way as seeded region growing. It differs from seeded region growing in that if the minimum <math>\delta</math> is less than a predefined threshold <math>T</math> then it is added to the respective region <math>A_j</math>. If not, then the pixel is considered different from all current regions <math>A_i</math> and a new region <math>A_{n+1}</math> is created with this pixel.
 
One variant of this technique, proposed by [[Haralick]] and Shapiro (1985),<ref name="computervision" /> is based on pixel [[Brightness|intensities]]. The [[Arithmetic mean|mean]] and [[Statistical dispersion|scatter]] of the region and the intensity of the candidate pixel are used to compute a test statistic. If the test statistic is sufficiently small, the pixel is added to the region, and the region’sregion's mean and scatter are recomputed. Otherwise, the pixel is rejected, and is used to form a new region.
 
A special region-growing method is called <math>\lambda</math>-connected segmentation (see also [[lambda-connectedness]]). It is based on pixel [[Brightness|intensities]] and neighborhood-linking paths. A degree of connectivity (connectedness) is calculated based on a path that is formed by pixels. For a certain value of <math>\lambda</math>, two pixels are called <math>\lambda</math>-connected if there is a path linking those two pixels and the connectedness of this path is at least <math>\lambda</math>. <math>\lambda</math>-connectedness is an equivalence relation.<ref name="lambda-connectedness">L. Chen, H. D. Cheng, and J. Zhang, [https://www.sciencedirect.com/science/article/pii/1069011594900094 Fuzzy subfiber and its application to seismic lithology classification], Information Sciences: Applications, Vol 1, No 2, pp 77–95, 1994.</ref>
Line 148 ⟶ 179:
[[Split and merge segmentation|Split-and-merge segmentation]] is based on a [[quadtree]] partition of an image. It is sometimes called quadtree segmentation.
 
This method starts at the root of the tree that represents the whole image. If it is found non-uniform (not homogeneous), then it is split into four child squares (the splitting process), and so on. If, in contrast, four child squares are homogeneous, they are merged as several connected components (the merging process). The node in the tree is a segmented node. This process continues recursively until no further splits or merges are possible.<ref name="split-and-merge1">S.L. Horowitz and T. Pavlidis, Picture Segmentation by a Directed Split and Merge Procedure, Proc. ICPR, 1974, Denmark, pp. 424–433.</ref><ref name="split-and-merge2">S.L. Horowitz and T. Pavlidis, Picture Segmentation by a Tree Traversal Algorithm, Journal of the ACM, 23 (1976), pp. 368–388.</ref> When a special data structure is involved in the implementation of the algorithm of the method, its time complexity can reach <math>O(n\log n)</math>, an optimal algorithm of the method.<ref name="split-and-merge3">L. Chen, [http://www.spclab.com/research/lambda/lambdaConn91.pdf The lambda-connected segmentation and the optimal algorithm for split-and-merge segmentation] {{Webarchive|url=https://web.archive.org/web/20160310054934/http://www.spclab.com/research/lambda/lambdaConn91.pdf |date=2016-03-10 March 2016 }}, Chinese J. Computers, 14(1991), pp 321–331</ref>
 
== Partial differential equation-based methods ==
Line 182 ⟶ 213:
 
== Graph partitioning methods ==
[[Graph (data structure)|Graph]] partitioning methods are an effective tools for image segmentation since they model the impact of pixel neighborhoods on a given cluster of pixels or pixel, under the assumption of homogeneity in images. In these methods, the image is modeled as a weighted, [[undirected graph]]. Usually a pixel or a group of pixels are associated with [[Vertex (graph theory)|nodes]] and [[Glossary of graph theory#Basics|edge]] weights define the (dis)similarity between the neighborhood pixels. The graph (image) is then partitioned according to a criterion designed to model "good" clusters. Each partition of the nodes (pixels) output from these algorithms are considered an object segment in the image; see [[Segmentation-based object categorization]]. Some popular algorithms of this category are normalized cuts,<ref>Jianbo Shi and [[Jitendra Malik]] (2000): [https://www.cs.cmu.edu/~jshi/papers/pami_ncut.pdf "Normalized Cuts and Image Segmentation"], ''IEEE Transactions on Pattern Analysis and Machine Intelligence'', pp 888–905, Vol. 22, No. 8</ref> [[random walker (computer vision)|random walker]],<ref>Leo Grady (2006): [http://vision.cse.psu.edu/people/chenpingY/paper/grady2006random.pdf "Random Walks for Image Segmentation"], ''IEEE Transactions on Pattern Analysis and Machine Intelligence'', pp. 1768–1783, Vol. 28, No. 11</ref> minimum cut,<ref>Z. Wu and R. Leahy (1993): [ftp://sipi.usc.edu/pub/leahy/pdfs/MAP93.pdf "An optimal graph theoretic approach to data clustering: Theory and its application to image segmentation"]{{Deaddead link|date=January 2020May 2025|bot=InternetArchiveBot medic}}{{cbignore|fix-attemptedbot=yes medic}}, ''IEEE Transactions on Pattern Analysis and Machine Intelligence'', pp. 1101–1113, Vol. 15, No. 11</ref> isoperimetric partitioning,<ref>Leo Grady and Eric L. Schwartz (2006): [http://www.cns.bu.edu/~lgrady/grady2006isoperimetric.pdf "Isoperimetric Graph Partitioning for Image Segmentation"] {{Webarchive|url=https://web.archive.org/web/20110719090404/http://www.cns.bu.edu/~lgrady/grady2006isoperimetric.pdf |date=2011-07-19 July 2011 }}, ''IEEE Transactions on Pattern Analysis and Machine Intelligence'', pp. 469–475, Vol. 28, No. 3</ref> [[minimum spanning tree-based segmentation]],<ref>C. T. Zahn (1971): [http://web.cse.msu.edu/~cse802/Papers/zahn.pdf "Graph-theoretical methods for detecting and describing gestalt clusters"], ''IEEE Transactions on Computers'', pp. 68–86, Vol. 20, No. 1</ref> and [[segmentation-based object categorization]].
 
=== Markov random fields ===
Line 246 ⟶ 277:
A range of other methods exist for solving simple as well as higher order MRFs. They include Maximization of Posterior Marginal, Multi-scale MAP estimation,<ref>A. Bouman and M. Shapiro (2002): "A multiscale Random field model for Bayesian image segmentation", IEEE Transactions on Image Processing, pp. 162–177, Vol. 3.</ref> Multiple Resolution segmentation<ref>J. Liu and Y. H. Yang (1994): "[https://ieeexplore.ieee.org/abstract/document/297949/ Multiresolution color image segmentation]", IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 689–700, Vol. 16.</ref> and more. Apart from likelihood estimates, graph-cut using maximum flow<ref>S. Vicente, V. Kolmogorov and C. Rother (2008): "[http://www0.cs.ucl.ac.uk/staff/s.vicente/papers/connectedGC-CVPR08-TR.pdf Graph cut based image segmentation with connectivity priors]", CVPR</ref> and other highly constrained graph based methods<ref>Corso, Z. Tu, and A. Yuille (2008): "MRF Labelling with Graph-Shifts Algorithm", Proceedings of International workshop on combinatorial Image Analysis</ref><ref>B. J. Frey and D. MacKayan (1997): "[http://papers.nips.cc/paper/1467-a-revolution-belief-propagation-in-graphs-with-cycles.pdf A Revolution: Belief propagation in Graphs with Cycles]", Proceedings of Neural Information Processing Systems (NIPS)</ref> exist for solving MRFs.
 
==== Image segmentation using MRFMAP and expectation–maximization ====
 
The [[expectation–maximization algorithm]] is utilized to iteratively estimate the a posterior probabilities and distributions of labeling when no training data is available and no estimate of segmentation model can be formed. A general approach is to use histograms to represent the features of an image and proceed as outlined briefly in this three-step algorithm:
Line 304 ⟶ 335:
Lindeberg<ref>[http://kth.diva-portal.org/smash/record.jsf?pid=diva2%3A472969&dswid=2693 Lindeberg, T.: Detecting salient blob-like image structures and their scales with a scale-space primal sketch: A method for focus-of-attention, International Journal of Computer Vision, 11(3), 283–318, 1993.]</ref><ref name=lin94>[http://www.csc.kth.se/~tony/book.html Lindeberg, Tony, Scale-Space Theory in Computer Vision, Kluwer Academic Publishers, 1994], {{ISBN|0-7923-9418-6}}</ref> studied the problem of linking local extrema and saddle points over scales, and proposed an image representation called the scale-space primal sketch which makes explicit the relations between structures at different scales, and also makes explicit which image features are stable over large ranges of scale including locally appropriate scales for those. Bergholm proposed to detect edges at coarse scales in scale-space and then trace them back to finer scales with manual choice of both the coarse detection scale and the fine localization scale.
 
Gauch and Pizer<ref>[http://portal.acm.org/citation.cfm?coll=GUIDE&dl=GUIDE&id=628490 Gauch, J. and Pizer, S.: Multiresolution analysis of ridges and valleys in grey-scale images, IEEE Transactions on Pattern Analysis and Machine Intelligence, 15:6 (June 1993), pages: 635–646, 1993.]</ref> studied the complementary problem of ridges and valleys at multiple scales and developed a tool for interactive image segmentation based on multi-scale watersheds. The use of multi-scale watershed with application to the gradient map has also been investigated by Olsen and Nielsen<ref>Olsen, O. and Nielsen, M.: [https://link.springer.com/content/pdf/10.1007/3-540-63507-6_178.pdf Multi-scale gradient magnitude watershed segmentation], Proc. of ICIAP 97, Florence, Italy, Lecture Notes in Computer Science, pages 6–13. Springer Verlag, September 1997.</ref> and been carried over to clinical use by Dam.<ref>Dam, E., Johansen, P., Olsen, O. Thomsen,, A. Darvann, T., Dobrzenieck, A., Hermann, N., Kitai, N., Kreiborg, S., Larsen, P., Nielsen, M.: "Interactive multi-scale segmentation in clinical use" in European Congress of Radiology 2000.</ref> Vincken et al.<ref>{{Cite journal |doi=10.1109/34.574787 |title=Probabilistic multiscale image segmentation |year=1997 |last1=Vincken |first1=K.L. |last2=Koster |first2=A.S.E. |last3=Viergever |first3=M.A. |journal=IEEE Transactions on Pattern Analysis and Machine Intelligence |volume=19 |issue=2 |pages=109–120 }}</ref> proposed a hyperstack for defining probabilistic relations between image structures at different scales. The use of stable image structures over scales has been furthered by Ahuja<ref>[http://vision.ai.uiuc.edu/~msingh/segmen/seg/MSS.html M. Tabb and N. Ahuja, Unsupervised multiscale image segmentation by integrated edge and region detection, IEEE Transactions on Image Processing, Vol. 6, No. 5, 642–655, 1997.] {{webarchive |url=https://web.archive.org/web/20110720084911/http://vision.ai.uiuc.edu/~msingh/segmen/seg/MSS.html |date=July 20, July 2011 }}</ref><ref>{{cite book | chapter-url=https://doi.org/10.1007%2F978-3-642-12307-8_12 | doi=10.1007/978-3-642-12307-8_12 | chapter=From Ramp Discontinuities to Segmentation Tree | title=Computer Vision – ACCV 2009 | series=Lecture Notes in Computer Science | year=2010 | last1=Akbas | first1=Emre | last2=Ahuja | first2=Narendra | volume=5994 | pages=123–134 | isbn=978-3-642-12306-1 }}</ref> and his co-workers into a fully automated system. A fully automatic brain segmentation algorithm based on closely related ideas of multi-scale watersheds has been presented by Undeman and Lindeberg<ref>[http://kth.diva-portal.org/smash/record.jsf?pid=diva2%3A451266&dswid=-4540 C. Undeman and T. Lindeberg (2003) "Fully Automatic Segmentation of MRI Brain Images using Probabilistic Anisotropic Diffusion and Multi-Scale Watersheds", Proc. Scale-Space'03, Isle of Skye, Scotland, Springer Lecture Notes in Computer Science, volume 2695, pages 641–656.]</ref> and been extensively tested in brain databases.
 
These ideas for multi-scale image segmentation by linking image structures over scales have also been picked up by Florack and Kuijper.<ref>Florack, L. and Kuijper, A.: The topological structure of scale-space images, Journal of Mathematical Imaging and Vision, 12:1, 65–79, 2000.</ref> Bijaoui and Rué<ref>{{cite journal | last1 = Bijaoui | first1 = A. | last2 = Rué | first2 = F. | year = 1995 | title = A Multiscale Vision Model | journal = Signal Processing | volume = 46 | issue = 3| page = 345 | doi=10.1016/0165-1684(95)00093-4}}</ref> associate structures detected in scale-space above a minimum noise threshold into an object tree which spans multiple scales and corresponds to a kind of feature in the original signal. Extracted features are accurately reconstructed using an iterative conjugate gradient matrix method.
Line 319 ⟶ 350:
An image segmentation [[neural network]] can process small areas of an image to extract simple features such as edges.<ref name="Transactions on Engineering, Computing and Technology">[[Mahinda Pathegama]] & Ö Göl (2004): "Edge-end pixel extraction for edge-based image segmentation", ''Transactions on Engineering, Computing and Technology,'' vol. 2, pp 213–216, ISSN 1305-5313</ref> Another neural network, or any decision-making mechanism, can then combine these features to label the areas of an image accordingly. A type of network designed this way is the [[Kohonen map]].
 
[[Pulse-coupled networks|Pulse-coupled neural networks (PCNNs)]] are neural models proposed by modeling a cat’scat's visual cortex and developed for high-performance [[biomimetic]] [[image processing]]. In 1989, Reinhard Eckhorn introduced a neural model to emulate the mechanism of a cat’scat's visual cortex. The Eckhorn model provided a simple and effective tool for studying the visual cortex of small mammals, and was soon recognized as having significant application potential in image processing. In 1994, the Eckhorn model was adapted to be an image processing algorithm by John L. Johnson, who termed this algorithm Pulse-Coupled Neural Network.<ref>{{cite journal|last1=Johnson|first1=John L.|date=September 1994|title=Pulse-coupled neural nets: translation, rotation, scale, distortion, and intensity signal invariance for images|doi=10.1364/AO.33.006239|pmid=20936043|publisher=OSA|volume=33|journal=Applied Optics|number=26|pages=6239–6253|bibcode=1994ApOpt..33.6239J}}</ref> Over the past decade, PCNNs have been utilized for a variety of image processing applications, including: image segmentation, feature generation, face extraction, motion detection, region growing, noise reduction, and so on. A PCNN is a two-dimensional neural network. Each neuron in the network corresponds to one pixel in an input image, receiving its corresponding pixel’spixel's color information (e.g. intensity) as an external stimulus. Each neuron also connects with its neighboring neurons, receiving local stimuli from them. The external and local stimuli are combined in an internal activation system, which accumulates the stimuli until it exceeds a dynamic threshold, resulting in a pulse output. Through iterative computation, PCNN neurons produce temporal series of pulse outputs. The temporal series of pulse outputs contain information of input images and can be utilized for various image processing applications, such as image segmentation and feature generation. Compared with conventional image processing means, PCNNs have several significant merits, including robustness against noise, independence of geometric variations in input patterns, capability of bridging minor intensity variations in input patterns, etc.
 
[[U-Net]]In is a2015, [[convolutional neural network|convolutional neural networks]] reached state of the art in semantic segmentation.<ref>{{Cite conference|last=Long |first=Jonathan |last2=Shelhamer |first2=Evan |last3=Darrell |first3=Trevor |date=2015 |title=Fully Convolutional Networks for Semantic Segmentation |url=https://openaccess.thecvf.com/content_cvpr_2015/html/Long_Fully_Convolutional_Networks_2015_CVPR_paper.html |conference=Proceedings of the IEEE conference on computer vision and pattern recognition|pages=3431–3440}}</ref> [[U-Net]] is an architecture which takes as input an image and outputs a label for each pixel.<ref>{{cite arXiv|last1=Ronneberger|first1=Olaf|last2=Fischer|first2=Philipp|last3=Brox|first3=Thomas|title=U-Net: Convolutional Networks for Biomedical Image Segmentation|eprint=1505.04597|date=2015|class=cs.CV}}</ref> U-Net initially was developed to detect cell boundaries in biomedical images. U-Net follows classical [[autoencoder]] architecture, as such it contains two sub-structures. The encoder structure follows the traditional stack of convolutional and max pooling layers to increase the receptive field as it goes through the layers. It is used to capture the context in the image. The decoder structure utilizes transposed convolution layers for upsampling so that the end dimensions are close to that of the input image. Skip connections are placed between convolution and transposed convolution layers of the same shape in order to preserve details that would have been lost otherwise.
 
In addition to pixel-level semantic segmentation tasks which assign a given category to each pixel, modern segmentation applications include instance-level semantic segmentation tasks in which each individual in a given category must be uniquely identified, as well as panoptic segmentation tasks which combines these two tasks to provide a more complete scene segmentation.<ref name="Panoptic Segmentation"/>
Line 335 ⟶ 366:
 
== See also ==
* [[{{annotated link|Object co-segmentation]]}}
 
* [[{{annotated link|Computer vision]]}}
* [[Object co-segmentation]]
* [[{{annotated link|Image-based meshing]]}}
* [[Computer vision]]
* [[{{annotated link|Range image segmentation]]}}
* [[Image-based meshing]]
* {{annotated link|Vector quantization}}
* [[Range image segmentation]]
* [[Vector{{annotated link|Image quantization]]}}
* [[Image{{annotated link|Color quantization]]}}
* [[{{annotated link|Object-based image analysis]]}}
* [[Color quantization]]
* [[{{annotated link|List of manual image annotation tools]]}}
* [[Object-based image analysis]]
* {{annotated link|Rigid motion segmentation}}
* [[List of manual image annotation tools]]
* [[Rigid{{annotated motionlink|Text segmentation]]}}
 
== Notes ==
<!-- {{reflist|2}} -->
{{reflist|refs=
<ref name="Wang Duan Zhang Niu p=1657">{{cite journal | last1=Wang | first1=Le | last2=Duan | first2=Xuhuan | last3=Zhang | first3=Qilin | last4=Niu | first4=Zhenxing | last5=Hua | first5=Gang | last6=Zheng | first6=Nanning | title=Segment-Tube: Spatio-Temporal Action Localization in Untrimmed Videos with Per-Frame Segmentation | journal=Sensors | volume=18 | issue=5 | date=2018-05-22 May 2018 | issn=1424-8220 | doi=10.3390/s18051657 | pmid=29789447 | pmc=5982167 | page=1657 | bibcode=2018Senso..18.1657W | url=https://qilin-zhang.github.io/_pages/pdfs/Segment-Tube_Spatio-Temporal_Action_Localization_in_Untrimmed_Videos_with_Per-Frame_Segmentation.pdf| doi-access=free }}</ref>
 
<ref name="Liu Wang Hua Zhang 2018 pp. 5840–5853">{{cite journal | last1=Liu | first1=Ziyi | last2=Wang | first2=Le | last3=Hua | first3=Gang | last4=Zhang | first4=Qilin | last5=Niu | first5=Zhenxing | last6=Wu | first6=Ying | last7=Zheng | first7=Nanning | title=Joint Video Object Discovery and Segmentation by Coupled Dynamic Markov Networks | journal=IEEE Transactions on Image Processing | volume=27 | issue=12 | year=2018 | issn=1057-7149 | doi=10.1109/tip.2018.2859622 | pmid=30059300 | bibcode=2018ITIP...27.5840L | pages=5840–5853 | s2cid=51867241 | url=https://qilin-zhang.github.io/_pages/pdfs/Joint_Video_Object_Discovery_and_Segmentation_by_Coupled_Dynamic_Markov_Networks.pdf| doi-access=free }}</ref>
Line 365 ⟶ 396:
* [https://web.archive.org/web/20100518124644/http://csc.fsksm.utm.my/syed/projects/image-processing.html Some sample code that performs basic segmentation], by Syed Zainudeen. University Technology of Malaysia.
* [https://rd.springer.com/article/10.1007/s11075-008-9183-x Generalized Fast Marching method] by Forcadel et al. [2008] for applications in image segmentation.
* [http://www.iprg.co.in Image Processing Research Group] {{Webarchiveusurped|url1=[https://web.archive.org/web/20201228051352/http://www.iprg.co.in/ |date=2020-12-28Image Processing Research Group]}} An Online Open Image Processing Research Community.
* [https://www.mathworks.com/discovery/image-segmentation.html Segmentation methods in image processing and analysis] and [https://blogs.mathworks.com/pick/2017/12/07/minimizing-energy-to-segment-images-or-cluster-data/ Minimizing energy to segment images] by Mathworks
* [http://disp.ee.ntu.edu.tw/meeting/%E6%98%B1%E7%BF%94/Segmentation%20tutorial.pdf More image segmentation methods with detailed algorithms] {{Webarchive|url=https://web.archive.org/web/20191101050028/http://disp.ee.ntu.edu.tw/meeting/%E6%98%B1%E7%BF%94/Segmentation%20tutorial.pdf |date=1 November 2019 }} by Yu-Hsiang Wang (王昱翔), National Taiwan University, Taipei, Taiwan, ROC
* [https://ipolcore.ipol.im/demo/clientApp/demo.html?id=295 Online demonstration of piecewise linear image segmentation] by IPOL Journal
{{Authority control}}