Hoshen–Kopelman algorithm: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 21:08, 25 May 2017 edit Marcocapelle (talk \| contribs) Extended confirmed users, Page movers 591,799 edits removed parent category of Category:Data clustering algorithms ← Previous edit		Latest revision as of 04:02, 25 May 2025 edit undo OAbot (talk \| contribs) Bots 643,717 edits m Open access bot: url-access updated in citation with #oabot.
(28 intermediate revisions by 23 users not shown)
Line 1: {{short description\|Algorithm for labeling clusters on a grid}} {{Machine learning bar}} The '''Hoshen–Kopelman algorithm''' is a simple and efficient [[algorithm]] for labeling [[Cluster analysis\|clusters]] on a grid, where the grid is a regular [[~~Artificial~~Network ~~neural network~~theory\|network]] of cells, with the cells being either occupied or unoccupied. This algorithm is based on a well-known [[Disjoint-set data structure\|union-finding algorithm]].<ref>{{Cite web \|title=Union-Find Algorithms \|url=https://www.cs.princeton.edu/~rs/AlgsDS07/01UnionFind.pdf \|url-status=usurped \|archive-url=https://web.archive.org/web/20210530054627/https://www.cs.princeton.edu/~rs/AlgsDS07/01UnionFind.pdf \|archive-date=2021-05-30 \|website=Princeton Computer Science}}</ref> The algorithm was originally described by J.Joseph Hoshen and R.[[Raoul Kopelman]] in their 1976 paper "Percolation and Cluster Distribution. I. Cluster Multiple Labeling Technique and Critical Concentration Algorithm".<ref>{{cite journal\|url=http://link.aps.org/doi/10.1103/PhysRevB.14.3438\|title=Percolation and cluster distribution. I. Cluster multiple labeling technique and critical concentration algorithm\|first1=J.\|last1=Hoshen\|first2=R.\|last2=Kopelman\|date=15 October 1976~~\|publisher=~~\|journal=Phys. Rev. B\|volume=14\|issue=8\|pages=3438–3445\|via=APS\|doi=10.1103/PhysRevB.14.3438\|bibcode=1976PhRvB..14.3438H\|url-access=subscription}}</ref> == Percolation theory == [[Percolation theory]] is the study of the behavior and [[statistics]] of [[cluster analysis\|clusters]] on [[Lattice graph\|lattices]]. Suppose we have a large square lattice where each cell can be occupied with the [[probability]] <code>p</code> and can be empty with the probability <code>1 – ''p''</code>. Each group of neighboring occupied cells forms a cluster. Neighbors are defined as cells having a common side but not those sharing only a corner i.e. we consider the [[Pixel connectivity\|~~4x4~~4-connected neighborhood]] that is top, bottom, left and right. Each occupied cell is independent of the status of its neighborhood. The number of clusters, the size of each cluster and their distribution are important topics in percolation theory. ~~<center>~~ {\| style="margin: auto;" {\| \|- \| <gallery> Occupied_Grids_P_%3D_0.24.png \| <{{center>\|'''Figure (a)'''~~</center>~~}} </gallery> \|\| <gallery> Occupied_Grids_P_%3D_0.64.png \| <{{center>\|'''Figure (b)'''~~</center>~~}} </gallery> \|\| Consider <code>5x5</code> grids in ~~figure~~figures (a) and (b).<br /> In figure (a), the probability of occupancy is <code>p = 6/25 = 0.24</code>.<br /> In figure (b), the probability of occupancy is <code>p = 16/25 = 0.64</code>. \|} ~~</center>~~ == Hoshen–Kopelman algorithm for cluster finding == In this algorithm, we scan through a grid looking for occupied cells and labeling them with cluster labels. The scanning process is called asa [[~~Raster~~raster scan~~\|Raster Scan~~]]. The algorithm begins with scanning the grid cell by cell and ~~check~~checking ifwhether the cell is occupied or not. If the cell is occupied, then it must be labeled with a cluster label. This cluster label is ~~decided~~assigned based on the neighbors of that cell. (For this we are going to use [[Union-find algorithm\|Union-Find Algorithm]] which is explained in the next section.) If the cell doesn’t have any occupied neighbors, then, a new label is assigned to the cell.<ref name=":0" /> == Union-find algorithm == This algorithm is aused ~~simple~~to ~~method~~represent ~~for~~disjoint ~~computing~~sets. ~~[[equivalence class]]es.~~ Calling the function <code>union(x,y)</code> ~~specifies that,~~places items <code>x</code> and <code>y</code> ~~are members of~~into the same ~~equivalence class~~set. ~~Because~~A ~~equivalence~~second ~~relations are [[Transitive relation\|transitive]]; all the items equivalent to~~function <code>find(x)</code> ~~are~~returns ~~equivalent~~a torepresentative ~~all~~member of the ~~items equivalent~~set to which <code>yx</code> belongs. ~~Thus~~ ~~for~~The ~~any~~representative ~~item~~member of the set containing <code>x</code>~~, there~~ is athe ~~set~~label ofwe ~~items~~will ~~which~~apply ~~are~~to ~~all~~the ~~equivalent~~cluster to which <code>x</code> belongs. ~~This~~ ~~set~~A iskey ~~the~~to ~~equivalence~~the ~~class~~efficiency of ~~which~~the ~~<code>x</code>~~[[Union-find isalgorithm\|Union-Find aAlgorithm]] ~~member.~~is Athat ~~second function~~the <code>find~~(x)~~</code> ~~returns~~operation aimproves ~~representative~~the ~~member~~underlying offorest ~~the~~data ~~equivalence~~structure ~~class~~that torepresents ~~which~~the sets, making future <code>xfind</code> ~~belongs~~queries more efficient. == ~~Pseudo-code~~Pseudocode == During the [[raster scan]] of the grid, whenever an occupied cell is encountered, neighboring cells are scanned to check whether any of them have already been scanned. If we find already scanned neighbors, the <code>union</code> operation is performed, to specify that these neighboring cells are in fact members of the same ~~equivalence class~~set. Then the<code>find</code> operation is performed to find a representative member of that ~~equivalence class~~set with which the current cell will be labeled. On the other hand, if the current cell has no neighbors, it is assigned a new, previously unused, label. The entire grid is processed in this way. Following [[~~Pseudocode\|pseudo-code~~pseudocode]] is referred from [https://www.ocf.berkeley.edu/~fricke/ Tobin Fricke's] implementation of the same algorithm.<ref name=":0">{{cite web\|url=https://www.ocf.berkeley.edu/~fricke/projects/hoshenkopelman/hoshenkopelman.html \|title=The Hoshen-Kopelman Algorithm for cluster identification\|first=Tobin \|last=Fricke \|website=~~Ocf~~ocf.berkeley.edu \|date=2004-04-21 \|~~accessdate~~access-date=2016-09-17}}</ref> On completion, the cluster labels may be found in <code>labels</code>. Not shown is the second raster scan of the grid needed to produce the example output. In that scan, the value at <code>label[x,y]</code> is replaced by <code>find(label[x,y])</code>. '''Raster Scan and Labeling on the Grid'''▼ largest_label = 0;▼ for x in 0 to n_columns {▼ ~~for y in 0 to n_rows {~~ if occupied[x,y] then▼ left = occupied[x-1,y];▼ above = occupied[x,y-1];▼ if (left == 0) and (above == 0) then /* Neither a label above nor to the left. /▼ largest_label = largest_label + 1; / Make a new, as-yet-unused cluster label. /▼ ~~label[x,y] = largest_label;~~ else if (left != 0) and (above == 0) then / One neighbor, to the left. /▼ ~~label[x,y] = find(left);~~ else if (left == 0) and (above != 0) then / One neighbor, above. /▼ ~~label[x,y] = find(above);~~ else / Neighbors BOTH to the left and above. /▼ union(left,above); / Link the left and above clusters. /▼ ~~label[x,y] = find(left);~~ }▼ }▼ ▲ '''Raster Scan and Labeling on the Grid''' '''Union'''▼ ▲ largest_label = 0; void union(int x, int y) {▼ label = zeros[n_columns, n_rows] labels[find(x)] = find(y);▼ labels = [0:n_columnsn_rows] /* Array containing integers from 0 to the size of the image. / } ▲ for x in 0 to n_columns { '''Find'''▼ ~~int~~ ~~find(int~~for x)y in 0 to n_rows { ~~int~~ ~~y =~~if occupied[x;, y] then ~~while~~ ~~(labels[y]~~ ! left = label[x-1, y)]; y above = ~~labels~~label[x, y-1]; ▲ if (left == 0) and (above == 0) then / Neither a label above nor to the left. / while (labels[x] != x) {▼ ▲ largest_label = largest_label + 1; / Make a new, as-yet-unused cluster label. / int z = labels[x];▼ ~~labels~~ label[x, y] = ylargest_label; ▲ else if (left != 0) and (above == 0) then / One neighbor, to the left. / x = z;▼ ▲ if ~~occupied~~ label[x, y] ~~then~~= find(left); }▼ ▲ else if (left == 0) and (above != 0) then / One neighbor, above. / return y;▼ ▲ ~~above~~ ~~= occupied~~label[x, y-1] = find(above); } ▲ else / Neighbors BOTH to the left and above. / ▲ union(left,above); / Link the left and above clusters. / ▲ ~~left~~ ~~= occupied~~label[x-1, y] = find(left); ▲ } } ▲ '''Union''' ▲ void union(int x, int y) { ▲ labels[find(x)] = find(y); } ▲ '''Find''' int find(int x) { ▲ int y = }x; while (labels[y] != y) y = labels[y]; ▲ while (labels[x] != x) { ▲ int z = labels[x]; labels[x] = y; ▲ x = z; ▲ } ▲ return y; } == Example == Consider the following example. The dark cells in the grid in ~~<code>figure~~Figure (ac)~~</code>~~ represent that they are occupied and the white ones are empty. So by running H–K algorithm on this input we would get the output as shown in ~~<code>figure~~Figure (bd)~~</code>~~ with all the clusters labeled. The algorithm processes the input grid, cell by cell, as follows: Let's say that grid is a two-dimensional array. Line 90 ⟶ 97: <code>grid[4][4]</code> is unoccupied so it is not labeled. * <code>grid[4][5]</code> is occupied so check cell to the left and above, both the cells are unoccupied so assign a new label <code>7</code>. * <code>grid[5][0]</code> , <code>grid[5][1]</code> , <code>grid[5][2]</code> and <code>grid[5][3]</code> are unoccupied so they are not labeled. * <code>grid[5][4]</code> is occupied so check cell to the left and above, both the cells are unoccupied so assign a new label <code>8</code>. * <code>grid[5][5]</code> is occupied so check cell to the left and above, both, cell to the left and above are occupied so merge the two clusters and assign the cluster label of the cell above to the cell on the left and to this cell i.e. <code>7</code>. (Merging using union algorithm will label all the cells with label <code>8</code> to <code>7</code>). ~~<center>~~ {\| style="margin: auto;" {\| \|- \| <gallery> H-K algorithm Input.png \| <{{center>\|'''Figure (ac)'''~~</center>~~}} </gallery> \|\| <gallery> H-K algorithm output.png \| <{{center>\|'''Figure (bd)'''~~</center>~~}} </gallery> \|\| Consider <code>6x6</code> grids in figure (ac) and (bd).<br /> Figure (ac)~~, This~~ is the input to the Hoshen–Kopelman algorithm.<br /> Figure (b)~~, This~~ is the output of the algorithm with all the clusters labeled. \|} ~~</center>~~ == Applications == * ~~Segmentation~~Determination ~~and~~of ~~Clustering~~Nodal ofDomain aArea ~~[[Binary~~and ~~image\|Binary~~Nodal Line ~~Image]]~~Lengths <ref>{{cite web\|url=~~http~~https://~~www~~webhome.~~scialert~~weizmann.~~net~~ac.il/home/feamit/nodalweek/~~qredirect~~c_joas_nodalweek.~~php?doi=jas.2008.2474.2479&linkid=~~pdf ~~\|format=PDF~~ \|title=~~Journal~~Introduction ofto ~~Applied~~the ~~Sciences~~Hoshen-Kopelman algorithm and its application to nodal ___domain statistics \|~~issn~~author=~~1812-5654~~Christian Joas \|website=~~Scialert~~Webhome.~~net~~weizmann.ac.il \|~~accessdate~~access-date=2016-09-17}}</ref> * Determination of Nodal Domain Area and Nodal Line Lengths <ref>{{cite web\|url=https://webhome.weizmann.ac.il/home/feamit/nodalweek/c_joas_nodalweek.pdf \|format=PDF \|title=Introduction to the Hoshen-Kopelman algorithm and its application to nodal ___domain statistics \|author=Christian Joas \|website=Webhome.weizmann.ac.il \|accessdate=2016-09-17}}</ref> * [[Connectivity (graph theory)\|Nodal Connectivity Information]] * Modeling of [[Electrical resistivity and conductivity\|electrical conduction]] Line 114 ⟶ 119: * [[Fuzzy clustering\|Fuzzy clustering algorithm]] * Gaussian ([[Expectation–maximization algorithm\|Expectation Maximization]]) clustering algorithm * Clustering Methods <ref>{{Cite web\|url=https://web.stanford.edu/class/cs345a/slides/12-clustering.pdf\|title=Clustering~~\|last=\|first=\|date=\|website=\|publisher=\|access-date=~~}}</ref> * C-means Clustering Algorithm <ref>{{Cite web\|url=https://sites.google.com/site/dataclusteringalgorithms/fuzzy-c-means-clustering-algorithm\|title=Fuzzy c-means clustering~~\|last=\|first=\|date=\|website=\|publisher=\|access-date=~~}}</ref> * [[Connected-component labeling]] == References == Line 121 ⟶ 127: {{DEFAULTSORT:Hoshen-Kopelman algorithm}} [[Category:~~Data~~Cluster ~~clustering~~analysis algorithms]]