Hoshen–Kopelman algorithm: Difference between revisions

Content deleted Content added
OAbot (talk | contribs)
m Open access bot: url-access updated in citation with #oabot.
 
(6 intermediate revisions by 4 users not shown)
Line 1:
{{short description|Algorithm for labeling clusters on a grid}}
{{Machine learning bar}}
The '''Hoshen–Kopelman algorithm''' is a simple and efficient [[algorithm]] for labeling [[Cluster analysis|clusters]] on a grid, where the grid is a regular [[ArtificialNetwork neural networktheory|network]] of cells, with the cells being either occupied or unoccupied. This algorithm is based on a well-known [[Disjoint-set data structure|union-finding algorithm]].<ref>{{Cite web |title=Union-Find Algorithms |url=https://www.cs.princeton.edu/~rs/AlgsDS07/01UnionFind.pdf {{Bare|url-status=usurped URL|archive-url=https://web.archive.org/web/20210530054627/https://www.cs.princeton.edu/~rs/AlgsDS07/01UnionFind.pdf PDF|archive-date=March2021-05-30 |website=Princeton Computer 2022Science}}</ref> The algorithm was originally described by Joseph Hoshen and [[Raoul Kopelman]] in their 1976 paper "Percolation and Cluster Distribution. I. Cluster Multiple Labeling Technique and Critical Concentration Algorithm".<ref>{{cite journal|url=http://link.aps.org/doi/10.1103/PhysRevB.14.3438|title=Percolation and cluster distribution. I. Cluster multiple labeling technique and critical concentration algorithm|first1=J.|last1=Hoshen|first2=R.|last2=Kopelman|date=15 October 1976|journal=Phys. Rev. B|volume=14|issue=8|pages=3438–3445|via=APS|doi=10.1103/PhysRevB.14.3438|bibcode=1976PhRvB..14.3438H|url-access=subscription}}</ref>
 
== Percolation theory ==
Line 12:
</gallery>||<gallery>
Occupied_Grids_P_%3D_0.64.png | {{center|'''Figure (b)'''}}
</gallery>|| Consider <code>5x5</code> grids in figurefigures (a) and (b).<br /> In figure (a), the probability of occupancy is <code>p = 6/25 = 0.24</code>.<br /> In figure (b), the probability of occupancy is <code>p = 16/25 = 0.64</code>.
|}
 
== Hoshen–Kopelman algorithm for cluster finding ==
In this algorithm, we scan through a grid looking for occupied cells and labeling them with cluster labels. The scanning process is called a [[Raster scan|raster scan]]. The algorithm begins with scanning the grid cell by cell and checking whether the cell is occupied or not. If the cell is occupied, then it must be labeled with a cluster label. This cluster label is assigned based on the neighbors of that cell. (For this we are going to use [[Union-find algorithm|Union-Find Algorithm]] which is explained in the next section.) If the cell doesn’t have any occupied neighbors, then a new label is assigned to the cell.<ref name=":0" />
 
== Union-find algorithm ==
This algorithm is aused simpleto methodrepresent fordisjoint computingsets. [[equivalence class]]es. Calling the function <code>union(x,y)</code> returns whetherplaces items <code>x</code> and <code>y</code> are members ofinto the same equivalence classset. BecauseA equivalencesecond relations are [[Transitive relation|transitive]], all the items equivalent tofunction <code>find(x)</code> arereturns equivalenta torepresentative allmember of the items equivalentset to which <code>yx</code> belongs. Thus forThe anyrepresentative member of the set itemcontaining <code>x</code>, there is athe setlabel ofwe itemswill whichapply areto allthe equivalentcluster to which <code>x</code> (calledbelongs. A key to the equivalenceefficiency class).of Athe second[[Union-find functionalgorithm|Union-Find Algorithm]] is that the <code>find(x)</code> returnsoperation aimproves representativethe memberunderlying offorest thedata equivalencestructure classthat torepresents whichthe sets, making future <code>xfind</code> belongsqueries more efficient.
 
== Pseudocode ==
During the [[raster scan]] of the grid, whenever an occupied cell is encountered, neighboring cells are scanned to check whether any of them have already been scanned. If we find already scanned neighbors, the <code>union</code> operation is performed, to specify that these neighboring cells are in fact members of the same equivalence classset. Then the<code>find</code> operation is performed to find a representative member of that equivalence classset with which the current cell will be labeled.
 
On the other hand, if the current cell has no neighbors, it is assigned a new, previously unused, label. The entire grid is processed in this way.
 
Following [[pseudocode]] is referred from [https://www.ocf.berkeley.edu/~fricke/ Tobin Fricke's] implementation of the same algorithm.<ref name=":0">{{cite web|url=https://www.ocf.berkeley.edu/~fricke/projects/hoshenkopelman/hoshenkopelman.html |title=The Hoshen-Kopelman Algorithm for cluster identification|first=Tobin |last=Fricke |website=ocf.berkeley.edu |date=2004-04-21 |access-date=2016-09-17}}</ref> On completion, the cluster labels may be found in <code>labels</code>. Not shown is the second raster scan of the grid needed to produce the example output. In that scan, the value at <code>label[x,y]</code> is replaced by <code>find(label[x,y])</code>.
 
'''Raster Scan and Labeling on the Grid'''
largest_label = 0;
Line 72 ⟶ 73:
 
== Example ==
Consider the following example. The dark cells in the grid in <code>figureFigure (ac)</code> represent that they are occupied and the white ones are empty. So by running H–K algorithm on this input we would get the output as shown in <code>figureFigure (bd)</code> with all the clusters labeled.
 
The algorithm processes the input grid, cell by cell, as follows: Let's say that grid is a two-dimensional array.
Line 103 ⟶ 104:
|-
| <gallery>
H-K algorithm Input.png | {{center|'''Figure (ac)'''}}
</gallery>||<gallery>
H-K algorithm output.png | {{center|'''Figure (bd)'''}}
</gallery>|| Consider <code>6x6</code> grids in figure (ac) and (bd).<br /> Figure (ac), This is the input to the Hoshen–Kopelman algorithm.<br /> Figure (b), This is the output of the algorithm with all the clusters labeled.
|}