Hoshen–Kopelman algorithm: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 07:47, 10 January 2023 edit Walkerhrh (talk \| contribs) 7 edits m →Union-find algorithm ← Previous edit		Latest revision as of 04:02, 25 May 2025 edit undo OAbot (talk \| contribs) Bots 643,717 edits m Open access bot: url-access updated in citation with #oabot.
(6 intermediate revisions by 4 users not shown)
Line 1: {{short description\|Algorithm for labeling clusters on a grid}} {{Machine learning bar}} The '''Hoshen–Kopelman algorithm''' is a simple and efficient [[algorithm]] for labeling [[Cluster analysis\|clusters]] on a grid, where the grid is a regular [[~~Artificial~~Network ~~neural network~~theory\|network]] of cells, with the cells being either occupied or unoccupied. This algorithm is based on a well-known [[Disjoint-set data structure\|union-finding algorithm]].<ref>{{Cite web \|title=Union-Find Algorithms \|url=https://www.cs.princeton.edu/~rs/AlgsDS07/01UnionFind.pdf ~~{{Bare~~\|url-status=usurped ~~URL~~\|archive-url=https://web.archive.org/web/20210530054627/https://www.cs.princeton.edu/~rs/AlgsDS07/01UnionFind.pdf ~~PDF~~\|archive-date=~~March~~2021-05-30 \|website=Princeton Computer ~~2022~~Science}}</ref> The algorithm was originally described by Joseph Hoshen and [[Raoul Kopelman]] in their 1976 paper "Percolation and Cluster Distribution. I. Cluster Multiple Labeling Technique and Critical Concentration Algorithm".<ref>{{cite journal\|url=http://link.aps.org/doi/10.1103/PhysRevB.14.3438\|title=Percolation and cluster distribution. I. Cluster multiple labeling technique and critical concentration algorithm\|first1=J.\|last1=Hoshen\|first2=R.\|last2=Kopelman\|date=15 October 1976\|journal=Phys. Rev. B\|volume=14\|issue=8\|pages=3438–3445\|via=APS\|doi=10.1103/PhysRevB.14.3438\|bibcode=1976PhRvB..14.3438H\|url-access=subscription}}</ref> == Percolation theory == Line 12: </gallery>\|\|<gallery> Occupied_Grids_P_%3D_0.64.png \| {{center\|'''Figure (b)'''}} </gallery>\|\| Consider <code>5x5</code> grids in ~~figure~~figures (a) and (b).<br /> In figure (a), the probability of occupancy is <code>p = 6/25 = 0.24</code>.<br /> In figure (b), the probability of occupancy is <code>p = 16/25 = 0.64</code>. \|} == Hoshen–Kopelman algorithm for cluster finding == In this algorithm, we scan through a grid looking for occupied cells and labeling them with cluster labels. The scanning process is called a [[~~Raster scan\|~~raster scan]]. The algorithm begins with scanning the grid cell by cell and checking whether the cell is occupied or not. If the cell is occupied, then it must be labeled with a cluster label. This cluster label is assigned based on the neighbors of that cell. (For this we are going to use [[Union-find algorithm\|Union-Find Algorithm]] which is explained in the next section.) If the cell doesn’t have any occupied neighbors, then a new label is assigned to the cell.<ref name=":0" /> == Union-find algorithm == This algorithm is aused ~~simple~~to ~~method~~represent ~~for~~disjoint ~~computing~~sets. ~~[[equivalence class]]es.~~ Calling the function <code>union(x,y)</code> ~~returns whether~~places items <code>x</code> and <code>y</code> ~~are members of~~into the same ~~equivalence class~~set. ~~Because~~A ~~equivalence~~second ~~relations are [[Transitive relation\|transitive]], all the items equivalent to~~function <code>find(x)</code> ~~are~~returns ~~equivalent~~a torepresentative ~~all~~member of the ~~items equivalent~~set to which <code>yx</code> belongs. ~~Thus~~ ~~for~~The ~~any~~representative member of the set ~~item~~containing <code>x</code>~~, there~~ is athe ~~set~~label ofwe ~~items~~will ~~which~~apply ~~are~~to ~~all~~the ~~equivalent~~cluster to which <code>x</code> ~~(called~~belongs. A key to the ~~equivalence~~efficiency ~~class).~~of Athe ~~second~~[[Union-find ~~function~~algorithm\|Union-Find Algorithm]] is that the <code>find~~(x)~~</code> ~~returns~~operation aimproves ~~representative~~the ~~member~~underlying offorest ~~the~~data ~~equivalence~~structure ~~class~~that torepresents ~~which~~the sets, making future <code>xfind</code> ~~belongs~~queries more efficient. == Pseudocode == During the [[raster scan]] of the grid, whenever an occupied cell is encountered, neighboring cells are scanned to check whether any of them have already been scanned. If we find already scanned neighbors, the <code>union</code> operation is performed, to specify that these neighboring cells are in fact members of the same ~~equivalence class~~set. Then the<code>find</code> operation is performed to find a representative member of that ~~equivalence class~~set with which the current cell will be labeled. On the other hand, if the current cell has no neighbors, it is assigned a new, previously unused, label. The entire grid is processed in this way. Following [[pseudocode]] is referred from [https://www.ocf.berkeley.edu/~fricke/ Tobin Fricke's] implementation of the same algorithm.<ref name=":0">{{cite web\|url=https://www.ocf.berkeley.edu/~fricke/projects/hoshenkopelman/hoshenkopelman.html \|title=The Hoshen-Kopelman Algorithm for cluster identification\|first=Tobin \|last=Fricke \|website=ocf.berkeley.edu \|date=2004-04-21 \|access-date=2016-09-17}}</ref> On completion, the cluster labels may be found in <code>labels</code>. Not shown is the second raster scan of the grid needed to produce the example output. In that scan, the value at <code>label[x,y]</code> is replaced by <code>find(label[x,y])</code>. '''Raster Scan and Labeling on the Grid''' largest_label = 0; Line 72 ⟶ 73: == Example == Consider the following example. The dark cells in the grid in ~~<code>figure~~Figure (ac)~~</code>~~ represent that they are occupied and the white ones are empty. So by running H–K algorithm on this input we would get the output as shown in ~~<code>figure~~Figure (bd)~~</code>~~ with all the clusters labeled. The algorithm processes the input grid, cell by cell, as follows: Let's say that grid is a two-dimensional array. Line 103 ⟶ 104: \|- \| <gallery> H-K algorithm Input.png \| {{center\|'''Figure (ac)'''}} </gallery>\|\|<gallery> H-K algorithm output.png \| {{center\|'''Figure (bd)'''}} </gallery>\|\| Consider <code>6x6</code> grids in figure (ac) and (bd).<br /> Figure (ac)~~, This~~ is the input to the Hoshen–Kopelman algorithm.<br /> Figure (b)~~, This~~ is the output of the algorithm with all the clusters labeled. \|}