Content deleted Content added
Meatsgains (talk | contribs) Filled in 2 bare reference(s) with reFill () |
HyperGaruda (talk | contribs) Rmv unnecessary justified alignment per MOS:MARKUP and inappropriate numbered list |
||
Line 3:
{{Machine learning bar}}
The '''Hoshen–Kopelman algorithm''' is simple and efficient [[algorithm]] for labeling [[Cluster analysis|clusters]] on a grid. Where the grid is a regular network of cells,with the cells being either occupied or <strong>unoccupied</strong>. This algorithm is based on well-known [[Disjoint-set data structure|union-finding algorithm]]. The algorithm was originally described in [http://journals.aps.org/prb/abstract/10.1103/PhysRevB.14.3438 Percolation and Cluster Distribution. I. Cluster Multiple Labeling Technique and Critical Concentration Algorithm] by J. Hoshen and R. Kopelman.<ref>{{cite journal|url=http://link.aps.org/doi/10.1103/PhysRevB.14.3438|title=Percolation and cluster distribution. I. Cluster multiple labeling technique and critical concentration algorithm|first1=J.|last1=Hoshen|first2=R.|last2=Kopelman|date=15 October 1976|publisher=|journal=Phys. Rev. B|volume=14|issue=8|pages=3438–3445|via=APS|doi=10.1103/PhysRevB.14.3438}}</ref>
== Percolation theory ==
[[Percolation theory]] is the study of the behavior and statistics of clusters on [[Lattice graph|lattices]]. Suppose we have a large square lattice where each cell can be occupied with the [[probability]] ''p'' and empty with probability 1 – ''p''. Each group of neighboring occupied cells forms a cluster. Neighbors are defined as cells having a common side but not those sharing only a corner i.e. we consider 4x4 neighborhood. (top, bottom, left, right). Each occupied cell is occupied independently of the status of its neighborhood. The number of clusters, a size of each cluster and their distribution are important topics in percolation theory.
<center>
{|
Line 23 ⟶ 19:
== Hoshen–Kopelman algorithm for cluster finding ==
In this algorithm we scan through a grid looking for occupied cells and labeling them with cluster labels. The algorithm begins with scanning the grid cell by cell and check if the cell is occupied, if the cell is occupied then this cell must be labelled with a cluster label. This cluster label is decided based on the neighbors of the cell which have been previously scanned and labelled, and if the cell doesn’t have any occupied neighbors then new label is assigned to the cell.
== Union-find algorithm ==
The union-find algorithm is a simple method for computing [[equivalence class]]es. Calling the function <code>union(x,y)</code> specifies that items <code>x</code> and <code>y</code> are members of the same equivalence class. Because equivalence relations are [[Transitive relation|transitive]]; all items equivalent to <code>x</code> are equivalent to all items equivalent to <code>y</code>. Thus for any item <code>x</code>, there is a set of items which are all equivalent to <code>x</code> are; this set is the equivalence class of which <code>x</code> is a member. A second function <code>find(x)</code> returns a representative member of the equivalence class to which <code>x</code> belongs.
== Pseudo-code ==
So in the raster scan of the grid in question, each time an occupied cell is encountered, a check is done to see whether this cell has any neighboring cells who have already been scanned. If so, first a <code>union</code> operation is performed, to specify that these neighboring cells are in fact members of the same equivalence class. Then a <code>find</code> operation is performed to find a representative member of that equivalence class with which to label the current cell. If on the other hand, the current cell has no neighbors, it is assigned a new, previously unused, label. The entire grid is processed in this way. The grid can be raster-scanned a second time, performing only <code>find</code> operations at each cell, to re-label the cells with their final assignment of a representative element.
Following pseudo-code is referred from one of the University of California Berkeley's projects.<ref>{{cite web|url=https://www.ocf.berkeley.edu/~fricke/projects/hoshenkopelman/hoshenkopelman.html|title=The Hoshen-Kopelman Algorithm|publisher=}}</ref>
<strong>Raster Scan and Labeling on the Grid</strong>
largest_label = 0;
Line 78 ⟶ 68:
== Example ==
Consider the following example. The dark cells in the grid in <code>figure (a)</code> represent that they are occupied and the white ones are empty. So by running H–K algorithm on this input we would get the output as shown in <code>figure (b)</code> with all the clusters labeled.
Line 106 ⟶ 95:
* <code>grid[5][4]</code> is occupied so check cell to the left and above, both the cells are unoccupied so assign new label <code>8</code>.
* <code>grid[5][5]</code> is occupied so check cell to the left and above, both, cell to the left and above are occupied so merge the two clusters and assign the cluster label of cell above to the cell on left and to this cell i.e. <code>7</code>. (Merging using union algorithm will label all the cells with label <code>8</code> to <code>7</code>).
<center>
{|
Line 119 ⟶ 107:
== Applications ==
== More clustering algorithms ==
== References ==
|