Hoshen–Kopelman algorithm: Difference between revisions

Content deleted Content added
Dssathe (talk | contribs)
added link
Inline external links are strongly discouraged per WP:ELPOINTS#2
Line 1:
{{Refimprove|date=September 2016}}
{{Machine learning bar}}
The '''Hoshen–Kopelman algorithm''' is a simple and efficient [[algorithm]] for labeling [[Cluster analysis|clusters]] on a grid. Where the grid is a regular [[network]] of cells,with the cells being either occupied or unoccupied. This algorithm is based on a well-known [[Disjoint-set data structure|union-finding algorithm]].<ref> https://www.cs.princeton.edu/~rs/AlgsDS07/01UnionFind.pdf </ref> The algorithm was originally described in by J. Hoshen and R. Kopelman in their 1976 paper [http://journals.aps.org/prb/abstract/10.1103/PhysRevB.14.3438 "Percolation and Cluster Distribution. I. Cluster Multiple Labeling Technique and Critical Concentration Algorithm]".<ref>{{cite journal|url=http://link.aps.org/doi/10.1103/PhysRevB.14.3438|title=Percolation and cluster distribution. I. Cluster multiple labeling technique and critical concentration algorithm|first1=J.|last1=Hoshen|first2=R.|last2=Kopelman|date=15 October 1976|publisher=|journal=Phys. Rev. B|volume=14|issue=8|pages=3438–3445|via=APS|doi=10.1103/PhysRevB.14.3438}}</ref>
 
== Percolation theory ==
[[Percolation theory]] is the study of the behavior and [[statistics]] of [[Cluster|clusters]] on [[Lattice graph|lattices]]. Suppose we have a large square lattice where each cell can be occupied with the [[probability]] <code>''p</code>'' and can be empty with the probability <code>1&nbsp;–&nbsp;''p''</code>. Each group of neighboring occupied cells forms a cluster. Neighbors are defined as cells having a common side but not those sharing only a corner i.e. we consider [[Pixel connectivity|4x4 neighborhood]] that is. (top, bottom, left and, right). Each occupied cell is independentoccupied independently of the status of its neighborhood. The number of clusters, thea size of each cluster and their distribution are important topics in percolation theory.
<center>
{|
Line 16 ⟶ 17:
 
== Hoshen–Kopelman algorithm for cluster finding ==
In this algorithm, we scan through a grid looking for occupied cells and labeling them with cluster labels. The scanning process is called as [[Raster scan|Raster Scan]]. The algorithm begins with scanning the grid cell by cell and check if the cell is occupied, or not. Ifif the cell is occupied, then itthis cell must be labeledlabelled with a cluster label. This cluster label is decided based on the neighbors of thatthe cell. (Forwhich thishave webeen arepreviously goingscanned toand uselabelled, [[Union-findand algorithm|Union-Find Algorithm]] which is explained in the next section.) Ifif the cell doesn’t have any occupied neighbors then, a new label is assigned to the cell.<ref name=":0" />
 
== Union-find algorithm ==
ThisThe union-find algorithm is a simple method for computing [[equivalence class]]es. Calling the function <code>union(x,y)</code> specifies that, items <code>x</code> and <code>y</code> are members of the same equivalence class. Because equivalence relations are [[Transitive relation|transitive]]; all the items equivalent to <code>x</code> are equivalent to all the items equivalent to <code>y</code>. Thus for any item <code>x</code>, there is a set of items which are all equivalent to <code>x</code> .are; Thisthis set is the equivalence class of which <code>x</code> is a member. A second function <code>find(x)</code> returns a representative member of the equivalence class to which <code>x</code> belongs.
 
== Pseudo-code ==
DuringSo in the [[Rasterraster scan|Raster Scan]] of the grid in question, whenevereach time an occupied cell is encountered, neighboringa cellscheck areis scanneddone to checksee whether this cell has any ofneighboring themcells who have already been scanned. If we find already scanned neighborsso, thefirst a <code>union</code> operation is performed, to specify that these neighboring cells are in fact members of the same equivalence class. Then thea <code>find</code> operation is performed to find a representative member of that equivalence class with which to label the current cell. willIf beon labeledthe other hand, the current cell has no neighbors, it is assigned a new, previously unused, label. The entire grid is processed in this way. The grid can be raster-scanned a second time, performing only <code>find</code> operations at each cell, to re-label the cells with their final assignment of a representative element.
Following [[Pseudocode|pseudo-code]] is referred from [https://www.ocf.berkeley.edu/~fricke/one Tobinof Fricke's]the implementationUniversity of theCalifornia sameBerkeley's algorithmprojects.<ref name=":0">{{cite web|url=https://www.ocf.berkeley.edu/~fricke/projects/hoshenkopelman/hoshenkopelman.html |title=The Hoshen-Kopelman Algorithm |website=Ocf.berkeley.edu |date=2004-04-21 |accessdate=2016-09-17}}</ref>
 
On the other hand,if the current cell has no neighbors, it is assigned a new, previously unused, label. The entire grid is processed in this way.
 
Following [[Pseudocode|pseudo-code]] is referred from [https://www.ocf.berkeley.edu/~fricke/ Tobin Fricke's] implementation of the same algorithm.<ref name=":0">{{cite web|url=https://www.ocf.berkeley.edu/~fricke/projects/hoshenkopelman/hoshenkopelman.html |title=The Hoshen-Kopelman Algorithm |website=Ocf.berkeley.edu |date=2004-04-21 |accessdate=2016-09-17}}</ref>
<strong>Raster Scan and Labeling on the Grid</strong>
largest_label = 0;
Line 47 ⟶ 45:
}
 
<strong>Union</strong>
void union(int x, int y) {
labels[find(x)] = find(y);
}<br>
 
<strong>Find</strong>
int find(int x) {
Line 73:
* <code>grid[0][3]</code> is occupied so check cell to the left which is unoccupied so we increment the current label value and assign the label to the cell as <code>2</code>.
* <code>grid[0][4]</code>, <code>grid[0][5]</code> and <code>grid[1][0]</code> are unoccupied so they are not labeled.
* <code>grid[1][1]</code> is occupied so check cell to the left and above, both the cells are unoccupied so assign a new label <code>3</code>.
* <code>grid[1][2]</code> is occupied so check cell to the left and above, only the cell to the left is occupied so assign the label of a cell on the left to this cell <code>3</code>.
* <code>grid[1][3]</code> is occupied so check cell to the left and above, both the cells are occupied, so merge the two clusters and assign the cluster label of the cell above to the cell on the left and to this cell i.e. <code>2</code>. (Merging using union algorithm will label all the cells with label <code>3</code> to <code>2</code>)
* <code>grid[1][4]</code> is occupied so check cell to the left and above, only the cell to the left is occupied so assign the label of a cell on the left to this cell <code>2</code>.
* <code>grid[1][5]</code> , <code>grid[2][0]</code> and <code>grid[2][1]</code> are unoccupied so they are not labeled.
* <code>grid[2][2]</code> is occupied so check cell to the left and above, only cell above is occupied so assign the label of the cell above to this cell <code>2</code>.
* <code>grid[2][3]</code> , <code>grid[2][4]</code> and <code>grid[2][5]</code> are unoccupied so they are not labeled.
* <code>grid[3][0]</code> is occupied so check cell above which is unoccupied so we increment the current label value and assign the label to the cell as <code>4</code>.
* <code>grid[3][1]</code> is occupied so check cell to the left and above, only the cell to the left is occupied so assign the label of the cell on the left to this cell <code>4</code>.
* <code>grid[3][2]</code> is unoccupied so it is not labeled.
* <code>grid[3][3]</code> is occupied so check cell to the left and above, both the cells are unoccupied so assign a new label <code>5</code>.
* <code>grid[3][4]</code> is occupied so check cell to the left and above, only the cell to the left is occupied so assign the label of the cell on the left to this cell <code>5</code>.
* <code>grid[3][5]</code> , <code>grid[4][0]</code> and <code>grid[4][1]</code> are unoccupied so they are not labeled.
* <code>grid[4][2]</code> is occupied so check cell to the left and above, both the cells are unoccupied so assign a new label <code>6</code>.
* <code>grid[4][3]</code> is occupied so check cell to the left and above, both, cell to the left and above are occupied so merge the two clusters and assign the cluster label of the cell above to the cell on the left and to this cell i.e. <code>5</code>. (Merging using union algorithm will label all the cells with label <code>6</code> to <code>5</code>).
* <code>grid[4][4]</code> is unoccupied so it is not labeled.
* <code>grid[4][5]</code> is occupied so check cell to the left and above, both the cells are unoccupied so assign a new label <code>7</code>.
* <code>grid[5][0]</code> , <code>grid[5][1]</code> , <code>grid[5][2]</code> and <code>grid[5][3]</code> are unoccupied so they are not labeled.
* <code>grid[5][4]</code> is occupied so check cell to the left and above, both the cells are unoccupied so assign a new label <code>8</code>.
* <code>grid[5][5]</code> is occupied so check cell to the left and above, both, cell to the left and above are occupied so merge the two clusters and assign the cluster label of the cell above to the cell on the left and to this cell i.e. <code>7</code>. (Merging using union algorithm will label all the cells with label <code>8</code> to <code>7</code>).
<center>
{|
Line 105:
 
== Applications ==
* Segmentation and Clustering of a [[Binary image|Binary Image]] <ref>{{cite web|url=http://www.scialert.net/qredirect.php?doi=jas.2008.2474.2479&linkid=pdf |format=PDF |title=Journal of Applied Sciences |issn=1812-5654 |website=Scialert.net |accessdate=2016-09-17}}</ref>
* Determination of Nodal Domain Area and Nodal Line Lengths <ref>{{cite web|url=https://webhome.weizmann.ac.il/home/feamit/nodalweek/c_joas_nodalweek.pdf |format=PDF |title=Introduction to the Hoshen-Kopelman algorithm and its application to nodal ___domain statistics |author=Christian Joas |website=Webhome.weizmann.ac.il |accessdate=2016-09-17}}</ref>
* [[Connectivity (graph theory)|Nodal Connectivity Information]]
* Modeling of [[Electrical resistivity and conductivity|electrical conduction]].
 
== SeeMore Alsoclustering algorithms ==
* [[K-means clustering|K-means clustering algorithm]]
* [[Fuzzy clustering|Fuzzy clustering algorithm]]
* Gaussian ([[Expectation–maximization algorithm|Expectation Maximization]]EM) clustering algorithm
* Clustering Methods <ref>{{Cite web|url=https://web.stanford.edu/class/cs345a/slides/12-clustering.pdf|title=Clustering|last=|first=|date=|website=|publisher=|access-date=}}</ref>
* C-means Clustering Algorithm <ref>{{Cite web|url=https://sites.google.com/site/dataclusteringalgorithms/fuzzy-c-means-clustering-algorithm|title=Fuzzy c-means clustering|last=|first=|date=|website=|publisher=|access-date=}}</ref>
 
== References ==