Content deleted Content added
tag with {{Bare URL PDF}} |
m Open access bot: url-access updated in citation with #oabot. |
||
(11 intermediate revisions by 7 users not shown) | |||
Line 1:
{{short description|Algorithm for labeling clusters on a grid}}
{{Machine learning bar}}
The '''Hoshen–Kopelman algorithm''' is a simple and efficient [[algorithm]] for labeling [[Cluster analysis|clusters]] on a grid, where the grid is a regular [[
== Percolation theory ==
[[Percolation theory]] is the study of the behavior and [[statistics]] of [[cluster analysis|clusters]] on [[Lattice graph|lattices]]. Suppose we have a large square lattice where each cell can be occupied with the [[probability]] <code>p</code> and can be empty with the probability <code>1 – ''p''</code>. Each group of neighboring occupied cells forms a cluster. Neighbors are defined as cells having a common side but not those sharing only a corner i.e. we consider the [[Pixel connectivity|4-connected neighborhood]] that is top, bottom, left and right. Each occupied cell is independent of the status of its neighborhood. The number of clusters, the size of each cluster and their distribution are important topics in percolation theory.
{| style="margin: auto;"
|-
| <gallery>
Occupied_Grids_P_%3D_0.24.png |
</gallery>||<gallery>
Occupied_Grids_P_%3D_0.64.png |
</gallery>|| Consider <code>5x5</code> grids in
|}
== Hoshen–Kopelman algorithm for cluster finding ==
In this algorithm, we scan through a grid looking for occupied cells and labeling them with cluster labels. The scanning process is called
== Union-find algorithm ==
This algorithm is
== Pseudocode ==
During the [[raster scan]] of the grid, whenever an occupied cell is encountered, neighboring cells are scanned to check whether any of them have already been scanned. If we find already scanned neighbors, the <code>union</code> operation is performed, to specify that these neighboring cells are in fact members of the same
On the other hand, if the current cell has no neighbors, it is assigned a new, previously unused, label. The entire grid is processed in this way.
Following [[pseudocode]] is referred from [https://www.ocf.berkeley.edu/~fricke/ Tobin Fricke's] implementation of the same algorithm.<ref name=":0">{{cite web|url=https://www.ocf.berkeley.edu/~fricke/projects/hoshenkopelman/hoshenkopelman.html |title=The Hoshen-Kopelman Algorithm for cluster identification|first=Tobin |last=Fricke |website=ocf.berkeley.edu |date=2004-04-21 |access-date=2016-09-17}}</ref> On completion, the cluster labels may be found in <code>labels</code>. Not shown is the second raster scan of the grid needed to produce the example output. In that scan, the value at <code>label[x,y]</code> is replaced by <code>find(label[x,y])</code>.
'''Raster Scan and Labeling on the Grid'''
largest_label = 0;
label = zeros[n_columns, n_rows]
labels = [0:n_columns*n_rows] /* Array containing integers from 0 to the size of the image. */
for x in 0 to n_columns {
for y in 0 to n_rows {
Line 51 ⟶ 52:
'''Union'''
void union(int x, int y)
labels[find(x)] = find(y);
}
'''Find'''
int find(int x)
int y = x;
while (labels[y] != y)
y = labels[y];
while (labels[x] != x) {
int z = labels[x];
Line 65 ⟶ 68:
x = z;
}
return y;
}
== Example ==
Consider the following example. The dark cells in the grid in
The algorithm processes the input grid, cell by cell, as follows: Let's say that grid is a two-dimensional array.
Line 96 ⟶ 100:
* <code>grid[5][4]</code> is occupied so check cell to the left and above, both the cells are unoccupied so assign a new label <code>8</code>.
* <code>grid[5][5]</code> is occupied so check cell to the left and above, both, cell to the left and above are occupied so merge the two clusters and assign the cluster label of the cell above to the cell on the left and to this cell i.e. <code>7</code>. (Merging using union algorithm will label all the cells with label <code>8</code> to <code>7</code>).
{| style="margin: auto;"
|-
| <gallery>
H-K algorithm Input.png |
</gallery>||<gallery>
H-K algorithm output.png |
</gallery>|| Consider <code>6x6</code> grids in figure (
|}
== Applications ==
|