Introduction
The Hoshen-Kopelman Algorithm is simple and efficient algorithm for labelling clusters on a grid, where grid is a regular network of cells, where each cell may be “occupied” or “unoccupied”. This algorithm is based on well-known union-finding algorithm. The algorithm was originally described in “Percolation and cluster distribution. I. Cluster multiple labeling technique and critical concentration algorithm”[1] by J. Hoshen and R. Kopelman.
Percolation Theory
Percolation theory is the study of the behavior and statistics of clusters on lattices. Suppose we have a large square lattice where each cell can be occupied with the probability p and empty with probability 1 – p.
Each group of neighboring occupied cells forms a cluster. Neighbors are defined as cells having a common side but not those sharing only a corner i.e. we consider 4x4 neighborhood. (top, bottom, left, right). Each occupied cell is occupied independently of the status of its neighborhood. The number of clusters, size of each cluster and their distribution are important topics in percolation theory.
Hoshen - Kopelman Algorithm for cluster finding
In this algorithm we scan through a grid looking for occupied cells and labelling them with cluster labels. The algorithm begins with scanning the grid cell by cell and check if the cell is occupied, if the cell is occupied then this cell must be labelled with a cluster label. This cluster label is decided based on the neighbors of the cell which have been previously scanned and labelled, and if the cell doesn’t have any occupied neighbors then new label is assigned to the cell.