Lloyd's algorithm: Difference between revisions

Content deleted Content added
Propose merge into k-means clustering.
Remove the merge tag and rewrite to clarify the distinction between this algorithm and k-means
Line 1:
{{Merge to|k-means clustering|date=August 2013}}
{{multiple image
| align = right
Line 28 ⟶ 27:
}}
 
In [[computer science]] and [[electrical engineering]], '''Lloyd's algorithm''', also known as '''Voronoi iteration''' or relaxation, is an algorithm named after Stuart P. Lloyd for groupingconstructing dataor pointsimproving partitions of subsets of [[Euclidean space]]s into awell-shaped givenand numberuniformly ofsized categories,convex usedcells.<ref forname="l82"/> Like the closely related [[k-means clustering|''k''-means clustering]] algorithm, it repeatedly finds the [[centroid]] of each set in the partition, and then re-partitions the input according to which of these centroids is closest. However, Lloyd's algorithm differs from ''k''-means clustering in that its input is a continuous subset of the input space, rather than a discrete set of points. Thus, when re-partitioning the input, Lloyd's algorithm uses [[Voronoi diagram]]s rather simply determining the nearest center to each of a finite set of points as the ''k''-means algorithm does.
 
Although the algorithm may be applied most directly to the [[Euclidean plane]], similar algorithms may also be applied to higher-dimensional spaces or to spaces with other [[Non-Euclidean geometry|non-Euclidean]] metrics. Lloyd's algorithm can be used to construct close approximations to [[centroidal Voronoi tessellation]]s of the input,<ref name="dfg99"/> which can be used for [[Quantization (signal processing)|quantization]], [[dithering]], and [[stippling]]. Other applications of Lloyd's algorithm include smoothing of [[triangle mesh]]es in the [[finite element method]].
Lloyd's algorithm is usually used in a [[Euclidean space]], so the distance function serves as a measure of similarity between points, and averaging of each dimension for the averaging, but this need not be the case.{{Clarify|date=May 2013}} For example,to get square-grid tiles other than hexagons, [[manhattan metric]] can be used. {{harv|Hausner|2001}}
 
==Algorithm description==
Lloyd's algorithm starts by partitioning the input points into k initial sets, either at random or using some [[heuristic]]. It then calculates the average point, or [[centroid]], of each set via some metric (usually averaging dimensions in [[Euclidean space]]). It constructs a new partition by associating each point with the closest centroid, usually using the [[Euclidean distance]] function. Then the centroids are recalculated for the new clusters, and algorithm repeated by alternate application of these two steps until convergence, which is obtained when the points no longer switch clusters (or alternatively centroids are no longer changed).
Lloyd's algorithm starts by an initial placement of some number ''k'' of point sites in the input ___domain. In mesh smoothing applications; these would be the vertices of the mesh to be smoothed; in other applications they may be placed at random, or by intersecting a uniform triangular mesh of the appropriate size with the input ___domain.
 
It then repeatedly executes the following relaxation step:
More formally:
* The [[Voronoi diagram]] of all the points''k'' sites is computed.
* Each cell of the Voronoi diagram is integrated and the centroid is computed.
* Each pointsite is then moved to the centroid of its Voronoi cell.
 
Because Voronoi diagram construction algorithms can be highly non-trivial, especially for inputs of dimension higher than two, the steps of calculating this diagram and finding the centroids of its cells may be approximated by a suitable discretization in which, for each cell of a fine grid, the closest site is determined, after which the centroid for a site's cell is approximated by averaging the centers of the grid cells assigned to it. Alternatively, [[Monte Carlo methods]] may be used, in which random sample points are generated according to some fixed underlying probability distribution, assigned to the closest site, and averaged to approximate the centroid for each site.
Lloyd's algorithm starts with an initial distribution of samples or points and consists of repeatedly executing one relaxation step:
 
* The [[Voronoi diagram]] of all the points is computed.
* Each cell of the Voronoi diagram is integrated and the centroid is computed.
* Each point is then moved to the centroid of its Voronoi cell.
 
==Convergence==
Obtaining the Voronoi cells and computing its centroid analytically is not a trivial task, and [[Monte Carlo methods]] are typically used for this purpose. In this method, random sample points are generated according to some fixed underlying probability distribution. These sample points are also known as training samples. In practice, the underlying probability distribution is often unknown and the training samples are obtained from raw field data. Each training sample is declared to belong to the Voronoi cell corresponding to the nearest seed point to it. The average of the training samples collected in each Voronoi cell gives the new centroids of Voronoi cell.
Each time a relaxation step is performed, the points are left in a slightly more even distribution: closely spaced points move furtherfarther apart, and widely spaced points move closer together. In one dimension, this algorithm has been shown to converge to a centroidal Voronoi diagram, also named a [[centroidal Voronoi tessellation]] {{harv|Du|Emelianenko|Ju|2006}}.<ref name="dej06"/> In higher dimensions, some slightly weaker convergence results are known {{harv|Sabin|1986}}, {{harv|Emelianenko|Ju|Rand|2009}}.<ref name="sg86"/><ref name="ejr09"/>
 
Since theThe algorithm converges slowly, andor, due to limitations in numerical precision, the algorithm will oftenmay not converge. Therefore, real-world applications of Lloyd's algorithm typically stop once the distribution is "good enough." One common termination criterion is to stop when the maximum distance amoved pointby movesany site in onean iteration isfalls below somea setpreset limitthreshold.
Each time a relaxation step is performed, the points are left in a slightly more even distribution: closely spaced points move further apart, and widely spaced points move closer together. In one dimension, this algorithm has been shown to converge to a centroidal Voronoi diagram, also named a [[centroidal Voronoi tessellation]] {{harv|Du|Emelianenko|Ju|2006}}. In higher dimensions, some slightly weaker convergence results are known {{harv|Sabin|1986}}, {{harv|Emelianenko|Ju|Rand|2009}}.
 
==Applications==
Since the algorithm converges slowly, and, due to limitations in numerical precision the algorithm will often not converge, real-world applications of Lloyd's algorithm stop once the distribution is "good enough." One common termination criterion is when the maximum distance a point moves in one iteration is below some set limit.
The Lloyd's method was originally used for scalar quantization, but it is clear that the method extends for vector quantization as well. As such, it is extensively used in [[data compression]] techniques in [[information theory]]. Lloyd's method is used in computer graphics because the resulting distribution has [[blue noise]] characteristics (see also [[Colors of noise]]), meaning there are few low-frequency components that could be interpreted as artifacts. It is particularly well-suited to picking sample positions for [[dithering]]. Lloyd's algorithm is also used to generate dot drawings in the style of [[stippling]].<ref name="dhos00"/> In this application, the centroids can be weighted based on a reference image to produce stipple illustrations matching an input image.<ref name="s02"/>
 
In the [[finite element method]], an input ___domain with a complex geometry is partioned into elements with simpler shapes; for instance, two-dimensional domains (either subsets of the Euclidean plane or surfaces in three dimensions) are often partitioned into triangles. It is important for the convergence of the finite element methods that these elements be well shaped; in the case of triangles, often elements that are nearly equilateral triangles are preferred. Lloyd's algorithm
The Lloyd's method was originally used for scalar quantization, but it is clear that the method extends for vector quantization as well. As such, it is extensively used in [[data compression]] techniques in [[information theory]]. Lloyd's method is used in computer graphics because the resulting distribution has [[blue noise]] characteristics (see also [[Colors of noise]]), meaning there are few low-frequency components that could be interpreted as artifacts. It is particularly well-suited to picking sample positions for [[dithering]].
can be used to smooth a mesh generated by some other algorithm, moving its vertices and changing the connection pattern among its elements in order to produce triangles that are more closely equilateral.<ref name="dg02"/> These applications typically use a smaller number of iterations of Lloyd's algorithm, stopping it to convergence, in order to preserve other features of the mesh such as differences in element size in different parts of the mesh. In contrast to a different smoothing method, [[Laplacian smoothing]] (in which mesh vertices are moved to the average of their neighbors' positions), Lloyd's algorithm can change the topology of the mesh, leading to more nearly-equilateral elements as well as avoiding the problems with tangling that can arise with Laplacian smoothing. However, Laplacian smoothing can be applied more generally to meshes with non-triangular elements.
 
==Different distances==
Lloyd's algorithm is also used to generate dot drawings in the style of [[stippling]] {{harv|Deussen|Hiller|van Overveld|Strothotte|2000}}. In this application, the centroids can be weighted based on a reference image {{harv|Secord|2002}} to produce stipple illustrations matching an input image.
Lloyd's algorithm is usually used in a [[Euclidean space]]. The Euclidean distance plays two roles in the algorithm: it is used to define the Voronoi cells, but it also corresponds to the choice of the centroid as the representative point of each cell, since the centroid is the point that minimizes the average squared Euclidean distance to the points in its cell. Alternative distances, and alternative central points than the centroid, may be used instead. For example,to get square-grid tiles other than hexagons, the [[Manhattan metric]] may be used.<ref name="h01"/>
 
== See also ==
* The [[Linde–Buzo–Gray algorithm]], which is a generalization of this algorithm.
* [[k-means clustering|''k''-means clustering]]
* [[Mean-shift]]
 
==References==
{{reflist|refs=
*{{citation
<ref name="dhos00">{{citation
| last1 = Deussen | first1 = Oliver
| last2 = Hiller | first2 = Stefan
Line 70 ⟶ 73:
| title = Floating points: a method for computing stipple drawings
| volume = 19
| year = 2000}}.</ref>
*<ref name="dej06">{{citation
| last1 = Du | first1 = Qiang
| last2 = Emelianenko | first2 = Maria
Line 80 ⟶ 83:
| title = Convergence of the Lloyd algorithm for computing centroidal Voronoi tessellations
| volume = 44
| year = 2006}}.</ref>
*<ref name="dfg99">{{citation
| last1 = Du | first1 = Qiang
| last2 = Faber | first2 = Vance
Line 91 ⟶ 94:
| title = Centroidal Voronoi tessellations: applications and algorithms
| volume = 41
| year = 1999}}.</ref>
*<ref name="dg02">{{citation
| last1 = Du | first1 = Qiang
| last2 = Gunzburger | first2 = Max
| doi = 10.1016/S0096-3003(01)00260-0
| issue = 2–3
| journal = Applied Mathematics and Computation
| pages = 591–607
| title = Grid generation and optimization based on centroidal Voronoi tessellations
| volume = 133
| year = 2002}}.</ref>
<ref name="ejr09">{{citation
| last1 = Emelianenko | first1 = Maria
| last2 = Ju | first2 = Lili
Line 100 ⟶ 113:
| title = Nondegeneracy and Weak Global Convergence of the Lloyd Algorithm in $R^d$
| volume = 46
| year = 2009}}.</ref>
*<ref name="l82">{{citation
| last = Lloyd | first = Stuart P.
| doi = 10.1109/TIT.1982.1056489
Line 109 ⟶ 122:
| title = Least squares quantization in PCM
| volume = 28
| year = 1982}}.</ref>
*<ref name="sg86">{{citation
| last1 = Sabin | first1 = M. J.
| last2 = Gray | first2 = R. M.
Line 119 ⟶ 132:
| title = Global convergence and empirical consistency of the generalized Lloyd algorithm
| volume = 32
| year = 1986}}.</ref>
*<ref name="s02">{{citation
| last = Secord | first = Adrian
| contribution = Weighted Voronoi stippling
Line 127 ⟶ 140:
| publisher = [[ACM SIGGRAPH]]
| title = Proceedings of the Symposium on Non-Photorealistic Animation and Rendering (NPAR)
| year = 2002}}.</ref>
*<ref name="h01">{{citation
| last = Hausner| first = Alejo
| doi = 10.1145/383259.383327
Line 135 ⟶ 148:
| pages = 573-580
| contribution = Simulating decorative mosaics
| year = 2001}}.</ref>
}}
 
[[Category:Data clustering algorithms]]
[[Category:Geometric algorithms]]