Revision as of 14:22, 26 August 2013 edit Chire (talk \| contribs) Extended confirmed users 1,803 edits Propose merge into k-means clustering. ← Previous edit		Revision as of 22:08, 5 September 2013 edit undo David Eppstein (talk \| contribs) Autopatrolled, Administrators 235,617 edits Remove the merge tag and rewrite to clarify the distinction between this algorithm and k-means Next edit →
Line 1: ~~{{Merge to\|k-means clustering\|date=August 2013}}~~ {{multiple image \| align = right Line 28 ⟶ 27: }} In [[computer science]] and [[electrical engineering]], '''Lloyd's algorithm''', also known as '''Voronoi iteration''' or relaxation, is an algorithm named after Stuart P. Lloyd for ~~grouping~~constructing ~~data~~or ~~points~~improving partitions of subsets of [[Euclidean space]]s into awell-shaped ~~given~~and ~~number~~uniformly ofsized ~~categories,~~convex ~~used~~cells.<ref ~~for~~name="l82"/> Like the closely related [[k-means clustering\|''k''-means clustering]] algorithm, it repeatedly finds the [[centroid]] of each set in the partition, and then re-partitions the input according to which of these centroids is closest. However, Lloyd's algorithm differs from ''k''-means clustering in that its input is a continuous subset of the input space, rather than a discrete set of points. Thus, when re-partitioning the input, Lloyd's algorithm uses [[Voronoi diagram]]s rather simply determining the nearest center to each of a finite set of points as the ''k''-means algorithm does. Although the algorithm may be applied most directly to the [[Euclidean plane]], similar algorithms may also be applied to higher-dimensional spaces or to spaces with other [[Non-Euclidean geometry\|non-Euclidean]] metrics. Lloyd's algorithm can be used to construct close approximations to [[centroidal Voronoi tessellation]]s of the input,<ref name="dfg99"/> which can be used for [[Quantization (signal processing)\|quantization]], [[dithering]], and [[stippling]]. Other applications of Lloyd's algorithm include smoothing of [[triangle mesh]]es in the [[finite element method]]. Lloyd's algorithm is usually used in a [[Euclidean space]], so the distance function serves as a measure of similarity between points, and averaging of each dimension for the averaging, but this need not be the case.{{Clarify\|date=May 2013}} For example,to get square-grid tiles other than hexagons, [[manhattan metric]] can be used. {{harv\|Hausner\|2001}} ==Algorithm description== Lloyd's algorithm starts by partitioning the input points into k initial sets, either at random or using some [[heuristic]]. It then calculates the average point, or [[centroid]], of each set via some metric (usually averaging dimensions in [[Euclidean space]]). It constructs a new partition by associating each point with the closest centroid, usually using the [[Euclidean distance]] function. Then the centroids are recalculated for the new clusters, and algorithm repeated by alternate application of these two steps until convergence, which is obtained when the points no longer switch clusters (or alternatively centroids are no longer changed). Lloyd's algorithm starts by an initial placement of some number ''k'' of point sites in the input ___domain. In mesh smoothing applications; these would be the vertices of the mesh to be smoothed; in other applications they may be placed at random, or by intersecting a uniform triangular mesh of the appropriate size with the input ___domain. It then repeatedly executes the following relaxation step: ~~More formally:~~ * The [[Voronoi diagram]] of ~~all~~ the ~~points~~''k'' sites is computed.▼ * Each cell of the Voronoi diagram is integrated and the centroid is computed.▼ * Each ~~point~~site is then moved to the centroid of its Voronoi cell.▼ Because Voronoi diagram construction algorithms can be highly non-trivial, especially for inputs of dimension higher than two, the steps of calculating this diagram and finding the centroids of its cells may be approximated by a suitable discretization in which, for each cell of a fine grid, the closest site is determined, after which the centroid for a site's cell is approximated by averaging the centers of the grid cells assigned to it. Alternatively, [[Monte Carlo methods]] may be used, in which random sample points are generated according to some fixed underlying probability distribution, assigned to the closest site, and averaged to approximate the centroid for each site. ~~Lloyd's algorithm starts with an initial distribution of samples or points and consists of repeatedly executing one relaxation step:~~ ▲* The [[Voronoi diagram]] of all the points is computed. ▲* Each cell of the Voronoi diagram is integrated and the centroid is computed. ▲* Each point is then moved to the centroid of its Voronoi cell. ==Convergence== Obtaining the Voronoi cells and computing its centroid analytically is not a trivial task, and [[Monte Carlo methods]] are typically used for this purpose. In this method, random sample points are generated according to some fixed underlying probability distribution. These sample points are also known as training samples. In practice, the underlying probability distribution is often unknown and the training samples are obtained from raw field data. Each training sample is declared to belong to the Voronoi cell corresponding to the nearest seed point to it. The average of the training samples collected in each Voronoi cell gives the new centroids of Voronoi cell. Each time a relaxation step is performed, the points are left in a slightly more even distribution: closely spaced points move ~~further~~farther apart, and widely spaced points move closer together. In one dimension, this algorithm has been shown to converge to a centroidal Voronoi diagram, also named a [[centroidal Voronoi tessellation]] ~~{{harv\|Du\|Emelianenko\|Ju\|2006}}~~.<ref name="dej06"/> In higher dimensions, some slightly weaker convergence results are known ~~{{harv\|Sabin\|1986}}, {{harv\|Emelianenko\|Ju\|Rand\|2009}}~~.<ref name="sg86"/><ref name="ejr09"/>▼ ~~Since the~~The algorithm converges slowly, ~~and~~or, due to limitations in numerical precision, ~~the algorithm will often~~may not converge. Therefore, real-world applications of Lloyd's algorithm typically stop once the distribution is "good enough." One common termination criterion is to stop when the maximum distance amoved ~~point~~by ~~moves~~any site in ~~one~~an iteration isfalls below ~~some~~a ~~set~~preset ~~limit~~threshold.▼ ▲Each time a relaxation step is performed, the points are left in a slightly more even distribution: closely spaced points move further apart, and widely spaced points move closer together. In one dimension, this algorithm has been shown to converge to a centroidal Voronoi diagram, also named a [[centroidal Voronoi tessellation]] {{harv\|Du\|Emelianenko\|Ju\|2006}}. In higher dimensions, some slightly weaker convergence results are known {{harv\|Sabin\|1986}}, {{harv\|Emelianenko\|Ju\|Rand\|2009}}. ==Applications== ▲Since the algorithm converges slowly, and, due to limitations in numerical precision the algorithm will often not converge, real-world applications of Lloyd's algorithm stop once the distribution is "good enough." One common termination criterion is when the maximum distance a point moves in one iteration is below some set limit. ~~The~~ Lloyd's method was originally used for scalar quantization, but it is clear that the method extends for vector quantization as well. As such, it is extensively used in [[data compression]] techniques in [[information theory]]. Lloyd's method is used in computer graphics because the resulting distribution has [[blue noise]] characteristics (see also [[Colors of noise]]), meaning there are few low-frequency components that could be interpreted as artifacts. It is particularly well-suited to picking sample positions for [[dithering]]. Lloyd's algorithm is also used to generate dot drawings in the style of [[stippling]].<ref name="dhos00"/> In this application, the centroids can be weighted based on a reference image to produce stipple illustrations matching an input image.<ref name="s02"/>▼ In the [[finite element method]], an input ___domain with a complex geometry is partioned into elements with simpler shapes; for instance, two-dimensional domains (either subsets of the Euclidean plane or surfaces in three dimensions) are often partitioned into triangles. It is important for the convergence of the finite element methods that these elements be well shaped; in the case of triangles, often elements that are nearly equilateral triangles are preferred. Lloyd's algorithm ▲The Lloyd's method was originally used for scalar quantization, but it is clear that the method extends for vector quantization as well. As such, it is extensively used in [[data compression]] techniques in [[information theory]]. Lloyd's method is used in computer graphics because the resulting distribution has [[blue noise]] characteristics (see also [[Colors of noise]]), meaning there are few low-frequency components that could be interpreted as artifacts. It is particularly well-suited to picking sample positions for [[dithering]]. can be used to smooth a mesh generated by some other algorithm, moving its vertices and changing the connection pattern among its elements in order to produce triangles that are more closely equilateral.<ref name="dg02"/> These applications typically use a smaller number of iterations of Lloyd's algorithm, stopping it to convergence, in order to preserve other features of the mesh such as differences in element size in different parts of the mesh. In contrast to a different smoothing method, [[Laplacian smoothing]] (in which mesh vertices are moved to the average of their neighbors' positions), Lloyd's algorithm can change the topology of the mesh, leading to more nearly-equilateral elements as well as avoiding the problems with tangling that can arise with Laplacian smoothing. However, Laplacian smoothing can be applied more generally to meshes with non-triangular elements. ==Different distances== Lloyd's algorithm is also used to generate dot drawings in the style of [[stippling]] {{harv\|Deussen\|Hiller\|van Overveld\|Strothotte\|2000}}. In this application, the centroids can be weighted based on a reference image {{harv\|Secord\|2002}} to produce stipple illustrations matching an input image. Lloyd's algorithm is usually used in a [[Euclidean space]]. The Euclidean distance plays two roles in the algorithm: it is used to define the Voronoi cells, but it also corresponds to the choice of the centroid as the representative point of each cell, since the centroid is the point that minimizes the average squared Euclidean distance to the points in its cell. Alternative distances, and alternative central points than the centroid, may be used instead. For example,to get square-grid tiles other than hexagons, the [[Manhattan metric]] may be used.<ref name="h01"/> == See also == * The [[Linde–Buzo–Gray algorithm]], ~~which is~~ a generalization of this algorithm. * [[k-means clustering\|''k''-means clustering]] * [[Mean-shift]] ==References== {{reflist\|refs= {{citation <ref name="dhos00">{{citation \| last1 = Deussen \| first1 = Oliver \| last2 = Hiller \| first2 = Stefan Line 70 ⟶ 73: \| title = Floating points: a method for computing stipple drawings \| volume = 19 \| year = 2000}}.</ref> <ref name="dej06">{{citation \| last1 = Du \| first1 = Qiang \| last2 = Emelianenko \| first2 = Maria Line 80 ⟶ 83: \| title = Convergence of the Lloyd algorithm for computing centroidal Voronoi tessellations \| volume = 44 \| year = 2006}}.</ref> <ref name="dfg99">{{citation \| last1 = Du \| first1 = Qiang \| last2 = Faber \| first2 = Vance Line 91 ⟶ 94: \| title = Centroidal Voronoi tessellations: applications and algorithms \| volume = 41 \| year = 1999}}.</ref> <ref name="dg02">{{citation \| last1 = Du \| first1 = Qiang \| last2 = Gunzburger \| first2 = Max \| doi = 10.1016/S0096-3003(01)00260-0 \| issue = 2–3 \| journal = Applied Mathematics and Computation \| pages = 591–607 \| title = Grid generation and optimization based on centroidal Voronoi tessellations \| volume = 133 \| year = 2002}}.</ref> <ref name="ejr09">{{citation \| last1 = Emelianenko \| first1 = Maria \| last2 = Ju \| first2 = Lili Line 100 ⟶ 113: \| title = Nondegeneracy and Weak Global Convergence of the Lloyd Algorithm in $R^d$ \| volume = 46 \| year = 2009}}.</ref> <ref name="l82">{{citation \| last = Lloyd \| first = Stuart P. \| doi = 10.1109/TIT.1982.1056489 Line 109 ⟶ 122: \| title = Least squares quantization in PCM \| volume = 28 \| year = 1982}}.</ref> <ref name="sg86">{{citation \| last1 = Sabin \| first1 = M. J. \| last2 = Gray \| first2 = R. M. Line 119 ⟶ 132: \| title = Global convergence and empirical consistency of the generalized Lloyd algorithm \| volume = 32 \| year = 1986}}.</ref> <ref name="s02">{{citation \| last = Secord \| first = Adrian \| contribution = Weighted Voronoi stippling Line 127 ⟶ 140: \| publisher = [[ACM SIGGRAPH]] \| title = Proceedings of the Symposium on Non-Photorealistic Animation and Rendering (NPAR) \| year = 2002}}.</ref> <ref name="h01">{{citation \| last = Hausner\| first = Alejo \| doi = 10.1145/383259.383327 Line 135 ⟶ 148: \| pages = 573-580 \| contribution = Simulating decorative mosaics \| year = 2001}}.</ref> }} [[Category:Data clustering algorithms]] [[Category:Geometric algorithms]]

Lloyd's algorithm: Difference between revisions