Talk:K-means clustering: Difference between revisions

Content deleted Content added
Merge proposal: new section
Line 156:
: From a quick look, the book you reference discusses the Lloyd algorithm for finding an ''approximative'' optimum. As the problem of finding the exact solution is proven to be NP-hard (it should be trivial to find references for this), O(n*k*i*m) obviously cannot be correct for the exact k-means solution.
: I would not use a wide-coverage book like this as reference. These books tend to be very imprecise on what they are talking about. Many of them call it "k-means", cite MacQueen, but give Lloyds algorithm, and do not discuss that this is just a heuristic. And exactly this happened in this NLP book. The given algorithm is the modern formulation of Lloyds algorithm (did you have a look at his article?); who not even used the term k-means. And it's just the most popular heuristic for finding a local optimum. However, the complexity section clearly states: " the k-means clustering problem", NOT "the lloyd algorithm". Btw, the same thing applies to hierarchical clustering. The naive matrix way can solve arbitrary cluster similarity measures in O(n^3); unless you use a really expensive cluster similarity measure. For special cases such as single-linkage, there exist variants that run in O(n^2). If you are only interested in a single cut through the dendrogram, DBSCAN with minPts=2 should give the desired result in O(n log n) with index support. Whenever talking about the complexity of an algorithm, you really need to be precise of what you are talking about. (Granted, the article could be a bit clearer on what it refers to, and emphasize this common misunderstanding). --[[Special:Contributions/160.39.199.43|160.39.199.43]] ([[User talk:160.39.199.43|talk]]) 03:54, 27 June 2013 (UTC)
 
== Merge proposal ==
 
Since [[Lloyd's algorithm]] is the standard algorithm for k-means these days, and essential to explaining k-means, we might just as well merge the two articles.
 
In fact, I'd even suggest to make [[Lloyd's algorithm]] simply a '''redirect''' to the appropriate section of k-means where it is explained.
 
However, there seems to be some other variant (not so much for clustering?) of Lloyd discussed in that other article, too. --[[User:Chire|Chire]] ([[User talk:Chire|talk]]) 14:28, 26 August 2013 (UTC)