Content deleted Content added
→Criticism: elbows in uniform data |
m Open access bot: url-access updated in citation with #oabot. |
||
(4 intermediate revisions by 4 users not shown) | |||
Line 31:
| url = http://www3.interscience.wiley.com/cgi-bin/fulltext/17435/PDFSTART
| doi = 10.1002/(SICI)1097-0266(199606)17:6<441::AID-SMJ819>3.0.CO;2-G
| url-access = subscription
}}{{dead link|date=February 2019|bot=medic}}{{cbignore|bot=medic}}</ref>
This can even hold in cases where all other methods for [[determining the number of clusters in a data set]] (as mentioned in that article) agree on the number of clusters.
[[File:Elbow in Inertia on uniform data.png|thumb|alt=Plot of the sum of squared errors (SSE) as k increases, following a typical 1/k shape.|Example of the typical "elbow" pattern used for choosing the number of clusters even emerging on uniform data.]]
Even on uniform random data (with no meaningful clusters) the curve follows approximately the ratio ''1/k'' where ''k'' is the number of clusters parameter, causing users to see an "elbow" to mistakenly choose some "optimal" number of clusters.<ref name=":0" /
Because the two axes (the number of clusters and the remaining variance) have no semantic relationship, various attempt to capture the elbow by "slope" are ill-defined and sensitive to the parameter range.<ref name=":0">{{Cite journal |last=Schubert |first=Erich |date=2023-07-05 |title=Stop using the elbow criterion for k-means and how to choose the number of clusters instead |url=https://doi.org/10.1145/3606274.3606278 |journal=ACM SIGKDD Explorations Newsletter |volume=25 |issue=1 |pages=36–42 |doi=10.1145/3606274.3606278 |issn=1931-0145|arxiv=2212.12189 }}</ref> Increasing the maximum number of clusters can change the ___location of the perceived "elbow", and in many cases alternate heuristics such as the [[Calinski–Harabasz index|variance-
== Measures of variation ==
|