Content deleted Content added
m Open access bot: doi added to citation with #oabot. |
|||
Line 15:
The Monti consensus clustering algorithm<ref>{{Cite journal|last1=Monti|first1=Stefano|last2=Tamayo|first2=Pablo|last3=Mesirov|first3=Jill|last4=Golub|first4=Todd|date=2003-07-01|title=Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data|journal=Machine Learning|language=en|volume=52|issue=1|pages=91–118|doi=10.1023/A:1023949509487|issn=1573-0565|doi-access=free}}</ref> is one of the most popular consensus clustering algorithms and is used to determine the number of clusters, <math>K</math>. Given a dataset of <math>N</math> total number of points to cluster, this algorithm works by resampling and clustering the data, for each <math>K</math> and a <math>N \times N</math> consensus matrix is calculated, where each element represents the fraction of times two samples clustered together. A perfectly stable matrix would consist entirely of zeros and ones, representing all sample pairs always clustering together or not together over all resampling iterations. The relative stability of the consensus matrices can be used to infer the optimal <math>K</math>.
More specifically, given a set of points to cluster, <math>D=\{e_1,e_2,...e_N\}</math>, let <math>D^1,D^2,...,D^H</math> be the list of <math>H</math>
<math>M^h(i,j)= \begin{cases} 1, & \text{if}\text{ points i and j belong to the same cluster} \\ 0, & \text{otherwise} \end{cases}</math>
|