Consensus clustering: Difference between revisions

Content deleted Content added
OAbot (talk | contribs)
m Open access bot: doi added to citation with #oabot.
Line 15:
The Monti consensus clustering algorithm<ref>{{Cite journal|last1=Monti|first1=Stefano|last2=Tamayo|first2=Pablo|last3=Mesirov|first3=Jill|last4=Golub|first4=Todd|date=2003-07-01|title=Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data|journal=Machine Learning|language=en|volume=52|issue=1|pages=91–118|doi=10.1023/A:1023949509487|issn=1573-0565|doi-access=free}}</ref> is one of the most popular consensus clustering algorithms and is used to determine the number of clusters, <math>K</math>. Given a dataset of <math>N</math> total number of points to cluster, this algorithm works by resampling and clustering the data, for each <math>K</math> and a <math>N \times N</math> consensus matrix is calculated, where each element represents the fraction of times two samples clustered together. A perfectly stable matrix would consist entirely of zeros and ones, representing all sample pairs always clustering together or not together over all resampling iterations. The relative stability of the consensus matrices can be used to infer the optimal <math>K</math>.
 
More specifically, given a set of points to cluster, <math>D=\{e_1,e_2,...e_N\}</math>, let <math>D^1,D^2,...,D^H</math> be the list of <math>H</math> pertubedperturbed (resampled) datasets of the original dataset <math>D</math>, and let <math>M^h</math> denote the <math>NXN</math> connectivity matrix resulting from applying a clustering algorithm to the dataset <math>D^h</math>. The entries of <math>M^h</math> are defined as follows:
 
<math>M^h(i,j)= \begin{cases} 1, & \text{if}\text{ points i and j belong to the same cluster} \\ 0, & \text{otherwise} \end{cases}</math>