Revision as of 07:30, 16 July 2024 edit 666-Bandera Mouse (talk \| contribs) 221 edits No edit summary ← Previous edit		Revision as of 15:36, 28 November 2024 edit undo Citation bot (talk \| contribs) Bots 5,865,517 edits Altered bibcode. \| Use this bot. Report bugs. \| Suggested by Dominic3203 \| Category:NP-complete problems \| #UCB_Category 5/181 Next edit →
Line 28: ==Over-interpretation potential of the Monti consensus clustering algorithm== [[File:PACexplained.png\|400px\|thumb\|PAC measure (proportion of ambiguous clustering) explained. Optimal K is the K with lowest PAC value.]] Monti consensus clustering can be a powerful tool for identifying clusters, but it needs to be applied with caution as shown by Şenbabaoğlu ''et al.'' <ref name="SenbabaogluSREP" /> It has been shown that the Monti consensus clustering algorithm is able to claim apparent stability of chance partitioning of null datasets drawn from a unimodal distribution, and thus has the potential to lead to over-interpretation of cluster stability in a real study.<ref name=SenbabaogluSREP>{{cite journal\|last=Şenbabaoğlu\|first=Y.\|author2=Michailidis, G. \|author3=Li, J. Z. \|title=Critical limitations of consensus clustering in class discovery\|journal=Scientific Reports\|date=2014\|doi=10.1038/srep06207\|volume=4\|pages=6207\|pmid=25158761\|pmc=4145288\|bibcode=2014NatSR...~~4E6207~~4.6207.}}</ref><ref name=SenbabaogluRXV>{{cite bioRxiv\|last=Şenbabaoğlu\|first=Y.\|author2=Michailidis, G. \|author3=Li, J. Z. \|title=A reassessment of consensus clustering for class discovery\|date=Feb 2014\|biorxiv=10.1101/002642}}</ref> If clusters are not well separated, consensus clustering could lead one to conclude apparent structure when there is none, or declare cluster stability when it is subtle. Identifying false positive clusters is a common problem throughout cluster research,<ref name=":0">{{Cite journal\|last1=Liu\|first1=Yufeng\|last2=Hayes\|first2=David Neil\|last3=Nobel\|first3=Andrew\|last4=Marron\|first4=J. S.\|date=2008-09-01\|title=Statistical Significance of Clustering for High-Dimension, Low–Sample Size Data\|journal=Journal of the American Statistical Association\|volume=103\|issue=483\|pages=1281–1293\|doi=10.1198/016214508000000454\|s2cid=120819441\|issn=0162-1459}}</ref> and has been addressed by methods such as SigClust<ref name=":0" /> and the GAP-statistic.<ref>{{Cite journal\|last1=Tibshirani\|first1=Robert\|last2=Walther\|first2=Guenther\|last3=Hastie\|first3=Trevor\|date=2001\|title=Estimating the number of clusters in a data set via the gap statistic\|journal=Journal of the Royal Statistical Society, Series B (Statistical Methodology)\|language=en\|volume=63\|issue=2\|pages=411–423\|doi=10.1111/1467-9868.00293\|s2cid=59738652 \|issn=1467-9868\|doi-access=free}}</ref> However, these methods rely on certain assumptions for the null model that may not always be appropriate. Şenbabaoğlu ''et al'' <ref name="SenbabaogluSREP" /> demonstrated the original delta K metric to decide <math>K</math> in the Monti algorithm performed poorly, and proposed a new superior metric for measuring the stability of consensus matrices using their CDF curves. In the CDF curve of a consensus matrix, the lower left portion represents sample pairs rarely clustered together, the upper right portion represents those almost always clustered together, whereas the middle segment represent those with ambiguous assignments in different clustering runs. The proportion of ambiguous clustering (PAC) score measure quantifies this middle segment; and is defined as the fraction of sample pairs with consensus indices falling in the interval (u<sub>1</sub>, u<sub>2</sub>) ∈ [0, 1] where u<sub>1</sub> is a value close to 0 and u<sub>2</sub> is a value close to 1 (for instance u<sub>1</sub>=0.1 and u<sub>2</sub>=0.9). A low value of PAC indicates a flat middle segment, and a low rate of discordant assignments across permuted clustering runs. One can therefore infer the optimal number of clusters by the <math>K</math> value having the lowest PAC.<ref name="SenbabaogluSREP" /><ref name="SenbabaogluRXV" /> Line 60: #* Compete for Objects #'''{{Proper name\|sHBGF}}''':represents the ensemble as a [[bipartite graph]] with clusters and instances as nodes, and edges between the instances and the clusters they belong to.<ref>Solving cluster ensemble problems by bipartite graph partitioning, Xiaoli Zhang Fern and [[Carla Brodley]], Proceedings of the twenty-first international conference on Machine learning</ref> This approach can be trivially adapted to consider soft ensembles since the graph partitioning algorithm METIS accepts weights on the edges of the graph to be partitioned. In sHBGF, the graph has ''n'' + ''t'' vertices, where t is the total number of underlying clusters. #'''Bayesian consensus clustering (BCC)''': defines a fully [[Bayesian probability\|Bayesian]] model for soft consensus clustering in which multiple source clusterings, defined by different input data or different probability models, are assumed to adhere loosely to a consensus clustering.<ref name=LockBCC>{{cite journal\|last=Lock\|first=E.F.\|author2=Dunson, D.B. \|title=Bayesian consensus clustering\|journal=Bioinformatics\|date=2013\|doi=10.1093/bioinformatics/btt425\|pmid=23990412\|pmc=3789539\|volume=29\|number=20\|pages=2610–2616\|arxiv=1302.7280\|bibcode=~~2013arXiv1302~~2013Bioin.~~7280L~~.29.2610L}}</ref> The full posterior for the separate clusterings, and the consensus clustering, are inferred simultaneously via [[Gibbs sampling]]. #'''Ensemble Clustering Fuzzification Means (ECF-Means)''': ECF-means is a clustering algorithm, which combines different clustering results in ensemble, achieved by different runs of a chosen algorithm ([[k-means]]), into a single final clustering configuration.<ref name=ZazzECF>{{cite journal\|last=Zazzaro\|first=Gaetano\|author2=Martone, Angelo \|title=ECF-means - Ensemble Clustering Fuzzification Means. A novel algorithm for clustering aggregation, fuzzification, and optimization \|journal=IMM 2018: The Eighth International Conference on Advances in Information Mining and Management\|date=2018}} [https://www.thinkmind.org/articles/immm_2018_2_10_50010.pdf]</ref>

Consensus clustering: Difference between revisions