Content deleted Content added
No edit summary |
Citation bot (talk | contribs) Altered bibcode. | Use this bot. Report bugs. | Suggested by Dominic3203 | Category:NP-complete problems | #UCB_Category 5/181 |
||
Line 28:
==Over-interpretation potential of the Monti consensus clustering algorithm==
[[File:PACexplained.png|400px|thumb|PAC measure (proportion of ambiguous clustering) explained. Optimal K is the K with lowest PAC value.]]
Monti consensus clustering can be a powerful tool for identifying clusters, but it needs to be applied with caution as shown by Şenbabaoğlu ''et al.'' <ref name="SenbabaogluSREP" /> It has been shown that the Monti consensus clustering algorithm is able to claim apparent stability of chance partitioning of null datasets drawn from a unimodal distribution, and thus has the potential to lead to over-interpretation of cluster stability in a real study.<ref name=SenbabaogluSREP>{{cite journal|last=Şenbabaoğlu|first=Y.|author2=Michailidis, G. |author3=Li, J. Z. |title=Critical limitations of consensus clustering in class discovery|journal=Scientific Reports|date=2014|doi=10.1038/srep06207|volume=4|pages=6207|pmid=25158761|pmc=4145288|bibcode=2014NatSR...
Şenbabaoğlu ''et al'' <ref name="SenbabaogluSREP" /> demonstrated the original delta K metric to decide <math>K</math> in the Monti algorithm performed poorly, and proposed a new superior metric for measuring the stability of consensus matrices using their CDF curves. In the CDF curve of a consensus matrix, the lower left portion represents sample pairs rarely clustered together, the upper right portion represents those almost always clustered together, whereas the middle segment represent those with ambiguous assignments in different clustering runs. The proportion of ambiguous clustering (PAC) score measure quantifies this middle segment; and is defined as the fraction of sample pairs with consensus indices falling in the interval (u<sub>1</sub>, u<sub>2</sub>) ∈ [0, 1] where u<sub>1</sub> is a value close to 0 and u<sub>2</sub> is a value close to 1 (for instance u<sub>1</sub>=0.1 and u<sub>2</sub>=0.9). A low value of PAC indicates a flat middle segment, and a low rate of discordant assignments across permuted clustering runs. One can therefore infer the optimal number of clusters by the <math>K</math> value having the lowest PAC.<ref name="SenbabaogluSREP" /><ref name="SenbabaogluRXV" />
Line 60:
#* Compete for Objects
#'''{{Proper name|sHBGF}}''':represents the ensemble as a [[bipartite graph]] with clusters and instances as nodes, and edges between the instances and the clusters they belong to.<ref>Solving cluster ensemble problems by bipartite graph partitioning, Xiaoli Zhang Fern and [[Carla Brodley]], Proceedings of the twenty-first international conference on Machine learning</ref> This approach can be trivially adapted to consider soft ensembles since the graph partitioning algorithm METIS accepts weights on the edges of the graph to be partitioned. In sHBGF, the graph has ''n'' + ''t'' vertices, where t is the total number of underlying clusters.
#'''Bayesian consensus clustering (BCC)''': defines a fully [[Bayesian probability|Bayesian]] model for soft consensus clustering in which multiple source clusterings, defined by different input data or different probability models, are assumed to adhere loosely to a consensus clustering.<ref name=LockBCC>{{cite journal|last=Lock|first=E.F.|author2=Dunson, D.B. |title=Bayesian consensus clustering|journal=Bioinformatics|date=2013|doi=10.1093/bioinformatics/btt425|pmid=23990412|pmc=3789539|volume=29|number=20|pages=2610–2616|arxiv=1302.7280|bibcode=
#'''Ensemble Clustering Fuzzification Means (ECF-Means)''': ECF-means is a clustering algorithm, which combines different clustering results in ensemble, achieved by different runs of a chosen algorithm ([[k-means]]), into a single final clustering configuration.<ref name=ZazzECF>{{cite journal|last=Zazzaro|first=Gaetano|author2=Martone, Angelo |title=ECF-means - Ensemble Clustering Fuzzification Means. A novel algorithm for clustering aggregation, fuzzification, and optimization |journal=IMM 2018: The Eighth International Conference on Advances in Information Mining and Management|date=2018}} [https://www.thinkmind.org/articles/immm_2018_2_10_50010.pdf]</ref>
|