Revision as of 05:10, 16 August 2018 edit 134.76.223.13 (talk) Merged Linclust and MMseqs2 list items; removed one older reference to MMseqs; moved "Virus Orthologous Clusters" from "clustering tools or packages" to "databases"; removed MulitNetClust because (1) it is not a sequence clustering tool, i.e., it does not perform sequence comparisons itself, (2) it i accrued only 3 Google citation in 8 years since 2010, two of which are self-citations. Tag: references removed ← Previous edit		Revision as of 05:20, 16 August 2018 edit undo 134.76.223.13 (talk) small correction to mmseqs2 description; move most cited tools (cd-hit, usearch) to the top of list of tools; removed nrdb90.pl (very slow and outdated perl script) Tag: references removed Next edit →
Line 7: == Sequence clustering algorithms and packages == * CD-HIT<ref name=cdhit/>▼ * [[UCLUST]] in USEARCH<ref name=usearch/>▼ * Starcode:<ref>{{cite web\|url=https://github.com/gui11aume/starcode\|title=Starcode repository}}</ref> a fast sequence clustering algorithm based on exact all-pairs search.<ref>{{cite journal \|title=Starcode: sequence clustering based on all-pairs search Line 21 ⟶ 23: \|pmid=26243257 \|doi=10.1186/s13059-015-0721-2 \|pmc=4531804}}</ref> ▲* [[UCLUST]] in USEARCH<ref name=usearch/> ▲* CD-HIT<ref name=cdhit/> * Linclust <ref>{{cite journal \|title=Clustering huge protein sequence sets in linear time Line 35: \|issue= \|pages= \|doi=10.1038/nbt.3988 \|pmid= 29035372}}</ref> software suite for fast, ~~and~~sensitive ~~deep~~sequence searching and clustering of large ~~protein~~ sequence sets * nrdb90.pl<ref name=rdb90>{{cite journal\|pmid=9682055 ~~\|journal=Bioinformatics~~ ~~\| date=Jun 1998 \|volume=14~~ ~~\|issue=5~~ ~~\|pages=423–9.~~ ~~\|title=Removing near-neighbour redundancy from large protein sequence collections.~~ ~~\|author=Holm L1, Sander C.~~ ~~\|doi=10.1093/bioinformatics/14.5.423~~ ~~}}</ref>~~ * TribeMCL: a method for clustering proteins into related groups<ref>{{cite journal \|title=An efficient algorithm for large-scale detection of protein families.

Sequence clustering: Difference between revisions