Content deleted Content added
Merged Linclust and MMseqs2 list items; removed one older reference to MMseqs; moved "Virus Orthologous Clusters" from "clustering tools or packages" to "databases"; removed MulitNetClust because (1) it is not a *sequence* clustering tool, i.e., it does not perform sequence comparisons itself, (2) it i accrued only 3 Google citation in 8 years since 2010, two of which are self-citations. Tag: references removed |
small correction to mmseqs2 description; move most cited tools (cd-hit, usearch) to the top of list of tools; removed nrdb90.pl (very slow and outdated perl script) Tag: references removed |
||
Line 7:
== Sequence clustering algorithms and packages ==
* CD-HIT<ref name=cdhit/>▼
* [[UCLUST]] in USEARCH<ref name=usearch/>▼
* Starcode:<ref>{{cite web|url=https://github.com/gui11aume/starcode|title=Starcode repository}}</ref> a fast sequence clustering algorithm based on exact all-pairs search.<ref>{{cite journal
|title=Starcode: sequence clustering based on all-pairs search
Line 21 ⟶ 23:
|pmid=26243257
|doi=10.1186/s13059-015-0721-2 |pmc=4531804}}</ref>
▲* [[UCLUST]] in USEARCH<ref name=usearch/>
▲* CD-HIT<ref name=cdhit/>
* Linclust <ref>{{cite journal
|title=Clustering huge protein sequence sets in linear time
Line 35:
|issue= |pages=
|doi=10.1038/nbt.3988
|pmid= 29035372}}</ref> software suite for fast,
* TribeMCL: a method for clustering proteins into related groups<ref>{{cite journal
|title=An efficient algorithm for large-scale detection of protein families.
|