Sequence clustering: Difference between revisions

Content deleted Content added
Merged Linclust and MMseqs2 list items; removed one older reference to MMseqs; moved "Virus Orthologous Clusters" from "clustering tools or packages" to "databases"; removed MulitNetClust because (1) it is not a *sequence* clustering tool, i.e., it does not perform sequence comparisons itself, (2) it i accrued only 3 Google citation in 8 years since 2010, two of which are self-citations.
Tag: references removed
small correction to mmseqs2 description; move most cited tools (cd-hit, usearch) to the top of list of tools; removed nrdb90.pl (very slow and outdated perl script)
Tag: references removed
Line 7:
 
== Sequence clustering algorithms and packages ==
* CD-HIT<ref name=cdhit/>
* [[UCLUST]] in USEARCH<ref name=usearch/>
* Starcode:<ref>{{cite web|url=https://github.com/gui11aume/starcode|title=Starcode repository}}</ref> a fast sequence clustering algorithm based on exact all-pairs search.<ref>{{cite journal
|title=Starcode: sequence clustering based on all-pairs search
Line 21 ⟶ 23:
|pmid=26243257
|doi=10.1186/s13059-015-0721-2 |pmc=4531804}}</ref>
* [[UCLUST]] in USEARCH<ref name=usearch/>
* CD-HIT<ref name=cdhit/>
* Linclust <ref>{{cite journal
|title=Clustering huge protein sequence sets in linear time
Line 35:
|issue= |pages=
|doi=10.1038/nbt.3988
|pmid= 29035372}}</ref> software suite for fast, andsensitive deepsequence searching and clustering of large protein sequence sets
* nrdb90.pl<ref name=rdb90>{{cite journal|pmid=9682055
|journal=Bioinformatics
| date=Jun 1998 |volume=14
|issue=5
|pages=423–9.
|title=Removing near-neighbour redundancy from large protein sequence collections.
|author=Holm L1, Sander C.
|doi=10.1093/bioinformatics/14.5.423
}}</ref>
* TribeMCL: a method for clustering proteins into related groups<ref>{{cite journal
|title=An efficient algorithm for large-scale detection of protein families.