Sequence clustering: Difference between revisions

Content deleted Content added
No edit summary
Merged Linclust and MMseqs2 list items; removed one older reference to MMseqs; moved "Virus Orthologous Clusters" from "clustering tools or packages" to "databases"; removed MulitNetClust because (1) it is not a *sequence* clustering tool, i.e., it does not perform sequence comparisons itself, (2) it i accrued only 3 Google citation in 8 years since 2010, two of which are self-citations.
Tag: references removed
Line 23:
* [[UCLUST]] in USEARCH<ref name=usearch/>
* CD-HIT<ref name=cdhit/>
|pmid=* 26743509}}</ref>Linclust <ref>{{cite journal
* Linclust: clustering protein sequences in linear time<ref>{{Cite journal|last=Steinegger|first=Martin|last2=Soeding|first2=Johannes|date=2018-06-29|title=Clustering huge protein sequence sets in linear time|journal= Nature Communications|doi=10.1038/s41467-018-04964-5}}</ref>
|title=Clustering huge protein sequence sets in linear time
|author1=Steinegger M. |author2=Söding J. |journal=Nature BiotechnologyCommunications
|date=JanJune 20162018 |volume=329
|pages=2542
|doi=10.1038/nbt.3988s41467-018-04964-5
|pmid= 29959318}}</ref>: first algorithm whose runtime scales linearly with input set size, very fast, part of [http://mmseqs.org/ MMseqs2] <ref>{{cite journal
|title=MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets
|author1=HauserSteinegger M. |author2=Steinegger M. |author3=Söding J. |journal=BioinformaticsNature Biotechnology
|date=Oct 16, 2017 |volume=
|issue= |pages=
|doi=10.1038/nbt.3988
|titlepmid=MMseqs 29035372}}</ref> software suite for fast and deep clustering and searching of large protein sequence sets
* nrdb90.pl<ref name=rdb90>{{cite journal|pmid=9682055
|journal=Bioinformatics
Line 33 ⟶ 45:
|doi=10.1093/bioinformatics/14.5.423
}}</ref>
* MMseqs2: software suite for fast and deep clustering of large protein sequence sets <ref>{{cite journal
|title=MMseqs software suite for fast and deep clustering and searching of large protein sequence sets
|author1=Hauser M. |author2=Steinegger M. |author3=Söding J. |journal=Bioinformatics
|date=Jan 2016 |volume=32
|issue=9 |pages=1323–1330
|doi=10.1093/bioinformatics/btw006
|pmid= 26743509}}</ref> <ref>{{cite journal
|title=MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets
|author1=Steinegger M. |author2=Söding J. |journal=Nature Biotechnology
|date=Oct 16, 2017 |volume=
|issue= |pages=
|doi=10.1038/nbt.3988
|pmid= 29035372}}</ref>
* TribeMCL: a method for clustering proteins into related groups<ref>{{cite journal
|title=An efficient algorithm for large-scale detection of protein families.
Line 58 ⟶ 57:
* UICluster:<ref>http://ratest.eng.uiowa.edu/pubsoft/clustering/</ref> Parallel Clustering of EST (Gene) Sequences
* BLASTClust single-linkage clustering with BLAST<ref>{{cite web|url=https://www.ncbi.nlm.nih.gov/Web/Newsltr/Spring04/blastlab.html|title=NCBI News: Spring 2004-BLASTLab|work=nih.gov}}</ref>
* (Multi)netclust:<ref>{{cite web|url=http://www.bioinformatics.nl/netclust/|title=WUR Multi-netclust web server|work=bioinformatics.nl}}</ref> fast and memory-efficient detection of connected clusters in (multi-parametric) data networks<ref>{{cite journal
|title=Multi-netclust: an efficient tool for finding connected clusters in multi-parametric networks
|author=Kuzniar, A., Dhir, S., Nijveen, H., Pongor, S. and Leunissen, J. A. M.
|journal=Bioinformatics
|date=Oct 2010
|volume=26
|issue=19
|pages=2482–2483
|pmid=20679333
|doi=10.1093/bioinformatics/btq435
|pmc=2944197}}</ref>
* Clusterer:<ref>{{cite web|url=http://bugaco.com/bioinf/clusterer/|title=Clusterer: extendable java application for sequence grouping and cluster analyses|work=bugaco.com}}</ref> extendable java application for sequence grouping and cluster analyses
* PATDB: a program for rapidly identifying perfect substrings
Line 74 ⟶ 62:
* CluSTr:<ref>{{cite web |url=http://www.ebi.ac.uk/clustr/ |title=Archived copy |accessdate=2006-11-23 |deadurl=yes |archiveurl=https://web.archive.org/web/20060924012903/http://www.ebi.ac.uk/clustr/ |archivedate=2006-09-24 |df= }}</ref> A single-linkage protein sequence clustering database from Smith-Waterman sequence similarities; covers over 7 mln sequences including UniProt and IPI
* ICAtools<ref>{{cite web|url=http://www.littlest.co.uk/software/bioinf/old_packages/icatools/|title=Introduction to the ICAtools|work=littlest.co.uk}}</ref> - original (ancient) DNA clustering package with many algorithms useful for artifact discovery or EST clustering
* Virus Orthologous Clusters:<ref>{{cite web|url=http://athena.bioc.uvic.ca/tools/VOCS|title=VOCS - Viral Bioinformatics Resource Center|work=uvic.ca}}</ref> A viral protein sequence clustering database; contains all predicted genes from eleven virus families organized into ortholog groups by BLASTP similarity
* Skipredudant EMBOSS tool<ref>{{cite web|url=http://bioweb2.pasteur.fr/docs/EMBOSS/skipredundant.html|title=EMBOSS: skipredundant|work=pasteur.fr}}</ref> to remove redundant sequences from a set
* CLUSS Algorithm<ref>{{cite web|url=https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-8-286|title=CLUSS Algorithm : Clustering non-alignable protein sequences|work=prospectus.usherbrooke.ca}}</ref> to identify groups of structurally, functionally, or evolutionarily related hard-to-align protein sequences. CLUSS webserver <ref>http://prospectus.usherbrooke.ca/CLUSS/</ref>
Line 91 ⟶ 78:
|issue=D1 |pages= D170–D176
|doi= 10.1093/nar/gkw1081}}</ref>
* Virus Orthologous Clusters:<ref>{{cite web|url=http://athena.bioc.uvic.ca/tools/VOCS|title=VOCS - Viral Bioinformatics Resource Center|work=uvic.ca}}</ref> A viral protein sequence clustering database; contains all predicted genes from eleven virus families organized into ortholog groups by BLASTP similarity
 
==See also==