Revision as of 19:38, 30 June 2018 edit 147.47.240.149 (talk) No edit summary ← Previous edit		Revision as of 05:10, 16 August 2018 edit undo 134.76.223.13 (talk) Merged Linclust and MMseqs2 list items; removed one older reference to MMseqs; moved "Virus Orthologous Clusters" from "clustering tools or packages" to "databases"; removed MulitNetClust because (1) it is not a sequence clustering tool, i.e., it does not perform sequence comparisons itself, (2) it i accrued only 3 Google citation in 8 years since 2010, two of which are self-citations. Tag: references removed Next edit →
Line 23: * [[UCLUST]] in USEARCH<ref name=usearch/> * CD-HIT<ref name=cdhit/> ~~\|pmid=~~* ~~26743509}}</ref>~~Linclust <ref>{{cite journal▼ * Linclust: clustering protein sequences in linear time<ref>{{Cite journal\|last=Steinegger\|first=Martin\|last2=Soeding\|first2=Johannes\|date=2018-06-29\|title=Clustering huge protein sequence sets in linear time\|journal= Nature Communications\|doi=10.1038/s41467-018-04964-5}}</ref> \|title=Clustering huge protein sequence sets in linear time \|author1=Steinegger M. \|author2=Söding J. \|journal=Nature ~~Biotechnology~~Communications▼ \|date=~~Jan~~June ~~2016~~2018 \|volume=329▼ \|pages=2542 \|doi=10.1038/~~nbt.3988~~s41467-018-04964-5▼ \|pmid= 29959318}}</ref>: first algorithm whose runtime scales linearly with input set size, very fast, part of [http://mmseqs.org/ MMseqs2] <ref>{{cite journal \|title=MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets▼ \|author1=~~Hauser~~Steinegger M. \|author2~~=Steinegger M. \|author3~~=Söding J. \|journal=~~Bioinformatics~~Nature Biotechnology▼ \|date=Oct 16, 2017 \|volume=▼ \|issue= \|pages=▼ \|doi=10.1038/nbt.3988 \|~~title~~pmid=~~MMseqs~~ 29035372}}</ref> software suite for fast and deep clustering ~~and searching~~ of large protein sequence sets ▼ * nrdb90.pl<ref name=rdb90>{{cite journal\|pmid=9682055 \|journal=Bioinformatics Line 33 ⟶ 45: \|doi=10.1093/bioinformatics/14.5.423 }}</ref> * MMseqs2: software suite for fast and deep clustering of large protein sequence sets <ref>{{cite journal ▲\|title=MMseqs software suite for fast and deep clustering and searching of large protein sequence sets ▲\|author1=Hauser M. \|author2=Steinegger M. \|author3=Söding J. \|journal=Bioinformatics ▲\|date=Jan 2016 \|volume=32 ~~\|issue=9 \|pages=1323–1330~~ ~~\|doi=10.1093/bioinformatics/btw006~~ ▲\|pmid= 26743509}}</ref> <ref>{{cite journal ▲\|title=MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets ▲\|author1=Steinegger M. \|author2=Söding J. \|journal=Nature Biotechnology ▲\|date=Oct 16, 2017 \|volume= ▲\|issue= \|pages= ▲\|doi=10.1038/nbt.3988 ~~\|pmid= 29035372}}</ref>~~ * TribeMCL: a method for clustering proteins into related groups<ref>{{cite journal \|title=An efficient algorithm for large-scale detection of protein families. Line 58 ⟶ 57: * UICluster:<ref>http://ratest.eng.uiowa.edu/pubsoft/clustering/</ref> Parallel Clustering of EST (Gene) Sequences * BLASTClust single-linkage clustering with BLAST<ref>{{cite web\|url=https://www.ncbi.nlm.nih.gov/Web/Newsltr/Spring04/blastlab.html\|title=NCBI News: Spring 2004-BLASTLab\|work=nih.gov}}</ref> * (Multi)netclust:<ref>{{cite web\|url=http://www.bioinformatics.nl/netclust/\|title=WUR Multi-netclust web server\|work=bioinformatics.nl}}</ref> fast and memory-efficient detection of connected clusters in (multi-parametric) data networks<ref>{{cite journal ~~\|title=Multi-netclust: an efficient tool for finding connected clusters in multi-parametric networks~~ ~~\|author=Kuzniar, A., Dhir, S., Nijveen, H., Pongor, S. and Leunissen, J. A. M.~~ ~~\|journal=Bioinformatics~~ ~~\|date=Oct 2010~~ ~~\|volume=26~~ ~~\|issue=19~~ ~~\|pages=2482–2483~~ ~~\|pmid=20679333~~ ~~\|doi=10.1093/bioinformatics/btq435~~ ~~\|pmc=2944197}}</ref>~~ * Clusterer:<ref>{{cite web\|url=http://bugaco.com/bioinf/clusterer/\|title=Clusterer: extendable java application for sequence grouping and cluster analyses\|work=bugaco.com}}</ref> extendable java application for sequence grouping and cluster analyses * PATDB: a program for rapidly identifying perfect substrings Line 74 ⟶ 62: * CluSTr:<ref>{{cite web \|url=http://www.ebi.ac.uk/clustr/ \|title=Archived copy \|accessdate=2006-11-23 \|deadurl=yes \|archiveurl=https://web.archive.org/web/20060924012903/http://www.ebi.ac.uk/clustr/ \|archivedate=2006-09-24 \|df= }}</ref> A single-linkage protein sequence clustering database from Smith-Waterman sequence similarities; covers over 7 mln sequences including UniProt and IPI * ICAtools<ref>{{cite web\|url=http://www.littlest.co.uk/software/bioinf/old_packages/icatools/\|title=Introduction to the ICAtools\|work=littlest.co.uk}}</ref> - original (ancient) DNA clustering package with many algorithms useful for artifact discovery or EST clustering * Virus Orthologous Clusters:<ref>{{cite web\|url=http://athena.bioc.uvic.ca/tools/VOCS\|title=VOCS - Viral Bioinformatics Resource Center\|work=uvic.ca}}</ref> A viral protein sequence clustering database; contains all predicted genes from eleven virus families organized into ortholog groups by BLASTP similarity▼ * Skipredudant EMBOSS tool<ref>{{cite web\|url=http://bioweb2.pasteur.fr/docs/EMBOSS/skipredundant.html\|title=EMBOSS: skipredundant\|work=pasteur.fr}}</ref> to remove redundant sequences from a set * CLUSS Algorithm<ref>{{cite web\|url=https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-8-286\|title=CLUSS Algorithm : Clustering non-alignable protein sequences\|work=prospectus.usherbrooke.ca}}</ref> to identify groups of structurally, functionally, or evolutionarily related hard-to-align protein sequences. CLUSS webserver <ref>http://prospectus.usherbrooke.ca/CLUSS/</ref> Line 91 ⟶ 78: \|issue=D1 \|pages= D170–D176 \|doi= 10.1093/nar/gkw1081}}</ref> ▲* Virus Orthologous Clusters:<ref>{{cite web\|url=http://athena.bioc.uvic.ca/tools/VOCS\|title=VOCS - Viral Bioinformatics Resource Center\|work=uvic.ca}}</ref> A viral protein sequence clustering database; contains all predicted genes from eleven virus families organized into ortholog groups by BLASTP similarity ==See also==

Sequence clustering: Difference between revisions