Revision as of 00:24, 30 September 2018 edit Evolution and evolvability (talk \| contribs) Extended confirmed users 24,414 edits + link to seq homology Tag: Visual edit ← Previous edit		Revision as of 01:25, 28 November 2018 edit undo Citation bot (talk \| contribs) Bots 5,863,738 edits m Alter: volume, issue, pages, template type. Add: doi-broken-date, year, pmid, doi, volume, journal, title, pmc, issue, pages, date, author pars. 1-4. Converted bare reference to cite template. Removed parameters. Formatted dashes. You can use this bot yourself. Report bugs here. \| User-activated; Category:Bioinformatics. Next edit →
Line 10: * CD-HIT<ref name=cdhit/> * [[UCLUST]] in USEARCH<ref name=usearch/> * Starcode:<ref>{{cite web\|url=https://github.com/gui11aume/starcode\|title=Starcode repository\|date=2018-10-11}}</ref> a fast sequence clustering algorithm based on exact all-pairs search.<ref>{{cite journal \|title=Starcode: sequence clustering based on all-pairs search \|author1=Zorita E \|author2=Cuscó P \|author3=Filion GJ. \|journal=Bioinformatics Line 22: \| date=Aug 2015 \|volume=16 \|issue=157 \|pages=157 \|pmid=26243257 \|doi=10.1186/s13059-015-0721-2 \|pmc=4531804}}</ref> * Linclust:<ref>{{cite journal Line 28: \|author1=Steinegger M. \|author2=Söding J. \|journal=Nature Communications \|date=June 2018 \|volume=9 \|issue=1 \|pages=2542 \|doi=10.1038/s41467-018-04964-5 \|pmid= 29959318\|pmc=6026198 }}</ref> first algorithm whose runtime scales linearly with input set size, very fast, part of [http://mmseqs.org/ MMseqs2] <ref>{{cite journal \|title=MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets \|author1=Steinegger M. \|author2=Söding J. \|journal=Nature Biotechnology \|date=Oct 16, 2017 \|volume=35 \|issue= 11\|pages=1026–1028 \|doi=10.1038/nbt.3988 \|pmid= 29035372}}</ref> software suite for fast, sensitive sequence searching and clustering of large sequence sets Line 51: * Clusterer:<ref>{{cite web\|url=http://bugaco.com/bioinf/clusterer/\|title=Clusterer: extendable java application for sequence grouping and cluster analyses\|work=bugaco.com}}</ref> extendable java application for sequence grouping and cluster analyses * PATDB: a program for rapidly identifying perfect substrings * nrdb:<ref>{{Cite web \| url=https://web.archive.org/web/20080101032917/http://blast.wustl.edu/pub/nrdb/ \| title=Index of /pub/nrdb}}</ref> a program for merging trivially redundant (identical) sequences * CluSTr:<ref>{{cite web \|url=http://www.ebi.ac.uk/clustr/ \|title=Archived copy \|accessdate=2006-11-23 \|deadurl=yes \|archiveurl=https://web.archive.org/web/20060924012903/http://www.ebi.ac.uk/clustr/ \|archivedate=2006-09-24 \|df= }}</ref> A single-linkage protein sequence clustering database from Smith-Waterman sequence similarities; covers over 7 mln sequences including UniProt and IPI * ICAtools<ref>{{cite web\|url=http://www.littlest.co.uk/software/bioinf/old_packages/icatools/\|title=Introduction to the ICAtools\|work=littlest.co.uk}}</ref> - original (ancient) DNA clustering package with many algorithms useful for artifact discovery or EST clustering * Skipredudant EMBOSS tool<ref>{{cite web\|url=http://bioweb2.pasteur.fr/docs/EMBOSS/skipredundant.html\|title=EMBOSS: skipredundant\|work=pasteur.fr}}</ref> to remove redundant sequences from a set * CLUSS Algorithm<ref>{{cite ~~web~~journal\|~~url~~title=~~https~~CLUSS Algorithm :~~//bmcbioinformatics~~ Clustering non-alignable protein sequences\|journal=Prospectus.~~biomedcentral~~usherbrooke.~~com/articles/~~ca\|volume=8\|pages=286\|doi=10.1186/1471-2105-8-286\|~~title~~pmid=~~CLUSS~~17683581\|pmc=1976428\|year ~~Algorithm~~= :2007\|last1 ~~Clustering~~= ~~non-alignable~~Kelil\|first1 ~~protein~~= ~~sequences~~Abdellali\|~~work~~last2=~~prospectus.usherbrooke.ca~~Wang\|first2=Shengrui\|last3=Brzezinski\|first3=Ryszard\|last4=Fleury\|first4=Alain}}</ref> to identify groups of structurally, functionally, or evolutionarily related hard-to-align protein sequences. CLUSS webserver <ref name="prospectus.usherbrooke.ca">{{Cite web \| url=http://prospectus.usherbrooke.ca/CLUSS/ \| title=CLUSS Home Page}}</ref> * CLUSS2 Algorithm<ref>{{cite ~~web~~journal\|url=https://www.inderscienceonline.com/doi/abs/10.1504/IJCBDD.2008.02019\|title=CLUSS2 : Alignment-independent algorithm for clustering protein families with multiple biological functions\|~~work~~issue=~~www~~2\|pages=122–140\|journal=International Journal of Computational Biology and Drug Design\|volume=1\|doi=10.~~inderscienceonline~~1504/IJCBDD.~~com~~2008.02019\|date=January 2008\|last1=Kelil\|first1=Abdellali\|last2=Wang\|first2=Shengrui\|last3=Brzezinski\|first3=Ryszard\|doi-broken-date=2018-11-28}}</ref> for clustering families of hard-to-align protein sequences with multiple biological functions. CLUSS2 webserver <ref name="prospectus.usherbrooke.ca"/> <!-- Lets try the above (although both are wobbly) --> <!-- * [http://bio.cc/RSDB RSDB] broken link --> Line 66: \| date=Jun 1998 \|volume=14 \|issue=5 \|pages=423–9. \|title=Removing near-neighbour redundancy from large protein sequence collections. \|author=Holm L1, Sander C. Line 77: \|date= Nov 2016 \|volume=45 \|issue=D1 \|pages= D170–D176 \|doi= 10.1093/nar/gkw1081\|pmid=27899574 \|pmc=5614098 }}</ref> * Virus Orthologous Clusters:<ref>{{cite web\|url=http://athena.bioc.uvic.ca/tools/VOCS\|title=VOCS - Viral Bioinformatics Resource Center\|work=uvic.ca}}</ref> A viral protein sequence clustering database; contains all predicted genes from eleven virus families organized into ortholog groups by BLASTP similarity

Sequence clustering: Difference between revisions