Content deleted Content added
+ link to seq homology |
Citation bot (talk | contribs) m Alter: volume, issue, pages, template type. Add: doi-broken-date, year, pmid, doi, volume, journal, title, pmc, issue, pages, date, author pars. 1-4. Converted bare reference to cite template. Removed parameters. Formatted dashes. You can use this bot yourself. Report bugs here. | User-activated; Category:Bioinformatics. |
||
Line 10:
* CD-HIT<ref name=cdhit/>
* [[UCLUST]] in USEARCH<ref name=usearch/>
* Starcode:<ref>{{cite web|url=https://github.com/gui11aume/starcode|title=Starcode repository|date=2018-10-11}}</ref> a fast sequence clustering algorithm based on exact all-pairs search.<ref>{{cite journal
|title=Starcode: sequence clustering based on all-pairs search
|author1=Zorita E |author2=Cuscó P |author3=Filion GJ. |journal=Bioinformatics
Line 22:
| date=Aug 2015 |volume=16
|issue=157
|pages=157 |pmid=26243257
|doi=10.1186/s13059-015-0721-2 |pmc=4531804}}</ref>
* Linclust:<ref>{{cite journal
Line 28:
|author1=Steinegger M. |author2=Söding J. |journal=Nature Communications
|date=June 2018 |volume=9
|issue=1 |pages=2542
|doi=10.1038/s41467-018-04964-5
|pmid= 29959318|pmc=6026198 }}</ref> first algorithm whose runtime scales linearly with input set size, very fast, part of [http://mmseqs.org/ MMseqs2] <ref>{{cite journal
|title=MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets
|author1=Steinegger M. |author2=Söding J. |journal=Nature Biotechnology
|date=Oct 16, 2017 |volume=35
|issue= 11|pages=1026–1028
|doi=10.1038/nbt.3988
|pmid= 29035372}}</ref> software suite for fast, sensitive sequence searching and clustering of large sequence sets
Line 51:
* Clusterer:<ref>{{cite web|url=http://bugaco.com/bioinf/clusterer/|title=Clusterer: extendable java application for sequence grouping and cluster analyses|work=bugaco.com}}</ref> extendable java application for sequence grouping and cluster analyses
* PATDB: a program for rapidly identifying perfect substrings
* nrdb:<ref>{{Cite web | url=https://web.archive.org/web/20080101032917/http://blast.wustl.edu/pub/nrdb/ | title=Index of /pub/nrdb}}</ref> a program for merging trivially redundant (identical) sequences
* CluSTr:<ref>{{cite web |url=http://www.ebi.ac.uk/clustr/ |title=Archived copy |accessdate=2006-11-23 |deadurl=yes |archiveurl=https://web.archive.org/web/20060924012903/http://www.ebi.ac.uk/clustr/ |archivedate=2006-09-24 |df= }}</ref> A single-linkage protein sequence clustering database from Smith-Waterman sequence similarities; covers over 7 mln sequences including UniProt and IPI
* ICAtools<ref>{{cite web|url=http://www.littlest.co.uk/software/bioinf/old_packages/icatools/|title=Introduction to the ICAtools|work=littlest.co.uk}}</ref> - original (ancient) DNA clustering package with many algorithms useful for artifact discovery or EST clustering
* Skipredudant EMBOSS tool<ref>{{cite web|url=http://bioweb2.pasteur.fr/docs/EMBOSS/skipredundant.html|title=EMBOSS: skipredundant|work=pasteur.fr}}</ref> to remove redundant sequences from a set
* CLUSS Algorithm<ref>{{cite
* CLUSS2 Algorithm<ref>{{cite
<!-- Lets try the above (although both are wobbly) -->
<!-- * [http://bio.cc/RSDB RSDB] broken link -->
Line 66:
| date=Jun 1998 |volume=14
|issue=5
|pages=423–9
|title=Removing near-neighbour redundancy from large protein sequence collections.
|author=Holm L1, Sander C.
Line 77:
|date= Nov 2016 |volume=45
|issue=D1 |pages= D170–D176
|doi= 10.1093/nar/gkw1081|pmid=27899574 |pmc=5614098 }}</ref>
* Virus Orthologous Clusters:<ref>{{cite web|url=http://athena.bioc.uvic.ca/tools/VOCS|title=VOCS - Viral Bioinformatics Resource Center|work=uvic.ca}}</ref> A viral protein sequence clustering database; contains all predicted genes from eleven virus families organized into ortholog groups by BLASTP similarity
|