Revision as of 01:11, 1 December 2020 edit Citation bot (talk \| contribs) Bots 5,864,863 edits Add: bibcode. \| You can use this bot yourself. Report bugs here. \| Suggested by Abductive \| via #UCB_webform 311/497 ← Previous edit		Revision as of 05:38, 12 December 2020 edit undo Monkbot (talk \| contribs) Bots 3,695,952 edits m Task 18 (cosmetic): eval 26 templates: del empty params (2×); hyphenate params (6×); Tag: AWB Next edit →
Line 11: * [[UCLUST]] in USEARCH<ref name=usearch/> * Starcode:<ref>{{cite web\|url=https://github.com/gui11aume/starcode\|title=Starcode repository\|date=2018-10-11}}</ref> a fast sequence clustering algorithm based on exact all-pairs search.<ref name="pmid25638815">{{cite journal \| vauthors = Zorita E, Cuscó P, Filion GJ \| title = Starcode: sequence clustering based on all-pairs search \| journal = Bioinformatics (Oxford, England) \| volume = 31 \| issue = 12 \| pages = 1913–9 \| date = June 2015 \| pmid = 25638815 \| pmc = 4765884 \| doi = 10.1093/bioinformatics/btv053 }}</ref> * OrthoFinder:<ref>{{cite web\|url=http://www.stevekellylab.com/software/orthofinder\|title=OrthoFinder\|work=Steve Kelly Lab}}</ref> a fast, scalable and accurate method for clustering proteins into gene families (orthogroups)<ref name="pmid26243257">{{cite journal \| vauthors = Emms DM, Kelly S \| title = OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy \| journal = Genome Biology \| volume = 16 ~~\| issue =~~ \| pages = 157 \| date = August 2015 \| pmid = 26243257 \| pmc = 4531804 \| doi = 10.1186/s13059-015-0721-2 }}</ref><ref name="pmid31727128">{{cite journal \| vauthors = Emms DM, Kelly S \| title = OrthoFinder: phylogenetic orthology inference for comparative genomics \| journal = Genome Biology \| volume = 20 \| issue = 1 \| pages = 238 \| date = November 2019 \| pmid = 31727128 \| pmc = 6857279 \| doi = 10.1186/s13059-019-1832-y }}</ref> * Linclust:<ref name="pmid29959318">{{cite journal \| vauthors = Steinegger M, Söding J \| title = Clustering huge protein sequence sets in linear time \| journal = Nature Communications \| volume = 9 \| issue = 1 \| pages = 2542 \| date = June 2018 \| pmid = 29959318 \| pmc = 6026198 \| doi = 10.1038/s41467-018-04964-5 \| bibcode = 2018NatCo...9.2542S }}</ref> first algorithm whose runtime scales linearly with input set size, very fast, part of [http://mmseqs.org/ MMseqs2]<ref name="pmid29035372">{{cite journal \| vauthors = Steinegger M, Söding J \| title = MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets \| journal = Nature Biotechnology \| volume = 35 \| issue = 11 \| pages = 1026–1028 \| date = November 2017 \| pmid = 29035372 \| doi = 10.1038/nbt.3988 \| hdl = 11858/00-001M-0000-002E-1967-3 \| s2cid = 402352 \| hdl-access = free }}</ref> software suite for fast, sensitive sequence searching and clustering of large sequence sets * TribeMCL: a method for clustering proteins into related groups<ref name="pmid11917018">{{cite journal \| vauthors = Enright AJ, Van Dongen S, Ouzounis CA \| title = An efficient algorithm for large-scale detection of protein families \| journal = Nucleic Acids Research \| volume = 30 \| issue = 7 \| pages = 1575–84 \| date = April 2002 \| pmid = 11917018 \| pmc = 101833 \| doi = 10.1093/nar/30.7.1575 }}</ref> * BAG: a graph theoretic sequence clustering algorithm<ref>{{cite web \|url=http://bio.informatics.indiana.edu/sunkim/BAG/ \|title=Archived copy \|~~accessdate~~access-date=2004-02-19 \|url-status=dead \|~~archiveurl~~archive-url=https://web.archive.org/web/20031206172749/http://bio.informatics.indiana.edu/sunkim/BAG/ \|~~archivedate~~archive-date=2003-12-06 }}</ref> * JESAM:<ref>{{cite web\|url=http://www.littlest.co.uk/software/bioinf/old_packages/jesam/jesam_paper.html\|title=Bioinformatics Paper: JESAM: CORBA software components for EST alignments and clusters\|work=littlest.co.uk}}</ref> Open source parallel scalable DNA alignment engine with optional clustering software component * UICluster:<ref>http://ratest.eng.uiowa.edu/pubsoft/clustering/</ref> Parallel Clustering of EST (Gene) Sequences Line 21: * PATDB: a program for rapidly identifying perfect substrings * nrdb:<ref>{{Cite web \| url=http://blast.wustl.edu/pub/nrdb/ \| title=Index of /pub/nrdb\| archive-url=https://web.archive.org/web/20080101032917/http://blast.wustl.edu/pub/nrdb/\| archive-date=2008-01-01}}</ref> a program for merging trivially redundant (identical) sequences * CluSTr:<ref>{{cite web \|url=http://www.ebi.ac.uk/clustr/ \|title=Archived copy \|~~accessdate~~access-date=2006-11-23 \|url-status=dead \|~~archiveurl~~archive-url=https://web.archive.org/web/20060924012903/http://www.ebi.ac.uk/clustr/ \|~~archivedate~~archive-date=2006-09-24 }}</ref> A single-linkage protein sequence clustering database from Smith-Waterman sequence similarities; covers over 7 mln sequences including UniProt and IPI * ICAtools<ref>{{cite web\|url=http://www.littlest.co.uk/software/bioinf/old_packages/icatools/\|title=Introduction to the ICAtools\|work=littlest.co.uk}}</ref> - original (ancient) DNA clustering package with many algorithms useful for artifact discovery or EST clustering * Skipredudant EMBOSS tool<ref>{{cite web\|url=http://bioweb2.pasteur.fr/docs/EMBOSS/skipredundant.html\|title=EMBOSS: skipredundant\|work=pasteur.fr}}</ref> to remove redundant sequences from a set * CLUSS Algorithm<ref name="pmid17683581">{{cite journal \| vauthors = Kelil A, Wang S, Brzezinski R, Fleury A \| title = CLUSS: clustering of protein sequences based on a new similarity measure \| journal = BMC Bioinformatics \| volume = 8 ~~\| issue =~~ \| pages = 286 \| date = August 2007 \| pmid = 17683581 \| pmc = 1976428 \| doi = 10.1186/1471-2105-8-286 }}</ref> to identify groups of structurally, functionally, or evolutionarily related hard-to-align protein sequences. CLUSS webserver <ref name="prospectus.usherbrooke.ca">{{Cite web \| url=http://prospectus.usherbrooke.ca/CLUSS/ \| title=CLUSS Home Page}}</ref> * CLUSS2 Algorithm<ref name="pmid20058485">{{cite journal \| vauthors = Kelil A, Wang S, Brzezinski R \| title = CLUSS2: an alignment-independent algorithm for clustering protein families with multiple biological functions \| journal = International Journal of Computational Biology and Drug Design \| volume = 1 \| issue = 2 \| pages = 122–40 \| date = 2008 \| pmid = 20058485 \| doi = 10.1504/ijcbdd.2008.020190 }}</ref> for clustering families of hard-to-align protein sequences with multiple biological functions. CLUSS2 webserver <ref name="prospectus.usherbrooke.ca"/> <!-- Lets try the above (although both are wobbly) -->

Sequence clustering: Difference between revisions