Content deleted Content added
Citation bot (talk | contribs) m Alter: doi, pmc, doi-broken-date. | You can use this bot yourself. Report bugs here.| Activated by User:Grimes2 | Category:CS1 errors: DOI | via #UCB_Category |
consistent citation formatting |
||
Line 10:
* CD-HIT<ref name=cdhit/>
* [[UCLUST]] in USEARCH<ref name=usearch/>
* Starcode:<ref>{{cite web|url=https://github.com/gui11aume/starcode|title=Starcode repository|date=2018-10-11}}</ref> a fast sequence clustering algorithm based on exact all-pairs search.<ref name="pmid25638815">{{cite journal | vauthors = Zorita E, Cuscó P, Filion GJ | title = Starcode: sequence clustering based on all-pairs search | journal = Bioinformatics (Oxford, England) | volume = 31 | issue = 12 | pages = 1913–9 | date = June 2015 | pmid = 25638815 | pmc = 4765884 | doi = 10.1093/bioinformatics/btv053 }}</ref>
* OrthoFinder:<ref>{{cite web|url=http://www.stevekellylab.com/software/orthofinder|title=OrthoFinder|work=Steve Kelly Lab}}</ref> a fast, scalable and accurate method for clustering proteins into gene families (orthogroups)<ref name="pmid26243257">{{cite journal | vauthors = Emms DM, Kelly S | title = OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy | journal = Genome Biology | volume = 16 | issue = | pages = 157 | date = August 2015 | pmid = 26243257 | pmc = 4531804 | doi = 10.1186/s13059-015-0721-2 }}</ref><ref name="pmid31727128">{{cite journal | vauthors = Emms DM, Kelly S | title = OrthoFinder: phylogenetic orthology inference for comparative genomics | journal = Genome Biology | volume = 20 | issue = 1 | pages = 238 | date = November 2019 | pmid = 31727128 | pmc = 6857279 | doi = 10.1186/s13059-019-1832-y }}</ref>
* Linclust:<ref name="pmid29959318">{{cite journal | vauthors = Steinegger M, Söding J | title = Clustering huge protein sequence sets in linear time | journal = Nature Communications | volume = 9 | issue = 1 | pages = 2542 | date = June 2018 | pmid = 29959318 | pmc = 6026198 | doi = 10.1038/s41467-018-04964-5 }}</ref> first algorithm whose runtime scales linearly with input set size, very fast, part of [http://mmseqs.org/ MMseqs2]<ref name="pmid29035372">{{cite journal | vauthors = Steinegger M, Söding J | title = MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets | journal = Nature Biotechnology | volume = 35 | issue = 11 | pages = 1026–1028 | date = November 2017 | pmid = 29035372 | doi = 10.1038/nbt.3988 }}</ref> software suite for fast, sensitive sequence searching and clustering of large sequence sets
* TribeMCL: a method for clustering proteins into related groups<ref name="pmid11917018">{{cite journal | vauthors = Enright AJ, Van Dongen S, Ouzounis CA | title = An efficient algorithm for large-scale detection of protein families | journal = Nucleic Acids Research | volume = 30 | issue = 7 | pages = 1575–84 | date = April 2002 | pmid = 11917018 | pmc = 101833 | doi = 10.1093/nar/30.7.1575 }}</ref>
* BAG: a graph theoretic sequence clustering algorithm<ref>{{cite web |url=http://bio.informatics.indiana.edu/sunkim/BAG/ |title=Archived copy |accessdate=2004-02-19 |url-status=dead |archiveurl=https://web.archive.org/web/20031206172749/http://bio.informatics.indiana.edu/sunkim/BAG/ |archivedate=2003-12-06 }}</ref>
* JESAM:<ref>{{cite web|url=http://www.littlest.co.uk/software/bioinf/old_packages/jesam/jesam_paper.html|title=Bioinformatics Paper: JESAM: CORBA software components for EST alignments and clusters|work=littlest.co.uk}}</ref> Open source parallel scalable DNA alignment engine with optional clustering software component
Line 61 ⟶ 24:
* ICAtools<ref>{{cite web|url=http://www.littlest.co.uk/software/bioinf/old_packages/icatools/|title=Introduction to the ICAtools|work=littlest.co.uk}}</ref> - original (ancient) DNA clustering package with many algorithms useful for artifact discovery or EST clustering
* Skipredudant EMBOSS tool<ref>{{cite web|url=http://bioweb2.pasteur.fr/docs/EMBOSS/skipredundant.html|title=EMBOSS: skipredundant|work=pasteur.fr}}</ref> to remove redundant sequences from a set
* CLUSS Algorithm<ref name="pmid17683581">{{cite journal |
* CLUSS2 Algorithm<ref name="pmid20058485">{{cite journal |
<!-- Lets try the above (although both are wobbly) -->
<!-- * [http://bio.cc/RSDB RSDB] broken link -->
Line 68 ⟶ 31:
== Non-redundant sequence databases ==
* PISCES: A Protein Sequence Culling Server<ref>{{cite web|url=http://dunbrack.fccc.edu/pisces/|title=Dunbrack Lab|work=fccc.edu}}</ref>
* RDB90<ref name=rdb90>{{cite journal | vauthors = Holm L, Sander C | title = Removing near-neighbour redundancy from large protein sequence collections | journal = Bioinformatics (Oxford, England) | volume = 14 | issue = 5 | pages = 423–9 | date = June 1998 | pmid = 9682055 | doi = 10.1093/bioinformatics/14.5.423 }}</ref>
* UniRef: A non-redundant [[UniProt]] sequence database<ref>{{cite web|url=https://www.uniprot.org/database/DBDescription.shtml#uniref|title=About UniProt|work=uniprot.org}}</ref>
* Uniclust: A clustered UniProtKB sequences at the level of 90%, 50% and 30% pairwise sequence identity.<ref name="pmid27899574">{{cite journal | vauthors = Mirdita M, von den Driesch L, Galiez C, Martin MJ, Söding J, Steinegger M | title = Uniclust databases of clustered and deeply annotated protein sequences and alignments | journal = Nucleic Acids Research | volume = 45 | issue = D1 | pages = D170–D176 | date = January 2017 | pmid = 27899574 | pmc = 5614098 | doi = 10.1093/nar/gkw1081 }}</ref>
* Virus Orthologous Clusters:<ref>{{cite web|url=http://athena.bioc.uvic.ca/tools/VOCS|title=VOCS - Viral Bioinformatics Resource Center|work=uvic.ca}}</ref> A viral protein sequence clustering database; contains all predicted genes from eleven virus families organized into ortholog groups by BLASTP similarity
|