Sequence clustering: Difference between revisions

Content deleted Content added
Yensaa (talk | contribs)
m See also: Added Social sequence analysis
Citation bot (talk | contribs)
Add: website. | Use this bot. Report bugs. | Suggested by BrownHairedGirl | Linked from User:BrownHairedGirl/Articles_with_bare_links | #UCB_webform_linked 1723/2189
Line 10:
* CD-HIT<ref name=cdhit/>
* [[UCLUST]] in USEARCH<ref name=usearch/>
* Starcode:<ref>{{cite web|url=https://github.com/gui11aume/starcode|title=Starcode repository|website=[[GitHub]]|date=2018-10-11}}</ref> a fast sequence clustering algorithm based on exact all-pairs search.<ref name="pmid25638815">{{cite journal | vauthors = Zorita E, Cuscó P, Filion GJ | title = Starcode: sequence clustering based on all-pairs search | journal = Bioinformatics | volume = 31 | issue = 12 | pages = 1913–9 | date = June 2015 | pmid = 25638815 | pmc = 4765884 | doi = 10.1093/bioinformatics/btv053 }}</ref>
* OrthoFinder:<ref>{{cite web|url=http://www.stevekellylab.com/software/orthofinder|title=OrthoFinder|work=Steve Kelly Lab}}</ref> a fast, scalable and accurate method for clustering proteins into gene families (orthogroups)<ref name="pmid26243257">{{cite journal | vauthors = Emms DM, Kelly S | title = OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy | journal = Genome Biology | volume = 16 | pages = 157 | date = August 2015 | pmid = 26243257 | pmc = 4531804 | doi = 10.1186/s13059-015-0721-2 }}</ref><ref name="pmid31727128">{{cite journal | vauthors = Emms DM, Kelly S | title = OrthoFinder: phylogenetic orthology inference for comparative genomics | journal = Genome Biology | volume = 20 | issue = 1 | pages = 238 | date = November 2019 | pmid = 31727128 | pmc = 6857279 | doi = 10.1186/s13059-019-1832-y }}</ref>
* Linclust:<ref name="pmid29959318">{{cite journal | vauthors = Steinegger M, Söding J | title = Clustering huge protein sequence sets in linear time | journal = Nature Communications | volume = 9 | issue = 1 | pages = 2542 | date = June 2018 | pmid = 29959318 | pmc = 6026198 | doi = 10.1038/s41467-018-04964-5 | bibcode = 2018NatCo...9.2542S }}</ref> first algorithm whose runtime scales linearly with input set size, very fast, part of [http://mmseqs.org/ MMseqs2]<ref name="pmid29035372">{{cite journal | vauthors = Steinegger M, Söding J | title = MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets | journal = Nature Biotechnology | volume = 35 | issue = 11 | pages = 1026–1028 | date = November 2017 | pmid = 29035372 | doi = 10.1038/nbt.3988 | hdl = 11858/00-001M-0000-002E-1967-3 | s2cid = 402352 | hdl-access = free }}</ref> software suite for fast, sensitive sequence searching and clustering of large sequence sets