Content deleted Content added
<ref></ref> |
m Filled in 12 bare reference(s) with reFill () |
||
Line 2:
For proteins, [[Homology (biology)|homologous]] sequences are typically grouped into [[protein family|families]]. For EST data, clustering is important to group sequences originating from the same [[gene]] before the ESTs are [[sequence assembly|assembled]] to reconstruct the original [[mRNA]].
Some clustering algorithms use [[single-linkage clustering]], constructing a [[transitive closure]] of sequences with a [[sequence similarity|similarity]] over a particular threshold. UCLUST<ref name=usearch>
Sequence clusters are often synonymous with (but not identical to) [[protein family|protein families]]. Determining a representative [[tertiary structure]] for each sequence cluster is the aim of many [[structural genomics]] initiatives.
== Sequence clustering algorithms and packages ==
* OrthoFinder:<ref>{{cite web|url=http://www.stevekellylab.com/software/orthofinder|title=OrthoFinder|work=Steve Kelly Lab}}</ref> a fast, scalable and accurate method for clustering proteins into gene families (orthogroups)<ref>{{cite journal
|title=OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy.
|author=Emms DM, Kelly S.
Line 36:
|doi=10.1093/nar/30.7.1575}}</ref>
* BAG: a graph theoretic sequence clustering algorithm<ref>http://bio.informatics.indiana.edu/sunkim/BAG/</ref>
* JESAM:<ref>{{cite web|url=http://www.littlest.co.uk/software/bioinf/old_packages/jesam/jesam_paper.html|title=Bioinformatics Paper: JESAM: CORBA software components for EST alignments and clusters|work=littlest.co.uk}}</ref> Open source parallel scalable DNA alignment engine with optional clustering software component
* UICluster:<ref>http://ratest.eng.uiowa.edu/pubsoft/clustering/</ref> Parallel Clustering of EST (Gene) Sequences
* BLASTClust single-linkage clustering with BLAST<ref>{{cite web|url=http://www.ncbi.nlm.nih.gov/Web/Newsltr/Spring04/blastlab.html|title=NCBI News: Spring 2004-BLASTLab|work=nih.gov}}</ref>
* (Multi)netclust:<ref>{{cite web|url=http://www.bioinformatics.nl/netclust/|title=WUR Multi-netclust web server|work=bioinformatics.nl}}</ref> fast and memory-efficient detection of connected clusters in (multi-parametric) data networks<ref>{{cite journal
|title=Multi-netclust: an efficient tool for finding connected clusters in multi-parametric networks
|author=Kuzniar, A., Dhir, S., Nijveen, H., Pongor, S. and Leunissen, J. A. M.
Line 49:
|pmid=20679333
|doi=10.1093/bioinformatics/btq435}}</ref>
* Clusterer:<ref>{{cite web|url=http://bugaco.com/bioinf/clusterer/|title=Clusterer: extendable java application for sequence grouping and cluster analyses|work=bugaco.com}}</ref> extendable java application for sequence grouping and cluster analyses
* PATDB: a program for rapidly identifying perfect substrings
* nrdb:<ref>http://web.archive.org/web/20080101032917/http://blast.wustl.edu/pub/nrdb/</ref> a program for merging trivially redundant (identical) sequences
* CluSTr:<ref>http://www.ebi.ac.uk/clustr/</ref> A single-linkage protein sequence clustering database from Smith-Waterman sequence similarities; covers over 7 mln sequences including UniProt and IPI
* ICAtools<ref>{{cite web|url=http://www.littlest.co.uk/software/bioinf/old_packages/icatools/|title=Introduction to the ICAtools|work=littlest.co.uk}}</ref> - original (ancient) DNA clustering package with many algorithms useful for artifact discovery or EST clustering
* Virus Orthologous Clusters:<ref>{{cite web|url=http://athena.bioc.uvic.ca/tools/VOCS|title=VOCS - Viral Bioinformatics Resource Center|work=uvic.ca}}</ref> A viral protein sequence clustering database; contains all predicted genes from eleven virus families organized into ortholog groups by BLASTP similarity
* Skipredudant EMBOSS tool<ref>{{cite web|url=http://bioweb2.pasteur.fr/docs/EMBOSS/skipredundant.html|title=EMBOSS: skipredundant|work=pasteur.fr}}</ref> to remove redundant sequences from a set
<!-- Lets try the above (although both are wobbly) -->
<!-- * [http://bio.cc/RSDB RSDB] broken link -->
== Non-redundant sequence databases ==
* PISCES: A Protein Sequence Culling Server<ref>{{cite web|url=http://dunbrack.fccc.edu/pisces/|title=Dunbrack Lab|work=fccc.edu}}</ref>
* RDB90<ref name=rdb90/>
* UniRef: A non-redundant [[UniProt]] sequence database<ref>
==See also==
|