Sequence clustering: Difference between revisions

Content deleted Content added
WatsonCN (talk | contribs)
Sequence clustering packages: added the Viral Orthologous Clusters at VBRC
No edit summary
Line 2:
For proteins, [[Homology (biology)|homologous]] sequences are typically grouped into [[protein family|families]]. For EST data, clustering is important to group sequences originating from the same [[gene]] before the ESTs are [[sequence assembly|assembled]] to reconstruct the original [[mRNA]].
 
Generally, the clustering algorithms are [[single -linkage clustering]], constructing a [[transitive closure]] of sequences with a similarity over a particular threshold. The similarity score is often based on [[sequence alignment]].
Sequence clustering is often used to make a [[Non redundant sequence|non-redundant]] set of [[representative sequences]].
 
Line 15:
* [http://skyrah.bio.cc/RSDB/ RSDB: Representative Sequences DataBase project]
* [http://ratest.eng.uiowa.edu/pubsoft/clustering/ UICluster: Parallel Clustering of EST (Gene) Sequences]
* [http://www.ncbi.nlm.nih.gov/Web/Newsltr/Spring04/blastlab.html BLASTClust single -linkage clustering with BLAST]
* [http://web.mit.edu/polz/clusterer Clusterer: extendable java application for sequence grouping and cluster analyses]
* [http://blast.wustl.edu/blast/README.html#Manifest PATDB: a program for rapidly identifying perfect substrings]