Revision as of 20:22, 4 July 2008 edit WatsonCN (talk \| contribs) 22 edits →Sequence clustering packages: added the Viral Orthologous Clusters at VBRC ← Previous edit		Revision as of 13:53, 1 August 2008 edit undo Michael Hardy (talk \| contribs) Administrators 210,596 edits No edit summary Next edit →
Line 2: For proteins, [[Homology (biology)\|homologous]] sequences are typically grouped into [[protein family\|families]]. For EST data, clustering is important to group sequences originating from the same [[gene]] before the ESTs are [[sequence assembly\|assembled]] to reconstruct the original [[mRNA]]. Generally, the clustering algorithms are [[single -linkage clustering]], constructing a [[transitive closure]] of sequences with a similarity over a particular threshold. The similarity score is often based on [[sequence alignment]]. Sequence clustering is often used to make a [[Non redundant sequence\|non-redundant]] set of [[representative sequences]]. Line 15: * [http://skyrah.bio.cc/RSDB/ RSDB: Representative Sequences DataBase project] * [http://ratest.eng.uiowa.edu/pubsoft/clustering/ UICluster: Parallel Clustering of EST (Gene) Sequences] * [http://www.ncbi.nlm.nih.gov/Web/Newsltr/Spring04/blastlab.html BLASTClust single -linkage clustering with BLAST] * [http://web.mit.edu/polz/clusterer Clusterer: extendable java application for sequence grouping and cluster analyses] * [http://blast.wustl.edu/blast/README.html#Manifest PATDB: a program for rapidly identifying perfect substrings]

Sequence clustering: Difference between revisions