Sequence clustering: Difference between revisions

Content deleted Content added
No edit summary
No edit summary
Line 2:
sequences that are somehow related. The sequences can be either of genomic, "transcriptomic" ([[EST (biology)|ESTs]]) or [[protein]] origin.
 
For proteins, one [[Homology (biology)|homologous]] sequences are typically grouped into [[protein family|families]]. For EST data, clustering is important to group sequences originating from the same [[gene]] before the ESTs are assembled to reconstruct the original [[mRNA]].
 
Generally, the clustering algorithms are single linkage clustering, constructing a [[transitive closure]] of sequences with a similarity over a particular threshold. The similarity score is often based on [[sequence alignment]].