Content deleted Content added
mNo edit summary |
m wikifications: casing in sect titles, consolidate paras (WP:1SP |
||
Line 1:
In [[bioinformatics]], '''sequence clustering''' [[algorithm]]s attempt to group
sequences that are somehow related. The sequences can be either of genomic, "transcriptomic" ([[expressed sequence tag|ESTs]]) or [[protein]] origin.
For proteins, [[Homology (biology)|homologous]] sequences are typically grouped into [[protein family|families]]. For EST data, clustering is important to group sequences originating from the same [[gene]] before the ESTs are [[sequence assembly|assembled]] to reconstruct the original [[mRNA]].
Generally, the clustering algorithms are [[single linkage clustering]], constructing a [[transitive closure]] of sequences with a similarity over a particular threshold. The similarity score is often based on [[sequence alignment]].
Sequence clustering is often used to make a [[Non redundant
▲Sequence clustering is often used to make a [[Non redundant sequences|non-redundant]] set of sequences.
== External links ==
=== Sequence
* [http://www.ebi.ac.uk/~holm/nrdb90 RDB90 and nrdb90.pl: a nonredundant sequence database]
* [http://www.ebi.ac.uk/research/cgg/tribe/ TribeMCL: a method for clustering proteins into related groups]
Line 17 ⟶ 15:
<!-- * [http://bio.cc/RSDB RSDB] broken link -->
=== Non-
* [http://www.fccc.edu/research/labs/dunbrack/pisces/ PISCES: A Protein Sequence Culling Server]
|