Substitution matrix: Difference between revisions

Content deleted Content added
Line 49:
 
=== BLOSUM ===
Dayhoff's methodology of comparing closely related species turned out not to work very well for aligning evolutionarily divergent sequences. Sequence changes over long evolutionary time scales are not well approximated by compounding small changes that occur over short time scales. The [[BLOSUM]] ''(BLOck SUbstitution Matrix)'' series of matrices rectifies this problem. [[Steven Henikoff|Henikoff]] & Henikoff constructed these matrices using multiple alignments of evolutionarily divergent proteins. The probabilities used in the matrix calculation are computed by looking at "blocks" of conserved sequences found in multiple protein alignments. These conserved sequences are assumed to be of functional importance within related proteins and will therefore have lower substitution rates than less conserved regions. To reduce bias from closely related sequences on substitution rates, segments in a block with a sequence identity above a certain threshold were clustered, reducing the weight of each such cluster (Henikoff and Henikoff). For the BLOSUM62 matrix, this threshold was set at 62%. Pairs frequencies were then counted between clusters, hence pairs were only counted between segments less than 62% identical. One would use a higher numbered BLOSUM matrix for aligning two closely related sequences and a lower number for more divergent sequences.
 
It turns out that the BLOSUM62 matrix does an excellent job detecting similarities in distant sequences, and this is the matrix used by default in most recent alignment applications such as [[BLAST]].