Substitution matrix: Difference between revisions

Content deleted Content added
Citation bot (talk | contribs)
Alter: volume. Add: doi-access, issue. Removed proxy/dead URL that duplicated identifier. | Use this bot. Report bugs. | #UCB_CommandLine
m Disambiguating links to Blast (link changed to BLAST (biotechnology); link changed to BLAST (biotechnology)) using DisamAssist.
Line 53:
Dayhoff's methodology of comparing closely related species turned out not to work very well for aligning evolutionarily divergent sequences. Sequence changes over long evolutionary time scales are not well approximated by compounding small changes that occur over short time scales. The [[BLOSUM]] ''(BLOck SUbstitution Matrix)'' series of matrices rectifies this problem. [[Steven Henikoff|Henikoff]] & Henikoff constructed these matrices using multiple alignments of evolutionarily divergent proteins. The probabilities used in the matrix calculation are computed by looking at "blocks" of conserved sequences found in multiple protein alignments. These conserved sequences are assumed to be of functional importance within related proteins and will therefore have lower substitution rates than less conserved regions. To reduce bias from closely related sequences on substitution rates, segments in a block with a sequence identity above a certain threshold were clustered, reducing the weight of each such cluster (Henikoff and Henikoff). For the BLOSUM62 matrix, this threshold was set at 62%. Pairs frequencies were then counted between clusters, hence pairs were only counted between segments less than 62% identical. One would use a higher numbered BLOSUM matrix for aligning two closely related sequences and a lower number for more divergent sequences.
 
It turns out that the BLOSUM62 matrix does an excellent job detecting similarities in distant sequences, and this is the matrix used by default in most recent alignment applications such as [[BLAST (biotechnology)|BLAST]].
 
=== Differences between PAM and BLOSUM ===
Line 67:
 
== Specialized substitution matrices and their extensions ==
Many specialized substitution matrices have been developed that describe the amino acid substitution rates in specific structural or sequence contexts, such as in transmembrane alpha helices,<ref>{{cite journal |pmid=11473008 |year=2001 |last1=Müller |first1=T |last2=Rahmann |last3=Rehmsmeier |title=Non-symmetric score matrices and the detection of homologous transmembrane proteins |volume=17 |pages=S182–9 |journal=Bioinformatics |first2=S |first3=M |issue=Suppl 1 |doi=10.1093/bioinformatics/17.suppl_1.s182|doi-access=free }}</ref> for combinations of secondary structure states and solvent accessibility states,<ref>{{cite journal |pmid=9135128 |year=1997 |last1=Rice |first1=DW |last2=Eisenberg |title=A 3D-1D substitution matrix for protein fold recognition that includes predicted secondary structure of the sequence |volume=267 |issue=4 |pages=1026–38 |doi=10.1006/jmbi.1997.0924 |journal=Journal of Molecular Biology |first2=D|citeseerx=10.1.1.44.1143 }}</ref><ref>{{cite journal |pmid=18833291 |year=2008 |last1=Gong |first1=Sungsam |last2=Blundell |first2=Tom L. |title=Discarding functional residues from the substitution table improves predictions of active sites within three-dimensional structures |volume=4 |issue=10 |pages=e1000179 |doi=10.1371/journal.pcbi.1000179 |journal=PLOS Computational Biology |pmc=2527532 |bibcode=2008PLSCB...4E0179G |editor1-last=Levitt |editor1-first=Michael |doi-access=free }}</ref><ref>{{cite journal |pmid=18004781 |year=2008 |last1=Goonesekere |first1=NC |last2=Lee |title=Context-specific amino acid substitution matrices and their use in the detection of protein homologs |volume=71 |issue=2 |pages=910–9 |doi=10.1002/prot.21775 |journal=Proteins |first2=B|s2cid=27443393 }}</ref> or for local sequence-structure contexts.<ref>{{cite journal |pmid=16352653 |year=2006 |last1=Huang |first1=YM |last2=Bystroff |title=Improved pairwise alignments of proteins in the Twilight Zone using local structure predictions |volume=22 |issue=4 |pages=413–22 |doi=10.1093/bioinformatics/bti828 |journal=Bioinformatics |first2=C|doi-access=free }}</ref> These context-specific substitution matrices lead to generally improved alignment quality at some cost of speed but are not yet widely used. Recently, sequence context-specific amino acid similarities have been derived that do not need substitution matrices but that rely on a library of sequence contexts instead. Using this idea, a context-specific extension of the popular [[BLAST (biotechnology)|BLAST]] program has been demonstrated to achieve a twofold sensitivity improvement for remotely related sequences over BLAST at similar speeds ([[CS-BLAST]]).
 
== Terminology ==