Substitution matrix: Difference between revisions

Content deleted Content added
Citation bot (talk | contribs)
Alter: journal, pages. Add: pmc, pmid. Formatted dashes. | Use this bot. Report bugs. | Suggested by Headbomb | #UCB_toolbar
m Reverted edit by 94.205.38.214 (talk) to last version by Chris the speller
Tags: Rollback Mobile edit Mobile web edit
 
(3 intermediate revisions by 3 users not shown)
Line 58:
It turns out that the BLOSUM62 matrix does an excellent job detecting similarities in distant sequences, and this is the matrix used by default in most recent alignment applications such as [[BLAST (biotechnology)|BLAST]].
 
It also turns out the BLOSUM computer code written by Henikoff and Henikoff does not exactly match the description in their paper. Surprisingly, this commonly- used "wrong" version has better search performance.<ref name=article>{{cite journal |last1=Styczynski |first1=Mark P |last2=Jensen |first2=Kyle L |last3=Rigoutsos |first3=Isidore |last4=Stephanopoulos |first4=Gregory |title=BLOSUM62 miscalculations improve search performance |journal=Nature Biotechnology |date=March 2008 |volume=26 |issue=3 |pages=274–275 |doi=10.1038/nbt0308-274 | pmid=18327232 |s2cid=205266180 }}</ref>
 
=== Differences between PAM and BLOSUM ===
Line 73:
* PMB (Probability Matrix from Blocks, 2003), a set of "true" substitution frequencies estimated from the observed frequencies of BLOSUM, taking into account the possibility of a later substitution masking a previous one. It thus creates a evolutionary model where the distances have theoretical meaning (BLOSUM does not have this feature, unlike PAM, WAG, and most other later matrices, and hence is ''not'' recommended for phylogeny by IQ-TREE).<ref>{{cite journal |last1=Veerassamy |first1=Shalini |last2=Smith |first2=Andrew |last3=Tillier |first3=Elisabeth R. M. |title=A Transition Probability Model for Amino Acid Substitutions from Blocks |journal=Journal of Computational Biology |date=December 2003 |volume=10 |issue=6 |pages=997–1010 |doi=10.1089/106652703322756195|pmid=14980022 }}</ref>
* LG (2008), which uses a larger dataset (Pfam-based) than WAG. An extension of the WAG algorithm is used, with a new PhyML (WAG+&Gamma;4) model taking into account of sites with different evolutionary rates.<ref>{{cite journal |last1=Le |first1=S. Q. |last2=Gascuel |first2=O. |title=An Improved General Amino Acid Replacement Matrix |journal=Molecular Biology and Evolution |date=3 April 2008 |volume=25 |issue=7 |pages=1307–1320 |doi=10.1093/molbev/msn067|pmid=18367465 }}</ref>
* Qmaker and nQmaker (2021, 2022), programs with the ability to estimate time-reversible and nonreversible matrices from very large datasets quickly. Each provide a general matrix and 5 specialized matrices, for a total of 12 precalculated substitution matrices.<ref>{{cite journal |last1=Minh |first1=Bui Quang |last2=Dang |first2=Cuong Cao |last3=Vinh |first3=Le Sy |last4=Lanfear |first4=Robert |title=QMaker: Fast and Accurate Method to Estimate Empirical Models of Protein Evolution |journal=Systematic Biology |date=11 August 2021 |volume=70 |issue=5 |pages=1046–1060 |doi=10.1093/sysbio/syab010|pmid=33616668 |pmc=8357343 }}</ref><ref>{{cite journal |last1=Dang |first1=Cuong Cao |last2=Minh |first2=Bui Quang |last3=McShea |first3=Hanon |last4=Masel |first4=Joanna |last5=James |first5=Jennifer Eleanor |last6=Vinh |first6=Le Sy |last7=Lanfear |first7=Robert |title=nQMaker: Estimating Time Nonreversible Amino Acid Substitution Models |journal=Systematic Biology |date=10 August 2022 |volume=71 |issue=5 |pages=1110–1123 |doi=10.1093/sysbio/syac007|pmid=35139203 |pmc=9366462 }}</ref>
* Matrices using a selection of proteins based on structual relatedness, as proposed by Benner et al. (1994), Fan (2004), and Steven et al. (2004).<ref name="pmid32954566"/>
* Matrices using structual alignments of proteins instead of simple sequence alignment (6 separate publications).<ref name="pmid32954566"/>