Substitution matrix: Difference between revisions

Content deleted Content added
ce
Line 69:
A number of newer substitution matrices have been proposed to deal with inadequacies in earlier designs.
* JTT (1992). Published in the same year as BLOSOM, it also performs clustering and uses an implicit model. This may help reduce the systematic error from maximum parismony (MP), but also wastes sequence information.<ref name="WAG original paper"/>
* VTML (2001), a PAM-like matrix based on the alignments in the SYSTERS database, iteratively improved using a maximum likelihood estimator starting from the 1970s Dayhoff PAM model.<ref name="pmid32954566">{{cite journal |last1=Trivedi |first1=R |last2=Nagarajaram |first2=HA |title=Substitution scoring matrices for proteins - An overview. |journal=Protein science : a publication of the Protein Society |date=November 2020 |volume=29 |issue=11 |pages=2150-2163 |doi=10.1002/pro.3954 |pmid=32954566 |pmc=7586916}}</ref>
* WAG (Wheelan And Goldman, 2001) uses a [[maximum likelihood]] estimating procedure instead of any form of MP over a "BRKALN" dataset. The substitution scores are calculated based on the likelihood of a change considering multiple tree topologies derived using [[neighbor-joining]]. The scores correspond to an [[substitution model]] which includes also amino-acid stationary frequencies and a scaling factor in the similarity scoring. There are two versions of the matrix: WAG matrix based on the assumption of the same amino-acid stationary frequencies across all the compared protein and WAG* matrix with different frequencies for each of included [[protein family|protein families]].<ref name="WAG original paper">{{cite journal |last1=Whelan |first1=Simon |last2=Goldman |first2=Nick |title=A General Empirical Model of Protein Evolution Derived from Multiple Protein Families Using a Maximum-Likelihood Approach |journal=Molecular Biology and Evolution |date=1 May 2001 |volume=18 |issue=5 |pages=691–699 |doi=10.1093/oxfordjournals.molbev.a003851 |pmid=11319253 |issn=0737-4038|doi-access=free }}</ref>
* PMB (Probability Matrix from Blocks, 2003), a set of "true" substitution frequencies estimated from the observed frequencies of BLOSUM, taking into account the possibility of a later substitution masking a previous one. It thus creates a evolutionary model where the distances have theoretical meaning (BLOSUM does not have this feature, unlike PAM, WAG, and most other later matrices, and hence is ''not'' recommended for phylogeny by IQ-TREE).<ref>{{cite journal |last1=Veerassamy |first1=Shalini |last2=Smith |first2=Andrew |last3=Tillier |first3=Elisabeth R. M. |title=A Transition Probability Model for Amino Acid Substitutions from Blocks |journal=Journal of Computational Biology |date=December 2003 |volume=10 |issue=6 |pages=997–1010 |doi=10.1089/106652703322756195}}</ref>