Content deleted Content added
Artoria2e5 (talk | contribs) →Newer matrices: Add a bunch |
|||
Line 66:
=== Newer matrices ===
A number of newer substitution matrices have been proposed to deal with inadequacies in earlier designs.
* MD (1992), an updated version of PAM using a larger dataset.<ref name="pmid32954566">{{cite journal |last1=Trivedi |first1=R |last2=Nagarajaram |first2=HA |title=Substitution scoring matrices for proteins - An overview. |journal=Protein science : a publication of the Protein Society |date=November 2020 |volume=29 |issue=11 |pages=2150-2163 |doi=10.1002/pro.3954 |pmid=32954566 |pmc=7586916}}</ref>
* JTT (1994), published in the same year as BLOSOM, also performs clustering and uses an implicit model. This may help reduce the systematic error from maximum parismony (MP), but also wastes sequence information.<ref name="WAG original paper"/>
* WAG (Wheelan And Goldman), published in 2001, uses a [[maximum likelihood]] estimating procedure instead of any form of MP. The substitution scores are calculated based on the likelihood of a change considering multiple tree topologies derived using [[neighbor-joining]]. The scores correspond to an [[substitution model]] which includes also amino-acid stationary frequencies and a scaling factor in the similarity scoring. There are two versions of the matrix: WAG matrix based on the assumption of the same amino-acid stationary frequencies across all the compared protein and WAG* matrix with different frequencies for each of included [[protein family|protein families]].<ref name="WAG original paper">{{cite journal |last1=Whelan |first1=Simon |last2=Goldman |first2=Nick |title=A General Empirical Model of Protein Evolution Derived from Multiple Protein Families Using a Maximum-Likelihood Approach |journal=Molecular Biology and Evolution |date=1 May 2001 |volume=18 |issue=5 |pages=691–699 |doi=10.1093/oxfordjournals.molbev.a003851 |pmid=11319253 |issn=0737-4038|doi-access=free }}</ref>▼
* VTML (2001), a PAM-like matrix based on the alignments in the SYSTERS database, iteratively improved using a maximum likelihood estimator starting from the 1970s Dayhoff PAM model.<ref name="pmid32954566"/>
▲* WAG (Wheelan And Goldman
* LG (2008), which uses a larger dataset (Pfam-based) than WAG.<ref>{{cite journal |last1=Le |first1=S. Q. |last2=Gascuel |first2=O. |title=An Improved General Amino Acid Replacement Matrix |journal=Molecular Biology and Evolution |date=3 April 2008 |volume=25 |issue=7 |pages=1307–1320 |doi=10.1093/molbev/msn067}}</ref>
* Matrices using a selection of proteins based on structual relatedness, as proposed by Benner et al. (1994), Fan (2004), and Steven et al. (2004).<ref name="pmid32954566"/>
* Matrices using structual alignments of proteins instead of simple sequence alignment (6 separate publications).<ref name="pmid32954566"/>
* Matrices using known physiochemical parameters of amino acid residues (5 separate publications).<ref name="pmid32954566"/>
Since the 2000s, an increasing amount of matrices are defined for subsets of proteins not optimally aligned by traditional "general-purpose" matrices. These include:<ref name="pmid32954566"/>
* PfSSM (2008), CBM and CCF (2008) for ''Plasmodium'' proteins, which have a different amino acid evolutionary bias due to the low [[GC content]] of the genome.
* Matrices for transmembrane proteins. JTT transmembrane (1994) is the first of the class. Later work include:
** For alpha-helical transmembrane proteins, PHAT (2000) and SLIM (2001).
** For beta-barrel transmembrane proteins, bbTM (2008).
* Matrices for a specific protein family, including GPCRtm (2015) for the transmembrane (mostly helical) regions of [[GPCR]]s.
* Matrices for proteins with a specific role, including Hubsm (2017) for "hub proteins" in protein‐protein interaction networks.
* Matrices for [[intrinsically disordered protein]]s, including DUNMat (2002), MidicMat (2009), Disorder (2010), and EDSSMat (2019).
For a list of more models (including irreversible i.e. asymmetric ones), see the documentation for recent bioinformatic software including IQ-Tree,<ref>{{cite web |title=Substitution Models |url=https://iqtree.github.io/doc/Substitution-Models |website=iqtree.github.io |language=en}}</ref> PhyML,<ref>{{cite web |title=phyml/doc/phyml-manual.pdf at master · stephaneguindon/phyml |url=https://github.com/stephaneguindon/phyml/blob/master/doc/phyml-manual.pdf |website=GitHub |language=en}}</ref> and RAxML.<ref>{{cite web |last1=Stamatakis |first1=Alexandros |title=The RAxML v8.2.X Manual |url=https://cme.h-its.org/exelixis/resource/download/NewManual.pdf#page=31 |date=July 20, 2016}}</ref>
== Specialized substitution matrices and their extensions ==
|