Content deleted Content added
Amkilpatrick (talk | contribs) Disambiguated: closed form → Closed-form expression; Unlinked: Continuous; Help needed: Potential function |
Citation bot (talk | contribs) Add: s2cid. | Use this bot. Report bugs. | Suggested by Abductive | Category:Computational chemistry | #UCB_Category 83/151 |
||
(13 intermediate revisions by 11 users not shown) | |||
Line 1:
{{No footnotes|date=June 2010}}
[[Graphical model]]s have become powerful frameworks for [[protein structure prediction]], [[protein–protein interaction]], and [[Thermodynamic free energy|free energy]] calculations for protein structures. Using a graphical model to represent the protein structure allows the solution of many problems including secondary structure prediction, protein
There are two main approaches to
==Discrete graphical models for protein structure==
Line 18:
:<math>p(X = x|\Theta) = p(X_b = x_b)p(X_s = x_s|X_b,\Theta), \,</math>
where <math>\Theta</math> represents any parameters used to describe this model, including sequence information, temperature etc. Frequently the backbone is assumed to be rigid with a known conformation, and the problem is then transformed to a side-chain placement problem. The structure of the graph is also encoded in <math>\Theta</math>. This structure shows which two variables are conditionally independent. As an example, side chain angles of two residues far apart can be independent given all other angles in the protein. To extract this structure, researchers use a distance threshold, and only a pair of residues which are within that threshold are considered connected (i.e. have an edge between them).
Given this representation, the probability of a particular side chain conformation ''x''<sub>''s''</sub> given the backbone conformation ''x''<sub>''b''</sub> can be expressed as
Line 24:
:<math>p(X_s = x_s|X_b = x_b) = \frac{1}{Z} \prod_{c\in C(G)}\Phi_c (x_s^c,x_b^c)</math>
where ''C''(''G'') is the set of all cliques in ''G'', <math>\Phi</math> is a [[function (mathematics)|potential function]]
To completely characterize the MRF, it is necessary to define the potential function <math>\Phi</math>. To simplify, the cliques of a graph are usually restricted to only the cliques of size 2, which means the potential function is only defined over pairs of variables. In [[Goblin System]],
:<math>\Phi(x_s^{i_p},x_b^{j_q}) = \exp ( -E(x_s^{i_p},x_b^{j_q})/K_BT)</math>
Line 32:
where <math>E(x_s^{i_p},x_b^{j_q})</math> is the energy of interaction between rotamer state p of residue <math>X_i^s</math> and rotamer state q of residue <math>X_j^s</math> and <math>k_B</math> is the [[Boltzmann constant]].
Using a PDB file, this model can be built over the protein structure. From this model, free energy can be calculated.
===Free energy calculation: belief propagation===
Line 46:
==Continuous graphical models for protein structures==
Graphical models can still be used when the variables of choice are continuous. In these cases, the probability distribution is represented as a [[multivariate probability distribution]] over continuous variables. Each family of distribution will then impose certain properties on the graphical model. [[Multivariate Gaussian distribution]] is one of the most convenient distributions in this problem. The simple form of the probability
===Gaussian graphical models of protein structures===
Line 55:
Where <math>Z = (2\pi)^{n/2}|\Sigma|^{1/2}</math> is the closed form for the [[Partition function (mathematics)|partition function]]. The parameters of this distribution are <math>\mu</math> and <math>\Sigma</math>. <math>\mu</math> is the vector of [[mean values]] of each variable, and <math>\Sigma^{-1}</math>, the inverse of the [[covariance matrix]], also known as the [[precision matrix]]. Precision matrix contains the pairwise dependencies between the variables. A zero value in <math>\Sigma^{-1}</math> means that conditioned on the values of the other variables, the two corresponding variable are independent of each other.
To learn the graph structure as a multivariate Gaussian graphical model, we can use either [[L-1 regularization]], or [[neighborhood selection]] algorithms. These algorithms simultaneously learn a graph structure and the edge strength of the connected nodes. An edge strength corresponds to the potential function defined on the corresponding two-node [[Clique (graph theory)|clique]]. We use a training set of a number of PDB structures to learn the <math>\mu</math> and <math>\Sigma^{-1}</math>.
Once the model is learned, we can repeat the same step as in the discrete case, to get the density functions at each node, and use analytical form to calculate the free energy. Here, the [[Partition function (mathematics)|partition function]] already has a [[Closed-form expression|closed form]], so the [[inference]], at least for the Gaussian graphical models is trivial. If the analytical form of the partition function is not available, [[particle filtering]] or [[expectation propagation]] can be used to approximate ''Z'', and then perform the inference and calculate free energy.
Line 69:
==External links==
* http://www.liebertonline.com/doi/pdf/10.1089/cmb.2007.0131
* https://web.archive.org/web/20110724225908/http://www.learningtheory.org/colt2008/81-Zhou.pdf
* {{cite journal|author1= Liu Y |author2= Carbonell J |author3= Gopalakrishnan V |year=2009|title= Conditional graphical models for protein structural motif recognition
|journal= J. Comput. Biol. | volume=16|pages= 639–57 |doi=10.1089/cmb.2008.0176 |pmid=19432536 |issue=5|hdl= 1721.1/62177 |s2cid= 7035106 |hdl-access=free }}
* [https://www.cs.cmu.edu/~jgc/publication/Predicting_Protein_Folds_ICML_2005.pdf Predicting Protein Folds with Structural Repeats Using a Chain Graph Model]
{{DEFAULTSORT:Graphical Models For Protein Structure}}
|