Content deleted Content added
←Created page with '{{Userspace draft|source=ArticleWizard|date=May 2010}} Graphical Models have become powerful frameworks for Protein structure prediction, [[Protein-protei...' |
Citation bot (talk | contribs) Add: s2cid. | Use this bot. Report bugs. | Suggested by Abductive | Category:Computational chemistry | #UCB_Category 83/151 |
||
(37 intermediate revisions by 25 users not shown) | |||
Line 1:
{{
[[Graphical
There are two main approaches to
▲[[Graphical Models]] have become powerful frameworks for [[Protein structure prediction]], [[Protein-protein interaction]] and [[Free energy]] calculations for Protein Structures. Using a graphical model to represent the protein structure allows us to solve many problems including secondary structure prediction, protein protein interactions, protein-drug interaction, and free energy calculations.
==Discrete graphical models for protein structure==
▲There are two main approaches to use Graphical Models in Protein Structure Modeling. First approach uses [[Discrete]] variables for representing coordinates or [[Dihedral angle]]s of the protein structure. The variables are originally all continuous values, to transform them into discrete values, a discretization process is typically applied. Second approach uses [[Continuous]] variables for the coordinates or [[Dihedral angle]]s.
[[Markov random field]]s, also known as undirected graphical
==Discrete Graphical Models for Protein Structure==▼
▲[[Markov random field]], also known as undirected graphical model is a common representation for this problem. Given an [[undirected graph]] ''G'' = (''V'', ''E''), a set of [[random variable]]s ''X'' = (''X''<sub>''v''</sub>)<sub>''v'' ∈ ''V''</sub> indexed by ''V'' form a Markov random field with respect to ''G'' if they satisfy the Pairwise Markov property:
In the
▲Any two non-adjacent variables are [[conditional independence|conditionally independent]] given all other variables:
===Model===▼
▲::<math>X_u \perp\!\!\!\perp X_v | X_{V \setminus \{u,v\}} \quad \text{if } \{u,v\} \notin E</math>
Let ''X'' = {''X''<sub>''b''</sub>, ''X''<sub>''s''</sub>} be the random variables representing the entire protein structure. ''X''<sub>''b''</sub> can be represented by a set of 3-d coordinates of the [[Backbone chain|backbone]] atoms, or equivalently, by a sequence of [[bond length]]s and [[dihedral angle]]s. The probability of a particular [[Protein structure|conformation]] ''x'' can then be written as:
▲In the Discrete model, the continuous variables are discretized into a set of favorable discrete values. If the variables of choice are [[dihedral angle]]s, the discretization is typically done by mapping each value to the corresponding [[Rotamer]] conformation.
where <math>\Theta</math> represents any parameters used to describe this model, including sequence information, temperature etc. Frequently the backbone is assumed to be rigid with a known conformation, and the problem is then transformed to a side-chain placement problem. The structure of the graph is also encoded in <math>\Theta</math>. This structure shows which two variables are conditionally independent. As an example, side chain angles of two residues far apart can be independent given all other angles in the protein. To extract this structure, researchers use a distance
▲===Model===
▲Let ''X'' = {''X''<sub>''b''</sub>, ''X''<sub>''s''</sub>} be the random variables representing the entire protein structure. ''X''<sub>''b''</sub> can be represented by a set of 3-d coordinates of the [[backbone]] atoms, or equivalently, by a sequence of [[bond length]]s and [[dihedral angle]]s. The probability of a particular [[conformation]] ''x'' can then be written as:
▲::<math>p(X = x|\Theta) = p(X_b = x_b)p(X_s = x_s|X_b,\Theta)</math>.
Given this representation, the probability of a particular side chain conformation ''x''<sub>''s''</sub> given the backbone conformation ''x''<sub>''b''</sub> can be expressed as▼
▲where <math>\Theta</math> represents any parameters used to describe this model, including sequence information, temperature etc. Frequently the backbone is assumed to be rigid with a known conformation, and the problem is then transformed to a side-chain placement problem. The structure of the graph is also encoded in <math>\Theta</math>. This structure shows which two variables are conditionally independent. As an example, side chain angles of two residues far apart can be independent given all other angles in the protein. To extract this structure, researchers use a distance [[threshold]], and only pair of residues which are within that threshold are considered connected (i.e. have an edge between them).
:<math>p(X_s = x_s|X_b = x_b) = \frac{1}{Z} \prod_{c\in C(G)}\Phi_c (x_s^c,x_b^c)</math>▼
▲Given this representation, the probability of a particular side chain conformation x<sub>s</sub> given the backbone conformation x<sub>b</sub> can be expressed as
where ''C''(''G'') is the set of all cliques in ''G'', <math>\Phi</math> is a [[function (mathematics)|potential function]] defined over the variables, and ''Z'' is the
▲<math>p(X_s = x_s|X_b = x_b) = \frac{1}{Z} \prod_{c\in C(G)}\Phi_c (x_s^c,x_b^c)</math>
To completely characterize the MRF, it is necessary to define the potential function <math>\Phi</math>. To simplify, the cliques of a graph are usually restricted to only the cliques of size 2, which means the potential function is only defined over pairs of variables. In [[Goblin System]],
▲where C(G) is the set of all cliques in G, <math>\Phi</math> is a [[potential function]] defined over the variables, and Z is the so called [[partition function]].
:<math>\Phi(x_s^{i_p},x_b^{j_q}) = \exp ( -E(x_s^{i_p},x_b^{j_q})/K_BT)</math>▼
▲To completely characterize the MRF, it is necessary to define the potential function <math>\Phi</math>. To simplify, the cliques of a graph are usually restricted to only the cliques of size 2, which means the potential function is only defined over pairs of variables. In [[Goblin System]], this pairwise functions are defined as
▲<math>\Phi(x_s^{i_p},x_b^{j_q}) = exp ( -E(x_s^{i_p},x_b^{j_q})/K_BT)</math>
where <math>E(x_s^{i_p},x_b^{j_q})</math> is the energy of interaction between rotamer state p of residue <math>X_i^s</math> and rotamer state q of residue <math>X_j^s</math> and <math>k_B</math> is the [[Boltzmann constant]].
Using a PDB file, this model can be built over the protein structure. From this model, free energy can be calculated.
It has been shown that the free energy of a system is calculated as ▼
===Free energy calculation: belief propagation===
<math>G=E-TS</math>▼
▲:<math>G=E-TS</math>
where E is the enthalpy of the system, T the temperature and S, the entropy. Now if we associate a probability with each state of the system, (p(x) for each conformation value, x), G can be rewritten as ▼
▲where E is the enthalpy of the system, T the temperature and S, the entropy. Now if we associate a probability with each state of the system, (p(x) for each conformation value, x), G can be rewritten as
<math>G=\sum_{x}p(x)E(x)-T\sum_xp(x)ln(p(x))</math>▼
▲:<math>G=\sum_{x}p(x)E(x)-T\sum_xp(x)\ln(p(x)) \,</math>
Calculating p(x) on discrete graphs is done by the [[Generalized belief propagation]] algorithm. This algorithm calculates an [[approximation]] to the probabilities, and it is not guaranteed to converge to a final value set. However, in practice, it has been shown to converge successfully in many cases.▼
▲Calculating p(x) on discrete graphs is done by the [[
==Continuous Graphical Models for Protein Structures==▼
Graphical
===
Gaussian
:<math>f(\Theta=D) = \frac{1}{Z} \exp\left\{-\frac{1}{2}(D-\mu)^T\Sigma^{-1}(D-\mu)\right\}</math>
Where <math>Z =
To learn the graph structure as a multivariate Gaussian graphical model, we can use either [[L-1 regularization]], or [[
Once the model is learned, we can repeat the same step as in the discrete case, to get the density functions at each node, and use analytical form to calculate the
{{No footnotes|date=August 2010}}
▲Once the model is learned, we can repeat the same step as in the discrete case, to get the density functions at each node, and use analytical form to calculate the Free energy. Here, the [[partition function]] already has a [[closed form]], so the [[inference]], at least for the Gaussian Graphical Models is trivial. If analytical form of the partition function was not available, we can use [[particle filtering]] or [[expectation propagation]] to approximate the Z, and then perform the inference and calculate free energy.
==
<!--- See http://en.wikipedia.org/wiki/Wikipedia:Footnotes on how to create references using <ref></ref> tags which will then appear here automatically -->
Line 69 ⟶ 67:
* Free Energy Estimates of All-atom Protein Structures Using Generalized Belief Propagation, Hetunandan Kamisetty Eric P. Xing Christopher J. Langmead, RECOMB 2008
==
* http://www.liebertonline.com/doi/pdf/10.1089/cmb.2007.0131
* https://web.archive.org/web/20110724225908/http://www.learningtheory.org/colt2008/81-Zhou.pdf
* {{cite journal|author1= Liu Y |author2= Carbonell J |author3= Gopalakrishnan V |year=2009|title= Conditional graphical models for protein structural motif recognition
|journal= J. Comput. Biol. | volume=16|pages= 639–57 |doi=10.1089/cmb.2008.0176 |pmid=19432536 |issue=5|hdl= 1721.1/62177 |s2cid= 7035106 |hdl-access=free }}
* [https://www.cs.cmu.edu/~jgc/publication/Predicting_Protein_Folds_ICML_2005.pdf Predicting Protein Folds with Structural Repeats Using a Chain Graph Model]
[[Category:
[[Category:Protein methods]]
[[Category:Computational chemistry]]
|