Content deleted Content added
m Date maintenance tags and general fixes: build 473:, removed orphan tag |
m Tracking category removed (listified) |
||
Line 1:
{{No footnotes|date=June 2010}}
[[Graphical model]]s have become powerful frameworks for [[protein structure prediction]], [[protein–protein interaction]] and [[free energy]] calculations for protein structures. Using a graphical model to represent the protein structure allows the solution of many problems including secondary structure prediction, protein protein interactions, protein-drug interaction, and free energy calculations.
There are two main approaches to use graphical models in protein structure modeling. The first approach uses [[discrete]] variables for representing coordinates or [[dihedral angle]]s of the protein structure. The variables are originally all continuous values and, to transform them into discrete values, a discretization process is typically applied. The second approach uses [[continuous]] variables for the coordinates or dihedral angles.
==Discrete graphical models for protein structure==
[[Markov random field]]s, also known as undirected graphical models are common representations for this problem. Given an [[undirected graph]] ''G'' = (''V'', ''E''), a set of [[random variable]]s ''X'' = (''X''<sub>''v''</sub>)<sub>''v'' ∈ ''V''</sub> indexed by ''V'', form a Markov random field with respect to ''G'' if they satisfy the pairwise Markov property:
▲[[Markov random field]]s, also known as undirected graphical models are common representations for this problem. Given an [[undirected graph]] ''G'' = (''V'', ''E''), a set of [[random variable]]s ''X'' = (''X''<sub>''v''</sub>)<sub>''v'' ∈ ''V''</sub> indexed by ''V'', form a Markov random field with respect to ''G'' if they satisfy the pairwise Markov property:
*any two non-adjacent variables are [[conditional independence|conditionally independent]] given all other variables:
Line 19 ⟶ 18:
:<math>p(X = x|\Theta) = p(X_b = x_b)p(X_s = x_s|X_b,\Theta), \,</math>
where <math>\Theta</math> represents any parameters used to describe this model, including sequence information, temperature etc. Frequently the backbone is assumed to be rigid with a known conformation, and the problem is then transformed to a side-chain placement problem. The structure of the graph is also encoded in <math>\Theta</math>. This structure shows which two variables are conditionally independent. As an example, side chain angles of two residues far apart can be independent given all other angles in the protein. To extract this structure, researchers use a distance threshold, and only pair of residues which are within that threshold are considered connected (i.e. have an edge between them).
Given this representation, the probability of a particular side chain conformation ''x''<sub>''s''</sub> given the backbone conformation ''x''<sub>''b''</sub> can be expressed as
Line 27 ⟶ 26:
where ''C''(''G'') is the set of all cliques in ''G'', <math>\Phi</math> is a [[potential function]] defined over the variables, and ''Z'' is the [[partition function]].
To completely characterize the MRF, it is necessary to define the potential function <math>\Phi</math>. To simplify, the cliques of a graph are usually restricted to only the cliques of size 2, which means the potential function is only defined over pairs of variables. In [[Goblin System]], this pairwise functions are defined as
:<math>\Phi(x_s^{i_p},x_b^{j_q}) = \exp ( -E(x_s^{i_p},x_b^{j_q})/K_BT)</math>
Line 36 ⟶ 35:
===Free energy calculation: belief propagation===
▲It has been shown that the free energy of a system is calculated as
:<math>G=E-TS</math>
where E is the enthalpy of the system, T the temperature and S, the entropy. Now if we associate a probability with each state of the system, (p(x) for each conformation value, x), G can be rewritten as
:<math>G=\sum_{x}p(x)E(x)-T\sum_xp(x)\ln(p(x)) \,</math>
Line 48 ⟶ 46:
==Continuous graphical models for protein structures==
Graphical models can still be used when the variables of choice are continuous. In these cases, the probability distribution is represented as a [[multivariate probability distribution]] over continuous variables. Each family of distribution will then impose certain properties on the graphical model. [[Multivariate Gaussian distribution]] is one of the most convenient distributions in this problem. The simple form of the probability, and the direct relation with the corresponding graphical model makes it a popular choice among researchers.
Line 63 ⟶ 60:
{{No footnotes|date=August 2010}}
==
<!--- See http://en.wikipedia.org/wiki/Wikipedia:Footnotes on how to create references using <ref></ref> tags which will then appear here automatically -->
Line 69 ⟶ 67:
* Free Energy Estimates of All-atom Protein Structures Using Generalized Belief Propagation, Hetunandan Kamisetty Eric P. Xing Christopher J. Langmead, RECOMB 2008
==
* http://www.liebertonline.com/doi/pdf/10.1089/cmb.2007.0131
* http://www.learningtheory.org/colt2008/81-Zhou.pdf
Line 76 ⟶ 74:
{{DEFAULTSORT:Graphical Models For Protein Structure}}
[[Category:Graphical models]]
[[Category:Protein methods]]
|