Content deleted Content added
+ orphan |
+cat; some tidying |
||
Line 1:
{{nofootnotes|date=June 2010}}
{{orphan|date=June 2010}}
[[Graphical
There are two main approaches to use
==Discrete
[[Markov random field]], also known as undirected graphical model is a common representation for this problem. Given an [[undirected graph]] ''G'' = (''V'', ''E''), a set of [[random variable]]s ''X'' = (''X''<sub>''v''</sub>)<sub>''v'' ∈ ''V''</sub> indexed by ''V'' form a Markov random field with respect to ''G'' if they satisfy the
Any two non-adjacent variables are [[conditional independence|conditionally independent]] given all other variables:
In the Discrete model, the continuous variables are discretized into a set of favorable discrete values. If the variables of choice are [[dihedral angle]]s, the discretization is typically done by mapping each value to the corresponding [[Rotamer]] conformation.
===Model===
Let ''X'' = {''X''<sub>''b''</sub>, ''X''<sub>''s''</sub>} be the random variables representing the entire protein structure. ''X''<sub>''b''</sub> can be represented by a set of 3-d coordinates of the [[backbone]] atoms, or equivalently, by a sequence of [[bond length]]s and [[dihedral angle]]s. The probability of a particular [[conformation]] ''x'' can then be written as:
where <math>\Theta</math> represents any parameters used to describe this model, including sequence information, temperature etc. Frequently the backbone is assumed to be rigid with a known conformation, and the problem is then transformed to a side-chain placement problem. The structure of the graph is also encoded in <math>\Theta</math>. This structure shows which two variables are conditionally independent. As an example, side chain angles of two residues far apart can be independent given all other angles in the protein. To extract this structure, researchers use a distance [[threshold]], and only pair of residues which are within that threshold are considered connected (i.e. have an edge between them).
Given this representation, the probability of a particular side chain conformation ''x''<sub>''s''</sub> given the backbone conformation ''x''<sub>''b''</sub> can be expressed as
:<math>p(X_s = x_s|X_b = x_b) = \frac{1}{Z} \prod_{c\in C(G)}\Phi_c (x_s^c,x_b^c)</math>
where ''C''(''G'') is the set of all cliques in ''G'', <math>\Phi</math> is a [[potential function]] defined over the variables, and ''Z'' is the
To completely characterize the MRF, it is necessary to define the potential function <math>\Phi</math>. To simplify, the cliques of a graph are usually restricted to only the cliques of size 2, which means the potential function is only defined over pairs of variables. In [[Goblin System]], this pairwise functions are defined as
:<math>\Phi(x_s^{i_p},x_b^{j_q}) = \exp ( -E(x_s^{i_p},x_b^{j_q})/K_BT)</math>
where <math>E(x_s^{i_p},x_b^{j_q})</math> is the energy of interaction between rotamer state p of residue <math>X_i^s</math> and rotamer state q of residue <math>X_j^s</math> and <math>k_B</math> is the [[Boltzmann constant]].
Line 35 ⟶ 36:
Using a PDB file, this model can be built over the protein structure. From this model free energy can be calculated.
===Free
It has been shown that the free energy of a system is calculated as
:<math>G=E-TS</math>
where E is the enthalpy of the system, T the temperature and S, the entropy. Now if we associate a probability with each state of the system, (p(x) for each conformation value, x), G can be rewritten as
:<math>G=\sum_{x}p(x)E(x)-T\sum_xp(x)\ln(p(x)) \,</math>
Calculating p(x) on discrete graphs is done by the [[
==Continuous
Graphical
===Guassian
Gaussian Graphical Models are multivariate probability distributions encoding a network of dependencies among variables. Let <math>\Theta=[\theta_1, \theta_2, .., \theta_n]</math> be a set of <math>n</math> variables, such as <math>n</math> [[dihedral angles]], and let <math>f(\Theta=D)</math> be the value of the [[probability density function]] at a particular value ''D''. A multivariate Gaussian
:<math>f(\Theta=D) = \frac{1}{Z} \exp\{-\frac{1}{2}(D-\mu)^T\Sigma^{-1}(D-\mu)\}</math>
Where <math>Z = \frac{1}{(2\pi)^{n/2}|\Sigma|^{1/2}}</math> is the closed form for the [[partition function]]. The parameters of this distribution are <math>\mu</math> and <math>\Sigma</math>. <math>\mu</math> is the vector of [[mean values]] of each variable, and <math>\Sigma^{-1}</math>, the inverse of the [[covariance matrix]], also known as the [[precision matrix]]. Precision matrix contains the pairwise dependencies between the variables. A zero value in <math>\Sigma^{-1}</math> means that conditioned on the values of the other variables, the two corresponding variable are independent of each other.
Line 60 ⟶ 61:
To learn the graph structure as a multivariate Gaussian graphical model, we can use either [[L-1 regularization]], or [[neghborhood selection]] algorithms. These algorithms simultaneously learn a graph structure and the edge strength of the connected nodes. An edge strength corresponds to the potential function defined on the corresponding two-node [[clique]]. We use a training set of a number of PDB structures to learn the <math>\mu</math> and <math>\Sigma^{-1}</math>.
Once the model is learned, we can repeat the same step as in the discrete case, to get the density functions at each node, and use analytical form to calculate the
== References ==
Line 74 ⟶ 75:
<!--- Categories --->
[[Category:Articles created via the Article Wizard]]
[[Category:Graphical models]]
[[Category:Protein methods]]
[[Category:Computational chemistry]]
|