Graphical models for protein structure: Difference between revisions

Content deleted Content added
m Gaussian graphical models of protein structures: replaced: neghborhood → neighborhood using AWB
Disambiguated: closed formClosed-form expression; Unlinked: Continuous; Help needed: Potential function
Line 2:
[[Graphical model]]s have become powerful frameworks for [[protein structure prediction]], [[protein–protein interaction]] and [[Thermodynamic free energy|free energy]] calculations for protein structures. Using a graphical model to represent the protein structure allows the solution of many problems including secondary structure prediction, protein protein interactions, protein-drug interaction, and free energy calculations.
 
There are two main approaches to use graphical models in protein structure modeling. The first approach uses [[Discrete mathematics|discrete]] variables for representing coordinates or [[dihedral angle]]s of the protein structure. The variables are originally all continuous values and, to transform them into discrete values, a discretization process is typically applied. The second approach uses [[continuous]]{{dn|date=October 2012}} variables for the coordinates or dihedral angles.
 
==Discrete graphical models for protein structure==
Line 24:
:<math>p(X_s = x_s|X_b = x_b) = \frac{1}{Z} \prod_{c\in C(G)}\Phi_c (x_s^c,x_b^c)</math>
 
where ''C''(''G'') is the set of all cliques in ''G'', <math>\Phi</math> is a [[potential function]]{{dn|date=August 2013}} defined over the variables, and ''Z'' is the [[partition function (mathematics)|partition function]].
 
To completely characterize the MRF, it is necessary to define the potential function <math>\Phi</math>. To simplify, the cliques of a graph are usually restricted to only the cliques of size 2, which means the potential function is only defined over pairs of variables. In [[Goblin System]], this pairwise functions are defined as
Line 57:
To learn the graph structure as a multivariate Gaussian graphical model, we can use either [[L-1 regularization]], or [[neighborhood selection]] algorithms. These algorithms simultaneously learn a graph structure and the edge strength of the connected nodes. An edge strength corresponds to the potential function defined on the corresponding two-node [[clique]]. We use a training set of a number of PDB structures to learn the <math>\mu</math> and <math>\Sigma^{-1}</math>.
 
Once the model is learned, we can repeat the same step as in the discrete case, to get the density functions at each node, and use analytical form to calculate the free energy. Here, the [[Partition function (mathematics)|partition function]] already has a [[Closed-form expression|closed form]], so the [[inference]], at least for the Gaussian graphical models is trivial. If the analytical form of the partition function is not available, [[particle filtering]] or [[expectation propagation]] can be used to approximate ''Z'', and then perform the inference and calculate free energy.
 
{{No footnotes|date=August 2010}}