Graphical model: Difference between revisions

Content deleted Content added
No edit summary
 
(35 intermediate revisions by 19 users not shown)
Line 1:
{{short description|Probabilistic model}}
{{machine learning bar}}
{{about|the representation of probability distributions using graphs|the computer graphics journal|Graphical Models{{!}}''Graphical Models''}}
{{Machine learning|Structured prediction}}
{{More footnotes|date=May 2017}}
A '''graphical model''' or '''probabilistic graphical model''' ('''PGM''') or '''structured probabilistic model''' is a [[probabilistic model]] for which a [[Graph (discrete mathematics)|graph]] expresses the [[conditional dependence]] structure between [[random variable]]s. TheyGraphical models are commonly used in [[probability theory]], [[statistics]]—particularly [[Bayesian statistics]]—and [[machine learning]].
 
==Types==
[[File:Graph model.svg|thumb|right|alt=An example of a graphical model.| An example of a graphical model. Each arrow indicates a dependency. In this example: D depends on A, B, and C; and C depends on B and D; whereas A and B are each independent.]]
 
==Types of graphical models==
Generally, probabilistic graphical models use a graph-based representation as the foundation for encoding a distribution over a multi-dimensional space and a graph that is a compact or [[Factor graph|factorized]] representation of a set of independences that hold in the specific distribution. Two branches of graphical representations of distributions are commonly used, namely, [[Bayesian network]]s and [[Markov random field]]s. Both families encompass the properties of factorization and independences, but they differ in the set of independences they can encode and the factorization of the distribution that they induce.<ref name=koller09>{{cite book
|author=Koller, D.
Line 16:
|pages=1208
|isbn=978-0-262-01319-2
|doi=
|accessdate=
|author2-link=Nir Friedman
|author-link=Daphne Koller
Line 24 ⟶ 22:
|url-status=dead
}}</ref>
 
===Undirected Graphical Model===
 
[[File:Examples of an Undirected Graph.svg|thumb|alt=An undirected graph with four vertices.|An undirected graph with four vertices]]
 
The undirected graph shown may have one of several interpretations; the common feature is that the presence of an edge implies some sort of dependence between the corresponding random variables. From this graph, we might deduce that B, C, and D are all [[Conditional independence|conditionally independent]] given A. This means that if the value of A is known, then the values of B, C, and D provide no further information about each other. Equivalently (in this case), the joint probability distribution can be factorized as:
 
:<math>P[A,B,C,D] = Pf_{AB}[A,B] \cdot Pf_{AC}[BA,C] \cdot Pf_{AD}[C,D|A,BD]</math>
 
for some non-negative functions <math>f_{AB}, f_{AC}, f_{AD}</math>.
 
===Bayesian network===
{{main|Bayesian network}}
 
[[File:Example of a Directed Graph.svg|thumb|alt=Example of a directed acyclic graph on four vertices.|Example of a directed acyclic graph on four vertices]]
 
 
If the network structure of the model is a [[directed acyclic graph]], the model represents a factorization of the joint [[probability]] of all random variables. More precisely, if the events are <math>X_1,\ldots,X_n</math> then the joint probability satisfies
Line 32 ⟶ 43:
:<math>P[X_1,\ldots,X_n]=\prod_{i=1}^nP[X_i|\text{pa}(X_i)]</math>
 
where <math>\text{pa}(X_i)</math> is the set of parents of node <math>X_i</math> (nodes with edges directed towards <math>X_i</math>). In other words, the [[joint distribution]] factors into a product of conditional distributions. For example, the graphical model in the Figure shown above (which is actually not a directed acyclic graph, butshown an [[ancestral graph]]) consists ofin the random variablesFigure <math>A,this B,factorization C,would D</math>be
:<math>P[A,B,C,D] = P[A]\cdot P[B | A]\cdot P[C | A] \cdot P[D|A,C]</math>.
with a joint probability density that factors as
 
:<math>P[A,B,C,D] = P[A]\cdot P[B]\cdot P[C,D|A,B]</math>
 
Any two nodes are [[Conditional independence|conditionally independent]] given the values of their parents. In general, any two sets of nodes are conditionally independent given a third set if a criterion called [[d-separation|''d''-separation]] holds in the graph. Local independences and global independences are equivalent in Bayesian networks.
 
This type of graphical model is known as a directed graphical model, [[Bayesian network]], or belief network. Classic machine learning models like [[hidden Markov models]], [[Artificial neural network|neural networks]] and newer models such as [[variable-order Markov model]]s can be considered special cases of Bayesian networks.
 
One of the simplest Bayesian Networks is the [[Naive Bayes classifier]].
 
===Cyclic Directed Graphical Models===
 
[[File:Graph model.svg|thumb|right|alt=An example of a directed graphical model.| An example of a directed, cyclic graphical model. Each arrow indicates a dependency. In this example: D depends on A, B, and C; and C depends on B and D; whereas A and B are each independent.]]
 
The next figure depicts a graphical model with a cycle. This may be interpreted in terms of each variable 'depending' on the values of its parents in some manner.
withThe particular graph shown suggests a joint probability density that factors as
:<math>P[A,B,C,D] = P[A]\cdot P[B]\cdot P[C,D|A,B]</math>,
but other interpretations are possible.
<ref name=richardson95causal>{{cite book
|first=Thomas |last=Richardson
|chapter=A discovery algorithm for directed cyclic graphs
|title=Proceedings of the Twelfth Conference on Uncertainty in Artificial Intelligence
|year=1996
|publisher=Morgan Kaufmann Pub.
|isbn=978-1-55860-412-4
}}
</ref>
 
===Other types===
*[[Naive Bayes classifier]] where we use a tree with a single root
*[[Dependency network (graphical model)|Dependency network]] where cycles are allowed
*Tree-augmented classifier or '''TAN model'''
[[File:Tan corral.png|thumb| TAN model for "corral dataset"]]
*Targeted Bayesian network learning (TBNL) [[File:Tbnl corral.jpg|thumb|TBNL model for "corral dataset"]]
*A [[factor graph]] is an undirected [[bipartite graph]] connecting variables and factors. Each factor represents a function over the variables it is connected to. This is a helpful representation for understanding and implementing [[belief propagation]].
* A [[clique tree]] or junction tree is a [[tree (graph theory)|tree]] of [[clique (graph theory)|cliques]], used in the [[junction tree algorithm]].
Line 86 ⟶ 116:
| last = Bishop
| first = Christopher M.
| authorlinkauthor-link = Christopher Bishop
| title = Pattern Recognition and Machine Learning
| publisher = Springer
Line 93 ⟶ 123:
| isbn=978-0-387-31073-2
| chapter= Chapter 8. Graphical Models
| chapterurlchapter-url=https://www.microsoft.com/en-us/research/wp-content/uploads/2016/05/Bishop-PRML-sample.pdf
| pages=359–422
| mr=2247587
Line 100 ⟶ 130:
|author=Cowell, Robert G.
|author2=Dawid, A. Philip |author3=Lauritzen, Steffen L. |author4-link=David Spiegelhalter |author4=Spiegelhalter, David J.
|title=Probabilistic networks and expert systems |publisher=Springer |___location=Berlin |year=1999 |pages= |isbn=978-0-387-98767-5 |doi= |accessdate= |mr=1697175 |ref=cowell |author2-link=Philip Dawid }} A more advanced and statistically oriented book
* {{cite book |author=Jensen, Finn |title=An introduction to Bayesian networks |publisher=Springer |___location=Berlin |year=1996 |pages= |isbn=978-0-387-91502-9 |doi= |accessdate=}}
* {{Cite book
|first=Judea |last=Pearl |authorlinkauthor-link = Judea Pearl
| year = 1988
| title = Probabilistic Reasoning in Intelligent Systems
|url=https://archive.org/details/probabilisticrea00pear |url-access=registration | edition = 2nd revised
| ___location = San Mateo, CA
| publisher = [[Morgan Kaufmann]]
Line 115 ⟶ 145:
===Journal articles===
* {{Cite journal
| author = Edoardo M. Airoldi |authorlink1author-link1=Edoardo Airoldi
| title = Getting Started in Probabilistic Graphical Models
| journal = [[PLoSPLOS Computational Biology]]
| volume = 3
| issue = 12
Line 125 ⟶ 155:
| pmid = 18069887
| pmc = 2134967
|arxiv=0706.2040
}}
|bibcode=2007PLSCB...3..252A
*{{Cite journal | last1 = Jordan | first1 = M. I. | authorlink=Michael I. Jordan| doi = 10.1214/088342304000000026 | title = Graphical Models | journal = Statistical Science | volume = 19 | pages = 140–155| year = 2004 | pmid = | pmc = }}
|doi-access=free
*{{Cite journal|last=Ghahramani|first=Zoubin|date=May 2015|title=Probabilistic machine learning and artificial intelligence|journal=Nature|language=English|volume=521|issue=7553|pages= 452–459|doi=10.1038/nature14541|pmid=26017444}}
}}
*{{Cite journal | last1 = Jordan | first1 = M. I. | authorlinkauthor-link=Michael I. Jordan| doi = 10.1214/088342304000000026 | title = Graphical Models | journal = Statistical Science | volume = 19 | pages = 140–155| year = 2004 | pmiddoi-access = | pmc =free }}
*{{Cite journal|last=Ghahramani|first=Zoubin|date=May 2015|title=Probabilistic machine learning and artificial intelligence|journal=Nature|language=Englishen|volume=521|issue=7553|pages= 452–459|doi=10.1038/nature14541|pmid=26017444|bibcode=2015Natur.521..452G|s2cid=216356|url=https://www.repository.cam.ac.uk/handle/1810/248538|doi-access=free}}
 
===Other===