Content deleted Content added
Mathguyjohn (talk | contribs) No edit summary |
→top: Use vector version |
||
(62 intermediate revisions by 45 users not shown) | |||
Line 1:
{{Short description|Set of random variables}}
[[File:markov random field example.
In the ___domain of [[physics]] and [[probability]], a '''Markov random field''' (
A Markov network or MRF is similar to a [[Bayesian network]] in its representation of dependencies; the differences being that Bayesian networks are [[directed acyclic graph|directed and acyclic]], whereas Markov networks are undirected and may be cyclic. Thus, a Markov network can represent certain dependencies that a Bayesian network cannot (such as cyclic dependencies {{Explain|date=July 2018}}); on the other hand, it can't represent certain dependencies that a Bayesian network can (such as induced dependencies {{Explain|date=July 2018}}). The underlying graph of a Markov random field may be finite or infinite.
When the [[joint probability distribution|joint probability density]] of the random variables is strictly positive, it is also referred to as a '''Gibbs random field''', because, according to the [[Hammersley–Clifford theorem]], it can then be represented by a [[Gibbs measure]] for an appropriate (locally defined) energy function. The prototypical Markov random field is the [[Ising model]]; indeed, the Markov random field was introduced as the general setting for the Ising model.<ref name="Kindermann-Snell80">{{cite book
|first1=Ross
|last1=Kindermann |first2=J. Laurie
|last2=Snell |url=http://www.cmap.polytechnique.fr/~rama/ehess/mrfbook.pdf
|title=Markov Random Fields and Their Applications
|year=1980
|publisher=American Mathematical Society
|isbn=978-0-8218-5001-
|mr=0620955
|access-date=2012-04-09
}}</ref>▼
|archive-date=2017-08-10
In the ___domain of [[artificial intelligence]], a Markov random field is used to model various low- to mid-level tasks in [[image processing]] and [[computer vision]].<ref>{{cite book▼
|archive-url=https://web.archive.org/web/20170810092327/http://www.cmap.polytechnique.fr/%7Erama/ehess/mrfbook.pdf
|url-status=dead
▲ }}</ref> In the ___domain of [[artificial intelligence]], a Markov random field is used to model various low- to mid-level tasks in [[image processing]] and [[computer vision]].<ref>{{cite book
|first1=S. Z. |last1=Li
|title=Markov Random Field Modeling in Image Analysis
|year=2009
|publisher=Springer
|url=https://books.google.com/books?id=rDsObhDkCIAC
|isbn=9781848002791
▲ }}</ref>
== Definition ==
Given an undirected graph <math>G=(V,E)</math>, a set of random variables <math>X = (X_v)_{v\in V}</math> indexed by <math>V</math>
:
::<math>X_u \perp\!\!\!\perp X_v \mid X_{V \
:
::<math>X_v \perp\!\!\!\perp X_{V\
:where <math display="inline">\operatorname{N}(v)</math> is the set of neighbors of <math>v</math>, and <math>\operatorname{N}[v] = v \cup \operatorname{N}(v)</math> is the [[Neighborhood (graph theory)|closed neighbourhood]] of <math>v</math>.
:
::<math>X_A \perp\!\!\!\perp X_B \mid X_S</math>
:where every path from a node in <math>A</math> to a node in <math>B</math> passes through <math>S</math>.
The relation between the three Markov properties is particularly clear in the following formulation:
* Pairwise: For any <math>i, j \in V</math> not equal or adjacent, <math>X_i \perp\!\!\!\perp X_j | X_{V \smallsetminus \{i, j\}}</math>.
* Local: For any <math>i\in V</math> and <math>J\subset V</math> not containing or adjacent to <math>i</math>, <math>X_i \perp\!\!\!\perp X_J | X_{V \smallsetminus (\{i\}\cup J)}</math>.
* Global: For any <math>I, J\subset V</math> not intersecting or adjacent, <math>X_I \perp\!\!\!\perp X_J | X_{V \smallsetminus (I\cup J)}</math>.
== Clique factorization ==
As the Markov
Given a set of random variables <math>X = (X_v)_{v\in V}</math>, let <math>P(X=x)</math> be the [[Probability density function|probability]] of a particular field configuration <math>x</math> in
If this joint density can be factorized over the cliques of <math>G</math>
:<math>P(X=x) = \prod_{C \in \operatorname{cl}(G)} \
then <math>X</math> forms a Markov random field with respect to <math>G</math>. Here, <math>\operatorname{cl}(G)</math> is the set of cliques of <math>G</math>. The definition is equivalent if only maximal cliques are used. The functions
Some MRF's do not factorize: a simple example can be constructed on a cycle of 4 nodes with some infinite energies, i.e. configurations of zero probabilities,<ref>{{cite journal
Line 56 ⟶ 70:
|title=Gibbs and Markov random systems with constraints
|journal=Journal of Statistical Physics
|volume=10 |issue=1 |pages=
|doi=10.1007/BF01011714 |mr=0432132
|hdl=10338.dmlcz/135184
}}</ref> even if one, more appropriately, allows the infinite energies to act on the complete graph on <math>V</math>.<ref>{{cite journal▼
|bibcode=1974JSP....10...11M
|s2cid=121299906
▲ |hdl-access=free}}</ref> even if one, more appropriately, allows the infinite energies to act on the complete graph on <math>V</math>.<ref>{{cite journal
| last1 = Gandolfi | first1 = Alberto
| last2 = Lenarda | first2 = Pietro
|title= A note on Gibbs and Markov Random Fields with constraints and their moments
|journal=Mathematics and Mechanics of Complex Systems
|volume=4 |issue=
|doi-access=free }}</ref>
MRF's factorize if at least one of the following conditions is fulfilled:
* the density is strictly positive (by the [[Hammersley–Clifford theorem]])
* the graph is [[Chordal graph|chordal]] (by equivalence to a [[Bayesian network]])
Line 74 ⟶ 90:
== Exponential family ==
Any positive Markov random field can be written as exponential family in canonical form with feature functions <math>f_k</math> such that the full-joint distribution can be written as
:<math> P(X=x) = \frac{1}{Z} \exp \left( \sum_{k} w_k^{\top} f_k (x_{ \{ k \}}) \right)</math>
Line 83 ⟶ 99:
:<math> Z = \sum_{x \in \mathcal{X}} \exp \left(\sum_{k} w_k^{\top} f_k(x_{ \{ k \} })\right).</math>
Here, <math>\mathcal{X}</math> denotes the set of all possible assignments of values to all the network's random variables. Usually, the feature functions <math>f_{k,i}</math> are defined such that they are [[indicator function|indicators]] of the clique's configuration, ''i.e.'' <math>f_{k,i}(x_{\{k\}}) = 1</math> if <math>x_{\{k\}}</math> corresponds to the ''i''-th possible configuration of the ''k''-th clique and 0 otherwise. This model is equivalent to the clique factorization model given above, if <math>N_k=|\operatorname{dom}(C_k)|</math> is the cardinality of the clique, and the weight of a feature <math>f_{k,i}</math> corresponds to the logarithm of the corresponding clique factor, ''i.e.'' <math>w_{k,i} = \log \
The probability ''P'' is often called the Gibbs measure. This expression of a Markov field as a logistic model is only possible if all clique factors are non-zero, ''i.e.'' if none of the elements of <math>\mathcal{X}</math> are assigned a probability of 0. This allows techniques from matrix algebra to be applied, ''e.g.'' that the [[trace (linear algebra)|trace]] of a matrix is log of the [[determinant]], with the matrix representation of a graph arising from the graph's [[incidence matrix]].
Line 97 ⟶ 113:
[[Correlation function]]s are computed likewise; the two-point correlation is:
:<math>C[X_u, X_v] = \frac{1}{Z} \left.\frac{\partial^2 Z[J]}{\partial J_u \,\partial J_v}\right|_{J_u=0, J_v=0}.</math>
Unfortunately, though the likelihood of a logistic Markov network is convex, evaluating the likelihood or gradient of the likelihood of a model requires inference in the model, which is generally computationally infeasible (see [[
== Examples ==
Line 113 ⟶ 129:
|title=Gaussian Markov random fields: theory and applications
|publisher=CRC Press |year=2005
|isbn=978-1-58488-432-
}}</ref>
== Inference ==
As in a [[Bayesian network]], one may calculate the [[conditional distribution]] of a set of nodes <math> V' = \{ v_1 ,\ldots, v_i \} </math> given values to another set of nodes <math> W' = \{ w_1 ,\ldots, w_j \} </math> in the Markov random field by summing over all possible assignments to <math>u \notin V',W'</math>; this is called [[exact inference]]. However, exact inference is a [[Sharp-P-complete|#P-complete]] problem, and thus computationally intractable in the general case. Approximation techniques such as [[Markov chain Monte Carlo]] and loopy [[belief propagation]] are often more feasible in practice. Some particular subclasses of MRFs, such as trees (see [[Chow–Liu tree]]), have polynomial-time inference algorithms; discovering such subclasses is an active research topic. There are also subclasses of MRFs that permit efficient [[Maximum a posteriori|MAP]], or most likely assignment, inference; examples of these include associative networks.<ref>{{citation
| last1 = Taskar | first1 = Benjamin
| last2 = Chatalbashev | first2 = Vassil
| last3 = Koller | first3 = Daphne | author3-link = Daphne Koller
| editor-last = Brodley | editor-first = Carla E. | editor-link = Carla Brodley
| contribution = Learning associative Markov networks
| doi = 10.1145/1015330.1015444
| publisher = [[Association for Computing Machinery]]
| series = ACM International Conference Proceeding Series
| title = Proceedings of the Twenty-
| volume = 69
| pages = 102
| year = 2004}}.</ref><ref>{{citation▼
| year = 2004| title-link = International Conference on Machine Learning
| isbn = 978-1581138283
| citeseerx = 10.1.1.157.329
| s2cid = 11312524
| last1 = Duchi | first1 = John C.
| last2 = Tarlow | first2 = Daniel
Line 142 ⟶ 163:
| series = [[Conference on Neural Information Processing Systems|Advances in Neural Information Processing Systems]] | volume = 19
| title = Proceedings of the Twentieth Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, December 4-7, 2006
| year = 2006}}.</ref> Another interesting sub-class is the one of decomposable models (when the graph is [[Chordal graph|chordal]]): having a closed-form for the [[Maximum likelihood estimate|MLE]], it is possible to discover a consistent structure for hundreds of variables.<ref name="Petitjean">{{cite conference
== Conditional random fields ==
{{Main|Conditional random field}}
One notable variant of a Markov random field is a
== Varied applications ==
Markov random fields find application in a variety of fields, ranging from [[computer graphics (computer science)|computer graphics]] to computer vision,<ref>{{Cite book |last1=Banf |first1=Michael |last2=Blanz |first2=Volker |chapter=Man made structure detection and verification of object recognition in images for the visually impaired |date=2013-06-06 |title=Proceedings of the 6th International Conference on Computer Vision / Computer Graphics Collaboration Techniques and Applications |chapter-url=https://dl.acm.org/doi/10.1145/2466715.2466732 |series=MIRAGE '13 |___location=New York, NY, USA |publisher=Association for Computing Machinery |pages=1–8 |doi=10.1145/2466715.2466732 |isbn=978-1-4503-2023-8}}</ref> [[machine learning]]
== See also ==
{{Div col|colwidth=20em}}
* [[Constraint
* [[Graphical model]]
* [[Dependency network (graphical model)]]
* [[Hammersley–Clifford theorem]]
* [[Hopfield network]]
Line 162 ⟶ 185:
* [[Markov logic network]]
* [[Maximum entropy method]]
* [[Stochastic cellular
{{Div col end}}
==References==
{{reflist}}
{{Stochastic processes}}
|