Markov random field: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 19:59, 23 July 2018 edit Mathguyjohn (talk \| contribs) 34 edits No edit summary ← Previous edit		Latest revision as of 01:31, 15 August 2025 edit undo Cmglee (talk \| contribs) Extended confirmed users 12,440 edits →top: Use vector version
(62 intermediate revisions by 45 users not shown)
Line 1: {{Short description\|Set of random variables}} [[File:markov random field example.~~png~~svg\|thumb\|alt=An example of a Markov random field.\|An example of a Markov random field. Each edge represents dependency. In this example: A depends on B and D. B depends on A and D. D depends on A, B, and E. E depends on D and C. C depends on E.]] In the ___domain of [[physics]] and [[probability]], a '''Markov random field''' (~~often abbreviated as~~ '''MRF'''), '''Markov network''' or '''undirected [[graphical model]]''' is a set of [[random variable]]s having a [[Markov property]] described by an [[undirected graph]]. In other words, a [[random field]] is said to be a [[Andrey Markov~~, Jr.~~\|Markov]] random field if it satisfies Markov properties. The concept originates from the [[Spin glass#Sherrington–Kirkpatrick model\|Sherrington–Kirkpatrick model]].<ref>{{citation\|title=Solvable Model of a Spin-Glass\|number=35\|year=1975\|author1= Sherrington, David\|author2=Kirkpatrick, Scott\|journal=Physical Review Letters\|volume=35\|pages=1792–1796\|doi=10.1103/PhysRevLett.35.1792\|bibcode=1975PhRvL..35.1792S}}</ref> A Markov network or MRF is similar to a [[Bayesian network]] in its representation of dependencies; the differences being that Bayesian networks are [[directed acyclic graph\|directed and acyclic]], whereas Markov networks are undirected and may be cyclic. Thus, a Markov network can represent certain dependencies that a Bayesian network cannot (such as cyclic dependencies {{Explain\|date=July 2018}}); on the other hand, it can't represent certain dependencies that a Bayesian network can (such as induced dependencies {{Explain\|date=July 2018}}). The underlying graph of a Markov random field may be finite or infinite. When the [[joint probability distribution\|joint probability density]] of the random variables is strictly positive, it is also referred to as a '''Gibbs random field''', because, according to the [[Hammersley–Clifford theorem]], it can then be represented by a [[Gibbs measure]] for an appropriate (locally defined) energy function. The prototypical Markov random field is the [[Ising model]]; indeed, the Markov random field was introduced as the general setting for the Ising model.<ref name="Kindermann-Snell80">{{cite book \|first1=Ross \|last1=Kindermann \|first2=J. Laurie \|last2=Snell \|url=http://www.cmap.polytechnique.fr/~rama/ehess/mrfbook.pdf \|title=Markov Random Fields and Their Applications \|year=1980 \|publisher=American Mathematical Society \|isbn=978-0-8218-5001-65 \|mr=0620955 \|access-date=2012-04-09 }}</ref>▼ \|archive-date=2017-08-10 In the ___domain of [[artificial intelligence]], a Markov random field is used to model various low- to mid-level tasks in [[image processing]] and [[computer vision]].<ref>{{cite book▼ \|archive-url=https://web.archive.org/web/20170810092327/http://www.cmap.polytechnique.fr/%7Erama/ehess/mrfbook.pdf \|url-status=dead ▲ }}</ref> In the ___domain of [[artificial intelligence]], a Markov random field is used to model various low- to mid-level tasks in [[image processing]] and [[computer vision]].<ref>{{cite book \|first1=S. Z. \|last1=Li \|title=Markov Random Field Modeling in Image Analysis \|year=2009 \|publisher=Springer \|url=https://books.google.com/books?id=rDsObhDkCIAC ~~}}</ref>~~ \|isbn=9781848002791 ▲ }}</ref> == Definition == Given an undirected graph <math>G=(V,E)</math>, a set of random variables <math>X = (X_v)_{v\in V}</math> indexed by <math>V</math>~~ ~~ form a Markov random field with respect to <math>G</math>~~ ~~ if they satisfy the local Markov properties: :~~'''~~Pairwise Markov property~~'''~~: Any two non-adjacent variables are [[conditional independence\|conditionally independent]] given all other variables: ::<math>X_u \perp\!\!\!\perp X_v \mid X_{V \~~setminus~~smallsetminus \{u,v\}} </math> :~~'''~~Local Markov property~~'''~~: A variable is conditionally independent of all other variables given its neighbors: ::<math>X_v \perp\!\!\!\perp X_{V\~~setminus~~smallsetminus \operatorname{N}[v]} \mid X_{\operatorname{N}(v)}</math> :where <math display="inline">\operatorname{N}(v)</math> is the set of neighbors of <math>v</math>, and <math>\operatorname{N}[v] = v \cup \operatorname{N}(v)</math> is the [[Neighborhood (graph theory)\|closed neighbourhood]] of <math>v</math>. :~~'''~~Global Markov property~~'''~~: Any two subsets of variables are conditionally independent given a separating subset: ::<math>X_A \perp\!\!\!\perp X_B \mid X_S</math> :where every path from a node in <math>A</math> to a node in <math>B</math> passes through <math>S</math>. ~~The above three Markov properties are not equivalent:~~ The Global Markov property is stronger than the Local Markov property, which in turn is stronger than the Pairwise one. <ref>{{cite book \|last1=Lauritzen \|first1=Steffen \|title=Graphical models \|date=1996 \|publisher=Clarendon Press \|___location=Oxford \|isbn=~~0198522193~~978-0198522195 \|page=33}}</ref> However, the above three Markov properties are equivalent for positive distributions<ref>{{cite book\|title=Probabilistic Graphical Models\|last1=Koller\|last2=Friedman\|publisher=MIT Press\|date=2009\|isbn=9780262013192\|first1=Daphne\|first2=Nir\|page=114-122}}</ref> (those that assign only nonzero probabilities to the associated variables). The relation between the three Markov properties is particularly clear in the following formulation: * Pairwise: For any <math>i, j \in V</math> not equal or adjacent, <math>X_i \perp\!\!\!\perp X_j \| X_{V \smallsetminus \{i, j\}}</math>. * Local: For any <math>i\in V</math> and <math>J\subset V</math> not containing or adjacent to <math>i</math>, <math>X_i \perp\!\!\!\perp X_J \| X_{V \smallsetminus (\{i\}\cup J)}</math>. * Global: For any <math>I, J\subset V</math> not intersecting or adjacent, <math>X_I \perp\!\!\!\perp X_J \| X_{V \smallsetminus (I\cup J)}</math>. == Clique factorization == As the Markov ~~properties~~property of an arbitrary probability distribution can be difficult to establish, a commonly used class of Markov random fields are those that can be factorized according to the [[Clique (graph theory)\|clique]]s of the graph. Given a set of random variables <math>X = (X_v)_{v\in V}</math>, let <math>P(X=x)</math> be the [[Probability density function\|probability]] of a particular field configuration <math>x</math> in~~ ~~ <math>X</math>~~. That~~—that is, <math>P(X=x)</math> is the probability of finding that the random variables <math>X</math> take on the particular value <math>x</math>. Because <math>X</math> is a set, the probability of <math>x</math> should be understood to be taken with respect to a ''joint distribution'' of the <math>X_v</math>. If this joint density can be factorized over the cliques of <math>G</math>: as :<math>P(X=x) = \prod_{C \in \operatorname{cl}(G)} \~~phi_C~~varphi_C (x_C) </math> then <math>X</math> forms a Markov random field with respect to <math>G</math>. Here, <math>\operatorname{cl}(G)</math> is the set of cliques of <math>G</math>. The definition is equivalent if only maximal cliques are used. The functions ~~''φ''~~<~~sub~~math>~~''C''~~\varphi_C</~~sub~~math> are sometimes referred to as ''factor potentials'' or ''clique potentials''. Note, however, conflicting terminology is in use: the word ''potential'' is often applied to the logarithm of ~~''φ''~~<~~sub~~math>~~''C''~~\varphi_C</~~sub~~math>. This is because, in [[statistical mechanics]], <math>\log(~~''φ''<sub>''C''~~\varphi_C)</~~sub~~math>) has a direct interpretation as the [[potential energy]] of a [[Configuration space (physics)\|configuration]]~~ ~~ <math>x_C</math>. Some MRF's do not factorize: a simple example can be constructed on a cycle of 4 nodes with some infinite energies, i.e. configurations of zero probabilities,<ref>{{cite journal Line 56 ⟶ 70: \|title=Gibbs and Markov random systems with constraints \|journal=Journal of Statistical Physics \|volume=10 \|issue=1 \|pages=~~11–33~~11–33 \|year=1974 \|doi=10.1007/BF01011714 \|mr=0432132 \|hdl=10338.dmlcz/135184 }}</ref> even if one, more appropriately, allows the infinite energies to act on the complete graph on <math>V</math>.<ref>{{cite journal▼ \|bibcode=1974JSP....10...11M \|s2cid=121299906 ▲ \|hdl-access=free}}</ref> even if one, more appropriately, allows the infinite energies to act on the complete graph on <math>V</math>.<ref>{{cite journal \| last1 = Gandolfi \| first1 = Alberto \| last2 = Lenarda \| first2 = Pietro \|title= A note on Gibbs and Markov Random Fields with constraints and their moments \|journal=Mathematics and Mechanics of Complex Systems \|volume=4 \|issue=~~3-4~~3–4 \|pages=~~407–422~~407–422 \|year=2016 \| doi=10.2140/memocs.2016.4.407 \|doi-access=free }}</ref> ~~\|doi=10.2140/memocs.2017..101~~ ~~}}</ref>~~ MRF's factorize if at least one of the following conditions is fulfilled: * the density is strictly positive (by the [[Hammersley–Clifford theorem]]) * the graph is [[Chordal graph\|chordal]] (by equivalence to a [[Bayesian network]]) Line 74 ⟶ 90: == Exponential family == Any positive Markov random field can be written as exponential family in canonical form with feature functions <math>f_k</math> such that the full-joint distribution can be written as :<math> P(X=x) = \frac{1}{Z} \exp \left( \sum_{k} w_k^{\top} f_k (x_{ \{ k \}}) \right)</math> Line 83 ⟶ 99: :<math> Z = \sum_{x \in \mathcal{X}} \exp \left(\sum_{k} w_k^{\top} f_k(x_{ \{ k \} })\right).</math> Here, <math>\mathcal{X}</math> denotes the set of all possible assignments of values to all the network's random variables. Usually, the feature functions <math>f_{k,i}</math> are defined such that they are [[indicator function\|indicators]] of the clique's configuration, ''i.e.'' <math>f_{k,i}(x_{\{k\}}) = 1</math> if <math>x_{\{k\}}</math> corresponds to the ''i''-th possible configuration of the ''k''-th clique and 0 otherwise. This model is equivalent to the clique factorization model given above, if <math>N_k=\|\operatorname{dom}(C_k)\|</math> is the cardinality of the clique, and the weight of a feature <math>f_{k,i}</math> corresponds to the logarithm of the corresponding clique factor, ''i.e.'' <math>w_{k,i} = \log \~~phi~~varphi(c_{k,i})</math>, where <math>c_{k,i}</math> is the ''i''-th possible configuration of the ''k''-th clique, ''i.e.'' the ''i''-th value in the ___domain of the clique <math>C_k</math>. The probability ''P'' is often called the Gibbs measure. This expression of a Markov field as a logistic model is only possible if all clique factors are non-zero, ''i.e.'' if none of the elements of <math>\mathcal{X}</math> are assigned a probability of 0. This allows techniques from matrix algebra to be applied, ''e.g.'' that the [[trace (linear algebra)\|trace]] of a matrix is log of the [[determinant]], with the matrix representation of a graph arising from the graph's [[incidence matrix]]. Line 97 ⟶ 113: [[Correlation function]]s are computed likewise; the two-point correlation is: :<math>C[X_u, X_v] = \frac{1}{Z} \left.\frac{\partial^2 Z[J]}{\partial J_u \,\partial J_v}\right\|_{J_u=0, J_v=0}.</math> Unfortunately, though the likelihood of a logistic Markov network is convex, evaluating the likelihood or gradient of the likelihood of a model requires inference in the model, which is generally computationally infeasible (see [[~~Markov random field~~#Inference\|'Inference']] below). == Examples == Line 113 ⟶ 129: \|title=Gaussian Markov random fields: theory and applications \|publisher=CRC Press \|year=2005 \|isbn=978-1-58488-432-03 }}</ref> == Inference == As in a [[Bayesian network]], one may calculate the [[conditional distribution]] of a set of nodes <math> V' = \{ v_1 ,\ldots, v_i \} </math> given values to another set of nodes <math> W' = \{ w_1 ,\ldots, w_j \} </math> in the Markov random field by summing over all possible assignments to <math>u \notin V',W'</math>; this is called [[exact inference]]. However, exact inference is a [[Sharp-P-complete\|#P-complete]] problem, and thus computationally intractable in the general case. Approximation techniques such as [[Markov chain Monte Carlo]] and loopy [[belief propagation]] are often more feasible in practice. Some particular subclasses of MRFs, such as trees (see [[Chow–Liu tree]]), have polynomial-time inference algorithms; discovering such subclasses is an active research topic. There are also subclasses of MRFs that permit efficient [[Maximum a posteriori\|MAP]], or most likely assignment, inference; examples of these include associative networks.<ref>{{citation \| last1 = Taskar \| first1 = Benjamin \| last2 = Chatalbashev \| first2 = Vassil \| last3 = Koller \| first3 = Daphne \| author3-link = Daphne Koller \| editor-last = Brodley \| editor-first = Carla E. \| editor-link = Carla Brodley \| contribution = Learning associative Markov networks \| doi = 10.1145/1015330.1015444 \| publisher = [[Association for Computing Machinery]] \| series = ACM International Conference Proceeding Series \| title = Proceedings of the Twenty-~~first~~First [[International Conference on Machine Learning]] (ICML 2004), Banff, Alberta, Canada, July 4-8, 2004 \| volume = 69 \| pages = 102 \| year = 2004}}.</ref><ref>{{citation▼ \| year = 2004\| title-link = International Conference on Machine Learning \| isbn = 978-1581138283 \| citeseerx = 10.1.1.157.329 \| s2cid = 11312524 ▲ ~~\| year = 2004~~}}.</ref><ref>{{citation \| last1 = Duchi \| first1 = John C. \| last2 = Tarlow \| first2 = Daniel Line 142 ⟶ 163: \| series = [[Conference on Neural Information Processing Systems\|Advances in Neural Information Processing Systems]] \| volume = 19 \| title = Proceedings of the Twentieth Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, December 4-7, 2006 \| year = 2006}}.</ref> Another interesting sub-class is the one of decomposable models (when the graph is [[Chordal graph\|chordal]]): having a closed-form for the [[Maximum likelihood estimate\|MLE]], it is possible to discover a consistent structure for hundreds of variables.<ref name="Petitjean">{{cite conference ~~\|url=http://www.tiny-clues.eu/Research/Petitjean2013-ICDM.pdf \|title= Scaling log-linear analysis to high-dimensional data~~ \|last1=Petitjean \|first1=F. \|last2=Webb \|first2=G.I. \|last3=Nicholson \|first3=A.E.\|author-link3=Ann Nicholson\|year=2013 \|~~publisher~~title=~~IEEE~~Scaling log-linear analysis to high-dimensional data\|url=http://www.tiny-clues.eu/Research/Petitjean2013-ICDM.pdf\|conference=International Conference on Data Mining \|___location=Dallas, TX, USA \|publisher=IEEE}}</ref> == Conditional random fields == {{Main\|Conditional random field}} One notable variant of a Markov random field is a ~~'''[[~~conditional random field~~]]'''~~, in which each random variable may also be conditioned upon a set of global observations <math>o</math>. In this model, each function <math>\~~phi_k~~varphi_k</math> is a mapping from all assignments to both the [[Clique (graph theory)\|clique]] ''k'' and the observations <math>o</math> to the nonnegative real numbers. This form of the Markov network may be more appropriate for producing [[discriminative model\|discriminative classifiers]], which do not model the distribution over the observations. CRFs were proposed by [[John D. Lafferty]], [[Andrew McCallum]] and [[Fernando C.N. Pereira]] in 2001.<ref name=ICML03classic>{{cite web \|url=http://icml.cc/2013/?page_id=21 \|title=Two classic paper prizes for papers that appeared at ICML 2013 ~~\|last1= \|first1= \|last2= \|first2=~~ \|date=2013 \|website=ICML \|~~publisher= \|accessdate~~access-date=15 December 2014}}</ref> == Varied applications == Markov random fields find application in a variety of fields, ranging from [[computer graphics (computer science)\|computer graphics]] to computer vision,<ref>{{Cite book \|last1=Banf \|first1=Michael \|last2=Blanz \|first2=Volker \|chapter=Man made structure detection and verification of object recognition in images for the visually impaired \|date=2013-06-06 \|title=Proceedings of the 6th International Conference on Computer Vision / Computer Graphics Collaboration Techniques and Applications \|chapter-url=https://dl.acm.org/doi/10.1145/2466715.2466732 \|series=MIRAGE '13 \|___location=New York, NY, USA \|publisher=Association for Computing Machinery \|pages=1–8 \|doi=10.1145/2466715.2466732 \|isbn=978-1-4503-2023-8}}</ref> [[machine learning]]. or [[computational biology]],<ref name="Kindermann-Snell80"/><ref>{{Cite ~~book~~journal\|~~title~~last1=Banf\|first1=Michael\|last2=Rhee\|first2=Seung Y.\|date=2017-02-01\|title=Enhancing ~~Markov~~gene ~~Random~~regulatory ~~Fields~~network ~~and~~inference ~~their~~through ~~Applications~~data integration with markov random fields\|~~last~~journal=Scientific Reports\|volume=7\|issue=1\|pages=41174\|doi=10.1038/srep41174\|pmid=28145456\|pmc=5286517\|issn=2045-2322\|bibcode=2017NatSR...741174B}}</ref> ~~Kindermann~~and &[[information retrieval]].<ref>{{Cite conference ~~Snell~~\|~~first~~ first1= ~~Ross~~Donald &\| last1 = Metzler ~~Laurie~~\|~~publisher~~ first2 = ~~American~~W.Bruce ~~Mathematical~~\|last2=Croft\| title=A Markov random field model for term dependencies ~~Society~~\| year = ~~1980~~2005 \|~~isbn~~ conference = ~~0-8218-5001-6~~Proceedings of the 28th ACM SIGIR Conference\|~~___location~~ pages = ~~Rhode~~472–479 ~~Island~~\|~~pages~~ publisher=ACM \| ___location= Salvador, Brazil \| doi=10.1145/1076034.1076115}}</ref> MRFs are used in image processing to generate textures as they can be used to generate flexible and stochastic image models. In image modelling, the task is to find a suitable intensity distribution of a given image, where suitability depends on the kind of task and MRFs are flexible enough to be used for image and texture synthesis, [[image compression]] and restoration, [[image segmentation]], ~~surface~~3D image inference from 2D ~~reconstruction~~images, [[image registration]], [[texture synthesis]], [[super-resolution]], [[Computer stereo vision\|stereo matching]] and [[information retrieval]]. They can be used to solve various computer vision problems which can be posed as energy minimization problems or problems where different regions have to be distinguished using a set of discriminating features, within a Markov random field framework, to predict the category of the region.<ref>{{Cite journal~~\|url =~~ \|title = Automatic Identification of Window Regions on Indoor Point Clouds Using LiDAR and Cameras\|last = Zhang & Zakhor\|first = Richard & Avideh\|date = 2014\|journal = VIP Lab Publications\|~~doi = \|pmid = \|access-date~~citeseerx = 10.1.1.649.303}}</ref> Markov random fields were a generalization over the Ising model and have, since then, been used widely in combinatorial optimizations and networks. == See also == {{Div col\|colwidth=20em}} * [[Constraint ~~Composite~~composite ~~Graph~~graph]] * [[Graphical model]] * [[Dependency network (graphical model)]] * [[Hammersley–Clifford theorem]] * [[Hopfield network]] Line 162 ⟶ 185: * [[Markov logic network]] * [[Maximum entropy method]] * [[Stochastic cellular ~~automata\|Probabilistic cellular automata~~automaton]] {{Div col end}} ==References== {{reflist}} ~~==External links==~~ * [https://bitbucket.org/rukletsov/b MRF implementation in C++ for regular 2D lattices] {{Stochastic processes}}