Maximal entropy random walk: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 03:29, 10 April 2025 edit Sesquilinear (talk \| contribs) 194 edits →Weighted MERW: Boltzmann path ensemble: Improved flow of grammar. Tags: Mobile edit Mobile web edit Advanced mobile edit ← Previous edit		Latest revision as of 01:37, 31 May 2025 edit undo 174.138.213.2 (talk) {{mcn}} Tag: Visual edit
(3 intermediate revisions by the same user not shown)
Line 1: {{Use dmy dates\|date=October 2017}} '''Maximal entropy random walk''' ('''MERW''') is a popular type of [[biased random walk on a graph]], in which transition probabilities are chosen accordingly to the [[principle of maximum entropy]], which says that the [[probability distribution]] which best represents the current state of knowledge is the one with largest entropy. While standard [[random walk]] chooses for every vertex uniform probability distribution among its outgoing edges, locally maximizing [[entropy rate]], MERW maximizes it globally (average [[entropy production]]) by assuming uniform probability distribution among all paths in a given graph.▼ {{Mcn\|date=May 2025}} ▲A '''~~Maximal~~maximal entropy random walk''' ('''MERW''') is a popular type of [[biased random walk on a graph]], in which transition probabilities are chosen accordingly to the [[principle of maximum entropy]], which says that the [[probability distribution]] which best represents the current state of knowledge is the one with largest entropy. While a standard [[random walk]] ~~chooses~~samples for every vertex a uniform probability distribution ~~among its~~of outgoing edges, locally maximizing [[entropy rate]], MERW maximizes it globally (average [[entropy production]]) by ~~assuming~~sampling a uniform probability distribution among all paths in a given graph. MERW is used in various fields of science. A direct application is choosing probabilities to maximize transmission rate through a constrained channel, analogously to [[Fibonacci coding]]. Its properties also made it useful for example in analysis of complex networks,<ref name="SinatraGómez-Gardeñes2011">{{cite journal\|last1=Sinatra\|first1=Roberta\|last2=Gómez-Gardeñes\|first2=Jesús\|last3=Lambiotte\|first3=Renaud\|last4=Nicosia\|first4=Vincenzo\|last5=Latora\|first5=Vito\|title=Maximal-entropy random walks in complex networks with limited information\|journal=Physical Review E\|volume=83\|issue=3\|pages=030103\|year=2011\|issn=1539-3755\|doi=10.1103/PhysRevE.83.030103\|pmid=21517435\|url=http://www.robertasinatra.com/pdf/sinatra_maximal_entropy.pdf\|bibcode=2011PhRvE..83c0103S\|arxiv=1007.4936\|s2cid=6984660 }}</ref> like [[link prediction]],<ref name="LiYu2011">{{cite conference\|last1=Li\|first1=Rong-Hua\|last2=Yu\|first2=Jeffrey Xu\|last3=Liu\|first3=Jianquan\|title=Link prediction: the power of maximal entropy random walk\|year=2011\|pages=1147\|doi=10.1145/2063576.2063741\|s2cid=15309519 \|url=https://pdfs.semanticscholar.org/f185/bc2499be95b4312169a7e722bac570c2d509.pdf\|archive-url=https://web.archive.org/web/20170212090812/https://pdfs.semanticscholar.org/f185/bc2499be95b4312169a7e722bac570c2d509.pdf\|url-status=dead\|archive-date=2017-02-12\|conference=Association for Computing Machinery Conference on Information and Knowledge Management\|conference-url=http://www.cikm2011.org/}}</ref> community detection,<ref name="OchabBurda2013">{{cite journal\|last1=Ochab\|first1=J.K.\|last2=Burda\|first2=Z.\|title=Maximal entropy random walk in community detection\|journal=The European Physical Journal Special Topics\|volume=216\|issue=1\|year=2013\|pages=73–81\|issn=1951-6355\|doi=10.1140/epjst/e2013-01730-6\|arxiv=1208.3688\|bibcode=2013EPJST.216...73O\|s2cid=56409069 }}</ref> robust transport over networks<ref name="CGPT2016">{{cite journal\|last1=Chen\|first1=Y.\|last2=Georgiou\|first2=T.T.\|last3=Pavon\|first3=M.\|last4=Tannenbaum\|first4=A.\|title=Robust transport over networks\|journal=IEEE Transactions on Automatic Control\|volume=62\|issue=9\|year=2016\|pages=4675–4682\|doi=10.1109/TAC.2016.2626796\|pmid=28924302\|pmc=5600536\|arxiv=1603.08129\|bibcode=2016arXiv160308129C}}</ref> and [[centrality]] measures.<ref name="DelvenneLibert2011">{{cite journal\|last1=Delvenne\|first1=Jean-Charles\|last2=Libert\|first2=Anne-Sophie\|title=Centrality measures and thermodynamic formalism for complex networks\|journal=Physical Review E\|volume=83\|issue=4\|pages=046117\|year=2011\|issn=1539-3755\|doi=10.1103/PhysRevE.83.046117\|pmid=21599250\|arxiv=0710.3972\|bibcode=2011PhRvE..83d6117D\|s2cid=25816198 }}</ref> ~~Also~~It is also used in [[image analysis]], for example for detecting visual saliency regions,<ref name="saliency">{{cite journal \| title=Maximal Entropy Random Walk for Region-Based Visual Saliency \| journal=IEEE Transactions on Cybernetics \| publisher=Institute of Electrical and Electronics Engineers (IEEE) \| volume=44 \| issue=9 \| year=2014 \| issn=2168-2267 \| doi=10.1109/tcyb.2013.2292054 \| pmid=25137693 \| pages=1661–1672\| last1=Jin-Gang Yu \| last2=Ji Zhao \| last3=Jinwen Tian \| last4=Yihua Tan \| s2cid=20962642 }}</ref> object localization,<ref name="local">[https://ieeexplore.ieee.org/abstract/document/6678551/ L. Wang, J. Zhao, X. Hu, J. Lu, ''Weakly supervised object localization via maximal entropy random walk''], ICIP, 2014.</ref> tampering detection<ref name="tamp">{{cite journal \| last1=Korus \| first1=Pawel \| last2=Huang \| first2=Jiwu \| title=Improved Tampering Localization in Digital Image Forensics Based on Maximal Entropy Random Walk \| journal=IEEE Signal Processing Letters \| publisher=Institute of Electrical and Electronics Engineers (IEEE) \| volume=23 \| issue=1 \| year=2016 \| issn=1070-9908 \| doi=10.1109/lsp.2015.2507598 \| pages=169–173\| bibcode=2016ISPL...23..169K \| s2cid=16305991 }}</ref> or [[tractography]] problem.<ref name="trac">{{cite journal \| last1=Galinsky \| first1=Vitaly L. \| last2=Frank \| first2=Lawrence R. \| title=Simultaneous Multi-Scale Diffusion Estimation and Tractography Guided by Entropy Spectrum Pathways \| journal=IEEE Transactions on Medical Imaging \| publisher=Institute of Electrical and Electronics Engineers (IEEE) \| volume=34 \| issue=5 \| year=2015 \| issn=0278-0062 \| doi=10.1109/tmi.2014.2380812 \| pmid=25532167 \| pmc=4417445 \| pages=1177–1193}}</ref> Additionally, it recreates some properties of [[quantum mechanics]], suggesting a way to repair the discrepancy between [[diffusion]] models and quantum predictions, like [[Anderson localization]].<ref name=prl>{{cite journal \| last1=Burda \| first1=Z. \| last2=Duda \| first2=J. \| last3=Luck \| first3=J. M. \| last4=Waclaw \| first4=B. \| title=Localization of the Maximal Entropy Random Walk \| journal=Physical Review Letters \| volume=102 \| issue=16 \| date=2009-04-23 \| issn=0031-9007 \| doi=10.1103/physrevlett.102.160602 \| pmid=19518691 \| page=160602\| bibcode=2009PhRvL.102p0602B \| arxiv=0810.4113 \| s2cid=32134048 }}</ref> Line 10 ⟶ 13: [[Image:General picture for Maximal Entropy Random Walk.png\|thumb\|upright=2.65\|right\|'''Left:''' basic concept of the generic random walk (GRW) and maximal entropy random walk (MERW)<br>'''Right:''' example of their evolution on the same inhomogeneous 2D lattice with cyclic boundary conditions – probability density after 10, 100 and 1000 steps while starting from the same vertex. The small boxes represent defects: all vertices but the marked ones have additional [[self-loop]] (edge to itself). For regular lattices (no defects), GRW and MERW are identical. While defects do not strongly affect the local behavior, they lead to a completely different global stationary probability here. While GRW (and based on it standard [[diffusion]]) leads to nearly uniform stationary density, MERW has strong localization property, imprisoning the walkers in entropic wells in analogy to electrons in defected lattice of [[semi-conductor]].]] Consider a [[Graph (discrete mathematics)\|graph]] with <math>n</math> vertices, defined by an [[adjacency matrix]] <math>A \in \left\{0, 1\right\}^{n \times n}</math>: <math>A_{ij}=1</math> if there is an edge from vertex <math>i</math> to <math>j</math>, 0 otherwise. For simplicity, assume it is an [[Graph (discrete mathematics)#Undirected graph\|undirected graph]], which corresponds to a symmetric <math>A</math>; however, ~~MERW~~MERWs can also be generalized for directed and [[Graph (discrete mathematics)#Weighted graph\|weighted graphs]] (for example [[Boltzmann distribution]] among paths instead of uniform). We would like to choose a random walk as a [[Markov process]] on this graph: for every vertex <math>i</math> and its outgoing edge to <math>j</math>, choose probability <math>S_{ij}</math> of the walker randomly using this edge after visiting <math>i</math>. Formally, find a [[stochastic matrix]] <math>S</math> (containing the transition probabilities of a Markov chain) such that Line 29 ⟶ 32: It locally maximizes entropy production (uncertainty) for every vertex, but usually leads to a suboptimal averaged global entropy rate <math>H(S)</math>. MERW chooses the stochastic matrix which maximizes <math>H(S)</math>, or equivalently assumes uniform probability distribution among all paths in a given graph. Its formula is obtained by first calculating the dominant [[eigenvalue]] <math>\lambda</math> and corresponding [[eigenvector]] <math>\psi</math> of the adjacency matrix, i.e. the largest <math>\lambda \in \mathbb{R}</math> with corresponding <math>\psi \in \mathbb{R}^n</math> such that <math>\psi A = \lambda \psi</math>. Then the stochastic matrix and stationary probability distribution are given by :<math>S_{ij} = \frac{A_{ij}}{\lambda} \frac{\psi_j}{\psi_i}</math> for which every possible path of length <math>l</math> from the <math>i</math>-th to <math>j</math>-th vertex has probability Line 36 ⟶ 39: :<math>\rho_i = \frac{\psi_i^2}{\\|\psi\\|_2^2}</math>. In contrast to GRW, the MERW transition probabilities generally depend on the structure of the entire graph, ~~(are~~making it nonlocal). Hence, they should not be imagined as directly applied by the walker – {{Snd}}if random-looking decisions are made based on the local situation, like a person would make, the GRW approach is more appropriate. MERW is based on the principle of maximum entropy, making it the safest assumption when we ~~don't~~do not have any additional knowledge about the system. For example, it would be appropriate for modelling our knowledge about an object performing some complex dynamics – {{Snd}}not necessarily random, like a particle. === Sketch of derivation === Assume for simplicity that the considered graph is ~~indirected~~undirected, connected and aperiodic, allowing to conclude from the [[Perron–Frobenius theorem]] that the dominant eigenvector is unique. Hence <math>A^l</math> can be asymptotically (<math>l\rightarrow\infty</math>) approximated by <math>\lambda^l \psi \psi^T</math> (or <math>\lambda^l \|\psi\rangle \langle \psi\|</math> in [[bra–ket notation]]). MERW requires a uniform distribution along paths. The number <math>m_{il}</math> of paths with length <math>2l</math> and vertex <math>i</math> in the center is :<math>m_{il} = \sum_{j=1}^n \sum_{k=1}^n \left(A^l\right)_{ji} \left(A^l\right)_{ik} \approx \sum_{j=1}^n \sum_{k=1}^n \left(\lambda^l \psi \psi^\top\right)_{ji} \left(\lambda^l \psi \psi^\top\right)_{ik} = \sum_{j=1}^n \sum_{k=1}^n \lambda^{2l} \psi_j \psi_i \psi_i \psi_k = \lambda^{2l} \psi_i^2 \underbrace{\sum_{j=1}^n \psi_j \sum_{k=1}^n \psi_k}_{=: b}</math>, hence for all <math>i</math>, Line 57 ⟶ 60: == Examples == [[Image:Examples of using MERW, Fibonacci coding(left) and 1D defected lattice (right).png\|thumb\|320px\|right\|Left: choosing the optimal probability after symbol 0 in [[Fibonacci coding]]. Right: a one-dimensional defected lattice and its stationary density for ~~length~~a length-1000 cycle (it has three defects). ~~While~~In ina standard random walk, the stationary density is proportional to degree of a vertex, leading to 3/2 difference here; however, in MERW, the density is nearly completely localized in the largest defect-free region, analogous to the [[ground state]] predicted by [[quantum mechanics]].]] Let us first look at a simple nontrivial situation: Fibonacci coding, where we want to transmit a message as a sequence of 0s and 1s, but not using two successive 1s: after a 1 there has to be a 0. To maximize the amount of information transmitted in such sequence, we should assume a uniform probability distribution in the space of all possible sequences fulfilling this constraint. Let us first look at a simple nontrivial situation: Fibonacci coding, where we want to transmit a message as a sequence of 0s and 1s, but not using two successive 1s: after a 1 there has to be a 0. To maximize the amount of information transmitted in such sequence, we should assume uniform probability distribution in the space of all possible sequences fulfilling this constraint. To practically use such long sequences, after 1 we have to use 0, but there remains athe freedom of choosing the probability of 0 after 0. Let us denote this probability by <math>q</math>~~, then~~. [[~~entropy~~Entropy coding]] ~~would allow~~allows encoding a message using this chosen probability distribution. The stationary probability distribution of symbols for a given <math>q</math> turns out to be <math>\rho=(1/(2-q),1-1/(2-q)) </math>. Hence, entropy ~~production~~produced is <math>H(S)=\rho_0 \left(q\log(1/q)+(1-q)\log(1/(1-q))\right)</math>, which is maximized for <math>q=(\sqrt{5}-1)/2\approx 0.618</math>, known as the [[golden ratio]]. In contrast, a standard random walk would choose the suboptimal <math>q=0.5</math>. While choosing a larger <math>q</math> reduces the amount of information produced after 0, it also reduces the frequency of 1, after which we cannot write any information. A more complex ~~example~~case is the defected one-dimensional cyclic lattice:, ~~let~~for ~~say~~example, ~~1000~~a ~~nodes~~ring ~~connected~~with in1000 aconnected ~~ring~~nodes, for which all nodes but the defects have a self-loop (edge to itself). In a standard random walk (GRW), the stationary probability distribution would have the defect probability ~~being~~be 2/3 of probability of the non-defect vertices – {{Snd}}there is nearly no localization, also analogously for standard diffusion, which is the infinitesimal limit of a GRW. For a MERW, we have to first find the dominant eigenvector of the adjacency matrix – maximizing <math>\lambda</math> in: <math>(\lambda \psi)_x = (A\psi)_x = \psi_{x-1}+ (1-V_x) \psi_x +\psi_{x+1}</math> Line 69 ⟶ 74: <math>E \psi_x =-(\psi_{x-1} -2 \psi_x +\psi_{x+1}) + V_x \psi_x</math> where <math>E = 3-\lambda</math> is minimized now, becoming the analog of energy. The formula inside the bracket is [[discrete Laplace operator]], making this equation a discrete analogue of the stationary [[~~Schrodinger~~Schrödinger equation]]. As in quantum mechanics, ~~MERW~~MERWs ~~predicts~~predict that the probability distribution ~~should~~is ~~lead~~that ~~exactly to~~of the ~~one of~~ quantum [[ground state]]: <math>\rho_x \propto \psi_x^2</math> with its strongly localized density (in contrast to standard diffusion). Taking the [[infinitesimal]] limit, we can get the standard continuous stationary (time-independent) ~~Schrodinger~~Schrödinger equation (<math>E\psi=-C\psi_{xx}+V\psi</math> for <math>C=\hbar^2/2m</math>) here.<ref name="ext">[http://www.fais.uj.edu.pl/documents/41628/d63bc0b7-cb71-4eba-8a5a-d974256fd065 J. Duda, ''Extended Maximal Entropy Random Walk''], PhD Thesis, 2012.</ref> ==See also==