Maximal entropy random walk: Difference between revisions

Content deleted Content added
Script-assisted fixes: per MOS:NUM, MOS:CAPS, MOS:LINK
Line 1:
{{Use dmy dates|date=October 2017}}
'''Maximal Entropyentropy Randomrandom Walkwalk''' ('''MERW''') is a popular type of [[biased random walk on a graph]], in which transition probabilities are chosen accordingly to the [[principle of maximum entropy]], which says that the probability distribution which best represents the current state of knowledge is the one with largest entropy. While standard [[random walk]] chooses for every vertex uniform probability distribution among its outgoing edges, locally maximizing [[entropy rate]], MERW maximizes it globally (average entropy production) by assuming uniform probability distribution among all paths in a given graph.
 
MERW is used in various fields of science. A direct application is choosing probabilities to maximize transmission rate through a constrained channel, analogously to [[Fibonacci coding]]. Its properties also made it useful for example in analysis of complex networks,<ref name=Sinatra>[http://www.robertasinatra.com/pdf/sinatra_maximal_entropy.pdf R. Sinatra, J. Gómez-Gardenes, R. Lambiotte, V. Nicosia, ''Maximal-entropy random walks in complex networks with limited information''], Phys. Rev. E, 2011.</ref> like link prediction,<ref name=link>[https://pdfs.semanticscholar.org/f185/bc2499be95b4312169a7e722bac570c2d509.pdf R.H. Li, J.X. Yu, J. Liu, ''Link Prediction: the Power of Maximal Entropy Random Walk''], CIKM '11, 2011.</ref> community detection<ref name=community>[https://link.springer.com/article/10.1140/epjst/e2013-01730-6 J. Ochab, Z. Burda, ''Maximal entropy random walk in community detection''], Z. Eur. Phys. J., 2013.</ref> and [[centrality]] measures.<ref name=central>[https://journals.aps.org/pre/abstract/10.1103/PhysRevE.83.046117 J.C. Delvenne, A.S. Libert, ''Centrality measures and thermodynamic formalism for complex networks''], Phys. Rev. E, 2011.</ref> Also in [[image analysis]], for example for detecting visual saliency regions,<ref name=saliency>[http://ieeexplore.ieee.org/abstract/document/6678551/ J.G. Yu, J. Zhao, J. Tian, Y. Tan, ''Maximal entropy random walk for region-based visual saliency''], IEEE Transactions on Cybernetics, 2014.</ref> object localization,<ref name=local>[http://ieeexplore.ieee.org/abstract/document/6678551/ L. Wang, J. Zhao, X. Hu, J. Lu, ''Weakly supervised object localization via maximal entropy random walk''], ICIP, 2014.</ref> tampering detection <ref name=tamp>[http://ieeexplore.ieee.org/abstract/document/7353154/ P. Korus, J. Huang, ''Improved Tampering Localization in Digital Image Forensics Based on Maximal Entropy Random Walk''], IEEE Signal Processing Letters, 2016.</ref> or [[tractography]] problem.<ref name=trac>[http://ieeexplore.ieee.org/abstract/document/6985622/ V.L. Galinsky, L.R. Frank, ''Simultaneous multi-scale diffusion estimation and tractography guided by entropy spectrum pathways''], IEEE Transactions on Medical Imaging, 2015.</ref>
 
Additionally, it recreates some properties of [[quantum mechanics]], suggesting a way to repair the discrepancy between [[diffusion]] models and quantum predictions, like [[Anderson localization]].<ref name=prl>[http://journals.aps.org/prl/abstract/10.1103/PhysRevLett.102.160602 Z. Burda, J. Duda, J. M. Luck, and B. Waclaw, ''Localization of the Maximal Entropy Random Walk''], Phys. Rev. Lett., 2009.</ref>
 
== Basic model ==
[[Image:General picture for Maximal Entropy Random Walk.png|thumb|480px|right|Left: the basic concept of the standard random walk (GRW) and Maximalmaximal Entropyentropy Randomrandom Walkwalk (MERW). Right: example of their evolution on the same inhomogeneous 2D lattice with cyclic boundary conditions - probability density after 10, 100 and 1000 steps while starting from the same vertex. The small boxes represent defects: all vertices but the marked ones have additional [[self-loop]] (edge to itself). For a regular lattice (no defects), GRW and MERW are identical. While defects do not strongly affect the local behavior, they lead to a completely different global stationary probability here. While GRW (and based on it standard [[diffusion]]) leads to nearly uniform stationary density, MERW has strong localization property, imprisoning the walkers in entropic wells in analogy to electrons in defected lattice of [[semi-conductor]].]]
 
Imagine there is a [[Graph (discrete mathematics)|graph]] given by [[adjacency matrix]]: <math>A_{ij}=1</math> if there is an edge from vertex <math>i</math> to <math>j</math>, 0 otherwise. For simplicity assume it is an [[Graph (discrete mathematics)#Undirected graph|undirected graph]], what corresponds to symmetric <math>A</math>, however, MERW can be also generalized for directed and [[Graph (discrete mathematics)#Weighted graph|weighted graphs]] (getting [[Boltzmann distribution]] among paths instead of uniform).
Line 18 ⟶ 19:
This definition turns out to be equivalent with asymptotic average entropy (per length) of the probability distribution in the space of paths for this stochastic process.
 
In the standard random walk, referred here as Genericgeneric Randomrandom Walkwalk (GRW), we naturally choose that each outgoing edge is equally probable: <math>S_{ij} = A_{ij}/\sum_k A_{ik}</math>. For a symmetric <math>A</math> it leads to <math>\rho_i \propto \sum_j A_{ij}</math> stationary probability distribution. It locally maximizes entropy production (uncertainty) for every vertex, but usually leads to a suboptimal averaged global entropy rate <math>H(S)</math>.
 
MERW chooses the stochastic matrix which maximizes <math>H(S)</math>, or equivalently assumes uniform probability distribution among all paths in a given graph. Its formula is obtained by first calculating the dominant [[eigenvalue]] and corresponding [[eigenvector]] of the adjacency matrix: <math>\max_{|\lambda|}\ \psi A = \lambda \psi</math>. Then stochastic matrix and stationary probability distribution are given by:
Line 26 ⟶ 27:
for which every length <math>l</math> possible path from <math>i</math> to <math>j</math> has probability <math>\frac{1}{\lambda^l} \frac{\psi_j}{\psi_i}</math>. Its entropy rate is <math> \log(\lambda) </math>.
 
In contrast to GRW, the MERW transition probabilities generally depend on situation of the entire graph (are nonlocal). Hence, they rather should not be imagined as directly applied by the walker - if heshe performs randomly looking decisions based on local situation, like a person, the GRW approach is rather more appropriate. MERW is based on the [[principle of maximum entropy]], making it the safest assumption when we don't have any additional knowledge about the system. For example to model our knowledge about an object performing some complex dynamics - not necessarily random, like a particle.
 
=== Sketch of derivation ===
Line 36 ⟶ 37:
 
== Examples ==
[[Image:Examples of using MERW, Fibonacci coding(left) and 1D defected lattice (right).png|thumb|320px|right|Left: choosing the optimal probability after symbol 0 in [[Fibonacci coding]]. Right: one-dimensional defected lattice and its stationary density for length 1000 cycle (it has 3three defects). While in standard random walk the stationary density is proportional to degree of a vertex, leading to 3/2 difference here, in MERW density is nearly completely localized in the largest defect-free region, analogous to the [[ground state]] predicted by [[quantum mechanics]].]]
 
Let us first look at probably the simplest nontrivial situation: [[Fibonacci coding]], where we want to transmit a message as a sequence of 0/1, but not using two successive 1s - after 1 there has to be used 0. To maximize the amount of information transmitted in such sequence, we should assume uniform probability distribution in the space of all possible sequences fulfilling this constraint. To practically use such long sequences, after 1 we have to use 0, but there remains a freedom of choosing the probability of 0 after 0. Let us denote this probability by <math>q</math>, then [[entropy coding]] would allow to encode a message using this chosen probability distribution. The stationary probability distribution of symbols for a given <math>q</math> turns out to be <math>\rho=(1/(2-q),1-1/(2-q)) </math>. Hence, entropy production is <math>H(S)=\rho_0 \left(q\log(1/q)+(1-q)\log(1/(1-q))\right)</math>, which is maximized for <math>q=(\sqrt{5}-1)/2\approx 0.618</math>, known from [[golden ratio]]. In contrast, standard random walk would choose suboptimal <math>q=0.5</math>. While choosing larger <math>q</math> reduces the amount of information produced after 0, it also reduces frequency of 1, after which we cannot write any information.
 
A more complex example is defected one-dimensional cyclic lattice: let say 1000 nodes connected in a ring, for which all nodes but the defects have self-loop (edge to itself). In standard random walk (GRW) the stationary probability distribution would have defect probability being 2/3 of probability of the remaining vertices - there is nearly no localization, also analogously for standard [[diffusion]], which is infinitesimal limit of GRW. For MERW we have to first find the dominant eigenvector of the adjacency matrix - maximizing <math>\lambda</math> in:
 
<math>(\lambda \psi)_x = (A\psi)_x = \psi_{x-1}+ (1-V_x) \psi_x +\psi_{x+1}</math>
 
for all positions <math>x</math>, where <math>V_x=1</math> for defects, 0 otherwise. Substituting <math>3\psi_x</math> and multiplying the equation by -1−1 we get:
 
<math>E \psi_x =-(\psi_{x-1} -2 \psi_x +\psi_{x+1}) + V_x \psi_x</math>