Data processing inequality: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 13:08, 27 May 2016 edit 31.210.187.117 (talk) →Example ← Previous edit		Latest revision as of 16:21, 22 August 2024 edit undo MichaelMaggs (talk \| contribs) Autopatrolled, Extended confirmed users, File movers, Pending changes reviewers, Rollbackers 46,522 edits Adding short description: "Concept in information processing" Tag: Shortdesc helper
(24 intermediate revisions by 17 users not shown)
Line 1: {{Short description\|Concept in information processing}} The '''~~Data~~data processing inequality''' is an [[information theory\|information theoretic]] concept ~~which~~that states that the information content of a signal cannot be increased via a local physical operation. This can be expressed concisely as 'post-processing cannot increase information'.<ref name= BeaudryArxiv>{{~~cite~~citation \|journal=Quantum Information & Computation \|volume=12 \|issue=~~5-6~~5–6 \|pages=~~432-441~~432–441 \|last1=Beaudry \|first1=Normand \|title=An intuitive proof of the data processing inequality \|date=2012 \|arxiv=1107.0740}}</ref> As explained by ''Kinney and Atwal'', the DPI means that information is generally lost (never gained) when transmitted through a noisy channel.<ref>{{cite journal\|pmid=24550517 \| doi=10.~~1073~~26421/~~pnas~~QIC12.~~1309933111~~5-6-4 \| ~~volume~~arxiv=~~111~~ 1107.0740\| ~~title~~bibcode=~~Equitability, mutual information, and the maximal information coefficient~~2011arXiv1107.0740B \| ~~date~~s2cid=~~Mar~~9531510 ~~2014 \| journal=Proc Natl Acad Sci U S A \| pages=3354–9~~}}</ref> ==~~Example~~Statement== Let three random variables form the [[Markov chain]] <math>X \rightarrow Y \rightarrow Z</math>, implying that the conditional distribution of <math>Z</math> depends only on <math>Y</math> and is [[Conditional independence\|conditionally independent]] of <math>X</math>. Specifically, we have such a Markov chain if the joint probability mass function can be written as ~~Let be a [[Markov chain]]~~ ~~<math>X \rightarrow Y \rightarrow Z</math><br>~~ :<math>p(x,y,z) = p(x)p(y\|x)p(z\|y)=p(y)p(x\|y)p(z\|y)</math> ~~Then,<br>~~ ~~<math> I(x;y) \geqslant I(x;z)</math> with <br>~~ In this setting, no processing of <math>Y</math>, deterministic or random, can increase the information that <math>Y</math> contains about <math>X</math>. Using the [[mutual information]], this can be written as : ~~<math>I(x;y) = I(x;z)</math> if and only if <math>X \rightarrow Z \rightarrow Y</math><br>~~ ~~where~~ :<math> I(xX;yY) \geqslant I(X;Z),</math> ~~is the [[Mutual information]]~~ with the equality <math>I(X;Y) = I(X;Z) </math> if and only if <math> I(X;Y\mid Z)=0 </math>. That is, <math>Z</math> and <math>Y</math> contain the same information about <math>X</math>, and <math>X \rightarrow Z \rightarrow Y</math> also forms a Markov chain.<ref>{{cite book\| title=Elements of information theory \| last1=Cover \| last2=Thomas \| date=2012 \| publisher=John Wiley & Sons}}</ref> ==Proof== One can apply the [[Conditional_mutual_information#Chain_rule_for_mutual_information chain rule for mutual information\|chain rule for mutual information]] to obtain two different decompositions of <math>I(X;Y,Z)</math>: :<math> I(X;Z) + I(X;Y\mid Z) = I(X;Y,Z) = I(X;Y) + I(X;Z\mid Y) </math> By the relationship <math>X \rightarrow Y \rightarrow Z</math>, we know that <math>X</math> and <math>Z</math> are conditionally independent, given <math>Y</math>, which means the [[conditional mutual information]], <math>I(X;Z\mid Y)=0</math>. The data processing inequality then follows from the non-negativity of <math>I(X;Y\mid Z)\ge0</math>. ==See also==