Chain rule for Kolmogorov complexity: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 02:00, 11 February 2009 edit Hadrianheugh (talk \| contribs) 6,199 edits Add another key link ← Previous edit		Latest revision as of 04:41, 2 December 2024 edit undo DavidBrooks (talk \| contribs) Extended confirmed users 20,055 edits Reverted 1 edit by Dragonthereal (talk): Editorializing Tags: Twinkle Undo
(32 intermediate revisions by 25 users not shown)
Line 1: {{Short description\|Lower bound for size of software program}} The [[chain rule]] for [[Kolmogorov complexity]] is an analogue of the chain rule for [[Information entropy]], which states:▼ {{inline\|date=July 2014}} ▲The [[chain rule]]{{cn\|date=July 2014}} for [[Kolmogorov complexity]] is an analogue of the chain rule for [[~~Information~~information entropy]], which states: :<math> Line 6 ⟶ 8: That is, the combined [[randomness]] of two sequences ''X'' and ''Y'' is the sum of the randomness of ''X'' plus whatever randomness is left in ''Y'' once we know ''X''. This follows immediately from the definitions of [[conditional entropy\|conditional]] and [[joint entropy]], and the fact from [[probability theory]] that the [[joint probability]] is the product of the [[marginal probability\|marginal]] and [[conditional probability]]: :<math> P(X,Y) = P(X) P(Y\|X) </math> The equivalent statement for Kolmogorov complexity does not hold exactly; it is only true up to a [[logarithm]]ic factor:▼ :<math> K\Rightarrow \log P(X,Y) = ~~K(X)~~\log ~~+ K~~P(Y\|X) + O(\log P(K(Y\|X~~,Y))~~) </math> ▲The equivalent statement for Kolmogorov complexity does not hold exactly; it is ~~only~~ true only up to a [[logarithm]]ic ~~factor~~term: It states that the shortest program to reproduce ''X'' and ''Y'' is using a program to reproduce ''X'' and a program to reproduce ''Y'' given ''X'', plus [[Big-O notation\|at most]] a logarithmic factor. Using this statement one can define [[Mutual information#Absolute mutual information\|an analogue of mutual information for Kolmogorov complexity]].▼ ==Proof==▼ ~~The <math>\le</math> direction is obvious: we can write a program to produce <math>x</math> and <math>y</math>~~ ~~by concatenating a program to produce <math>x</math>, a program to produce <math>y</math> given~~ access to <math>x</math>, and (whence the log term) the length of one of the programs, so▼ that we know where to separate the two programs for <math>x</math> and <math>y\|x</math>▼ ~~(<math>\log(K(x,y))</math> upper bounds this length).~~ ~~The <math>\ge</math> direction is rather more difficult. The key to the proof is the~~ ~~construction of the set <math>A = \{(u,z) : K(u,z) \le K(x,y)\}</math>; that is, the~~ ~~construction of the set of all pairs <math>(u,z)</math> such that the shortest input (for~~ ~~a [[universal Turing machine]]) that produces <math>(u,z)</math> (and some way to distinguish~~ ~~<math>u</math> from <math>z</math>) is shorter than the shortest producing <math>(x,y)</math>. We will also~~ ~~need <math>A_x = \{z: K(x,z) \le K(x,y)\}</math>, the subset of <math>A</math> where <math>x = u</math>.~~ ~~[[recursively enumerable\|Enumerating]] <math>A</math> is not hard (although not fast!). In parallel, simulate all~~ ~~<math>2^{K(x,y)}</math> programs with length <math>\le K(x,y)</math>. Output each <math>u,z</math> as the~~ ~~simulated program terminates; eventually, any particular such <math>u,z</math> will be~~ ~~produced. We can restrict to <math>A_x</math> simply by ignoring any output where <math>u \ne~~ ~~x</math>. We will assume that the inequality does not hold and derive a contradiction~~ ~~by finding a short description for <math>K(x)</math> in terms of its index in the~~ ~~enumeration of a subset of <math>A</math>.~~ ~~In particular, we can describe <math>y</math> by giving the index in which <math>y</math> is output~~ ~~when enumerating <math>A_x</math> (which is less than or equal to <math>\|A_x\|</math>) along with <math>x</math>~~ ~~and the value of <math>K(x,y)</math>. Thus,~~ :<math> K(y\|x) \le \log(\|A_x\|) + O(\log(K(x,y))) + O(1) </math>▼ ~~where the <math>O(1)</math> term is from the overhead of the enumerator, which does not~~ ~~depend on <math>x</math> or <math>y</math>.~~ ~~Assume by way of contradiction~~ ~~:<math> K(y\|x) > K(x,y) - K(x) + O(\log(K(x,y))) </math>~~ ~~for some <math>x,y</math>.~~ ~~Combining this with the above, we have~~ :<math> ~~\log(\|A_x\|) + O(\log(~~K(x,y))) ~~+ O(1) \ge~~= K(y\|x) >+ K(x,y~~) - K(~~\|x) + O(\log(K(x,y))) </math> (An exact version, {{math\|1=''KP''(''x'', ''y'') = ''KP''(''x'') + ''KP''(''y''{{!}}''x''{{sup\|&lowast;}}) + ''O''(1)}}, ~~So for any constant <math>c</math> and some <math>x,y</math>,~~ holds for the prefix complexity ''KP'', where {{math\|''x''{{sup\|&lowast;}}}} is a shortest program for ''x''.) ▲It states that the shortest program ~~to reproduce~~printing ''X'' and ''Y'' is ~~using~~obtained by concatenating a ~~program~~shortest toprogram ~~reproduce~~printing ''X'' ~~and~~with a program ~~to reproduce~~printing ''Y'' given ''X'', plus [[Big-O notation\|at most]] a logarithmic factor. ~~Using~~The ~~this~~results ~~statement~~implies ~~one~~that ~~can define~~ [[Mutual information#Absolute mutual information\|algorithmic mutual information]], an analogue of mutual information for Kolmogorov complexity]] is symmetric: {{tmath\|1=I(x:y) = I(y:x) + O(\log K(x,y))}} for all ''x,y''. ~~:<math> \log(\|A_x\|) \ge K(x,y) - K(x) + c \log(K(x,y)) </math>~~ ▲==Proof== ~~Let <math>e = K(x,y) - K(x) + c \log K(x,y)</math>; <math>\|A_x\| \ge 2^e</math>.~~ The ≤ direction is obvious: we can write a program to produce ''x'' and ''y'' by concatenating a program to produce ''x'', a program to produce ''y'' given ~~Now, we can enumerate a set of possible values for <math>x</math> by finding all <math>u</math> such~~ ▲access to ~~<math>~~''x~~</math>~~'', and (whence the log term) the length of one of the programs, so ~~that <math>\|A_u\| > 2^e</math> -- list only the <math>u</math> such that <math>y</math> is output after more than~~ ▲that we know where to separate the two programs for ~~<math>~~''x~~</math>~~'' and <{{math>y\|1=''y''{{!}}''x~~</math>~~'' (log(''K''(''x'', ''y''))}} upper-bounds this length). ~~<math>2^e</math> other values when enumerating <math>A_u = \{ z: K(u,z) \le K(x,y) \}</math>. <math>x</math> is~~ ~~one such <math>u</math>. Let <math>U</math> be the set of all such <math>u</math>. We can enumerate <math>U</math> by~~ ~~enumerating each <math>A_u</math> (in parallel, as usual) and outputting a <math>u</math> only when~~ ~~<math>y</math> would be output for <math>A_u</math> and more than <math>2^e</math> other values for the <math>A_u</math>~~ ~~would be output. Note that given <math>K(x,y)</math>, <math>e</math> and the index of <math>x</math> in <math>U</math> we~~ ~~can enumerate <math>U</math> to recover <math>x</math>.~~ For the ≥ direction, it suffices to show that for all {{mvar\|k,l}} such that {{tmath\|1=k+l = K(x,y)}} we have that either ~~Note that <math>\{(u,z) : u \in U, z \in A_u\}</math> is a subset of <math>A</math>, and <math>A</math> has at~~ most <math>2^{K(x,y)}</math> elements, since there are only that many programs of length▼ ~~<math>K(x,y)</math>. So we have:~~ :<math> \|UK(x\| ~~< \frac{\|A\|}{2^e}~~k,l) \le ~~\frac{2^{K~~k + O(~~x,y~~1)~~}}{2^e}~~ </math> or ~~where the first inequality follows since each element of <math>U</math> corresponds to an~~ ~~<math>A_u</math> with strictly more than <math>2^e</math> elements.~~ ▲:<math> K(y\|x,k,l) \le ~~\log(\|A_x\|) + O(\log(K(x,y)))~~l + O(1) </math>. ~~But this leads to a contradiction:~~ Consider the list (''a''<sub>1</sub>,''b''<sub>1</sub>), (''a''<sub>2</sub>,''b''<sub>2</sub>), ..., (''a<sub>e</sub>,b<sub>e</sub>'') of all pairs {{tmath\|1=(a,b)}} produced by programs of length exactly {{tmath\|1=K(x,y)}} [hence {{tmath\|K(a,b) \le K(x,y)}}]. Note that this list ~~:<math>~~ * contains the pair {{tmath\|1=(x,y)}}, ~~K(x) < \log(K(x,y)) + \log(e) + \log(\|U\|) \le \log(K(x,y)) + \log(e) + K(x,y) - e~~ * can be [[recursively enumerable\|enumerated]] given {{mvar\|k}} and {{mvar\|l}} (by running all programs of length {{tmath\|1=K(x,y)}} in parallel), ~~</math>~~ ▲* has at most 2<~~math~~sup>~~2^{~~''K''(''x'',''y'')}</~~math~~sup> elements, ~~since~~(because there are ~~only~~at ~~that~~most ~~many~~2<sup>''n''</sup> programs of length {{mvar\|n}}). First, suppose that ''x'' appears less than {{math\|2<sup>''l''</sup>}} times as first element. We can specify ''y'' given {{mvar\|x,k,l}} by enumerating (''a''<sub>1</sub>,''b''<sub>1</sub>), (''a''<sub>2</sub>,''b''<sub>2</sub>), ... and then selecting {{tmath\|1=(x,y)}} in the sub-list of pairs {{tmath\|1=(x,b)}}. By assumption, the index of {{tmath\|1=(x,y)}} in this sub-list is less than {{math\|2<sup>''l''</sup>}} and hence, there is a program for ''y'' given {{mvar\|x,k,l}} of length {{tmath\|1=l + O(1)}}. ~~which when we substitute in <math>e</math> gives~~ Now, suppose that ''x'' appears at least {{math\|2<sup>''l''</sup>}} times as first element. This can happen for at most {{math\|1=2<sup>''K''(''x,y'')−l</sup> = 2<sup>''k''</sup>}} different strings. These strings can be enumerated given {{mvar\|k,l}} and hence ''x'' can be specified by its index in this enumeration. The corresponding program for ''x'' has size {{tmath\|1=k + O(1)}}. Theorem proved. ~~:<math>~~ ~~K(x) < \log(K(x,y)) + \log(K(x,y) + K(x) + \log(K(x,y))~~ ~~+ K(x,y) - K(x,y) + K(x) - c \log K(x,y)~~ ~~</math>~~ ~~which for large enough <math>c</math> gives <math>K(x) < K(x)</math>.~~ ==References== * {{cite book \| last = Li \| first = Ming \| author2 = Vitányi, Paul ~~\| coauthors = Paul Vitányi~~ \| title = An introduction to Kolmogorov complexity and its applications \| ___location = New York \| publisher = [[Springer-Verlag]] \| date =February 1997 \| ~~month~~isbn = ~~February~~0-387-94868-6 }} ~~\| isbn = 0387948686 }}~~ * {{cite journal \| last=Kolmogorov \| first=A. \| title=Logical basis for information theory and probability theory \| journal=IEEE Transactions on Information Theory \| publisher=Institute of Electrical and Electronics Engineers (IEEE) \| volume=14 \| issue=5 \| year=1968 \| issn=0018-9448 \| doi=10.1109/tit.1968.1054210 \| pages=662–664\| s2cid=11402549 }} * {{cite journal \| last1=Zvonkin \| first1=A K \| last2=Levin \| first2=L A \| title=The complexity of finite objects and the development of the concepts of information and randomness by means of the theory of algorithms \| journal=Russian Mathematical Surveys \| publisher=IOP Publishing \| volume=25 \| issue=6 \| date=1970-12-31 \| issn=0036-0279 \| doi=10.1070/rm1970v025n06abeh001269 \| pages=83–124\| bibcode=1970RuMaS..25...83Z \| s2cid=250850390 }} [[Category:Algorithmic information theory\|]] [[Category:Information theory\|]] [[Category:~~Recursion~~Computability theory]] [[Category:Theory of computation]] [[Category:Articles containing proofs]]