Chain rule for Kolmogorov complexity

This is an old revision of this page, as edited by Aelkiss (talk | contribs) at 18:55, 22 December 2006 (Initial revision). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)

Introduction

The chain rule for Kolmogorov complexity is an analogue of the chain rule for Information entropy, which states:

 

That is, the combined randomness of two sequences X and Y is the sum of the randomness of X plus whatever randomness is left in Y once we know X. This follows immediately from the definitions of conditional and joint entropy fact from probability theory that the joint probability is the product of the marginal and conditional probability:

 

The equivalent statement for Kolmogorov complexity does not hold exactly; it is only true up to a logarithmic factor:

 

It states that the shortest program to reproduce X and Y is using a program to reproduce X and a program to reproduce Y given X, plus at most a logarithmic factor. Using this statement one can define an analogue of mutual information for Kolmogorov complexity.

Proof

The   direction is obvious: we can write a program to produce   and   by concatenating a program to produce  , a program to produce   given access to  , and (whence the log term) the length of one of the programs, so that we know where to separate the two programs for   and   (</math>\log(K(x,y))</math> upper bounds this length).

The   direction is rather more difficult. The key to the proof is the construction of the set  ; that is, the construction of the set of all pairs   such that the shortest input (for a universal Turing machine) that produces   (and some way to distinguish   from  ) is shorter than the shortest producing  . We will also need  , the subset of   where  . Enumerating   is not hard (although not fast!). In parallel, simulate all </math>2^{K(x,y)}</math> programs with length  . Output each   as the simulated program terminates; eventually, any particular such   will be produced. We can restrict to   simply by ignoring any output where  . We will assume that the inequality does not hold and derive a contradiction by finding a short description for   in terms of its index in the enumeration of a subset of  .

In particular, we can describe   by giving the index in which   is output when enumerating   (which is less than or equal to  ) along with   and the value of  . Thus,

 

where the   term is from the overhead of the enumerator, which does not depend on   or  .

Assume by way of contradiction

 

for some  .

Combining this with the above, we have

 

So for any constant   and some  ,

 

Let  ;  .

Now, we can enumerate a set of possible values for   by finding all   such that   -- list only the   such that   is output after more than </math>2^e</math> other values when enumerating  .   is one such  . Let   be the set of all such  . We can enumerate   by enumerating each   (in parallel, as usual) and outputting a   only when </math>y</math> would be output for   and more than   other values for the   would be output. Note that given  ,   and the index of   in   we can enumerate   to recover  .

Note that   is a subset of  , and   has at most   elements, since there are only that many programs of length </math>K(x,y)</math>. So we have:

 

where the first inequality follows since each element of   corresponds to an </math>A_u</math> with strictly more than   elements.

But this leads to a contradiction:

 

which when we substitute in   gives

 

which for large enough   gives  .

Reference

  • Li, Ming (1997). An introduction to Kolmogorov complexity and its applications. New York: Springer-Verlag. ISBN 0387948686. {{cite book}}: Unknown parameter |coauthors= ignored (|author= suggested) (help); Unknown parameter |month= ignored (help)