Kolmogorov complexity: Difference between revisions

Content deleted Content added
one typo, and reformat of (formerly) monospaced quote from comp.compression FAQ
+ example of a binary encoding, significance of the constant C, and a straightforward way to get upper bound on I(s)
Line 7:
 
 
To formalize the above definition of complexity, one has to specify exactly what types of programs are allowed.

Fortunately, it doesn't really matter: one could take a particular notation for [[Turing machine]]s, or [[lambda calculus]] [[recursive function]]s, or [[LISP]] programs, or [[Pascal programming language|Pascal]] programs, or [[recursiveJava functionvirtual machine]]s bytecode.

If we agree to measure the lengths of all objects consistently in [[bit]]s, then the resulting notions of complexity will only differ by a constant: if ''I''<sub>''T''</sub>(''s'') is the complexity of the string ''s'' when Turing machines are used for the definition, and ''I''<sub>''L''</sub>(''s'') is the complexity of the string ''s'' when LISP programs are used, then there is a constant ''C'' such that for all strings ''s'':
 
:''I''<sub>''L''</sub>(''s'') - ''C'' &le; ''I''<sub>''T''</sub>(''s'') &le; ''I''<sub>''L''</sub>(s) + ''C''
 
''C'' equals the length of the interpreter for one language in the other and does not depend on ''s''.
 
 
Line 20 ⟶ 24:
 
The first surprising result is that ''I''(''s'') can not be computed: there is no general algorithm which takes a string ''s'' as input and produces the number ''I''(''s'') as output. The proof is a formalization of the amusing [[Berry paradox]]: "Let ''n'' be the smallest number that cannot be defined in less than twenty English words. Well, I just defined it in less than twenty English words."
 
However, it is straightforward to produce an upper bound on ''I''(''s''): simply implement the [[gzip|gunzip]] program in the chosen language, concatenate it to gzip's output when given ''s'', and measure the resulting string.