Content deleted Content added
→Conditional versions: more |
Citation bot (talk | contribs) Added pmc. Removed URL that duplicated identifier. | Use this bot. Report bugs. | Suggested by Abductive | Category:Computational complexity theory | #UCB_Category 52/110 |
||
(38 intermediate revisions by 25 users not shown) | |||
Line 1:
{{short description|Measure of algorithmic complexity}}
[[Image:Mandelpart2 red.png|right|thumb|upright=1.4|This image illustrates part of the [[Mandelbrot set]] [[fractal]]. Simply storing the 24-bit color of each pixel in this image would require 23 million bytes<!--3200 × 2400 × 3 = 23.04e6-->, but a small computer program can reproduce these 23 MB using the definition of the Mandelbrot set
In [[algorithmic information theory]] (a subfield of [[computer science]] and [[mathematics]]), the '''Kolmogorov complexity''' of an object, such as a piece of text, is the length of a shortest [[computer program]] (in a predetermined [[programming language]]) that produces the object as output. It is a measure of the [[computation]]al resources needed to specify the object, and is also known as '''algorithmic complexity''', '''Solomonoff–Kolmogorov–Chaitin complexity''', '''program-size complexity''', '''descriptive complexity''', or '''algorithmic entropy'''. It is named after [[Andrey Kolmogorov]], who first published on the subject in 1963<ref>{{cite journal |author-link=Andrey Kolmogorov |first=Andrey|last=Kolmogorov |date=Dec 1963 |title=On Tables of Random Numbers| journal=Sankhyā: The Indian Journal of Statistics, Series A (1961-2002) |volume=25 |issue=4 |pages=369–375 |mr=178484 |jstor=25049284 |issn=0581-572X }}</ref><ref>{{cite journal|author-link=Andrey Kolmogorov|first=Andrey|last=Kolmogorov|year=1998|title=On Tables of Random Numbers| journal=Theoretical Computer Science|volume=207|issue=2|pages=387–395|doi=10.1016/S0304-3975(98)00075-9 |mr=1643414|doi-access=free}}</ref> and is a generalization of classical information theory.
The notion of Kolmogorov complexity can be used to state and [[Proof of impossibility|prove impossibility]] results akin to [[Cantor's diagonal argument]], [[Gödel's incompleteness theorem]], and [[halting problem|Turing's halting problem]].
In particular, no program ''P'' computing a [[lower bound]] for each text's Kolmogorov complexity can return a value essentially larger than ''P''<nowiki>'</nowiki>s own length (see section {{slink||Chaitin's incompleteness theorem}}); hence no single program can compute the exact Kolmogorov complexity for infinitely many texts.
==Definition==
Line 37:
:''K''(''s'') = |''d''(''s'')|.
The length of the shortest description will depend on the choice of description language; but the effect of changing languages is bounded (a result called the ''invariance theorem'', see [[#invariance theorem|below]]).
=== Plain Kolmogorov complexity ''C'' ===
There are two definitions of Kolmogorov complexity: ''plain'' and ''prefix-free''. The plain complexity is the minimal description length of any program, and denoted <math>C(x)</math> while the prefix-free complexity is the minimal description length of any program encoded in a [[prefix code|prefix-free code]], and denoted <math>K(x)</math>. The plain complexity is more intuitive, but the prefix-free complexity is easier to study.
By default, all equations hold only up to an additive constant. For example, <math>f(x) = g(x)</math> really means that <math>f(x) = g(x) + O(1)</math>, that is, <math>\exists c, \forall x, |f(x) - g(x)| \leq c</math>.
Line 46:
Let <math>U: 2^* \to 2^*</math> be a computable function mapping finite binary strings to binary strings. It is a universal function if, and only if, for any computable <math>f: 2^* \to 2^*</math>, we can encode the function in a "program" <math>s_f</math>, such that <math>\forall x \in 2^*, U(s_fx) = f(x) </math>. We can think of <math>U</math> as a program interpreter, which takes in an initial segment describing the program, followed by data that the program should process.
One problem with plain complexity is that <math>C(xy) \not < C(x) + C(y)</math>, because intuitively speaking, there is no general way to tell where to divide an output string just by looking at the concatenated string. We can divide it by specifying the length of <math>x</math> or <math>y</math>, but that would take <math>O(\min(\ln x, \ln y))</math> extra symbols. Indeed, for any <math>c > 0</math> there exists <math>x, y</math> such that <math>C(xy) \geq C(x) + C(y) + c</math>.<ref>(Downey and Hirschfeldt, 2010), Theorem 3.1.4
Typically, inequalities with plain complexity have a term like <math>O(\min(\ln x, \ln y))</math> on one side, whereas the same inequalities with prefix-free complexity have only <math>O(1)</math>.
Line 67:
The prefix-free complexity of a string <math>x</math> is the shortest prefix code that makes the machine output <math>x</math>:<math display="block">K(x) := \min\{|c| : c \in S, U(c) = x\}</math>
==Invariance theorem==
Line 141 ⟶ 142:
}}</ref>
Andrey Kolmogorov later [[multiple discovery|independently published]] this theorem in ''Problems Inform. Transmission'' in 1965.<ref>{{cite journal|volume=1 |issue=1 |year=1965 |pages=1–7 |title=Three Approaches to the Quantitative Definition of Information |url=http://www.ece.umd.edu/~abarg/ppi/contents/1-65-abstracts.html#1-65.2 |journal=Problems Inform. Transmission |first=A.N. |last=Kolmogorov |url-status=dead |archive-url=https://web.archive.org/web/20110928032821/http://www.ece.umd.edu/~abarg/ppi/contents/1-65-abstracts.html |archive-date=September 28, 2011 }}</ref>
The theorem says that, among algorithms that decode strings from their descriptions (codes), there exists an optimal one. This algorithm, for all strings, allows codes as short as allowed by any other algorithm up to an additive constant that depends on the algorithms, but not on the strings themselves. Solomonoff used this algorithm and the code lengths it allows to define a "universal probability" of a string on which inductive inference of the subsequent digits of the string can be based. Kolmogorov used this theorem to define several functions of strings, including complexity, randomness, and information.
When Kolmogorov became aware of Solomonoff's work, he acknowledged Solomonoff's priority.<ref>{{cite journal | last1=Kolmogorov | first1=A. | title=Logical basis for information theory and probability theory | journal=IEEE Transactions on Information Theory | volume=14|issue=5 | pages=662–664 | year=1968 | doi =10.1109/TIT.1968.1054210 | s2cid=11402549 }}</ref> For several years, Solomonoff's work was better known in the Soviet Union than in the
There are several other variants of Kolmogorov complexity or algorithmic information. The most widely used one is based on [[self-delimiting program]]s, and is mainly due to [[Leonid Levin]] (1974).
An axiomatic approach to Kolmogorov complexity based on [[Blum axioms]] (Blum 1967) was introduced by Mark Burgin in the paper presented for publication by Andrey Kolmogorov.<ref name=Burgin1982>{{cite journal |last=Burgin |first=M. |year=1982 |title=Generalized Kolmogorov complexity and duality in theory of computations |journal=Notices of the Russian Academy of Sciences |volume=25 |issue=3 |pages=19–23 |url=https://www.mathnet.ru/eng/dan45265}}</ref>
In the late 1990s and early 2000s, methods developed to approximate Kolmogorov complexity relied on popular compression algorithms like LZW,<ref name="zenil20202">{{cite journal |last1=Zenil |first1=Hector |year=2020 |title=A Review of Methods for Estimating Algorithmic Complexity: Options, Challenges, and New Directions |journal=Entropy |volume=22 |issue=6 |pages=612 |doi=10.3390/e22060612 |doi-access=free |pmid=33286384 |pmc=7517143 }}</ref> which made difficult or impossible to provide any estimation to short strings until a method based on [[Algorithmic probability]] was introduced,<ref>{{cite journal |last1=Delahaye |first1=Jean-Paul |last2=Zenil |first2=Hector |title=Numerical evaluation of algorithmic complexity for short strings: A glance into the innermost structure of randomness |journal=Applied Mathematics and Computation |volume=219 |issue=1 |pages=63–77 |year=2012 |doi=10.1016/j.amc.2011.09.020 |url=https://www.sciencedirect.com/science/article/pii/S0096300311012367 }}</ref><ref>{{cite journal |last1=Soler-Toscano |first1=Fernando |last2=Zenil |first2=Hector |last3=Delahaye |first3=Jean-Paul |last4=Gauvrit |first4=Nicolas |title=Calculating Kolmogorov Complexity from the Output Frequency Distributions of Small Turing Machines |journal=PLOS ONE |volume=9 |issue=5 |pages=74–85 |year=2014 |doi=10.1371/journal.pone.0096223 |doi-access=free |pmid=24787763 |pmc=4013017 |bibcode=2014PLoSO...996223S }}</ref> offering the only alternative to compression-based methods.<ref>Zenil, Hector; Soler Toscano, Fernando; Gauvrit, Nicolas (2022). "Methods and Applications of Algorithmic Complexity: Beyond Statistical Lossless Compression". ''Emergence, Complexity and Computation''. Springer Berlin, Heidelberg. doi:10.1007/978-3-662-64985-5. ISBN 978-3-662-64983-1.
</ref>
==Basic results==
Line 157 ⟶ 161:
We omit additive factors of <math>O(1)</math>. This section is based on.<ref name=":0" />
'''Theorem.''' <math>
'''Proof.''' Take any program for the universal Turing machine used to define plain complexity, and convert it to a prefix-free program by first coding the length of the program in binary, then convert the length to prefix-free coding. For example, suppose the program has length 9, then we can convert it as follows:<math display="block">9 \mapsto 1001 \mapsto 11-00-00-11-\color{red}{01} </math>where we double each digit, then add a termination code. The prefix-free universal Turing machine can then read in any program for the other machine as follows:<math display="block">[\text{code for simulating the other machine}][\text{coded length of the program}][\text{the program}]</math>The first part programs the machine to simulate the other machine, and is a constant overhead <math>O(1)</math>. The second part has length <math>\leq 2 \log_2 C(x) + 3</math>. The third part has length <math>C(x)</math>.
Line 168 ⟶ 172:
* <math>K(x | y) \leq K(x) \leq K(x, y) \leq \max( K(x|y) + K(y), K(y|x) + K(x)) \leq K(x) + K(y) </math>
* <math>K(xy) \leq K(x, y)</math>
Note that there is no way to compare <math>K(xy)</math> and <math>K(x|y)</math> or <math>K(x)</math> or <math>K(y|x)</math> or <math>K(y)</math>. There are strings such that the whole string <math>xy</math> is easy to describe, but its substrings are very hard to describe.
Line 174 ⟶ 177:
'''Theorem. (symmetry of information)''' <math>K(x, y) = K(x | y, K(y)) + K(y) = K(y, x)</math>.
'''Proof.''' One side is simple. For the other side with <math>K(x, y) \geq K(x | y, K(y)) + K(y)</math>, we need to use a counting argument (page 38 <ref>{{Cite book |last=Hutter |first=Marcus |title=Universal artificial intelligence: sequential decisions based on algorithmic probability |date=2005 |publisher=Springer |isbn=978-3-540-26877-2 |series=Texts in theoretical computer science |___location=Berlin New York}}</ref>).
'''Theorem. (information non-increase)''' For any computable function <math>f</math>, we have <math>K(f(x)) \leq K(x) + K(f)</math>.
Line 250 ⟶ 253:
It is straightforward to compute upper bounds for ''K''(''s'') – simply [[data compression|compress]] the string ''s'' with some method, implement the corresponding decompressor in the chosen language, concatenate the decompressor to the compressed string, and measure the length of the resulting string – concretely, the size of a [[self-extracting archive]] in the given language.
A string ''s'' is compressible by a number ''c'' if it has a description whose length does not exceed |''s''| − ''c'' bits. This is equivalent to saying that {{math|''K''(''s'') ≤ {{abs|''s''
For the same reason, most strings are complex in the sense that they cannot be significantly compressed – their ''K''(''s'') is not much smaller than |''s''|, the length of ''s'' in bits. To make this precise, fix a value of ''n''. There are 2<sup>''n''</sup> bitstrings of length ''n''. The [[Uniform distribution (discrete)|uniform]] [[probability]] distribution on the space of these bitstrings assigns exactly equal weight 2<sup>−''n''</sup> to each string of length ''n''.
'''Theorem''': With the uniform probability distribution on the space of bitstrings of length ''n'', the probability that a string is incompressible by ''c'' is at least {{math|1 − 2<sup>−''c''+1</sup> + 2<sup>−''n''</sup>}}.
To prove the theorem, note that the number of descriptions of length not exceeding ''n'' − ''c'' is given by the geometric series:
Line 352 ⟶ 355:
If we have a halting oracle, then the Kolmogorov complexity of a string can be computed by simply trying every halting program, in lexicographic order, until one of them outputs the string.
The other direction is much more involved.<ref>{{Cite journal |last1=Chaitin |first1=G. |last2=Arslanov |first2=A. |last3=Calude |first3=Cristian S. |date=1995-09-01 |title=Program-size Complexity Computes the Halting Problem |journal=Bull. EATCS|s2cid=39718973 }}</ref><ref>{{Cite book |last1=Li |first1=Ming |last2=Vitányi |first2=Paul |date=2008 |title=An Introduction to Kolmogorov Complexity and Its Applications |url=https://link.springer.com/book/10.1007/978-0-387-49820-1 |
Consider this program <math display="inline">p_K</math>, which takes input as <math display="inline">n</math>, and uses <math display="inline">K</math>.
* List all strings of length <math display="inline">\leq 2n + 1</math>.
* For each such string <math display="inline">x</math>, enumerate all (prefix-free) programs of length <math>K(x)</math> until one of them does output <math display="inline">x</math>. Record its runtime <math display="inline">n_x</math>.
* Output the largest <math display="inline">n_x</math>.
Line 366 ⟶ 368:
* Run the program <math display="inline">p_{n}</math>, and record its runtime length <math display="inline">BB(n)</math>.
* Generate all programs with length <math display="inline">\leq 2n</math>. Run every one of them for up to <math display="inline">BB(n)</math> steps. Note the outputs of those that have halted.
* Output the string with the lowest lexicographic order that has not been output by any of those.
Let the string output by the program be <math display="inline">x</math>.
The program has length <math display="inline">\leq n + 2\log_2 n + O(1)</math>, where <math>n</math> comes from the length of the Busy Beaver <math display="inline">p_{n}</math>, <math>2\log_2 n</math> comes from using the (prefix-free) [[Elias delta code]] for the number <math>n</math>, and <math>O(1)</math> comes from the rest of the program. Therefore,<math display="block">K(x) \leq n + 2\log_2 n + O(1) \leq 2n</math>for all big <math display="inline">n</math>. Further, since there are only so many possible programs with length <math display="inline">\leq 2n</math>, we have <math display="inline">l(x) \leq 2n + 1</math> by [[pigeonhole principle]].
Line 377:
== Universal probability ==
Fix a universal Turing machine <math>U</math>, the same one used to define the (prefix-free) Kolmogorov complexity. Define the (prefix-free) universal probability of a string <math>x</math> to be<math display="block">P(x) = \sum_{U(p) = x} 2^{-l(p)}</math>In other words, it is the probability that, given a uniformly random binary stream as input, the universal Turing machine would halt after reading a certain prefix of the stream, and output <math>x</math>.
Note. <math>U(p) = x</math> does not mean that the input stream is <math>p000\cdots</math>, but that the universal Turing machine would halt at some point after reading the initial segment <math>p</math>, without reading any further input, and that, when it halts, its has written <math>x</math> to the output tape.
'''Theorem.''' (Theorem 14.11.1<ref name=":1" />) <math>\log \frac{1}{P(x)} = K(x) + O(1)</math>
==Implications in biology==
In the context of biology to argue that the symmetries and modular arrangements observed in multiple species emerge from the tendency of evolution to prefer minimal Kolmogorov complexity.<ref>{{Cite journal |last1=Johnston |first1=Iain G. |last2=Dingle |first2=Kamaludin |last3=Greenbury |first3=Sam F. |last4=Camargo |first4=Chico Q. |last5=Doye |first5=Jonathan P. K. |last6=Ahnert |first6=Sebastian E. |last7=Louis |first7=Ard A. |date=2022-03-15 |title=Symmetry and simplicity spontaneously emerge from the algorithmic nature of evolution |journal=Proceedings of the National Academy of Sciences |volume=119 |issue=11 |pages=e2113883119 |doi=10.1073/pnas.2113883119 |doi-access=free |pmc=8931234 |pmid=35275794|bibcode=2022PNAS..11913883J }}</ref> Considering the genome as a program that must solve a task or implement a series of functions, shorter programs would be preferred on the basis that they are easier to find by the mechanisms of evolution.<ref>{{Cite journal |last=Alon |first=Uri |date=Mar 2007 |title=Simplicity in biology |url=https://www.nature.com/articles/446497a |journal=Nature |language=en |volume=446 |issue=7135 |pages=497 |doi=10.1038/446497a |pmid=17392770 |bibcode=2007Natur.446..497A |issn=1476-4687}}</ref> An example of this approach is the eight-fold symmetry of the compass circuit that is found across insect species, which correspond to the circuit that is both functional and requires the minimum Kolmogorov complexity to be generated from self-replicating units.<ref>{{Cite journal |last1=Vilimelis Aceituno |first1=Pau |last2=Dall'Osto |first2=Dominic |last3=Pisokas |first3=Ioannis |date=2024-05-30 |editor-last=Colgin |editor-first=Laura L |editor2-last=Vafidis |editor2-first=Pantelis |title=Theoretical principles explain the structure of the insect head direction circuit |journal=eLife |volume=13 |pages=e91533 |doi=10.7554/eLife.91533 |doi-access=free |pmid=38814703 |pmc=11139481 |issn=2050-084X}}</ref>
==Conditional versions==
{{expand section|date=July 2014}}
The conditional Kolmogorov complexity of two strings <math>K(x|y)</math> is, roughly speaking, defined as the Kolmogorov complexity of ''x'' given ''y'' as an auxiliary input to the procedure.<ref name="Rissanen2007">{{cite book |author=Jorma Rissanen |url=https://link.springer.com/book/10.1007/978-0-387-68812-1 |title=Information and Complexity in Statistical Modeling |
There is also a length-conditional complexity <math>K(x|L(x))</math>, which is the complexity of ''x'' given the length of ''x'' as known/input.<ref>{{cite book|author1=Ming Li|author2=Paul M.B. Vitányi|title=An Introduction to Kolmogorov Complexity and Its Applications|url=https://archive.org/details/introductiontoko00limi_695|url-access=limited|year=2009|publisher=Springer |isbn=978-0-387-49820-1|page=[https://archive.org/details/introductiontoko00limi_695/page/n141 119]}}</ref><ref>{{cite journal |doi=10.1016/j.tcs.2013.07.009 |title=Conditional Kolmogorov complexity and universal probability |journal=Theoretical Computer Science |volume=501 |pages=93–100 |year=2013 |last1=Vitányi |first1=Paul M.B. |url=https://ir.cwi.nl/pub/26818 |arxiv=1206.0983 |s2cid=12085503 }}</ref>
== Time-bounded complexity ==
Time-bounded Kolmogorov complexity is a modified version of Kolmogorov complexity where the space of programs to be searched for a solution
==See also==
Line 400 ⟶ 404:
* [[Kolmogorov structure function]]
* [[Levenshtein distance]]
* [[Manifold hypothesis]]
* [[Solomonoff's theory of inductive inference]]
* [[Sample entropy]]
Line 440 ⟶ 445:
[[Category:Measures of complexity]]
[[Category:Computational complexity theory]]
[[Category:Data compression]]
|