Content deleted Content added
Deltahedron (talk | contribs) cite Allouche & Shallit (2003) |
Citation bot (talk | contribs) Altered title. | Use this bot. Report bugs. | Suggested by Dominic3203 | Category:Theoretical computer science | #UCB_Category 99/137 |
||
(36 intermediate revisions by 14 users not shown) | |||
Line 1:
{{Short description|Function that counts distinct factors of a string}}
In [[computer science]], the '''complexity function''' of a string, a finite or infinite sequence ''u'' of letters from some alphabet, is the function ''p''<sub>''u''</sub>(''n'') of a positive integer ''n'' that counts the number of different factors (consecutive substrings) of length ''n'' from the string ''u''.<ref name=L7>Lothaire (2011) p.7</ref><ref name=L46/><ref name=PF3>Pytheas Fogg (2002) p.3</ref><ref name=BLRS82>Berstel et al (2009) p.82</ref> ▼
In [[computer science]], the '''complexity function''' of a ''word'' or ''[[string (computer science)|string]]'' (a finite or infinite sequence of symbols from some [[alphabet (formal languages)|alphabet]]) is the function that counts the number of distinct ''factors'' (substrings of consecutive symbols) of that string. More generally, the complexity function of a [[formal language]] (a set of finite strings) counts the number of distinct words of given length.
==Complexity function of a word==
Let ''u'' be a (possibly infinite) sequence of symbols from an alphabet. Define the function
▲
For a string ''u'' of length at least ''n'' over an alphabet of size ''k'' we clearly have
:<math> 1 \le p_u(n) \le k^n \ , </math>
the bounds being achieved by the constant word and a [[disjunctive word]],<ref name=Bug91>Bugeaud (2012) p.91</ref> for example, the [[Champernowne word]] respectively.<ref name=CN165>Cassaigne & Nicolas (2010) p.165</ref> For infinite words ''u'', we have ''p''<sub>''u''</sub>(''n'') bounded if ''u'' is ultimately periodic (a finite, possibly empty, sequence followed by a finite cycle). Conversely, if ''p''<sub>''u''</sub>(''n'') ≤ ''n'' for some ''n'', then ''u'' is ultimately periodic.<ref name=PF3/><ref name=AS302>Allouche & Shallit (2003) p.302</ref>
An '''aperiodic sequence''' is one which is not ultimately periodic. An aperiodic sequence has strictly increasing complexity function (this is the '''[[Morse–Hedlund theorem]]'''),<ref name=BR166/><ref name=CN166>Cassaigne & Nicolas (2010) p.166</ref> so ''p''(''n'') is at least ''n''+1.<ref name=L22>Lothaire (2011) p.22</ref>
A set ''S'' of finite binary words is ''balanced'' if for each ''n'' the subset ''S''<sub>''n''</sub> of words of length ''n'' has the property that the [[Hamming weight]] of the words in ''S''<sub>''n''</sub> takes at most two distinct values. A '''balanced sequence''' is one for which the set of factors is balanced.<ref name=AS313>Allouche & Shallit (2003) p.313</ref> A balanced sequence has complexity
A [[Sturmian word]] over a binary alphabet is one with complexity function ''n'' + 1.<ref name=PF6>Pytheas Fogg (2002) p.6</ref> A sequence is Sturmian [[if and only if]] it is balanced and aperiodic.<ref name=L46>Lothaire (2011) p.46</ref><ref name=AS318>Allouche & Shallit (2003) p.318</ref> An example is the [[Fibonacci word]].<ref name=PF6/><ref name=deluca1995>{{cite journal | journal=Information Processing Letters | year=1995 | pages=307–312 | doi=10.1016/0020-0190(95)00067-M | title=A division property of the Fibonacci word | first=Aldo | last=de Luca | volume=54 | issue=6 }}</ref> More generally, a Sturmian word over an
For [[recurrent word]]s, those in which each factor appears infinitely often, the complexity function
==Complexity function of a language==
Let ''L'' be a language over an alphabet and define the function ''p''<sub>''L''</sub>(''n'') of a positive integer ''n'' to be the number of different words of length ''n'' in ''L''<ref name=BR166>Berthé & Rigo (2010) p.166</ref> The complexity function of a word is thus the complexity function of the language consisting of the factors of that word.
The complexity function of a language is less constrained than that of a word. For example, it may be bounded but not eventually constant: the complexity function of the [[regular language]] <math>a(bb)^*a</math> takes values 3 and 4 on odd and even ''n''≥2 respectively. There is an analogue of the Morse–Hedlund theorem: if the complexity of ''L'' satisfies ''p''<sub>''L''</sub>(''n'') ≤ ''n'' for some ''n'', then ''p''<sub>''L''</sub> is bounded and there is a finite language ''F'' such that<ref name=BR166/>
:<math>L \subseteq \{ x y^k z : x,y,z \in F,\ k \in \mathbb{N} \} \ . </math>
A '''polynomial''' or '''[[sparse language]]''' is one for which the complexity function ''p''(''n'') is bounded by a fixed power of ''n''. A [[regular language]] which is not polynomial is ''exponential'': there are infinitely many ''n'' for which ''p''(''n'') is greater than ''k''<sup>''n''</sup> for some fixed ''k'' > 1.<ref name=BR136>Berthé & Rigo (2010) p.136</ref>
==Related concepts==
The ''[[topological entropy]]'' of an infinite sequence ''u'' is defined by
:<math>H_{\mathrm{top}}(u) = \lim_{n \rightarrow \infty} \frac{\log p_u(n)}{n \log k} \ . </math>
The limit exists as the logarithm of the complexity function is [[Subadditivity|subadditive]].<ref name=PF4>Pytheas Fogg (2002) p.4</ref><ref name=AS303>Allouche & Shallit (2003) p.303</ref> Every [[real number]] between 0 and 1 occurs as the topological
For ''x'' a real number and ''b'' an [[integer]] ≥ 2 then the complexity function of ''x'' in base ''b'' is the complexity function ''p''(''x'',''b'',''n'') of the sequence of digits of ''x'' written in base ''b''.
If ''x'' is an [[irrational number]] then ''p''(''x'',''b'',''n'') ≥ ''n''+1; if ''x'' is [[rational number|rational]] then ''p''(''x'',''b'',''n'') ≤ ''C'' for some constant ''C'' depending on ''x'' and ''b''.<ref name=Bug91>Bugeaud (2012) p.91</ref> It is conjectured that for algebraic irrational ''x'' the complexity is ''b''<sup>''n''</sup> (which would follow if all such numbers were [[Normal number|normal]]) but all that is known in this case is that ''p'' grows faster than any linear function of ''n''.<ref name=BR414>Berthé & Rigo (2010) p.414</ref>
The '''abelian complexity function''' ''p''<sup>ab</sup>(''n'') similarly counts the number of occurrences of distinct factors of given length ''n'', where now we identify factors that differ only by a permutation of the positions. Clearly ''p''<sup>ab</sup>(''n'') ≤ ''p''(''n''). The abelian complexity of a Sturmian sequence satisfies ''p''<sup>ab</sup>(''n'') = 2.<ref>{{cite book | chapter=On the Asymptotic Abelian Complexity of Morphic Words | title=Developments in Language Theory. Proceedings, 17th International Conference, DLT 2013, Marne-la-Vallée, France, June 18-21, 2013 | pages=94–105 | year=2013 | doi=10.1007/978-3-642-38771-5_10 | isbn=978-3-642-38770-8 | series=Lecture Notes in Computer Science | volume=7907 | issn=0302-9743 | publisher=[[Springer-Verlag]] | ___location=Berlin, Heidelberg | editor1-first=Marie-Pierre | editor1-last=Béal | editor2-first=Olivier | editor2-last=Carton | author1-first=Francine | author1-last=Blanchet-Sadri | author2-first=Nathan | author2-last=Fox }}</ref>
==References==
{{reflist}}
*{{cite book | last1 = Allouche | first1 = Jean-Paul | last2 = Shallit | first2 = Jeffrey | author2-link = Jeffrey Shallit | isbn = 978-0-521-82332-6 | publisher = [[Cambridge University Press]] | title = Automatic Sequences: Theory, Applications, Generalizations | year = 2003 | zbl=1086.11015 }}
* {{cite book | last1=Berstel | first1=Jean | last2=Lauve | first2=Aaron | last3=Reutenauer | first3=Christophe | last4=Saliola | first4=Franco V. | title=Combinatorics on words. Christoffel words and repetitions in words | series=CRM Monograph Series | volume=27 | ___location=Providence, RI | publisher=[[American Mathematical Society]] | year=2009 | isbn=978-0-8218-4480-9 | url=
* {{cite book | editor1-last=
*{{cite book | last=Bugeaud | first=Yann | title=Distribution modulo one and Diophantine approximation | series=Cambridge Tracts in Mathematics | volume=193 | ___location=Cambridge | publisher=[[Cambridge University Press]] | year=2012 | isbn=978-0-521-11169-0 | zbl=1260.11001}}
* {{cite book | last1=Cassaigne | first1=Julien | last2=Nicolas | first2=François | chapter=Factor complexity | pages=163–247 | editor1-last=Berthé | editor1-first=Valérie|editor1-link= Valérie Berthé | editor2-last=Rigo | editor2-first=Michel | title=Combinatorics, automata, and number theory | series=Encyclopedia of Mathematics and its Applications | volume=135 | ___location=Cambridge | publisher=[[Cambridge University Press]] | year=2010 | isbn=978-0-521-51597-9 | zbl=1216.68204 }} * {{cite book | last=Lothaire | first=M. | authorlink=M. Lothaire | title=Algebraic combinatorics on words | others=With preface by Jean Berstel and Dominique Perrin | edition=Reprint of the 2002 hardback | series=Encyclopedia of Mathematics and Its Applications | volume=90| publisher=Cambridge University Press | year=2011 | isbn=978-0-521-18071-9 | zbl=1221.68183 }}
* {{cite book | last=Pytheas Fogg | first=N. |
[[Category:Theoretical computer science]]
|