Complexity function: Difference between revisions

Content deleted Content added
Citation bot (talk | contribs)
Altered title. | Use this bot. Report bugs. | Suggested by Dominic3203 | Category:Theoretical computer science | #UCB_Category 99/137
 
(17 intermediate revisions by 13 users not shown)
Line 1:
{{Short description|Function that counts distinct factors of a string}}
In [[computer science]], the '''complexity function''' of a ''word'' or ''[[string, (computer science)|string]]'' (a finite or infinite sequence of letterssymbols from some [[alphabet, (formal languages)|alphabet]]) is the function that counts the number of distinct ''factors'' (substrings of consecutive symbols) fromof that string. More generally, the complexity function of a [[formal language,]] (a set of finite words over an alphabet,strings) counts the number of distinct words of given length.
 
==Complexity function of a word==
Line 9 ⟶ 10:
the bounds being achieved by the constant word and a [[disjunctive word]],<ref name=Bug91>Bugeaud (2012) p.91</ref> for example, the [[Champernowne word]] respectively.<ref name=CN165>Cassaigne & Nicolas (2010) p.165</ref> For infinite words ''u'', we have ''p''<sub>''u''</sub>(''n'') bounded if ''u'' is ultimately periodic (a finite, possibly empty, sequence followed by a finite cycle). Conversely, if ''p''<sub>''u''</sub>(''n'') ≤ ''n'' for some ''n'', then ''u'' is ultimately periodic.<ref name=PF3/><ref name=AS302>Allouche & Shallit (2003) p.302</ref>
 
An '''aperiodic sequence''' is one which is not ultimately periodic. An aperiodic sequence has strictly increasing complexity function (this is the '''[[Morse–Hedlund theorem]]'''),<ref name=BR166/><ref name=CN166>Cassaigne & Nicolas (2010) p.166</ref> so ''p''(''n'') is at least ''n''+1.<ref name=L22>Lothaire (2011) p.22</ref>
 
A set ''S'' of finite binary words is ''balanced'' if for each ''n'' the subset ''S''<sub>''n''</sub> of words of length ''n'' has the property that the [[Hamming weight]] of the words in ''S''<sub>''n''</sub> takes at most two distinct values. A '''balanced sequence''' is one for which the set of factors is balanced.<ref name=AS313>Allouche & Shallit (2003) p.313</ref> A balanced sequence has complexity function at most ''n''+1.<ref name=L48>Lothaire (2011) p.48</ref>
 
A [[Sturmian word]] over a binary alphabet is one with complexity function ''n''&nbsp;+&nbsp;1.<ref name=PF6>Pytheas Fogg (2002) p.6</ref> A sequence is Sturmian [[if and only if]] it is balanced and aperiodic.<ref name=L46>Lothaire (2011) p.46</ref><ref name=AS318>Allouche & Shallit (2003) p.318</ref> An example is the [[Fibonacci word]].<ref name=PF6/><ref name=deluca1995>{{cite journal | journal=Information Processing Letters | year=1995 | pages=307–312 | doi=10.1016/0020-0190(95)00067-M | title=A division property of the Fibonacci word | first=Aldo | last=de Luca | volume=54 | issue=6 }}</ref> More generally, a Sturmian word over an alphaberalphabet of size ''k'' is one with complexity ''n''+''k''−1. An Arnoux-Rauzy word over a ternary alphabet has complexity 2''n''&nbsp;+&nbsp;1:<ref name=PF6/> an example is the [[Tribonacci word]].<ref name=PF368>Pytheas Fogg (2002) p.368</ref>
 
For [[recurrent word]]s, those in which each factor appears infinitely often, the complexity function almost characterises the set of factors: if ''s'' is a recurrent word with the same complexity function as ''t'' are then ''s'' has the same set of factors as ''t'' or δ''t'' where δ denotes the letter doubling morphism ''a'' → ''aa''.<ref name=BLRS84>Berstel et al (2009) p.84</ref>
 
==Complexity function of a language==
Let ''L'' be a language over an alphaetalphabet and define the function ''p''<sub>''L''</sub>(''n'') of a positive integer ''n'' to be the number of different words of length ''n'' in ''L''<ref name=BR166>Berthé & Rigo (2010) p.166</ref> The complexity function of a word is thus the complexity function of the language consisting of the factors of that word.
 
The complexity function of a language is less constrained than that of a word. For example, it may be bounded but not eventually constant: the complexity function of the [[regular language]] <math>a(bb)^*a</math> takes values 3 and 4 on odd and even ''n''≥2 respectively. There is an analogue of the Morse–Hedlund theorem: if the complexity of ''L'' satisfies ''p''<sub>''L''</sub>(''n'') ≤ ''n'' for some ''n'', then ''p''<sub>''L''</sub> is bounded and there is a finite language ''F'' such that<ref name=BR166/>
 
:<math>L \subseteq \{ x y^k z : x,y,z \in F,\ k \in \mathbb{N} \} \ . </math>
 
A '''polynomial''' or '''[[sparse language]]''' is one for which the complexity function ''p''(''n'') is bounded by a fixed power of ''n''. A [[regular language]] which is not polynomial is ''exponential'': there are infinitely many ''n'' for which ''p''(''n'') is greater than ''k''<sup>''n''</sup> for some fixed ''k'' > 1.<ref name=BR136>Berthé & Rigo (2010) p.136</ref>
 
==Related concepts==
Line 29 ⟶ 32:
:<math>H_{\mathrm{top}}(u) = \lim_{n \rightarrow \infty} \frac{\log p_u(n)}{n \log k} \ . </math>
 
The limit exists as the logarithm of the complexity function is [[Subadditivity|subadditive]].<ref name=PF4>Pytheas Fogg (2002) p.4</ref><ref name=AS303>Allouche & Shallit (2003) p.303</ref> Every [[real number]] between 0 and 1 occurs as the topological entryentropy of some sequence is applicable,<ref name=CN169>Cassaigne & Nicolas (2010) p.169</ref> which may be taken to be [[Uniformly recurrent word|uniformly recurrent]]<ref name=BR391>Berthé & Rigo (2010) p.391</ref> or even uniquely ergodic.<ref name=BR169>Berthé & Rigo (2010) p.169</ref>
 
For ''x'' a real number and ''b'' an [[integer]] ≥ 2 then the complexity function of ''x'' in base ''b'' is the complexity function ''p''(''x'',''b'',''n'') of the sequence of digits of ''x'' written in base ''b''.
If ''x'' is an [[irrational number]] then ''p''(''x'',''b'',''n'') ≥ ''n''+1; if ''x'' is [[rational number|rational]] then ''p''(''x'',''b'',''n'') ≤ ''C'' for some constant ''C'' depending on ''x'' and ''b''.<ref name=Bug91>Bugeaud (2012) p.91</ref> It is conjectured that for algebraic irrational ''x'' the complexity is ''b''<sup>''n''</sup> (which would follow if all such numbers were [[Normal number|normal]]) but all that is known in this case is that ''p'' grows faster than any linear function of ''n''.<ref name=BR414>Berthé & Rigo (2010) p.414</ref>
 
The '''abelian complexity function''' ''p''<sup>ab</sup>(''n'') similarly counts the number of occurrences of distinct factors of given length ''n'', where now we identify factors that differ only by a permutation of the positions. Clearly ''p''<sup>ab</sup>(''n'') ≤ ''p''(''n''). The abelian complexity of a Sturmian sequence satisfies ''p''<sup>ab</sup>(''n'') = 2.<ref>{{cite book | chapter=On the Asymptotic Abelian Complexity of Morphic Words | title=Developments in Language Theory. Proceedings, 17th International Conference, DLT 2013, Marne-la-Vallée, France, June 18-21, 2013 | pages=94-10594–105 | year=2013 | doi=10.1007/978-3-642-38771-5_10 | isbn=978-3-642-38770-8 | series=Lecture Notes in Computer Science | volume=7907 | issn=0302-9743 | publisher=[[Springer-Verlag]] | ___location=Berlin, Heidelberg | | editor1-first=Marie-Pierre | editor1-last=Béal | editor2-first=Olivier | editor2-last=Carton | author1-first=Francine | author1-last=Blanchet-Sadri | author2-first=Nathan | author2-last=Fox }}</ref>
 
==References==
{{reflist}}
*{{cite book | last1 = Allouche | first1 = Jean-Paul | last2 = Shallit | first2 = Jeffrey | author2-link = Jeffrey Shallit | isbn = 978-0-521-82332-6 | publisher = [[Cambridge University Press]] | title = Automatic Sequences: Theory, Applications, Generalizations | year = 2003 | zbl=1086.11015 }}
* {{cite book | last1=Berstel | first1=Jean | last2=Lauve | first2=Aaron | last3=Reutenauer | first3=Christophe | last4=Saliola | first4=Franco V. | title=Combinatorics on words. Christoffel words and repetitions in words | series=CRM Monograph Series | volume=27 | ___location=Providence, RI | publisher=[[American Mathematical Society]] | year=2009 | isbn=978-0-8218-4480-9 | url=httphttps://www.ams.org/bookpages/crmm-27 | zbl=1161.68043 }}
* {{cite book | editor1-last=Berthé | editor1-first=Valérie|editor1-link= Valérie Berthé | editor2-last=Rigo | editor2-first=Michel | title=Combinatorics, automata, and number theory | series=Encyclopedia of Mathematics and its Applications | volume=135 | ___location=Cambridge | publisher=[[Cambridge University Press]] | year=2010 | isbn=978-0-521-51597-9 | zbl=1197.68006 }}
*{{cite book | last=Bugeaud | first=Yann | title=Distribution modulo one and Diophantine approximation | series=Cambridge Tracts in Mathematics | volume=193 | ___location=Cambridge | publisher=[[Cambridge University Press]] | year=2012 | isbn=978-0-521-11169-0 | zbl=pre06066616 1260.11001}}
* {{cite book | last1=Cassaigne | first1=Julien | last2=Nicolas | first2=François | chapter=Factor complexity | pages=163–247 | editor1-last=Berthé | editor1-first=Valérie|editor1-link= Valérie Berthé | editor2-last=Rigo | editor2-first=Michel | title=Combinatorics, automata, and number theory | series=Encyclopedia of Mathematics and its Applications | volume=135 | ___location=Cambridge | publisher=[[Cambridge University Press]] | year=2010 | isbn=978-0-521-51597-9 | zbl=1216.68204 }}
* {{cite book | last=Lothaire | first=M. | authorlink=M. Lothaire | title=Algebraic combinatorics on words | others=With preface by Jean Berstel and Dominique Perrin | edition=Reprint of the 2002 hardback | series=Encyclopedia of Mathematics and Its Applications | volume=90| publisher=Cambridge University Press | year=2011 | isbn=978-0-521-18071-9 | zbl=1221.68183 }}
* {{cite book | last=Pytheas Fogg | first=N. | otherseditor1=Editors Berthé, Valérie;|editor1-link=Valérie Berthé|editor2=Ferenczi, Sébastien; |editor3=Mauduit, Christian; |editor4=Siegel, A. | title=Substitutions in dynamics, arithmetics and combinatorics | series=Lecture Notes in Mathematics | volume=1794 | ___location=Berlin | publisher=[[Springer-Verlag]] | year=2002 | isbn=3-540-44141-7 | zbl=1014.11015 }}
 
[[Category:Theoretical computer science]]