Content deleted Content added
add red link to page on morse-hedlund theorem |
Citation bot (talk | contribs) Altered title. | Use this bot. Report bugs. | Suggested by Dominic3203 | Category:Theoretical computer science | #UCB_Category 99/137 |
||
(3 intermediate revisions by 3 users not shown) | |||
Line 1:
{{Short description|Function that counts distinct factors of a string}}
In [[computer science]], the '''complexity function''' of a ''word'' or ''[[string (computer science)|string]]'' (a finite or infinite sequence of symbols from some [[alphabet (formal languages)|alphabet]]) is the function that counts the number of distinct ''factors'' (substrings of consecutive symbols) of that string. More generally, the complexity function of a [[formal language]] (a set of finite strings) counts the number of distinct words of given length.
Line 13 ⟶ 14:
A set ''S'' of finite binary words is ''balanced'' if for each ''n'' the subset ''S''<sub>''n''</sub> of words of length ''n'' has the property that the [[Hamming weight]] of the words in ''S''<sub>''n''</sub> takes at most two distinct values. A '''balanced sequence''' is one for which the set of factors is balanced.<ref name=AS313>Allouche & Shallit (2003) p.313</ref> A balanced sequence has complexity function at most ''n''+1.<ref name=L48>Lothaire (2011) p.48</ref>
A [[Sturmian word]] over a binary alphabet is one with complexity function ''n'' + 1.<ref name=PF6>Pytheas Fogg (2002) p.6</ref> A sequence is Sturmian [[if and only if]] it is balanced and aperiodic.<ref name=L46>Lothaire (2011) p.46</ref><ref name=AS318>Allouche & Shallit (2003) p.318</ref> An example is the [[Fibonacci word]].<ref name=PF6/><ref name=deluca1995>{{cite journal | journal=Information Processing Letters | year=1995 | pages=307–312 | doi=10.1016/0020-0190(95)00067-M | title=A division property of the Fibonacci word | first=Aldo | last=de Luca | volume=54 | issue=6 }}</ref> More generally, a Sturmian word over an alphabet of size ''k'' is one with complexity ''n''+''k''−1. An Arnoux-Rauzy word over a ternary alphabet has complexity 2''n'' + 1:<ref name=PF6/> an example is the [[Tribonacci word]].<ref name=PF368>Pytheas Fogg (2002) p.368</ref>
For [[recurrent word]]s, those in which each factor appears infinitely often, the complexity function almost characterises the set of factors: if ''s'' is a recurrent word with the same complexity function as ''t'' are then ''s'' has the same set of factors as ''t'' or δ''t'' where δ denotes the letter doubling morphism ''a'' → ''aa''.<ref name=BLRS84>Berstel et al (2009) p.84</ref>
Line 31 ⟶ 32:
:<math>H_{\mathrm{top}}(u) = \lim_{n \rightarrow \infty} \frac{\log p_u(n)}{n \log k} \ . </math>
The limit exists as the logarithm of the complexity function is [[Subadditivity|subadditive]].<ref name=PF4>Pytheas Fogg (2002) p.4</ref><ref name=AS303>Allouche & Shallit (2003) p.303</ref> Every [[real number]] between 0 and 1 occurs as the topological entropy of some sequence is applicable,<ref name=CN169>Cassaigne & Nicolas (2010) p.169</ref> which may be taken to be [[Uniformly recurrent word|uniformly recurrent]]<ref name=BR391>Berthé & Rigo (2010) p.391</ref> or even uniquely ergodic.<ref name=BR169>Berthé & Rigo (2010) p.169</ref>
For ''x'' a real number and ''b'' an [[integer]] ≥ 2 then the complexity function of ''x'' in base ''b'' is the complexity function ''p''(''x'',''b'',''n'') of the sequence of digits of ''x'' written in base ''b''.
If ''x'' is an [[irrational number]] then ''p''(''x'',''b'',''n'') ≥ ''n''+1; if ''x'' is [[rational number|rational]] then ''p''(''x'',''b'',''n'') ≤ ''C'' for some constant ''C'' depending on ''x'' and ''b''.<ref name=Bug91>Bugeaud (2012) p.91</ref> It is conjectured that for algebraic irrational ''x'' the complexity is ''b''<sup>''n''</sup> (which would follow if all such numbers were [[Normal number|normal]]) but all that is known in this case is that ''p'' grows faster than any linear function of ''n''.<ref name=BR414>Berthé & Rigo (2010) p.414</ref>
The '''abelian complexity function''' ''p''<sup>ab</sup>(''n'') similarly counts the number of occurrences of distinct factors of given length ''n'', where now we identify factors that differ only by a permutation of the positions. Clearly ''p''<sup>ab</sup>(''n'') ≤ ''p''(''n''). The abelian complexity of a Sturmian sequence satisfies ''p''<sup>ab</sup>(''n'') = 2.<ref>{{cite book | chapter=On the Asymptotic Abelian Complexity of Morphic Words | title=Developments in Language Theory.
==References==
|