Lexicographically minimal string rotation: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 03:31, 19 January 2016 edit Cedar101 (talk \| contribs) Extended confirmed users 18,557 edits m →The Naive Algorithm: fix italics ← Previous edit		Latest revision as of 05:10, 21 July 2025 edit undo OAbot (talk \| contribs) Bots 646,409 edits m Open access bot: arxiv updated in citation with #oabot.
(25 intermediate revisions by 15 users not shown)
Line 1: In [[computer science]], the '''lexicographically minimal string rotation''' (LMSR) or '''lexicographically least circular substring''' is the problem of finding the [[String_(computer_science)#Rotations\|rotation of a string]] possessing the lowest [[lexicographical order]] of all such rotations. For example, the lexicographically minimal rotation of "bbaaccaadd" would be "aaccaaddbb". LMSR is widely used in [[Graph isomorphism\|equality checking of graphs]], polygons, automata and chemical structures.<ref name="Wang"/> It is possible for a string to have multiple ~~lexicographically minimal rotations~~LMSRs, but for most applications this does not matter as the rotations must be equivalent. Finding the lexicographically minimal rotation is useful as a way of [[Text normalization\|normalizing]] strings. If the strings represent potentially [[~~Isomorphism~~isomorphism\|isomorphic]] structures such as [[Graph (discrete mathematics)\|graphs]], normalizing in this way allows for simple equality checking.<ref>{{cite journal \| author = Kellogg S. Booth \| last2 = Colbourn \| first2 = Charles J. Line 12 ⟶ 14: \| url = http://epubs.siam.org/sicomp/resource/1/smjcat/v10/i1/p203_s \| doi = 10.1137/0210015 \| issn = 0097-5397 }}\| url-access = subscription }} </ref> A common implementation trick when dealing with circular strings is to concatenate the string to itself instead of having to perform [[modular arithmetic]] on the string indices. Line 18 ⟶ 21: ==Algorithms== ===The ~~Naive~~naive ~~Algorithm~~algorithm=== The naive algorithm for finding the lexicographically minimal rotation of a string is to iterate through successive rotations while keeping track of the most lexicographically minimal rotation encountered. If the string is of length {{mvar\|n}}, this algorithm runs in {{math\|''O''(''n''<sup>2</sup>)}} time in the worst case. ===Booth's ~~Algorithm~~algorithm=== An efficient algorithm was proposed by Booth (1980).<ref>{{cite journal \| author = Kellogg S. Booth Line 28 ⟶ 31: \| publisher = Elsevier \| volume = 10 \| number = ~~4-5~~4–5 \| pages = 240–242 \| year = 1980 ~~\| url = http://www.sciencedirect.com/science/article/pii/0020019080901490~~ \| doi = 10.1016/0020-0190(80)90149-0 \| issn = 0020-0190 }} </ref> The algorithm uses a modified preprocessing function from the [[~~Knuth-Morris-Pratt~~Knuth–Morris–Pratt algorithm\|~~Knuth-Morris-Pratt~~Knuth–Morris–Pratt string search algorithm]]. The failure function for the string is computed as normal, but the string is rotated during the computation so some indices must be computed more than once as they wrap around. Once all indices of the failure function have been successfully computed without the string rotating again, the minimal lexicographical rotation is known to be found and its starting index is returned. The correctness of the algorithm is somewhat difficult to understand, but it is easy to implement. <~~source~~syntaxhighlight lang="~~Python~~python"> def ~~LCS~~least_rotation(Ss: str) -> int: """Booth's lexicographically minimal string rotation algorithm.""" ~~S += S # Concatenate string to it self to avoid modular arithmetic~~ fn = [-1]len(Ss) ~~# Failure function~~ f = [-1] (2 * n) ~~k = 0 # Least rotation of string found so far~~ k = 0 for j in range(1, ~~len(S)~~2 * n): i = f[j - k - 1] while i != -1 and Ss[j % n] != Ss[(k + i + 1) % n]: if Ss[j % n] < Ss[(k + i + 1) % n]: k = j - i - 1 i = f[i] if i == -1 and Ss[j % n] != Ss[(k + i + 1) % n]: if Ss[j % n] < Ss[(k + i + 1) % n]: k = j f[j - k] = -1 else: f[j - k] = i + 1 return k </syntaxhighlight> ~~</source>~~ Of interest is that removing all lines of code which modify the value of '''''k''''' results in the original Knuth-Morris-Pratt preprocessing function, as '''''k''''' (representing the rotation) will remain zero. Booth's algorithm runs in '''''O(n)''''' time, where '''''n''''' is the length of the string. The algorithm performs at most '''''3n''''' comparisons in the worst case, and requires auxiliary memory of length '''''n''''' to hold the failure function table.▼ ▲Of interest is that removing all lines of code which modify the value of ~~'''''~~{{mvar\|k~~'''''~~}} results in the original Knuth-Morris-Pratt preprocessing function, as ~~'''''~~{{mvar\|k~~'''''~~}} (representing the rotation) will remain zero. Booth's algorithm runs in ~~'''''~~{{tmath\|O(n)~~'''''~~}} time, where ~~'''''~~{{mvar\|n~~'''''~~}} is the length of the string. The algorithm performs at most ~~'''''~~{{tmath\|3n~~'''''~~}} comparisons in the worst case, and requires auxiliary memory of length ~~'''''~~{{mvar\|n~~'''''~~}} to hold the failure function table. ===Shiloach's Fast Canonization Algorithm===▼ ▲===Shiloach's ~~Fast~~fast ~~Canonization~~canonization ~~Algorithm~~algorithm=== Shiloach (1981)<ref>{{cite journal \| title = Fast canonization of circular strings Line 70 ⟶ 73: \| issn = 0196-6774 \| doi = 10.1016/0196-6774(81)90013-4 ~~\| url = http://www.sciencedirect.com/science/article/pii/0196677481900134~~ \| author = Yossi Shiloach }} </ref> proposed an algorithm improving on Booth's result in terms of performance. It was observed that if there are ~~'''''~~{{mvar\|q~~'''''~~}} equivalent lexicographically minimal rotations of a string of length ~~'''''~~{{mvar\|n~~'''''~~}}, then the string must consist of ''{{mvar\|q''}} equal substrings of length ~~'''''~~{{tmath\|1=d=n/q~~'''''~~}}. The algorithm requires only ~~'''''~~{{tmath\|n + d/2~~'''''~~}} comparisons and constant space in the worst case. The algorithm is divided into two phases. The first phase is a quick sieve which rules out indices that are obviously not starting locations for the lexicographically minimal rotation. The second phase then finds the lexicographically minimal rotation start index from the indices which remain. ===Duval's Lyndon ~~Factorization~~factorization ~~Algorithm~~algorithm=== Duval (1983)<ref>{{cite journal \| title = Factorizing words over an ordered alphabet Line 88 ⟶ 90: \| issn = 0196-6774 \| doi = 10.1016/0196-6774(83)90017-2 ~~\| url = http://www.sciencedirect.com/science/article/pii/0196677483900172~~ \| author = Jean Pierre Duval }} </ref> Line 103 ⟶ 104: \| pages = 236–238 \| year = 1979 ~~\| url = http://www.sciencedirect.com/science/article/pii/0020019079901145~~ \| doi = 10.1016/0020-0190(79)90114-5 \| issn = 0020-0190 }} </ref> proposed an algorithm to efficiently compare two circular strings for equality without a normalization requirement. An additional application which arises from the algorithm is the fast generation of certain chemical structures without repetitions. A variant for [[quantum computing]] was proposed by Wang & Ying (2024).<ref name="Wang">{{cite journal \| author = Wang, Q., Ying, M. \| title = Quantum Algorithm for Lexicographically Minimal String Rotation \| journal = Theory Comput Syst \| volume = 68 \| pages = 29–74 \| year = 2024 \| doi = 10.1007/s00224-023-10146-8 \| arxiv = 2012.09376 }} </ref> They show that the quantum algorithm outperforms any (classical) randomized algorithms in both worst and average cases. ==See also== * [[Lyndon word]] * [[~~Knuth-Morris-Pratt~~Knuth–Morris–Pratt algorithm]] ==References== Line 118 ⟶ 130: [[Category:Problems on strings]] [[Category:Lexicography]] [[Category:Articles with example code]]