Lexicographically minimal string rotation: Difference between revisions

Content deleted Content added
m The Naive Algorithm: fix italics
OAbot (talk | contribs)
m Open access bot: arxiv updated in citation with #oabot.
 
(25 intermediate revisions by 15 users not shown)
Line 1:
In [[computer science]], the '''lexicographically minimal string rotation''' (LMSR) or '''lexicographically least circular substring''' is the problem of finding the [[String_(computer_science)#Rotations|rotation of a string]] possessing the lowest [[lexicographical order]] of all such rotations. For example, the lexicographically minimal rotation of "bbaaccaadd" would be "aaccaaddbb". LMSR is widely used in [[Graph isomorphism|equality checking of graphs]], polygons, automata and chemical structures.<ref name="Wang"/>

It is possible for a string to have multiple lexicographically minimal rotationsLMSRs, but for most applications this does not matter as the rotations must be equivalent. Finding the lexicographically minimal rotation is useful as a way of [[Text normalization|normalizing]] strings. If the strings represent potentially [[Isomorphismisomorphism|isomorphic]] structures such as [[Graph (discrete mathematics)|graphs]], normalizing in this way allows for simple equality checking.<ref>{{cite journal
| author = Kellogg S. Booth
| last2 = Colbourn | first2 = Charles J.
Line 12 ⟶ 14:
| url = http://epubs.siam.org/sicomp/resource/1/smjcat/v10/i1/p203_s
| doi = 10.1137/0210015
| issn = 0097-5397 }}| url-access = subscription
}}
</ref>
A common implementation trick when dealing with circular strings is to concatenate the string to itself instead of having to perform [[modular arithmetic]] on the string indices.
Line 18 ⟶ 21:
==Algorithms==
 
===The Naivenaive Algorithmalgorithm===
The naive algorithm for finding the lexicographically minimal rotation of a string is to iterate through successive rotations while keeping track of the most lexicographically minimal rotation encountered. If the string is of length {{mvar|n}}, this algorithm runs in {{math|''O''(''n''<sup>2</sup>)}} time in the worst case.
 
===Booth's Algorithmalgorithm===
An efficient algorithm was proposed by Booth (1980).<ref>{{cite journal
| author = Kellogg S. Booth
Line 28 ⟶ 31:
| publisher = Elsevier
| volume = 10
| number = 4-54–5
| pages = 240–242
| year = 1980
| url = http://www.sciencedirect.com/science/article/pii/0020019080901490
| doi = 10.1016/0020-0190(80)90149-0
| issn = 0020-0190 }}
</ref>
The algorithm uses a modified preprocessing function from the [[Knuth-Morris-PrattKnuth–Morris–Pratt algorithm|Knuth-Morris-PrattKnuth–Morris–Pratt string search algorithm]]. The failure function for the string is computed as normal, but the string is rotated during the computation so some indices must be computed more than once as they wrap around. Once all indices of the failure function have been successfully computed without the string rotating again, the minimal lexicographical rotation is known to be found and its starting index is returned. The correctness of the algorithm is somewhat difficult to understand, but it is easy to implement.
 
<sourcesyntaxhighlight lang="Pythonpython">
def LCSleast_rotation(Ss: str) -> int:
"""Booth's lexicographically minimal string rotation algorithm."""
S += S # Concatenate string to it self to avoid modular arithmetic
fn = [-1]*len(Ss) # Failure function
f = [-1] * (2 * n)
k = 0 # Least rotation of string found so far
k = 0
for j in range(1, len(S)2 * n):
i = f[j - k - 1]
while i != -1 and Ss[j % n] != Ss[(k + i + 1) % n]:
if Ss[j % n] < Ss[(k + i + 1) % n]:
k = j - i - 1
i = f[i]
if i == -1 and Ss[j % n] != Ss[(k + i + 1) % n]:
if Ss[j % n] < Ss[(k + i + 1) % n]:
k = j
f[j - k] = -1
else:
f[j - k] = i + 1
return k
</syntaxhighlight>
</source>
Of interest is that removing all lines of code which modify the value of '''''k''''' results in the original Knuth-Morris-Pratt preprocessing function, as '''''k''''' (representing the rotation) will remain zero. Booth's algorithm runs in '''''O(n)''''' time, where '''''n''''' is the length of the string. The algorithm performs at most '''''3n''''' comparisons in the worst case, and requires auxiliary memory of length '''''n''''' to hold the failure function table.
 
Of interest is that removing all lines of code which modify the value of '''''{{mvar|k'''''}} results in the original Knuth-Morris-Pratt preprocessing function, as '''''{{mvar|k'''''}} (representing the rotation) will remain zero. Booth's algorithm runs in '''''{{tmath|O(n)'''''}} time, where '''''{{mvar|n'''''}} is the length of the string. The algorithm performs at most '''''{{tmath|3n'''''}} comparisons in the worst case, and requires auxiliary memory of length '''''{{mvar|n'''''}} to hold the failure function table.
===Shiloach's Fast Canonization Algorithm===
 
===Shiloach's Fastfast Canonizationcanonization Algorithmalgorithm===
Shiloach (1981)<ref>{{cite journal
| title = Fast canonization of circular strings
Line 70 ⟶ 73:
| issn = 0196-6774
| doi = 10.1016/0196-6774(81)90013-4
| url = http://www.sciencedirect.com/science/article/pii/0196677481900134
| author = Yossi Shiloach }}
</ref>
proposed an algorithm improving on Booth's result in terms of performance. It was observed that if there are '''''{{mvar|q'''''}} equivalent lexicographically minimal rotations of a string of length '''''{{mvar|n'''''}}, then the string must consist of ''{{mvar|q''}} equal substrings of length '''''{{tmath|1=d=n/q'''''}}. The algorithm requires only '''''{{tmath|n + d/2'''''}} comparisons and constant space in the worst case.
 
The algorithm is divided into two phases. The first phase is a quick sieve which rules out indices that are obviously not starting locations for the lexicographically minimal rotation. The second phase then finds the lexicographically minimal rotation start index from the indices which remain.
 
===Duval's Lyndon Factorizationfactorization Algorithmalgorithm===
Duval (1983)<ref>{{cite journal
| title = Factorizing words over an ordered alphabet
Line 88 ⟶ 90:
| issn = 0196-6774
| doi = 10.1016/0196-6774(83)90017-2
| url = http://www.sciencedirect.com/science/article/pii/0196677483900172
| author = Jean Pierre Duval }}
</ref>
Line 103 ⟶ 104:
| pages = 236–238
| year = 1979
| url = http://www.sciencedirect.com/science/article/pii/0020019079901145
| doi = 10.1016/0020-0190(79)90114-5
| issn = 0020-0190 }}
</ref>
proposed an algorithm to efficiently compare two circular strings for equality without a normalization requirement. An additional application which arises from the algorithm is the fast generation of certain chemical structures without repetitions.
 
A variant for [[quantum computing]] was proposed by Wang & Ying (2024).<ref name="Wang">{{cite journal
| author = Wang, Q., Ying, M.
| title = Quantum Algorithm for Lexicographically Minimal String Rotation
| journal = Theory Comput Syst
| volume = 68
| pages = 29–74
| year = 2024
| doi = 10.1007/s00224-023-10146-8
| arxiv = 2012.09376
}}
</ref> They show that the quantum algorithm outperforms any (classical) randomized algorithms in both worst and average cases.
 
==See also==
* [[Lyndon word]]
* [[Knuth-Morris-PrattKnuth–Morris–Pratt algorithm]]
 
==References==
Line 118 ⟶ 130:
[[Category:Problems on strings]]
[[Category:Lexicography]]
[[Category:Articles with example code]]