Lexicographically minimal string rotation: Difference between revisions

Content deleted Content added
Booth's Algorithm: add docstring
mNo edit summary
Line 1:
In [[computer science]], the '''lexicographically minimal string rotation''' or '''lexicographically least circular substring''' is the problem of finding the [[String_(computer_science)#Rotations|rotation of a string]] possessing the lowest [[lexicographical order]] of all such rotations. For example, the lexicographically minimal rotation of "bbaaccaadd" would be "aaccaaddbb". It is possible for a string to have multiple lexicographically minimal rotations, but for most applications this does not matter as the rotations must be equivalent. Finding the lexicographically minimal rotation is useful as a way of [[Text normalization|normalizing]] strings. If the strings represent potentially [[Isomorphismisomorphism|isomorphic]] structures such as [[Graph (discrete mathematics)|graphs]], normalizing in this way allows for simple equality checking.<ref>{{cite journal
| author = Kellogg S. Booth
| last2 = Colbourn | first2 = Charles J.
Line 34:
| issn = 0020-0190 }}
</ref>
The algorithm uses a modified preprocessing function from the [[Knuth-Morris-PrattKnuth–Morris–Pratt algorithm|Knuth-Morris-PrattKnuth–Morris–Pratt string search algorithm]]. The failure function for the string is computed as normal, but the string is rotated during the computation so some indices must be computed more than once as they wrap around. Once all indices of the failure function have been successfully computed without the string rotating again, the minimal lexicographical rotation is known to be found and its starting index is returned. The correctness of the algorithm is somewhat difficult to understand, but it is easy to implement.
 
<syntaxhighlight lang="python">
Line 56:
return k
</syntaxhighlight>
 
Of interest is that removing all lines of code which modify the value of {{mvar|k}} results in the original Knuth-Morris-Pratt preprocessing function, as {{mvar|k}} (representing the rotation) will remain zero. Booth's algorithm runs in {{tmath|O(n)}} time, where {{mvar|n}} is the length of the string. The algorithm performs at most {{tmath|3n}} comparisons in the worst case, and requires auxiliary memory of length {{mvar|n}} to hold the failure function table.
 
===Shiloach's Fast Canonization Algorithm===
 
Shiloach (1981)<ref>{{cite journal
| title = Fast canonization of circular strings
Line 108:
==See also==
* [[Lyndon word]]
* [[Knuth-Morris-PrattKnuth–Morris–Pratt algorithm]]
 
==References==