Longest palindromic substring: Difference between revisions

Content deleted Content added
m Expand the explanation of why S' is needed in the Manacher's algorithm
Tag: Reverted
Change inconsistent highlighted source code blocks (the first one was highlighted, the second one was not) to use syntaxhighlight. Also rename the variable S' to R.
 
(10 intermediate revisions by 9 users not shown)
Line 2:
In [[computer science]], the '''longest palindromic substring''' or '''longest symmetric factor''' problem is the problem of finding a maximum-length contiguous [[substring]] of a given string that is also a [[palindrome]]. For example, the longest palindromic substring of "bananas" is "anana". The longest palindromic substring is not guaranteed to be unique; for example, in the string "abracadabra", there is no palindromic substring with length greater than three, but there are two palindromic substrings with length three, namely, "aca" and "ada". In some applications it may be necessary to return all maximal palindromic substrings (that is, all substrings that are themselves palindromes and cannot be extended to larger palindromic substrings) rather than returning only one substring or returning the maximum length of a palindromic substring.
 
{{harvtxt|Manacher|1975}} invented an <math>O(n)</math>-time algorithm for listing all the palindromes that appear at the start of a given string of length <math>n</math>. However, as observed e.g., by {{harvtxt|Apostolico|Breslauer|Galil|1995}}, the same algorithm can also be used to find all maximal palindromic substrings anywhere within the input string, again in <math>O(n)</math> time. Therefore, it provides an <math>O(n)</math>-time solution to the longest palindromic substring problem. Alternative <math>O(n)</math>-time solutions were provided by {{harvtxt|Jeuring|1994}}, and by {{harvtxt|Gusfield|1997}}, who described a solution based on [[suffix tree]]s. A faster algorithm can be achieved in the [[word RAM]] model of computation if the size <math>\sigma</math> of the input alphabet is in <math>2^{o(\log n)}</math>. In particular, this algorithm runs in <math>O(n\log\sigma/\log n)</math> time using <math>O(n\log\sigma/\log n)</math> space.<ref name="CPR22">{{cite conference | last1 = Charalampopoulos | first1 = Panagiotis | last2 = Pissis | first2 = Solon P. | last3 = Radoszewski | first3 = Jakub | date = Jun 2022 | title = Longest Palindromic Substring in Sublinear Time | conference = Combinatorial Pattern Matching | editor1-last=Bannai|editor1-first=Hideo|editor2-last=Holub|editor2-first=Jan | publisher=Schloss Dagstuhl | series=Leibniz International Proceedings in Informatics (LIPIcs) | volume=223 | doi = 10.4230/LIPIcs.CPM.2022.20 | doi-access = free }} Here: Theorem 1, p.20:2.</ref> Efficient [[parallel algorithm]]s are also known for the problem.<ref>{{harvtxt|Crochemore|Rytter|1991}}, {{harvtxt|Apostolico|Breslauer|Galil|1995}}.</ref>
 
The longest palindromic substring problem should not be confused with the different problem of finding the longest palindromic [[subsequence]].
 
==SlowSlower algorithm==
 
This algorithm is slower than Manacher's algorithm, but is a good stepping stone for understanding Manacher's algorithm. It looks at each character as the center of a palindrome and loops to determine the largest palindrome with that center.
 
The loop at the center of the function only works for palindromes where the length is an odd number. The function works for even-length palindromes by modifying the input string. The character '|' is inserted between every character in the inputs string, and at both ends. So the input "book" becomes "|b|o|o|k|". The even-length palindrome "oo" in "book" becomes the odd-length palindrome "|o|o|".
 
<syntaxhighlight lang="C" line>
// C pseudocode.
Longest_Palindrome_SLOW(string S, string R) {
// R == S with a bogus character (eg. '|') inserted
// between each character (including outer boundaries)
 
// The radius of the longest palindrome centered on each place in R
// note: length(R) = length(PalindromeRadii) = 2 × length(S) + 1
 
array PalindromeRadii = [0,...,0]
 
Center = 0
 
while Center < length(R) {
// Determine the longest palindrome starting
// at Center-Radius and going to Center+Radius
Radius = 0
 
while Center-(Radius + 1) >= 0 and
Center+(Radius + 1) < length(R) and
R[Center-(Radius + 1)] = R[Center+(Radius + 1)] {
Radius = Radius + 1
}
 
// Save the radius of the longest palindrome in the array
 
PalindromeRadii[Center] = Radius
 
Center = Center + 1
}
 
// One can show that longest_palindrome_in_S is max(PalindromeRadii).
// if R[i] == '|', PalindromeRadii[i] is even, otherwise you could increase PalindromeRadii[i] by 1,
// which is equivalent to inserting an extra '|' in each border.
// Remember that a palindrome centered in an '|' in R corresponds to an even palindrome in S.
// if R[i] != '|', PalindromeRadii[i] is odd (same argument), and corresponds to an odd palindrome.
// In this case, the length of the palindrome
// centered at that character is also x=PalindromeRadii[i], as we have (x-1)/2 characters on each side,
// plus the extra middle one ((x-1)/2*2+1=x)
longest_palindrome_in_S = max(PalindromeRadii)
 
return longest_palindrome_in_S
}
</syntaxhighlight>
 
Longest_Palindrome_SLOW(string S, string S') {
// S' == S with a bogus character (eg. '|') inserted
// between each character (including outer boundaries)
// The radius of the longest palindrome centered on each place in S'
// note: length(S') = length(PalindromeRadii) = 2 × length(S) + 1
array PalindromeRadii = [0,...,0]
Center = 0
'''while''' Center < length(S') '''{'''
// Determine the longest palindrome starting
// at Center-Radius and going to Center+Radius
Radius = 0
'''while''' Center-(Radius + 1) >= 0 '''and'''
Center+(Radius + 1) < length(S') '''and'''
S'[Center-(Radius + 1)] = S'[Center+(Radius + 1)] '''{'''
Radius = Radius + 1
'''}'''
// Save the radius of the longest palindrome in the array
PalindromeRadii[Center] = Radius
Center = Center + 1
'''}'''
// One can show that longest_palindrome_in_S is max(PalindromeRadii).
// if S'[i] == '|', PalindromeRadii[i] is even, otherwise you could increase PalindromeRadii[i] by 1,
// which is equivalent to inserting an extra '|' in each border.
// Remember that a palindrome centered in an '|' in S' corresponds to an even palindrome in S.
// if S'[i] != '|', PalindromeRaii[i] is odd (same argument), and corresponds to an odd palindrome.
// In this case, the length of the palindrome
// centered at that character is also x=PalindromeRaii[i], as we have (x-1)/2 characters on each side,
// plus the extra middle one ((x-1)/2*2+1=x)
longest_palindrome_in_S = max(PalindromeRadii)
'''return''' longest_palindrome_in_S
}
 
The runtime of this algorithm is <math>O(n^2)</math>. The outer loop runs <math>n</math> times and the inner loop can run up to <math>n/2</math> times.
Line 62 ⟶ 65:
For example, consider the input string "abacaba". By the time it gets to the "c", Manacher's algorithm will have identified the length of every palindrome centered on the letters before the "c". At the "c", it runs a loop to identify the largest palindrome centered on the "c": "abacaba". With that knowledge, everything after the "c" looks like the reflection of everything before the "c". The "a" after the "c" has the same longest palindrome as the "a" before the "c". Similarly, the "b" after the "c" has a longest palindrome that is ''at least'' the length of the longest palindrome centered on the "b" before the "c". There are some special cases to consider, but that trick speeds up the computation dramatically.{{cn|date=March 2022}}
 
<syntaxhighlight lang="C" line>
Longest_Palindrome(string S, string S') {
// C pseudocode.
// The algorithm requires an odd-length string.
Longest_Palindrome(string S, string R) {
// It can be shown that without loss of generality
// R == S with //a anbogus odd-lengthcharacter string(eg. S'|') can be created from Sinserted
// bybetween inserting a distincteach character (e.g.including outer '|'boundaries)
 
// at the start, at the end as well as between all
// The radius of the longest palindrome centered on each place in R
// characters of S:
// note: - odd-length(R) S= getslength(PalindromeRadii) even= number2 of '|', thus× length(S') remains+ odd-length1
 
// - even-length S gets odd number of '|', thus S' becomes odd-length
array PalindromeRadii = [0,...,0]
// A palindrome in S remains a palindrome in S' and
 
// extra characters in S' can be trivially accounted for.
Center = 0
Radius = 0
// The radius of the longest palindrome centered on each place in S'
 
// note: length(S') = length(PalindromeRadii) = 2 × length(S) + 1
while Center < length(R) {
// At the start of the loop, Radius is already set to a lower-bound
array PalindromeRadii = [0,...,0]
// for the longest radius. In the first iteration, Radius is 0, but
// Centerit =can 0be higher on later iterations.
 
Radius = 0
// Determine the longest palindrome starting at Center-Radius and
// whilegoing to Center < length(S') {+Radius
 
// At the start of the loop, Radius is already set to a lower-bound
// for the longest radius. In the first iteration,while Center-(Radius+1) is>= 0, butand
Center+(Radius+1) < length(R) and
// it can be higher on later iterations.
R[Center-(Radius+1)] = R[Center+(Radius+1)] {
// Determine the longest palindrome starting at Center-Radius and= Radius+1
// going to Center+Radius}
 
// Save the radius of whilethe Center-(Radius+1)longest >=palindrome 0in andthe array
PalindromeRadii[Center] = Center+(Radius+1) < length(S') and
 
S'[Center-(Radius+1)] = S'[Center+(Radius+1)] {
// Below, Center is Radius = Radius+1incremented.
// If any precomputed values } can be reused, they are.
// Also, Radius may be set to a value greater than 0
 
// Save the radius of the longest palindrome in the array
OldCenter = PalindromeRadii[Center] = Radius
OldRadius = Radius
Center // Below,= Center is incremented.+1
 
// If any precomputed values can be reused, they are.
// Radius' default value will //be Also0, Radiusif maywe bereach setthe toend aof value greater than 0the
// following loop.
OldCenterRadius = Center0
 
OldRadius = Radius
while Center <= OldCenter + CenterOldRadius = Center+1{
 
// Radius' default value will be 0, if// weBecause reachCenter lies inside the endold ofpalindrome theand every
// character inside a palindrome has a "mirrored" character
// following loop.
// Radiusreflected =across 0its center, we can use the data that was
// precomputed for the Center's mirrored point.
 
while Center <= OldCenter + OldRadius {
MirroredCenter = OldCenter - (Center - OldCenter)
MaxMirroredRadius = OldCenter + OldRadius - Center
// Because Center lies inside the old palindrome and every
 
// character inside a palindrome has a "mirrored" character
if PalindromeRadii[MirroredCenter] < MaxMirroredRadius {
// reflected across its center, we can use the data that was
 
// precomputed for the Center's mirrored point.
// The palindrome centered at MirroredCenter is entirely
// MirroredCentercontained =in OldCenterthe -palindrome (Centercentered -at OldCenter). So,
// MaxMirroredRadiusMirroredCenter =and OldCenterCenter +have OldRadiusthe -same Centersized palindrome
 
PalindromeRadii[Center] if= PalindromeRadii[MirroredCenter] < MaxMirroredRadius {
Center = Center+1
// The palindrome centered at } else if PalindromeRadii[MirroredCenter] > isMaxMirroredRadius entirely{
 
// contained in the palindrome centered at OldCenter So,
// MirroredCenter and Center have the same sized // The palindrome at MirroredCenter extends beyond the
// palindrome at OldCenter The palindrome at Center must
// end at the edge PalindromeRadii[Center]of =the PalindromeRadii[MirroredCenter]OldCenter palindrome. Otherwise,
// the palindrome at OldCenter Centerwould =be Center+1bigger
 
} else if PalindromeRadii[MirroredCenter] > MaxMirroredRadius {
PalindromeRadii[Center] = MaxMirroredRadius
Center = Center+1
// The palindrome at MirroredCenter extends beyond the
} else { // PalindromeRadii[MirroredCenter] = MaxMirroredRadius
// palindrome at OldCenter The palindrome at Center must
 
// end at the edge of the OldCenter palindrome Otherwise,
// Since the palindrome at OldCenterMirroredCenter wouldends beexactly biggerat
// the edge of the palindrome centered at OldCenter, the
// palindrome at Center might PalindromeRadii[Center]be bigger. Set Radius =to MaxMirroredRadiusthe
// minimum size of the Centerpalindrome =at Center+1 so it doesn't
} else { // PalindromeRadii[MirroredCenter] =recheck MaxMirroredRadiusunnecessarily
 
Radius = MaxMirroredRadius
// Since the palindrome at MirroredCenter ends exactly at
// the edge of the palindrome centered at OldCenter, the break
}
// palindrome at Center might be bigger Set Radius to the
}
// minimum size of the palindrome at Center so it doesn't
}
// recheck unnecessarily
 
// A palindrome's size is equal to its radius * 2. However, since our
Radius = MaxMirroredRadius
// variable Radius considers our bogus characters to the side of the
break
// center, the size of its corresponding palindrome is actually 2 *
}
// (Radius / 2), which means a palindrome's size } is equal to its
// corresponding Radius value in PalindromeRadii
}
 
longest_palindrome_in_S = max(PalindromeRadii)
// A palindrome's size is equal to its radius * 2. However, since our
return longest_palindrome_in_S
// variable Radius considers our bogus characters to the side of the
}
// center, the size of its corresponding palindrome is actually 2 *
</syntaxhighlight>
// (Radius / 2), which means a palindrome's size is equal to its
// corresponding Radius value in PalindromeRadii
longest_palindrome_in_S = max(PalindromeRadii)
return longest_palindrome_in_S
}
 
===Special cases===
Line 178 ⟶ 176:
==Notes==
{{reflist}}
 
== See also ==
<!-- New links in alphabetical order please -->
* [[Palindrome tree]]
 
==References==
Line 191 ⟶ 193:
| doi = 10.1016/0304-3975(94)00083-U
| pages = 163–173| url = https://docs.lib.purdue.edu/cgi/viewcontent.cgi?article=2068&context=cstech
| doi-access = free
}}.
*{{citation
Line 201 ⟶ 204:
| year = 2022
| doi = 10.4230/LIPIcs.CPM.2022.20
| pages = 20:1--201–20:9| urldoi-access = https://doi.org/10.4230/LIPIcs.CPM.2022.20free
}}.
*{{citation
Line 258 ⟶ 261:
| volume = 22
| year = 1975| s2cid = 10615419
| doi-access = free
}}.
*{{Creative Commons text attribution notice|cc=by3|url=https://web.archive.org/web/20180629091030/http://wcipeg.com/wiki/index.php?title=Longest_palindromic_substring Longest palindromic substring}}