Content deleted Content added
Getabyte1432 (talk | contribs) m change "Since the second loop can run at most <math>n</math> times and the value for Radius cannot exceed <math>n/2</math>," to "Since the second inner loop can run at most <math>n</math> times and the value for Radius cannot exceed <math>n/2</math>," |
Change inconsistent highlighted source code blocks (the first one was highlighted, the second one was not) to use syntaxhighlight. Also rename the variable S' to R. |
||
(45 intermediate revisions by 31 users not shown) | |||
Line 1:
{{Short description|Computer science problem}}
In [[computer science]], the '''longest palindromic substring''' or '''longest symmetric factor''' problem is the problem of finding a maximum-length contiguous [[substring]] of a given string that is also a [[palindrome]]. For example, the longest palindromic substring of "bananas" is "anana". The longest palindromic substring is not guaranteed to be unique; for example, in the string "abracadabra", there is no palindromic substring with length greater than three, but there are two palindromic substrings with length three, namely, "aca" and "ada". In some applications it may be necessary to return all maximal palindromic substrings (that is, all substrings that are themselves palindromes and cannot be extended to larger palindromic substrings) rather than returning only one substring or returning the maximum length of a palindromic substring.
{{harvtxt|Manacher|1975}} invented
The longest palindromic substring problem should not be confused with the different problem of finding the longest palindromic [[subsequence]].
==
This algorithm is slower than Manacher's algorithm, but is a good stepping stone for understanding Manacher's algorithm. It looks at each character as the center of a palindrome and loops to determine the largest palindrome with that center.
The loop at the center of the function only works for palindromes where the length is an odd number. The function works for even-length palindromes by modifying the input string. The character '|' is inserted between every character in the inputs string, and at both ends. So the input "book" becomes "|b|o|o|k|". The even-length palindrome "oo" in "book" becomes the odd-length palindrome "|o|o|".
<syntaxhighlight lang="C" line>
// C pseudocode.
Longest_Palindrome_SLOW(string S, string R) {
// R == S with a bogus character (eg. '|') inserted
// between each character (including outer boundaries)
// The radius of the longest palindrome centered on each place in R
// note: length(R) = length(PalindromeRadii) = 2 × length(S) + 1
array PalindromeRadii = [0,...,0]
Center = 0
while Center < length(R) {
// Determine the longest palindrome starting
// at Center-Radius and going to Center+Radius
Radius = 0
while Center-(Radius + 1) >= 0 and
Center+(Radius + 1) < length(R) and
R[Center-(Radius + 1)] = R[Center+(Radius + 1)] {
Radius = Radius + 1
}
// Save the radius of the longest palindrome in the array
PalindromeRadii[Center] = Radius
Center = Center + 1
}
// One can show that longest_palindrome_in_S is max(PalindromeRadii).
// if R[i] == '|', PalindromeRadii[i] is even, otherwise you could increase PalindromeRadii[i] by 1,
// which is equivalent to inserting an extra '|' in each border.
// Remember that a palindrome centered in an '|' in R corresponds to an even palindrome in S.
// if R[i] != '|', PalindromeRadii[i] is odd (same argument), and corresponds to an odd palindrome.
// In this case, the length of the palindrome
// centered at that character is also x=PalindromeRadii[i], as we have (x-1)/2 characters on each side,
// plus the extra middle one ((x-1)/2*2+1=x)
longest_palindrome_in_S = max(PalindromeRadii)
return longest_palindrome_in_S
}
</syntaxhighlight>
The runtime of this algorithm is <math>O(n^2)</math>. The outer loop runs <math>n</math> times and the inner loop can run up to <math>n/2</math> times.
==Manacher's algorithm==
Line 41 ⟶ 63:
Below is the pseudocode for Manacher's algorithm. The algorithm is faster than the previous algorithm because it exploits when a palindrome happens inside another palindrome.
For example, consider the input string "abacaba". By the time it gets to the "c", Manacher's algorithm will have identified the length of every palindrome centered on the letters before the "c". At the "c", it runs a loop to identify the largest palindrome centered on the "c": "abacaba". With that knowledge, everything after the "c" looks like the reflection of everything before the "c". The "a" after the "c" has the same longest palindrome as the "a" before the "c". Similarly, the "b" after the "c" has a longest palindrome that is ''at least'' the length of the longest palindrome centered on the "b" before the "c". There are some special cases to consider, but that trick speeds up the computation dramatically.{{cn|date=March
<syntaxhighlight lang="C" line>
// C pseudocode.
Longest_Palindrome(string S, string R) {
// R == S with a bogus character (eg. '|') inserted
// between each character (including outer boundaries)
// The radius of the longest palindrome centered on each place in R
// note: length(R) = length(PalindromeRadii) = 2 × length(S) + 1
array PalindromeRadii = [0,...,0]
Center = 0
Radius = 0
while Center < length(R) {
// At the start of the loop, Radius is already set to a lower-bound
// for the longest radius. In the first iteration, Radius is 0, but
// it can be higher on later iterations.
// Determine the longest palindrome starting at Center-Radius and
// going to Center+Radius
while Center-(Radius+1) >= 0 and
Center+(Radius+1) < length(R) and
R[Center-(Radius+1)] = R[Center+(Radius+1)] {
Radius = Radius+1
}
// Save the radius of the longest palindrome in the array
PalindromeRadii[Center] = Radius
// Below, Center is incremented.
// If any precomputed values can be reused, they are.
// Also, Radius may be set to a value greater than 0
OldCenter = Center
OldRadius = Radius
Center = Center+1
// Radius' default value will be 0, if we reach the end of the
// following loop.
Radius = 0
while Center <= OldCenter + OldRadius {
// Because Center lies inside the old palindrome and every
// character inside a palindrome has a "mirrored" character
// reflected across its center, we can use the data that was
// precomputed for the Center's mirrored point.
MirroredCenter = OldCenter - (Center - OldCenter)
MaxMirroredRadius = OldCenter + OldRadius - Center
if PalindromeRadii[MirroredCenter] < MaxMirroredRadius {
// The palindrome centered at MirroredCenter is entirely
// contained in the palindrome centered at OldCenter. So,
// MirroredCenter and Center have the same sized palindrome
PalindromeRadii[Center] = PalindromeRadii[MirroredCenter]
Center = Center+1
} else if PalindromeRadii[MirroredCenter] > MaxMirroredRadius {
// The palindrome at MirroredCenter extends beyond the
// palindrome at OldCenter The palindrome at Center must
// end at the edge of the OldCenter palindrome. Otherwise,
// the palindrome at OldCenter would be bigger
PalindromeRadii[Center] = MaxMirroredRadius
Center = Center+1
} else { // PalindromeRadii[MirroredCenter] = MaxMirroredRadius
// Since the palindrome at MirroredCenter ends exactly at
// the edge of the palindrome centered at OldCenter, the
// palindrome at Center might be bigger. Set Radius to the
// minimum size of the palindrome at Center so it doesn't
// recheck unnecessarily
Radius = MaxMirroredRadius
break
}
}
}
// A palindrome's size is equal to its radius * 2. However, since our
// variable Radius considers our bogus characters to the side of the
// center, the size of its corresponding palindrome is actually 2 *
// (Radius / 2), which means a palindrome's size is equal to its
// corresponding Radius value in PalindromeRadii
longest_palindrome_in_S = max(PalindromeRadii)
return longest_palindrome_in_S
}
</syntaxhighlight>
===Special cases===
Manacher's algorithm is faster because it reuses precomputed data when a palindrome exists inside another palindrome. There are 3 cases of this. They are represented by the "if / else if / else" statement in the pseudocode.
The first case is when the palindrome at <code>MirroredCenter</code> lies completely inside the "Old" palindrome. In this situation, the palindrome at <code>Center</code> will have the same length as the one at <code>MirroredCenter</code>. For example, if the "Old" palindrome is "abcbpbcba", we can see that the palindrome centered on "c" after the "p" must have the same length as the palindrome centered on the "c" before the "p".
The second case is when the palindrome at <code>MirroredCenter</code> extends outside the "Old" palindrome. That is, it extends "to the left" (or, contains characters with a lower index than any inside the "Old" palindrome). Because the "Old" palindrome is the largest possible palindrome centered on <code>OldCenter</code>, we know the characters before and after it are different. Thus, the palindrome at <code>Center</code> will run exactly up to the border of the "Old" palindrome, because the next character will be different than the one inside the palindrome at <code>MirroredCenter</code>. For example, if the string was "ababc", the "Old" palindrome could be "bab" with the <code>Center</code> being the second "b" and the <code>MirroredCenter</code> being the first "b". Since the palindrome at the
The third and last case is when the palindrome at <code>MirroredCenter</code> extends exactly up to the border of the "Old" palindrome. In this case, we don't know if the character after the "Old" palindrome might make the palindrome at <code>Center</code> longer than the one at <code>MirroredCenter</code>. But we do know that the palindrome at <code>Center</code> is ''at least'' as long as the one at <code>MirroredCenter</code>. In this case, <code>Radius</code> is initialized to the radius of the palindrome at <code>MirroredCenter</code> and the search starts from there. An example string would be "abcbpbcbp" where the "Old" palindrome is "bcbpbcb" and the <code>Center</code> is on the second "c". The <code>MirroredCenter</code> is the first "c" and it has a longest palindrome of "bcb". The longest palindrome at the <code>Center</code> on the second "c" has to be at least that long and, in this case, is longer.
===Runtime===
The algorithm runs in linear time.
==Notes==
{{reflist}}
== See also ==
<!-- New links in alphabetical order please -->
* [[Palindrome tree]]
==References==
Line 127 ⟶ 193:
| doi = 10.1016/0304-3975(94)00083-U
| pages = 163–173| url = https://docs.lib.purdue.edu/cgi/viewcontent.cgi?article=2068&context=cstech
| doi-access = free
}}.
*{{citation
| last1 = Charalampopoulos | first1 = Panagiotis
| last2 = Pissis | first2 = Solon P.
| last3 = Radoszewski | first3 = Jakub
| title = Longest Palindromic Substring in Sublinear Time
| journal = 33rd Annual Symposium on Combinatorial Pattern Matching, CPM 2022, June 27-29, 2022, Prague, Czech Republic
| volume = 223
| year = 2022
| doi = 10.4230/LIPIcs.CPM.2022.20
| pages = 20:1–20:9| doi-access = free
}}.
*{{citation
Line 138 ⟶ 216:
| title = Usefulness of the Karp–Miller–Rosenberg algorithm in parallel computations on strings and arrays
| volume = 88
| year = 1991
}}.
*{{citation
| last1 = Crochemore | first1 = Maxime
Line 182 ⟶ 261:
| volume = 22
| year = 1975| s2cid = 10615419
| doi-access = free
}}.
*{{Creative Commons text attribution notice|cc=by3|url=https://web.archive.org/web/20180629091030/http://wcipeg.com/wiki/index.php?title=Longest_palindromic_substring Longest palindromic substring}}
==External links==
Line 188 ⟶ 269:
* {{citation|first=Fred|last=Akalin|url=https://www.akalin.com/longest-palindrome-linear-time|title=Finding the longest palindromic substring in linear time|date=2007-11-28|accessdate=2016-10-01}}. An explanation and Python implementation of Manacher's linear-time algorithm.
* {{citation|first=Johan|last=Jeuring|url=http://www.staff.science.uu.nl/~jeuri101/homepage/palindromes/index.html|title=Palindromes|date=2007–2010|accessdate=2011-11-22}}. Haskell implementation of Jeuring's linear-time algorithm.
* {{citation|url=https://github.com/vvikas/palindromes/blob/master/src/LongestPalindromicSubString.java|title=Palindromes}} (deadlink). Java implementation of Manacher's linear-time algorithm.
[[Category:Problems on strings]]
[[Category:Palindromes]]
|