Suffix array: Difference between revisions

Content deleted Content added
m removed unnecessary word
Applications: Clarify search code
Line 186:
<syntaxhighlight lang="python">
n = len(S)
def search(P: str) -> Tuple[int, int]:
"""
Return indices (s, r) such that the interval A[s:r] (including the end
index) represents all suffixes of S that start with the pattern P.
"""
# Find starting position of interval
l = 0
r = n + 1
while l < r:
mid = (l + r) // 2 # division rounding down to nearest integer
# suffixAt(A[i]) is the ith smallest suffix
if P > suffixAt(A[mid]):
l = mid + 1
Line 196 ⟶ 202:
r = mid
s = l
# Find ending position of interval
r = n + 1
while l < r:
mid = (l + r) // 2
if P not a prefix of suffixAt(A[mid]).startswith(P):
r = mid
else:
l = mid + 1
else:
r = mid
return (s, r)
</syntaxhighlight>
Line 209 ⟶ 217:
Suffix sorting algorithms can be used to compute the [[Burrows–Wheeler transform|Burrows–Wheeler transform (BWT)]]. The [[Burrows–Wheeler transform|BWT]] requires sorting of all cyclic permutations of a string. If this string ends in a special end-of-string character that is lexicographically smaller than all other character (i.e., $), then the order of the sorted rotated [[Burrows–Wheeler transform|BWT]] matrix corresponds to the order of suffixes in a suffix array. The [[Burrows–Wheeler transform|BWT]] can therefore be computed in linear time by first constructing a suffix array of the text and then deducing the [[Burrows–Wheeler transform|BWT]] string: <math>BWT[i] = S[A[i]-1]</math>.
 
Suffix arrays can also be used to look up substrings in [[Exampleexample-Basedbased Machinemachine Translationtranslation]], demanding much less storage than a full [[phrase table]] as used in [[Statistical machine translation]].
 
Many additional applications of the suffix array require the [[LCP array]]. Some of these are detailed in the [[LCP array#Applications|application section]] of the latter.