Suffix array: Difference between revisions

Content deleted Content added
+ja
Applications: explain o(m+log n) algorithm
Line 27:
==Applications==
 
The suffix array of a string can be used as an [[index (information technology)|index]] to quickly locate every occurrence of a substring within the string. Finding every occurrence of the substring is equivalent to finding every suffix that begins with the substring. Thanks to the lexicographical ordering, these suffixes will be grouped together in the suffix array, and can be found efficiently with a [[binary search]]. If implemented straightforwardly, this binary search takes <math>O(m \log n)</math> time, where <math>m</math> is the length of the substring.

To avoid redoing comparisons, extrait datais structurespossible givingto informationperform aboutup the longest common prefixes (LCPs) of suffixes are constructed, givingto <math>O(m + \log n)2m</math> searchbinary searches on time.progressively
narrower ranges, obtaining <math>O(m + \log n)</math> worst-case time. Each pair of binary searches restricts the range to suffixes that match one more character of the requested substring. When searching for ''abracad'', for example, the following steps would be followed:
* Find the first suffix starting with ''a'' (which is ''a'' itself)
* Find the last suffix starting with ''a'' (''adabra'')
* Find the first suffix lying between ''a'' and ''adabra'' whose second character is ''b'' (''abra'')
* Find the last suffix lying between ''a'' and ''adabra'' whose second character is ''b'' (''abracadabra'')
* Find the first suffix lying between ''abra'' and ''abracadabra'' whose third character is ''r''
* Find the first suffix lying between ''abra'' and ''abracadabra'' whose third character is ''r''
* Keep doing the same until the range restricts to a single suffix (in this case at ''abrac'')
* Check the remaining characters of the suffix (''adabra'') match the remaining characters of the requested string (''ad'')
 
Suffix sorting algorithms can be used to perform the [[Burrows-Wheeler transform]] (BWT). Technically the BWT requires sorting [[cyclic permutation]]s of a string, not suffixes. We can fix this by appending to the string a special end-of-string character which sorts lexicographically before every other character. Sorting cyclic permutations is then equivalent to sorting suffixes.