Revision as of 13:50, 28 September 2006 edit 222.146.69.70 (talk) +ja ← Previous edit		Revision as of 14:22, 4 October 2006 edit undo 195.176.176.226 (talk) →Applications: explain o(m+log n) algorithm Next edit →
Line 27: ==Applications== The suffix array of a string can be used as an [[index (information technology)\|index]] to quickly locate every occurrence of a substring within the string. Finding every occurrence of the substring is equivalent to finding every suffix that begins with the substring. Thanks to the lexicographical ordering, these suffixes will be grouped together in the suffix array, and can be found efficiently with a [[binary search]]. If implemented straightforwardly, this binary search takes <math>O(m \log n)</math> time, where <math>m</math> is the length of the substring. To avoid redoing comparisons, ~~extra~~it ~~data~~is ~~structures~~possible ~~giving~~to ~~information~~perform ~~about~~up ~~the longest common prefixes (LCPs) of suffixes are constructed, giving~~to <math>~~O(m + \log n)~~2m</math> ~~search~~binary searches on ~~time.~~progressively narrower ranges, obtaining <math>O(m + \log n)</math> worst-case time. Each pair of binary searches restricts the range to suffixes that match one more character of the requested substring. When searching for ''abracad'', for example, the following steps would be followed: * Find the first suffix starting with ''a'' (which is ''a'' itself) * Find the last suffix starting with ''a'' (''adabra'') * Find the first suffix lying between ''a'' and ''adabra'' whose second character is ''b'' (''abra'') * Find the last suffix lying between ''a'' and ''adabra'' whose second character is ''b'' (''abracadabra'') * Find the first suffix lying between ''abra'' and ''abracadabra'' whose third character is ''r'' * Find the first suffix lying between ''abra'' and ''abracadabra'' whose third character is ''r'' * Keep doing the same until the range restricts to a single suffix (in this case at ''abrac'') * Check the remaining characters of the suffix (''adabra'') match the remaining characters of the requested string (''ad'') Suffix sorting algorithms can be used to perform the [[Burrows-Wheeler transform]] (BWT). Technically the BWT requires sorting [[cyclic permutation]]s of a string, not suffixes. We can fix this by appending to the string a special end-of-string character which sorts lexicographically before every other character. Sorting cyclic permutations is then equivalent to sorting suffixes.

Suffix array: Difference between revisions