Suffix array: Difference between revisions

Content deleted Content added
Line 3:
==Details==
 
Consider the string
Consider the string "abracadabra", of length 11. It has eleven suffixes: "abracadabra", "bracadabra", "racadabra", and so on down to "a". Sorted into lexicographical order, these suffixes are
 
{| class="wikitable" border="1"
a
|-
abra
! 1
abracadabra
! 2
acadabra
! 3
adabra
! 4
bra
! 5
bracadabra
! 6
cadabra
! 7
dabra
! 8
ra
! 9
racadabra
! 10
! 11
! 12
|-
| a
| b
| r
| a
| c
| a
| d
| a
| b
| r
| a
| $
|}
 
of length 12, that ends with a sentinel letter $, appearing only once and less than any other letter in the string.
If the original string is available, each suffix can be completely specified by the index of its first character. The suffix array is the array of these indices in lexicographical order. For the string "abracadabra", using [[1-based array|one-based]] indexing, the suffix array is {11,8,1,4,6,9,2,5,7,10,3}, because the suffix "a" begins at the 11th character, "abra" begins at the 8th character, and so forth.
 
Consider the string "abracadabra", of length 11. It has eleventwelve suffixes: "abracadabra$", "bracadabra$", "racadabra$", and so on down to "a$". Sortedand into"$" lexicographicalthat order,can these suffixesbe aresorted
One may also treat the string "abracadabra" as having a twelfth suffix, the empty string (with index 12). But since the empty string will always sort lexicographically before every other suffix, we lose no information by omitting it.
into lexicographical order to obtain:
 
{| class="wikitable" border="1"
|-
! index
! sorted suffix
|-
| 12
| $
|-
| 11
| a$
|-
| 8
| abra$
|-
| 1
| abracadabra$
|-
| 4
| acadabra$
|-
| 6
| adabra$
|-
| 9
| bra$
|-
| 2
| bracadabra$
|-
| 5
| cadabra$
|-
| 7
| dabra$
|-
| 10
| ra$
|-
| 3
| racadabra$
|}
 
 
If the original string is available, each suffix can be completely specified by the index of its first character. The suffix array is the array of thesethe indices of suffixes sorted in lexicographical order. For the string "abracadabra$", using [[1-based array|one-based]] indexing, the suffix array is {12,11,8,1,4,6,9,2,5,7,10,3}, because the suffix "a$" begins at the 11th character, "abra$" begins at the 8th character, and so forth.
 
==Algorithms==