Suffix array: Difference between revisions

Content deleted Content added
OAbot (talk | contribs)
m Open access bot: doi added to citation with #oabot.
Eliminate duplicate cite. Mores SFNs.
Line 29:
{{harvtxt|Li|Li|Huo|2016}} gave the first in-place <math>\mathcal{O}(n)</math> time suffix array construction algorithm that is optimal both in time and space, where ''in-place'' means that the algorithm only needs <math>\mathcal{O}(1)</math> additional space beyond the input string and the output suffix array.
 
[[Enhanced suffix arrays]] (ESAs) are suffix arrays with additional tables that reproduce the full functionality of suffix trees preserving the same time and memory complexity.<ref>{{Cite journalsfn|last1=Abouelhoda|first1=Mohamed Ibrahim|last2=Kurtz|first2=Stefan|last3=Ohlebusch|first3=Enno|date=March 2004|title=Replacing suffix trees with enhanced suffix arrays|journal=Journal of Discrete Algorithms|volume=2|issue=1|pages=53–86|doi=10.1016/s1570-8667(03)00065-0|issn=1570-8667|doi-access=free}}</ref>
The suffix array for a subset of all suffixes of a string is called [[sparse suffix array]].<ref>{{Citationsfn|last1=Kärkkäinen|first1=JuhaUkkonen|title=Sparse suffix trees|date=1996|work=Lecture Notes in Computer Science|pages=219–230|publisher=Springer Berlin Heidelberg|doi=10.1007/3-540-61332-3_155|isbn=9783540613329|last2=Ukkonen|first2=Esko}}</ref> Multiple probabilistic algorithms have been developed to minimize the additional memory usage including an optimal time and memory algorithm.<ref>{{Cite journalsfn|last1=Gawrychowski|first1=Paweł|last2=Kociumaka|first2=Tomasz|date=January 2017|title=Sparse Suffix Tree Construction in Optimal Time and Space|journal=Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms|___location=Philadelphia, PA|publisher=Society for Industrial and Applied Mathematics|pages=425–439|isbn=9781611974782|arxiv=1608.00865|doi=10.1137/1.9781611974782.27|s2cid=6608776}}</ref>
 
== Definition ==
Line 182:
== Generalized Suffix Array ==
 
The concept of a suffix array can be extended to more than one string. This is called a generalized suffix array (or GSA), a suffix array that contains all suffixes for a set of strings (for example, <math>S=S_1,S_2,S_3,...,S_k</math> and is lexicographically sorted with all suffixes of each string.<ref>{{Citationsfn|last1=Shi|first1=Fei|title=Suffix arrays for multiple strings: A method for on-line multiple string searches.|date=1996|series=Lecture Notes in Computer Science|publisher=Springer Berlin Heidelberg|volume=1179|pages=11–22|doi=10.1007/BFb0027775|isbn=978-3-540-62031-0}}</ref>
 
== Applications ==
Line 219:
return (s, r)
</syntaxhighlight>
Finding the substring pattern <math>P</math> of length <math>m</math> in the string <math>S</math> of length <math>n</math> takes <math>\mathcal{O}(m \log n)</math> time, given that a single suffix comparison needs to compare <math>m</math> characters. {{harvtxt|Manber|Myers|1990}} describe how this bound can be improved to <math>\mathcal{O}(m + \log n)</math> time using [[LCP array|LCP]] information. The idea is that a pattern comparison does not need to re-compare certain characters, when it is already known that these are part of the longest common prefix of the pattern and the current search interval. {{harvtxt|Abouelhoda|Kurtz|Ohlebusch|2004}} improve the bound even further and achieve a search time of <math>\mathcal{O}(m)</math> as known from [[suffix tree]]s.
 
Suffix sorting algorithms can be used to compute the [[Burrows–Wheeler transform|Burrows–Wheeler transform (BWT)]]. The [[Burrows–Wheeler transform|BWT]] requires sorting of all cyclic permutations of a string. If this string ends in a special end-of-string character that is lexicographically smaller than all other character (i.e., $), then the order of the sorted rotated [[Burrows–Wheeler transform|BWT]] matrix corresponds to the order of suffixes in a suffix array. The [[Burrows–Wheeler transform|BWT]] can therefore be computed in linear time by first constructing a suffix array of the text and then deducing the [[Burrows–Wheeler transform|BWT]] string: <math>BWT[i] = S[A[i]-1]</math>.
Line 253:
| author2-link = Gene_Myers
}}
* {{Cite journal|last1=Gawrychowski|first1=Paweł|last2=Kociumaka|first2=Tomasz|date=January 2017 |title=Sparse Suffix Tree Construction in Optimal Time and Space|journal=Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms|___location=Philadelphia, PA|publisher=Society for Industrial and Applied Mathematics|pages=425–439|isbn=9781611974782|arxiv=1608.00865|doi=10.1137/1.9781611974782.27 |s2cid=6608776}}
* {{cite conference |last1=Li|first1=Zhize|last2=Li|first2=Jian|last3=Huo|first3=Hongwei
|doi = 10.1007/978-3-030-00479-8_22
|title = Optimal In-Place Suffix Sorting
Line 264 ⟶ 265:
|volume = 11147
|pages = 268–284
}}
|last1=Li|first1=Zhize}}
* {{cite conference|last1=Shi|first1=Fei|title=Suffix arrays for multiple strings: A method for on-line multiple string searches|date=1996|series=Lecture Notes in Computer Science|volume=1179 |publisher=Springer Berlin Heidelberg |pages=11–22|doi=10.1007/BFb0027775|isbn=978-3-540-62031-0}}
* {{cite conference
| doi = 10.1007/3-540-45784-4_35|title=The Enhanced Suffix Array and Its Applications to Genome Analysis |conference=Algorithms in Bioinformatics|series=[[Lecture Notes in Computer Science]]|year=2002 |last1=Abouelhoda|first1=Mohamed Ibrahim|last2=Kurtz|first2=Stefan|last3=Ohlebusch|first3=Enno|isbn=978-3-540-44211-0|volume=2452|pages=449}}
* {{cite journal
| doi=10.1016/S1570-8667(03)00065-0
| title=Replacing suffix trees with enhanced suffix arrays
| yeardate=March 2004
| last1=Abouelhoda | first1=Mohamed Ibrahim
| last2=Kurtz | first2=Stefan
Line 276 ⟶ 280:
| issue=1
| pages=53–86| doi-access=free
| issn=1570-8667
}}
* {{cite journal
Line 298 ⟶ 303:
| hdl-access = free
}}
* {{cite conference
| doi = 10.1007/3-540-45784-4_35|title=The Enhanced Suffix Array and Its Applications to Genome Analysis|conference=Algorithms in Bioinformatics|series=[[Lecture Notes in Computer Science]]|year=2002|last1=Abouelhoda|first1=Mohamed Ibrahim|last2=Kurtz|first2=Stefan|last3=Ohlebusch|first3=Enno|isbn=978-3-540-44211-0|volume=2452|pages=449
}}
* {{cite journal
| doi = 10.1145/1242471.1242472|title=A taxonomy of suffix array construction algorithms|year=2007|last1=Puglisi|first1=Simon J.|last2=Smyth|first2=W. F.|last3=Turpin|first3=Andrew H.|journal=ACM Computing Surveys|volume=39|issue=2|pages=4
Line 317 ⟶ 319:
}}
* {{cite conference
| doi = 10.1145/800152.804905|title=Rapid identification of repeated patterns in strings, trees and arrays |conference=Proceedings of the fourth annual ACM symposium on Theory of computing - STOC '72|year=1972 |last1=Karp|first1=Richard M.|last2=Miller|first2=Raymond E.|last3=Rosenberg|first3=Arnold L.|pages=125
}}
* {{cite conference
| doi = 10.1109/SFCS.1997.646102|title=Optimal suffix tree construction with large alphabets |conference=Proceedings 38th Annual Symposium on Foundations of Computer Science|year=1997 |last1=Farach|first1=M.|isbn=0-8186-8197-7|pages=137
}}
* {{cite conference|last1=Kärkkäinen|first1=Juha|last2=Ukkonen|first2=Esko|title=Sparse suffix trees |date=1996|series=Lecture Notes in Computer Science |volume=1090 |pages=219–230 |publisher=Springer Berlin Heidelberg |doi=10.1007/3-540-61332-3_155 |isbn=978-3-540-61332-9}}
* {{cite conference
| doi = 10.1007/3-540-45061-0_73|title=Simple Linear Work Suffix Array Construction|conference=Automata, Languages and Programming|series=Lecture Notes in Computer Science|year=2003 |last1=Kärkkäinen|first1=Juha|last2=Sanders|first2=Peter|author2-link=Peter Sanders (computer scientist)|isbn=978-3-540-40493-4|volume=2719|pages=943
}}
* {{cite journal
| doi = 10.1145/1227161.1402296|title=Better external memory suffix array construction|year=2008 |last1=Dementiev|first1=Roman|last2=Kärkkäinen|first2=Juha|last3=Mehnert|first3=Jens|last4=Sanders |first4=Peter|author4-link=Peter Sanders (computer scientist)|journal=Journal of Experimental Algorithmics |volume=12|pages=1–24| s2cid=12296500| url=https://publikationen.bibliothek.kit.edu/1000009446}}
| s2cid = 12296500| url = https://publikationen.bibliothek.kit.edu/1000009446}}
* {{cite journal
| doi = 10.1016/j.parco.2007.06.004|title=Scalable parallel suffix array construction|year=2007 |last1=Kulla|first1=Fabian|last2=Sanders|first2=Peter|author2-link=Peter Sanders (computer scientist) |journal=Parallel Computing |volume=33|issue=9|pages=605
}}