Content deleted Content added
this chapter looks like a good general reference |
source word suffix tree |
||
Line 14:
| year = 2012}}</ref>
These data structures typically treat their text and pattern as [[string (computer science)|strings]] over a fixed alphabet, and search for locations where the pattern occurs as a substring of the text. The symbols of the alphabet may be characters (for instance in [[Unicode]]) but in practical applications for [[text retrieval]] it may be preferable to treat the ([[Stemming|stemmed]]) words of a document as the symbols of its alphabet, because doing this reduces the lengths of both the text and pattern as measured in letters of their alphabet.<ref>{{citation
| last = Risvik | first = Knut Magne
| editor-last = Farach-Colton | editor-first = Martin | editor-link = Martin Farach-Colton
| contribution = Approximate word sequence matching over sparse suffix trees
| doi = 10.1007/BFB0030781
| pages = 65–79
| publisher = Springer
| series = Lecture Notes in Computer Science
| title = Combinatorial Pattern Matching, 9th Annual Symposium, CPM 98, Piscataway, New Jersey, USA, July 20–22, 1998, Proceedings
| volume = 1448
| year = 1998}}</ref>
The phrase '''full-text index''' is often used for substring indexes. But this is ambiguous, as it is also used for regular word indexes such as [[inverted file]]s and [[document retrieval]]. See [[full text search]].
|