Revision as of 08:24, 12 December 2024 edit David Eppstein (talk \| contribs) Autopatrolled, Administrators 235,633 edits this chapter looks like a good general reference ← Previous edit		Revision as of 08:33, 12 December 2024 edit undo David Eppstein (talk \| contribs) Autopatrolled, Administrators 235,633 edits source word suffix tree Next edit →
Line 14: \| year = 2012}}</ref> These data structures typically treat their text and pattern as [[string (computer science)\|strings]] over a fixed alphabet, and search for locations where the pattern occurs as a substring of the text. The symbols of the alphabet may be characters (for instance in [[Unicode]]) but in practical applications for [[text retrieval]] it may be preferable to treat the ([[Stemming\|stemmed]]) words of a document as the symbols of its alphabet, because doing this reduces the lengths of both the text and pattern as measured in letters of their alphabet.<ref>{{citation \| last = Risvik \| first = Knut Magne \| editor-last = Farach-Colton \| editor-first = Martin \| editor-link = Martin Farach-Colton \| contribution = Approximate word sequence matching over sparse suffix trees \| doi = 10.1007/BFB0030781 \| pages = 65–79 \| publisher = Springer \| series = Lecture Notes in Computer Science \| title = Combinatorial Pattern Matching, 9th Annual Symposium, CPM 98, Piscataway, New Jersey, USA, July 20–22, 1998, Proceedings \| volume = 1448 \| year = 1998}}</ref> The phrase '''full-text index''' is often used for substring indexes. But this is ambiguous, as it is also used for regular word indexes such as [[inverted file]]s and [[document retrieval]]. See [[full text search]].

Substring index: Difference between revisions