Revision as of 08:12, 12 December 2024 edit David Eppstein (talk \| contribs) Autopatrolled, Administrators 235,660 edits notation unneeded ← Previous edit		Revision as of 08:14, 12 December 2024 edit undo David Eppstein (talk \| contribs) Autopatrolled, Administrators 235,660 edits ce Next edit →
Line 3: In [[computer science]], a '''substring index''' is a [[data structure]] which gives [[substring]] search in a text or text collection in [[sublinear]] time. Once constructed from a document or set of documents, a substring index can be used to locate all occurrences of a pattern in time linear or near-linear in the pattern size, with no dependence or only logarithmic dependence on the document size. These data structures typically treat their text and pattern as [[string (computer science)\|strings]] over a fixed alphabet, and search for locations where the pattern occurs as a substring of the text. The symbols of the alphabet may ~~consist of~~be characters (for instance in [[Unicode]]) but in practical applications for [[text retrieval]] it may be preferable to treat the ([[Stemming\|stemmed]]) words of a document as the ~~elements~~symbols of ~~the~~its alphabet, because doing this reduces the lengths of both the text and pattern as measured in letters of their alphabet. The phrase '''full-text index''' is often used for substring indexes. But this is ambiguous, as it is also used for regular word indexes such as [[inverted file]]s and [[document retrieval]]. See [[full text search]]. Specific data structures that can be used as substring indexes include: * The [[suffix tree]], a [[radix tree]] of the suffixes of the string, allowing substring search to be performed ~~character~~symbol-by-~~character~~symbol<ref name=gv/> * The [[suffix array]], a sorted array of the starting positions of suffixes of the string, allowing substring search to be performed by [[binary search]]<ref name=gv/> * The [[compressed suffix array]], a data structure that combines [[data compression]] with the suffix array, allowing the structure to be stored in space sublinear in the text length<ref name=gv>{{citation

Substring index: Difference between revisions