Revision as of 08:11, 12 December 2024 edit David Eppstein (talk \| contribs) Autopatrolled, Administrators 235,633 edits +source for FM-index ← Previous edit		Revision as of 08:12, 12 December 2024 edit undo David Eppstein (talk \| contribs) Autopatrolled, Administrators 235,633 edits notation unneeded Next edit →
Line 1: {{Short description\|Data structure}} {{more references\|date=December 2021}} In [[computer science]], a '''substring index''' is a [[data structure]] which gives [[substring]] search in a text or text collection in [[sublinear]] time. IfOnce ~~you~~constructed ~~have~~from a document ~~<math>S</math> of length <math>n</math>,~~ or a set of documents ~~<math>D=\{S^1,S^2, \dots, S^d\}</math>~~, a substring index can be used to locate all occurrences of a pattern in time linear or near-linear in the pattern size, with no dependence or only logarithmic dependence on the document size. These data structures typically treat their text and pattern as [[string (computer science)\|strings]] over a fixed alphabet, and search for locations where the pattern occurs as a substring of the text. The alphabet may consist of characters (for instance in [[Unicode]]) but in practical applications for [[text retrieval]] it may be preferable to treat the ([[Stemming\|stemmed]]) words of a document as the elements of the alphabet, because doing this reduces the lengths of both the text and pattern as measured in letters of their alphabet.

Substring index: Difference between revisions