String-searching algorithm: Difference between revisions

Content deleted Content added
Line 88:
[[Knuth–Morris–Pratt algorithm|Knuth–Morris–Pratt]] computes a [[deterministic finite automaton|DFA]] that recognizes inputs with the string to search for as a suffix, [[Boyer–Moore string search algorithm|Boyer–Moore]] starts searching from the end of the needle, so it can usually jump ahead a whole needle-length at each step. Baeza–Yates keeps track of whether the previous ''j'' characters were a prefix of the search string, and is therefore adaptable to [[fuzzy string searching]]. The [[bitap algorithm]] is an application of Baeza–Yates' approach. Alpha Skip Search builds logM sized keys (log base is alphabet size), raising the effective alphabet size to M, for small alphabets or long patterns.
 
On average, Alpha Skip Search is the fastest of these algorithms on small alphabets. The Alpha Skip Search algorithm builds a start position lookup table with logM sized keys of the pattern, in time MlogM, with log base alphabet size. With keys of size logM, the effective alphabet size is M, even for small alphabets. The main search then skips through the text by size M-logM+1, trying to match keys of size logM in the start position lookup table. When a key is found at text position k, a naive pattern match is tried at text position k - key start position. The expected run time is O( (N/M)logM ) even for alphabet size two, and so is much faster than Boyer-Moore on average, for small alphabets. See Combinatorial Pattern Matching 1998 for a more thorough discussion.
 
=== Index methods ===