Aho–Corasick algorithm: Difference between revisions

Content deleted Content added
No edit summary
Undid revision 512204188 by 195.235.15.243 (talk)
Line 1:
In [[computer science]], the '''Aho–Corasick string matching algorithm''' is a [[string searching algorithm]] invented by [[Alfred V. Aho]] and [[Margaret J. Corasick]]. It is a kind of dictionary-matching algorithm that locates elements of a finite set of strings (the "dictionary") within an input text. It matches all patterns simultaneously. The [[Computational complexity theory|complexity]] of the algorithm is linear in the length of the patterns plus the length of the searched text plus the number of output matches. Note that because all matches are found, there can be a quadratic number of matches if every substring matches (e.g. dictionary = <code>a</code>, <code>aa</code>, <code>aaa</code>, <code>aaaa</code> and input string is <code>aaaa</code>).
 
Informally, the algorithm constructs a [[finite state machine]] that resembles a [[trie]] with additional links between the various internal nodes. These extra internal links allow fast transitions between failed pattern matches (e.g. a search for <code>cat</code> in a trie that does not contain <code>cat</code>, but contains <code>cart</code>, and thus would fail at the node prefixed by <code>ca</code>), to other branches of the trie that share a common prefix (e.g., in the previous case, a branch for <code>attribute</code> might be the best lateral transition). This allows the automaton to transition between pattern matches without the need for backtracking.