Aho–Corasick algorithm: Difference between revisions

Content deleted Content added
More explanation of construction
Describe dictionary suffix link
Line 10:
 
The data structure has one node for every prefix of every string in the dictionary. So if (bca) is in the dictionary, then there will be nodes for (bca), (bc), (b), and (). There is a black directed "child" arc from each node to a node whose name is found by appending one character. So there is a black arc from (bc) to (bca). There is a blue directed "suffix" arc from each node to the node that is the longest possible strict suffix of it in the graph. For example, for node (caa), its strict suffixes are (aa) and (a) and (). The longest of these that exists in the graph is (a). So there is a blue arc from (caa) to (a).
 
The "dictionary suffix link" is an additional arc from a node to the first dictionary entry that can be reached from that node by following blue arrows (if any). The graph below doesn't show the dictionary suffix links, but there would be one from (bca) to (a), if those links were shown. The table next to the graph does show the dictionary suffix link.
 
[[File:Aho Corasick Concept.PNG|left]]
Line 47 ⟶ 49:
ending in the root node if nothing's seen before.
 
When the algorithm reaches a node, isit reachedoutputs that is inall the dictionary, then the algorithm will
outputentries that nodeend asat athe "found"current dictionarycharacter word.position in Atthe thatinput point,text. it shouldThis alsois outputdone
by printing every node reached by following the dictionary suffix links, starting
all the other dictionary words that are found along the path from that node
tofrom thethat rootnode, followingand thecontinuing blueuntil arrows.it reaches Thesea arenode allwith theno dictionary wordssuffix link. that
In addition, the node itself is printed, if it is a dictionary entry.
end at the current point in the input text.
 
Execution on input string <code>abccab</code> yields the following steps: