Aho–Corasick algorithm: Difference between revisions

Content deleted Content added
Describe dictionary suffix link
Added dictionary suffix arcs to the diagram, changed text appropriately
Line 9:
The graph below is the Aho–Corasick data structure constructed from the specified dictionary, with each row in the table representing a node in the trie, with the column path indicating the (unique) sequence of characters from the root to the node.
 
The data structure has one node for every prefix of every string in the dictionary. So if (bca) is in the dictionary, then there will be nodes for (bca), (bc), (b), and (). There is a black directed "child" arc from each node to a node whose name is found by appending one character. So there is a black arc from (bc) to (bca). There is a blue directed "suffix" arc from each node to the node that is the longest possible strict suffix of it in the graph. For example, for node (caa), its strict suffixes are (aa) and (a) and (). The longest of these that exists in the graph is (a). So there is a blue arc from (caa) to (a). There is a green "dictionary suffix" arc from each node to the next node in the dictionary that can be reached by following blue arcs. For example, there is a green arc from (bca) to (a) because (a) is the first node in the dictionary (i.e. a white node) that is reached when following the blue arcs to (ca) and then on to (a).
 
[[File:A diagram of the Aho-Corasick string search algorithm.svg|thumb|left]]
The "dictionary suffix link" is an additional arc from a node to the first dictionary entry that can be reached from that node by following blue arrows (if any). The graph below doesn't show the dictionary suffix links, but there would be one from (bca) to (a), if those links were shown. The table next to the graph does show the dictionary suffix link.
 
[[File:Aho Corasick Concept.PNG|left]]
 
{| class="wikitable"