Knuth–Morris–Pratt algorithm: Difference between revisions

Content deleted Content added
Efficiency of the table-building algorithm: The complexity analysis doesn't consider the inner while loop, which makes the analysis useless. Removed until this problem is resolved.
Undid revision 965275578 by 119.18.2.71 (talk)
Line 185:
 
=="Partial match" table (also known as "failure function")==
The goal of the table is to allow the algorithm not to match any character of <code>S</code> more than once. The key observation about the nature of a linear search that allows this to happen is that in having checked some segment of the main string against an ''initial segment'' of the pattern, we know exactly at which places a new potential match which could continue to the current position could begin prior to the current position. In other words, we "pre-search" the pattern itself and compile a list of all possible fallback positions that bypass a maximum of hopeless characters while not sacrificing any potential matches in doing so.
 
We want to be able to look up, for each position in <code>W</code>, the length of the longest possible initial segment of <code>W</code> leading up to (but not including) that position, other than the full segment starting at <code>W[0]</code> that just failed to match; this is how far we have to backtrack in finding the next match. Hence <code>T[i]</code> is exactly the length of the longest possible ''proper'' initial segment of <code>W</code> which is also a segment of the substring ending at <code>W[i - 1]</code>. We use the convention that the empty string has length 0. Since a mismatch at the very start of the pattern is a special case (there is no possibility of backtracking), we set <code>T[0] = -1</code>, as discussed [[#Description of pseudocode for the table-building algorithm|below]].