Input enhancement (computer science): Difference between revisions

Content deleted Content added
m Replace magic links with templates per local RfC and MediaWiki RfC
m WP:GenFixes on, typo(s) fixed: … → ... (3), doesn’t → doesn't (4), Horspool’s → Horspool's (7)
Line 46:
 
The [[brute-force search|brute-force]] algorithm for this problem would perform as follows:
When presented with a string of ''n'' characters, often called the key or pattern, the string would be compared to every single character of a longer string ''m'', often called the text. If a matched character occurs, it checks the second character of the key to see if it matches. If it does, the next character is checked and so on until the string matches or the subsequent character doesn’tdoesn't match and the entire key shifts a single character. This continues until the key is found or until the text is exhausted.
 
This algorithm is extremely inefficient. The maximum number of check trials would be ''m''-''n''+1 trials, making the big-O efficiency at worst case O(''mn''). On average case, the maximum number of check trials would never be reached and only a few would be executed, resulting in an average time efficiency of O(''m''+''n'').
Line 53:
 
===Horspool's Algorithm===
As a demonstration of input enhancement in string matching, one should examine a simplified version of the Boyer-Moore algorithm, [[Boyer–Moore–Horspool algorithm|Horspool’sHorspool's]] algorithm. The algorithm starts at the ''n''th character of the text ''m'' and compares the character. Let’sLet's call this character ''x''. There are 4 possible cases of what can happen next.
 
'''Case 1''':
Line 62:
 
'''Case 3''':
The third possible case is that the character ''x'' matches with the last character in the key but the other characters don’tdon't fully match the key and ''x'' doesn’tdoesn't occur again in the key. If this occurs, the entire key can be shifted the length of the key.
 
'''Case 4''':
The fourth and last possible case is that character ''x'' matches the key but the other characters don’tdon't fully match the key and ''x'' does occur again in the key. If this occurs, the key is shifted to align the rightmost occurrence if the character ''x''.
 
This may seem like it is not more efficient than the brute-force algorithm since it has to check all of the characters on every check. However, this is not the case. Horspool’sHorspool's algorithm utilizes a shift table to store the number of characters the algorithm should shift if it runs into a specific character. The input is precomputed into a table with every possible character that can be encountered in the text. The shift size is computed with two options: one, if the character is in not in the key, then the shift size is ''n'', the length of the key; or two, if the character appears in the key, then its shift value is the distance from the rightmost occurrence of the character in the first ''n''-1 characters in the key. The algorithm for the shift table generator is given the key and an alphabet of possible characters that could appear in the string (K[0...''n''-1]) as input and returns the shift table (T[0...''s''-1]). Pseudocode for the shift table generator and an example of the shift table for the string ‘POTATO’ is displayed below:
 
'''algorithm''' shiftTableGenerator(K[0...''n''-1])
Line 79:
|+ Shift Table for 'POTATO'
! character ''x''
| A || B || C || ... || O || P || ... || T || ... || Z || _
|-
! shift value
Line 85:
|}
 
After the shift table is constructed in the input enhancement stage, the algorithm lines up the key and starts executing. The algorithm executes until a matching [[substring]] of text ''m'' is found or the key overlaps the last characters of text ''m''. If the algorithm encounters a pair of characters that do not match, it accesses the table for the character’scharacter's shift value and shifts accordingly. Horspool’sHorspool's algorithm takes the key (K[0...''n''-1]) and the text (M[0...''m''-1]) and outputs either the index of the matching substring or the string “Key not found” depending on the result. Pseudocode for Horspool’sHorspool's algorithm is presented below:
 
'''algorithm''' HorspoolsAlgorithm(K[0...''n''-1]), M[0...''m''-1])
Line 100:
'''return''' “Key not found”
 
Although it may not be evident, the worst case runtime efficiency of this algorithm is O(''mn''). Fortunately, on texts that are random, the runtime efficiency is linear, O(''n/m''). This places Horspool’sHorspool's algorithm, which utilizes input enhancement, in a much faster class than the brute-force algorithm for this problem.
 
== Related Concepts ==