Input enhancement (computer science): Difference between revisions

Content deleted Content added
m WP:GenFixes on, typo(s) fixed: … → ... (3), doesn’t → doesn't (4), Horspool’s → Horspool's (7)
top: ce, rm orphan tag (Query 38614); ► Wikiproject Orphanage: You can help!
 
(5 intermediate revisions by 3 users not shown)
Line 1:
{{Orphan|date=January 2016}}
 
In [[computer science]], '''input enhancement''' is the principle that processing a given input to a problem and altering it in a specific way will increase [[Time complexity|runtime efficiency]] or [[DSPACE|space efficiency]], or both. The altered input is usually stored and accessed to simplify the problem. By exploiting the structure and properties of the inputs, input enhancement creates various speed-ups in the efficiency of the [[algorithm]].
 
Line 13 ⟶ 11:
A simple example of the benefits of presorting can be seen with an algorithm that checks an array for unique elements: If an array of ''n'' elements is given, return true if every element in the array is unique, otherwise return false. The [[pseudocode]] is presented below:
 
'''algorithm''' uniqueElementSearch(A[0...''n'']) '''is'''
'''for''' ''i'' := 0 '''to''' ''n'' – 1 '''do'''
'''for''' ''j'' := ''i'' + 1 '''to''' ''n'' '''do'''
'''if''' A[''i''] = A[''j''] '''then'''
'''return false'''
'''return true'''
 
Line 24 ⟶ 22:
Now compare this to a similar algorithm that utilizes presorting. This algorithm sorts the inputted array, and then checks each pair of elements for a duplicate. The pseudocode is presented below:
 
'''algorithm''' presortUniqueElementSearch(A[0...''n'']) '''is'''
sort(A[0...''n''])
'''for''' ''i'' := 0 '''to''' ''n'' – 1 '''do'''
'''if''' A[''i''] = A[''i'' + 1] '''then'''
'''return false'''
'''return true'''
 
Line 35 ⟶ 33:
This simple example demonstrates what is capable with an input enhancement technique such as presorting. The algorithm went from [[Time complexity#Polynomial time|quadratic]] runtime to [[Time complexity#Linearithmic time|linearithmic]] runtime which will result in speed-ups for large inputs.
 
=== In Treestrees ===
Creating data structures to more efficiently search through data is also a form of input enhancement. Placing data into a tree to store and search through inputs is another popular technique. [[Tree (data structure)|Trees]] are used throughout computer science and many different types of trees -{{snd}} [[binary search treestree]]s, [[AVL treestree]]s, [[red-blackred–black treestree]]s, and [[2-32–3 tree]]s to name just a small few -{{snd}} have been developed to properly store, access, and manipulate data while maintaining their structure. Trees are a principal data structure for dictionary implementation.
 
The benefits of putting data in a tree are great, especially if the data is being manipulated or repeatedly searched through. Binary search trees are the most simplest, yet most common type of tree for this implementation. The insertion, deletion, and searching of items in a tree are all worst case O(''n''), but are most often executed in O(log ''n''). This makes the repeated searching of elements even quicker for large inputs. There are many different types of binary search trees that work more efficiently and even self-balance upon addition and removal of items, like the AVL tree which has a worst case O(log ''n'') for all searching, inserting, and deletion.
Line 42 ⟶ 40:
Taking the time to put the inputted data into such a structure will have great speed-ups for repeated searching of elements, as opposed to searching through the data that hasn't enhanced.
 
== String Matchingmatching ==
[[String matching]] is a complex issue in the world of [[Computer programming|programming]] now that [[Web search engine|search engines]] are the forefront of the internet and the online world. When given a keyword or a string that needs to be searched among millions upon millions of words, it would take an unbelievable amount of time to match this string character per character. Input enhancement allows an input to be altered to make this process that much faster.
 
Line 50 ⟶ 48:
This algorithm is extremely inefficient. The maximum number of check trials would be ''m''-''n''+1 trials, making the big-O efficiency at worst case O(''mn''). On average case, the maximum number of check trials would never be reached and only a few would be executed, resulting in an average time efficiency of O(''m''+''n'').
 
Because of the necessity of more efficient string matching algorithms, several faster algorithms have been developed, with most of them utilizing the idea of input enhancement. The key is preprocessed to gather information about what to look for in the text and that information is stored in order to refer back to them when necessary. The accessing of this information is constant time and greatly increases the runtime efficiency of the algorithms that use it, most famously the [[Knuth-Morris-PrattKnuth–Morris–Pratt algorithm]] algorithm and the [[Boyer-MooreBoyer–Moore string -search algorithm|Boyer-MooreBoyer–Moore algorithm]] algorithm. These algorithms, for the most part, use the same methods to obtain its efficiency with the main difference being on how the key is composed.
 
=== Horspool's Algorithmalgorithm ===
As a demonstration of input enhancement in string matching, one should examine a simplified version of the Boyer-Moore algorithm, [[Boyer–Moore–Horspool algorithm|Horspool's]] algorithm. The algorithm starts at the ''n''th character of the text ''m'' and compares the character. Let's call this character ''x''. There are 4 possible cases of what can happen next.
 
Line 67 ⟶ 65:
The fourth and last possible case is that character ''x'' matches the key but the other characters don't fully match the key and ''x'' does occur again in the key. If this occurs, the key is shifted to align the rightmost occurrence if the character ''x''.
 
This may seem like it is not more efficient than the brute-force algorithm since it has to check all of the characters on every check. However, this is not the case. Horspool's algorithm utilizes a shift table to store the number of characters the algorithm should shift if it runs into a specific character. The input is precomputed into a table with every possible character that can be encountered in the text. The shift size is computed with two options: one, if the character is in not in the key, then the shift size is ''n'', the length of the key; or two, if the character appears in the key, then its shift value is the distance from the rightmost occurrence of the character in the first ''n''-1 characters in the key. The algorithm for the shift table generator is given the key and an alphabet of possible characters that could appear in the string (K[0...''n''-1]) as input and returns the shift table (T[0...''s''-1]). Pseudocode for the shift table generator and an example of the shift table for the string ‘POTATO’ is displayed below:
 
'''algorithm''' shiftTableGenerator(K[0...''n''-1]) '''is'''
'''for''' ''i'' = 0 '''to''' ''s'' – 1 '''do'''
T[''i''] := ''m''
'''for''' ''j'' := 0 '''to''' ''n'' – 2 '''do'''
T[P[''j'']] := ''n'' – 1 – ''j''
'''return''' T
 
Line 87 ⟶ 85:
After the shift table is constructed in the input enhancement stage, the algorithm lines up the key and starts executing. The algorithm executes until a matching [[substring]] of text ''m'' is found or the key overlaps the last characters of text ''m''. If the algorithm encounters a pair of characters that do not match, it accesses the table for the character's shift value and shifts accordingly. Horspool's algorithm takes the key (K[0...''n''-1]) and the text (M[0...''m''-1]) and outputs either the index of the matching substring or the string “Key not found” depending on the result. Pseudocode for Horspool's algorithm is presented below:
 
'''algorithm''' HorspoolsAlgorithm(K[0...''n''-1]), M[0...''m''-1]) '''is'''
shiftTableGenerator(K[0...''n''-1])
''i'' := ''n'' – 1
'''while''' ''i'' ≤ ''m'' – 1 '''do'''
''k'' := 0
'''while''' ''k'' ≤ ''m'' – 1 and K[''n'' – 1 – ''k''] = M[''i'' – ''k''] '''do'''
''k'' := ''k'' + 1
'''if''' ''k'' = ''m'' '''then'''
'''return''' ''i'' – ''n'' + 1
'''else'''
''i'' = ''i'' + T[M[''i'']]
'''return''' “Key not found”
 
Although it may not be evident, the worst case runtime efficiency of this algorithm is O(''mn''). Fortunately, on texts that are random, the runtime efficiency is linear, O(''n/m''). This places Horspool's algorithm, which utilizes input enhancement, in a much faster class than the brute-force algorithm for this problem.
 
== Related Conceptsconcepts ==
Input enhancement is often used interchangeably with [[precomputation]] and [[Data preprocessing|preprocessing]]. Although they are related, there are several important differences that must be noted.
 
* Precomputing and input enhancement can sometimes be used synonymously. More specifically, precomputation is the calculation of a given input before anything else is done to the input. Oftentimes a table is generated to be looked back on during the actual execution of the algorithm. Input enhancement that calculates values and assigns them to elements of the input can be classified as precomputation, but the similarities stop there. There are sections of input enhancement that do not utilize precomputing and the terms should not be mutually used.
* When speaking about altering inputs, preprocessing is often misused. In computer science, a preprocessor and preprocessing are entirely different. When preprocessing is used in context, the usual intention is to portray the concept of input enhancement, and not that of utilizing a preprocessor. Implementing a preprocessor is the concept in which a program takes an input and processes it into an output to be used by another program entirely. This sounds like input enhancement, but the application of preprocessor applies to the generic program that processes the source input to be outputted in a format that a compiler can read and can then be compiled.
 
== References ==