Content deleted Content added
Altering some phrasing; Reorganization of sections; Addition of Related Concepts and uses of Trees in Presorting; Correction of Horspool's runtime |
→top: ce, rm orphan tag (Query 38614); ► Wikiproject Orphanage: You can help! ● |
||
(11 intermediate revisions by 9 users not shown) | |||
Line 2:
== Searching ==
Input enhancement when searching has been an essential component of the algorithm world for some time in computer science. The main idea behind this principle is that the efficiency of a search is much faster when the time is taken to create or sort a [[data structure]] of the given input before attempting to search for the element in said data structure.
=== Presorting ===
Line 11:
A simple example of the benefits of presorting can be seen with an algorithm that checks an array for unique elements: If an array of ''n'' elements is given, return true if every element in the array is unique, otherwise return false. The [[pseudocode]] is presented below:
'''algorithm''' uniqueElementSearch(A[0...''n'']) '''is'''
'''for''' ''i'' := 0 '''to''' ''n'' – 1 '''do'''
'''for''' ''j'' := ''i'' + 1 '''to''' ''n'' '''do'''
'''if''' A[''i''] = A[''j''] '''then'''
'''return false'''
'''return true'''
Without a presort, at worst case, this algorithm would require every element to be checked against every other element with two possible outcomes: either there is no duplicate element in the array, or the last two elements in the array are the duplicates. This results in
Now compare this to a similar algorithm that utilizes presorting. This algorithm sorts the inputted array, and then checks each pair of elements for a duplicate. The pseudocode is presented below:
'''algorithm''' presortUniqueElementSearch(A[0...''n'']) '''is'''
sort(A[0...''n''])
'''for''' ''i'' := 0 '''to''' ''n'' – 1 '''do'''
'''if''' A[''i''] = A[''i'' + 1] '''then'''
'''return false'''
'''return true'''
As previously stated, the least efficient part of this algorithm is the sorting of the array, which, if an efficient sort is selected, would run in O(''n'' log ''n''). But after the array is sorted, the array only needs to be traversed once, which would run in O(''n''). This results in
This simple example demonstrates what is capable with an input enhancement technique such as presorting. The algorithm went from [[Time complexity
=== In
Creating data structures to more efficiently search through data is also a form of input enhancement. Placing data into a tree to store and search through inputs is another popular technique. [[Tree (data structure)
The benefits of putting data in a tree are great, especially if the data is being manipulated or repeatedly searched through. Binary search trees are the most simplest, yet most common type of tree for this implementation. The insertion, deletion, and searching of items in a tree are all worst case O(''n''), but are most often executed in O(log ''n''). This makes the repeated searching of elements even quicker for large inputs. There are many different types of binary search trees that work more efficiently and even self-balance upon addition and removal of items, like the AVL tree which has a worst case O(log ''n'') for all searching, inserting, and deletion.
Taking the time to put the inputted data into such a structure will have great speed-ups for repeated searching of elements, as opposed to searching through the data that hasn't enhanced.
== String
[[String matching]]
The [[brute-force search|brute-force]] algorithm for this problem would perform as follows:
When presented with a string of ''n'' characters, often called the key or pattern, the string would be compared to every single character of a longer string ''m'', often called the text. If a matched character occurs, it checks the second character of the key to see if it matches. If it does, the next character is checked and so on until the string matches or the subsequent character
This algorithm is extremely inefficient. The maximum number of check trials would be ''m''-''n''+1 trials, making the big-O efficiency at worst case O(''mn''). On average case, the maximum number of check trials would never be reached and only a few would be executed, resulting in an average time efficiency of O(''m''+''n'').
Because of the necessity of more efficient string matching algorithms, several faster algorithms have been developed, with most of them utilizing the idea of input enhancement. The key is preprocessed to gather information about what to look for in the text and that information is stored in order to refer back to them when necessary. The accessing of this information is constant time and greatly increases the runtime efficiency of the algorithms that use it, most famously the [[
=== Horspool's
As a demonstration of input enhancement in string matching, one should examine a simplified version of the Boyer-Moore algorithm, [[Boyer–Moore–Horspool algorithm|
'''Case 1''':
Line 60:
'''Case 3''':
The third possible case is that the character ''x'' matches with the last character in the key but the other characters
'''Case 4''':
The fourth and last possible case is that character ''x'' matches the key but the other characters
This may seem like it is not more efficient than the brute-force algorithm since it has to check all of the characters on every check. However, this is not the case.
'''algorithm''' shiftTableGenerator(K[0...''n''-1]) '''is'''
'''for''' ''i'' = 0 '''to''' ''s'' – 1 '''do'''
T[''i''] := ''m''
'''for''' ''j'' := 0 '''to''' ''n'' – 2 '''do'''
T[P[''j'']] := ''n'' – 1 – ''j''
'''return''' T
Line 77:
|+ Shift Table for 'POTATO'
! character ''x''
| A || B || C ||
|-
! shift value
Line 83:
|}
After the shift table is constructed in the input enhancement stage, the algorithm lines up the key and starts executing. The algorithm executes until a matching [[substring]] of text ''m'' is found or the key overlaps the last characters of text ''m''. If the algorithm encounters a pair of characters that do not match, it accesses the table for the
'''algorithm''' HorspoolsAlgorithm(K[0...''n''-1]), M[0...''m''-1]) '''is'''
shiftTableGenerator(K[0...''n''-1])
''i'' := ''n'' – 1
'''while''' ''i'' ≤ ''m'' – 1 '''do'''
''k'' := 0
'''while''' ''k'' ≤ ''m'' – 1 and K[''n'' – 1 – ''k''] = M[''i'' – ''k''] '''do'''
''k'' := ''k'' + 1
'''if''' ''k'' = ''m'' '''then'''
'''return''' ''i'' – ''n'' + 1
'''else'''
''i'' = ''i'' + T[M[''i'']]
'''return''' “Key not found”
Although it may not be evident, the worst case runtime efficiency of this algorithm is O(''mn''). Fortunately, on texts that are random, the runtime efficiency is linear, O(''n/m''). This places
== Related
Input enhancement is often used interchangeably with [[precomputation]] and [[Data preprocessing|preprocessing]]. Although they are related, there are several important differences that must be noted.
* Precomputing and input enhancement can sometimes be used synonymously. More specifically, precomputation is the calculation of a given input before anything else is done to the input.
* When speaking about altering inputs, preprocessing is often misused. In computer science, a preprocessor and preprocessing are entirely different. When preprocessing is used in context, the usual intention is to portray the concept of input enhancement, and not that of utilizing a preprocessor. Implementing a preprocessor is the concept in which a program takes an input and processes it into an output to be used by another program entirely. This sounds like input enhancement, but the application of preprocessor applies to the generic program that processes the source input to be outputted in a format that a compiler can read and can then be compiled.▼
▲*When speaking about altering inputs, preprocessing is often misused. In computer science, a preprocessor and preprocessing are entirely different. When preprocessing is used in context, the usual intention is to portray the concept of input enhancement, and not that of utilizing a preprocessor. Implementing a preprocessor is the concept in which a program takes an input and processes it into an output to be used by another program entirely. This sounds like input enhancement, but the application of preprocessor applies to the generic program that processes the source input to be outputted in a format that a compiler can read and can then be compiled.
== References ==
* Levitin, Anany (2012). ''Introduction to The Design & Analysis of Algorithms'' (Third Edition). Pearson. {{ISBN
* Sebesta, Robert W. (2012). ''Concepts of Programming Languages'' (Tenth Edition). Pearson. {{ISBN
▲* Sebesta, Robert W. (2012). ''Concepts of Programming Languages'' (Tenth Edition). Pearson. ISBN 978-0-13-139531-2
[[Category:
|