Search algorithm: Difference between revisions

Content deleted Content added
top: - {{context|date=December 2014}}. Dated tag, and the article provides sufficient context
OAbot (talk | contribs)
m Open access bot: url-access=subscription updated in citation with #oabot.
 
(141 intermediate revisions by 91 users not shown)
Line 1:
{{short description|Any algorithm which solves the search problem}}
{{Multiple issues|
{{Expert-subject|computer science|talk=Calling for expert attention|reason=longstanding subpar state of article structure and content, which is currently list based, dated, and non-encyclopedic, and without external sourcing |date=December 2014}}
{{specific|date=December 2014}}
{{RefimproveMore citations needed|date=April 2016}}
}}
[[File:Hash table 3 1 1 0 1 0 0 SP.svg|thumb|upright=1.2|Visual representation of a [[hash table]], a [[data structure]] that allows for fast retrieval of information.]]
In [[computer science]], a '''search algorithm''' is an [[algorithm]] thatdesigned retrievesto solve a [[search problem]]. Search algorithms work to retrieve information stored within someparticular [[data structure]], or calculated in the [[Feasible region|search space]] of a [[problem ___domain]]. Data structures can include linked lists, arrays,with search trees, hash tables,[[Continuous or variousdiscrete othervariable|either storagediscrete methods.or Thecontinuous appropriate search algorithm often depends on the data structure being searchedvalues]]. Searching also encompasses algorithms that query the data structure, such as the SQL SELECT command.{{Sfn|Beame|Fich|2001|p=39}}''{{Sfn|Knuth|1998|loc=§6.5 ("Retrieval on Secondary Keys")}}''
 
Although [[Search engine (computing)|search engines]] use search algorithms, they belong to the study of [[information retrieval]], not algorithmics.
Search algorithms can be classified based on their mechanism of searching. [[Linear search]] algorithms check every record for the one associated with a target key in a linear fashion.[[Binary search algorithm|{{Sfn|Knuth|1998|loc=§6.1 ("Sequential Searching")}}]][[Search algorithm#cite note-FOOTNOTEKnuth1998.C2.A76.1 .28.22Sequential Searching.22.29-3|<span class="mw-reflink-text">[3]</span>]][[Search algorithm#cite note-FOOTNOTEKnuth1998.C2.A76.1 .28.22Sequential Searching.22.29-3|<span class="mw-reflink-text">[3]</span>]][[#cite_note-FOOTNOTEKnuth1998.C2.A76.1_(.22Sequential_Searching.22)-4|<span class="mw-reflink-text"><nowiki>[4]</nowiki></span>]] [[Binary search algorithm|Binary, or half interval searches]], repeatedly target the center of the search structure and divide the search space in half. Comparison search algorithms improve on linear searching by successively eliminating records based on comparisons of the keys until the target record is found, and can be applied on data structures with a defined order.{{Sfn|Knuth|1998|loc=§6.2 ("Searching by Comparison of Keys")}} Digital search algorithms work based on the properties of digits in data structures that use numerical keys.{{Sfn|Knuth|1998|loc=§6.3 (Digital Searching)}} Finally, [[Hash table|hashing]] directly maps keys to records based on a [[hash function]].{{Sfn|Knuth|1998|loc=§6.4, (Hashing)}} Searches outside of a linear search require that the data be sorted in some way.
 
The appropriate search algorithm to use often depends on the data structure being searched, and may also include prior knowledge about the data. Search algorithms can be made faster or more efficient by specially constructed database structures, such as [[search tree]]s, [[hash map]]s, and [[database index]]es.{{Sfn|Beame|Fich|2002|p=39}}{{Sfn|Knuth|1998|loc=§6.5 ("Retrieval on Secondary Keys")}}
Search functions are also evaluated on the basis of their complexity, or maximum theoretical run time. Binary search functions, for example, have a maximum complexity of O(log(n)), or logarithmic time. This means that the maximum number of operations needed to find the search target is a logarithmic function of the size of the search space.
 
Search algorithms can be classified based on their mechanism of searching into three types of algorithms: linear, binary, and hashing. [[Linear search]] algorithms check every record for the one associated with a target key in a linear fashion.[[Binary search algorithm|{{Sfn|Knuth|1998|loc=§6.1 ("Sequential Searching")}}]][[Search algorithm#cite note-FOOTNOTEKnuth1998.C2.A76.1 .28.22Sequential Searching.22.29-3|<span class="mw-reflink-text">[3]</span>]][[Search algorithm#cite note-FOOTNOTEKnuth1998.C2.A76.1 .28.22Sequential Searching.22.29-3|<span class="mw-reflink-text">[3]</span>]][[#cite_note-FOOTNOTEKnuth1998.C2.A76.1_(.22Sequential_Searching.22)-4|<span class="mw-reflink-text"><nowiki>[4]</nowiki></span>]] [[Binary search algorithm|Binary, or half -interval, searches]], repeatedly target the center of the search structure and divide the search space in half. Comparison search algorithms improve on linear searching by successively eliminating records based on comparisons of the keys until the target record is found, and can be applied on data structures with a defined order.{{Sfn|Knuth|1998|loc=§6.2 ("Searching by Comparison of Keys")}} Digital search algorithms work based on the properties of digits in data structures thatby useusing numerical keys.{{Sfn|Knuth|1998|loc=§6.3 (Digital Searching)}} Finally, [[Hash table|hashing]] directly maps keys to records based on a [[hash function]].{{Sfn|Knuth|1998|loc=§6.4, (Hashing)}} Searches outside of a linear search require that the data be sorted in some way.
 
Search functionsAlgorithms are alsooften evaluated onby thetheir basis of their[[computational complexity]], or maximum theoretical run time. Binary search functions, for example, have a maximum complexity of {{math|''O''(log( ''n''))}}, or logarithmic time. ThisIn meanssimple thatterms, the maximum number of operations needed to find the search target is a logarithmic function of the size of the search space.
 
== Applications of search algorithms ==
Specific applications of search algorithms include:
 
*Problems in [[combinatorial optimization]], such as:
** The [[vehicle routing problem]], a form of [[shortest path problem]]
** The [[knapsack problem]]: Given a set of items, each with a weight and a value, determine the number of each item to include in a collection so that the total weight is less than or equal to a given limit and the total value is as large as possible.
** The [[Linearnurse searchscheduling problem]]
* Problems in [[constraint satisfaction]], such as:
** The [[map coloring problem]]
** Filling in a [[sudoku]] or [[crossword puzzle]]
* In [[game theory]] and especially [[combinatorial game theory]], choosing the best move to make next (such as with the [[minmax]] algorithm)
* Finding a combination or password from the whole set of possibilities
* [[Factorization|Factoring]] an integer (an important problem in [[cryptography]])
* Search engine optimization (SEO) and content optimization for web crawlers
* Optimizing an industrial process, such as a [[chemical reaction]], by changing the parameters of the process (like temperature, pressure, and pH)
* Retrieving a record from a [[database]]
* Finding the maximum or minimum value in a [[List (abstract data type)|list]] or [[Array data structure|array]]
* Checking to see if a given value is present in a set of values
 
==Classes==
 
===For virtual search spaces===
{{see also|Solver}}
Algorithms for searching virtual spaces are used in the constraint satisfaction problem, where the goal is to find a set of value assignments to certain variables that will satisfy specific mathematical [[equation]]s and [[inequation]]s / equalities. They are also used when the goal is to find a variable assignment that will [[discrete optimization|maximize or minimize]] a certain function of those variables. Algorithms for these problems include the basic [[brute-force search]] (also called "naïve" or "uninformed" search), and a variety of [[heuristic function|heuristic]]s that try to exploit partial knowledge about the structure of this space, such as linear relaxation, constraint generation, and [[Local consistency|constraint propagation]].
 
Algorithms for searching virtual spaces are used in the [[constraint satisfaction problem]], where the goal is to find a set of value assignments to certain variables that will satisfy specific mathematical [[equation]]s and [[inequation]]s / equalities. They are also used when the goal is to find a variable assignment that will [[discrete optimization|maximize or minimize]] a certain function of those variables. Algorithms for these problems include the basic [[brute-force search]] (also called "naïve" or "uninformed" search), and a variety of [[heuristic function|heuristic]]s that try to exploit partial knowledge about the structure of this space, such as linear relaxation, constraint generation, and [[Local consistency|constraint propagation]].
An important subclass are the [[Local search (optimization)|local search]] methods, that view the elements of the search space as the [[vertex (graph theory)|vertices]] of a graph, with edges defined by a set of heuristics applicable to the case; and scan the space by moving from item to item along the edges, for example according to the [[gradient descent|steepest descent]] or [[best-first search|best-first]] criterion, or in a [[Stochastic optimization|stochastic search]]. This category includes a great variety of general [[metaheuristic]] methods, such as [[simulated annealing]], [[tabu search]], A-teams, and [[genetic programming]], that combine arbitrary heuristics in specific ways.
 
An important subclass are the [[Local search (optimization)|local search]] methods, that view the elements of the search space as the [[vertex (graph theory)|vertices]] of a graph, with edges defined by a set of heuristics applicable to the case; and scan the space by moving from item to item along the edges, for example according to the [[gradient descent|steepest descent]] or [[best-first search|best-first]] criterion, or in a [[Stochastic optimization|stochastic search]]. This category includes a great variety of general [[metaheuristic]] methods, such as [[simulated annealing]], [[tabu search]], [[A-teams]] <ref>{{Cite journal |last=Talukdar |first=Sarosh |last2=Baerentzen |first2=Lars |last3=Gove |first3=Andrew |last4=De Souza |first4=Pedro |date=1998-12-01 |title=Asynchronous Teams: Cooperation Schemes for Autonomous Agents |url=https://doi.org/10.1023/A:1009669824615 |journal=Journal of Heuristics |language=en |volume=4 |issue=4 |pages=295–321 |doi=10.1023/A:1009669824615 |issn=1572-9397|url-access=subscription }}</ref>, and [[genetic programming]], that combine arbitrary heuristics in specific ways. The opposite of local search would be global search methods. This method is applicable when the search space is not limited and all aspects of the given network are available to the entity running the search algorithm.<ref>{{Cite journal|last1=Hunter|first1=A.H.|last2=Pippenger|first2=Nicholas|date=4 July 2013|title=Local versus global search in channel graphs|journal=Networks: An International Journey|arxiv=1004.2526}}</ref>
 
This class also includes various [[Tree traversal|tree search algorithm]]s, that view the elements as vertices of a [[tree (graph theory)|tree]], and traverse that tree in some special order. Examples of the latter include the exhaustive methods such as [[depth-first search]] and [[breadth-first search]], as well as various heuristic-based [[Pruning (decision trees)|search tree pruning]] methods such as [[backtracking]] and [[branch and bound]]. Unlike general metaheuristics, which at best work only in a probabilistic sense, many of these tree-search methods are guaranteed to find the exact or optimal solution, if given enough time. This is called "[[Completeness (logic)|completeness]]".
 
Another important sub-class consists of algorithms for exploring the [[game tree]] of multiple-player games, such as [[chess]] or [[backgammon]], whose nodes consist of all possible game situations that could result from the current situation. The goal in these problems is to find the move that provides the best chance of a win, taking into account all possible moves of the opponent(s). Similar problems occur when humans or machines have to make successive decisions whose outcomes are not entirely under one's control, such as in [[robot]] guidance or in [[marketing]], [[finance|financial]], or [[military]] strategy planning. This kind of problem — [[combinatorial search]] — has been extensively studied in the context of [[artificial intelligence]]. Examples of algorithms for this class are the [[Minimax|minimax algorithm]], [[alpha–beta pruning]], *and Informationalthe [[A* search <ref>{{citealgorithm|A* algorithm]] and its papervariants.
|url= http://www.eng.tau.ac.il/~bengal/GTA.pdf
|title=A Group-Testing Algorithm with Online Informational Learning
|author= Kagan E. and Ben-Gal I.
|publisher= IIE Transactions, 46:2, 164-184,
|year=2014
}}</ref> and the [[A* search algorithm|A* algorithm]].
 
===For sub-structures of a given structure===
The name "combinatorial search" is generally used for algorithms that look for a specific sub-structure of a given [[Discrete mathematics|discrete structure]], such as a graph, a [[string (computer science)|string]], a finite [[group (mathematics)|group]], and so on. The term [[combinatorial optimization]] is typically used when the goal is to find a sub-structure with a maximum (or minimum) value of some parameter. (Since the sub-structure is usually represented in the computer by a set of integer variables with constraints, these problems can be viewed as special cases of constraint satisfaction or discrete optimization; but they are usually formulated and solved in a more abstract setting where the internal representation is not explicitly mentioned.)
 
An important and extensively studied subclass are the [[List of algorithms#Graph algorithms|graph algorithm]]s, in particular [[graph traversal]] algorithms, for finding specific sub-structures in a given graph — such as [[Glossary of graph theory#Subgraphs|subgraphs]], [[path (graph theory)|paths]], circuits, and so on. Examples include [[Dijkstra's algorithm]], [[Kruskal's algorithm]], the [[nearest neighbour algorithm]], and [[Prim's algorithm]].
 
Another important subclass of this category are the [[string searching algorithm]]s, that search for patterns within strings. Two famous examples are the [[Boyer–Moore string -search algorithm|Boyer–Moore]] and [[Knuth–Morris–Pratt algorithm]]s, and several algorithms based on the [[suffix tree]] data structure.
 
===Search for the maximum of a function===
Line 39 ⟶ 57:
 
===For quantum computers===
There are also search methods designed for [[Quantum computing|quantum computer]]s, like [[Grover's algorithm]], that are theoretically faster than linear or brute-force search even without the help of data structures or heuristics. While the ideas and applications behind quantum computers are still entirely theoretical, studies have been conducted with algorithms like Grover's that accurately replicate the hypothetical physical versions of quantum computing systems.<ref>{{Cite journal|last1=López|first1=G V|last2=Gorin|first2=T|last3=Lara|first3=L|date=26 February 2008|title=Simulation of Grover's quantum search algorithm in an Ising-nuclear-spin-chain quantum computer with first- and second-nearest-neighbour couplings|journal=Journal of Physics B: Atomic, Molecular and Optical Physics|volume=41|issue=5|page=055504|doi=10.1088/0953-4075/41/5/055504|arxiv=0710.3196|bibcode=2008JPhB...41e5504L|s2cid=18796310}}</ref>
 
==See also==
 
*{{annotated [[link|Backward induction]]}}
* [[{{annotated link|Content-addressable memory]]}} hardware
* [[{{annotated link|Dual-phase evolution]]}}
* {{annotated link|Linear search problem}}
* [[{{annotated link|No free lunch in search and optimization]]}}
* [[{{annotated link|Recommender system]]s}}, also use statistical methods to rank results in very large data sets
* [[{{annotated link|Search engine (computing)]]}}
* [[{{annotated link|Search game]]}}
* [[{{annotated link|Selection algorithm]]}}
* {{annotated link|Solver}}
* [[{{annotated link|Sorting algorithm]]s}}, necessary for executing certain search algorithms
* [[{{annotated link|Web search engine]]}}
Categories:
* [[:Category:Search algorithms]]
* [[Backward induction]]
* [[Content-addressable memory]] hardware
* [[Dual-phase evolution]]
* [[Linear search problem]]
* [[No free lunch in search and optimization]]
* [[Recommender system]]s also use statistical methods to rank results in very large data sets
* [[Search engine (computing)]]
* [[Search game]]
* [[Selection algorithm]]
* [[Solver]]
* [[Sorting algorithm]]s necessary for executing certain search algorithms
* [[Web search engine]]
 
==References==
Line 62 ⟶ 82:
===Bibliography===
====Books====
{{sfn whitelist|CITEREFKnuth1998}}
*{{TAOCP|volume=3|edition=2|harv=y}}
 
====Articles====
*{{Citecite journal|lastlast1=Beame|firstfirst1=Paul|last2=Fich|first2=Faith E.|dateauthor2-link=2002-08-01Faith Ellen|title = Optimal Bounds for the Predecessor Problem and Related Problems|url=http://www.sciencedirect.com/science/article/pii/S0022000002918222|journal=[[Journal of Computer and System Sciences]]|volume=65|issue=1|date=August 2002|pages=38–72|doi=10.1006/jcss.2002.1822|s2cid=1991980 |doi-access=free|ref=harv}}
*{{Cite journal|last1=Schmittou|first1=Thomas|last2=Schmittou|first2=Faith E.|date=2002-08-01|title=Optimal Bounds for the Predecessor Problem and Related Problems|journal=Journal of Computer and System Sciences|volume=65|issue=1|pages=38–72|doi=10.1006/jcss.2002.1822|doi-access=free |ref=none}}
 
==External links==
{{Refbegin}}
*[[Wikiversity:Uninformed Search Project|Uninformed Search Project]] at the [[Wikiversity]].
*[http://sites.google.com/site/hantarto/quantum-computing/unsorted Unsorted Data Searching Using Modulated Database].
{{Refend}}
 
{{Algorithmic paradigms}}
 
[[Category:Internet search algorithms|Web search algorithms]]
[[Category:Ranking functions|ranking algorithms]]
[[Category:Search algorithms| ]]