Search algorithm: Difference between revisions

Content deleted Content added
OAbot (talk | contribs)
m Open access bot: url-access=subscription updated in citation with #oabot.
 
(528 intermediate revisions by more than 100 users not shown)
Line 1:
{{short description|Any algorithm which solves the search problem}}
In [[computer science]], a '''search algorithm''', broadly speaking, is an [[algorithm]] that takes a problem as [[input]] and returns a solution to the problem. Most of the algorithms studied by computer scientists that solve problems are kinds of search algorithms. The set of all possible solutions to a problem is called the [[search space]]. [[Brute-force search]] or "naïve"/uninformed search algorithms use the simplest, most intuitive method of searching through the search space, whereas informed search algorithms use [[heuristics (computer science)|heuristics]] to apply knowledge about the structure of the [[search space]] to try to reduce the amount of time spent searching.
{{Multiple issues|
{{specific|date=December 2014}}
{{More citations needed|date=April 2016}}
}}
[[File:Hash table 3 1 1 0 1 0 0 SP.svg|thumb|upright=1.2|Visual representation of a [[hash table]], a [[data structure]] that allows for fast retrieval of information]]
In [[computer science]], a '''search algorithm''' is an [[algorithm]] designed to solve a [[search problem]]. Search algorithms work to retrieve information stored within particular [[data structure]], or calculated in the [[Feasible region|search space]] of a problem ___domain, with [[Continuous or discrete variable|either discrete or continuous values]].
 
Although [[Search engine (computing)|search engines]] use search algorithms, they belong to the study of [[information retrieval]], not algorithmics.
== Uninformed search ==
An uninformed search algorithm is one that does not take into account the specific nature of the problem. As such, they can be implemented in general, and then the same [[implementation]] can be used in a wide range of problems thanks to [[Abstraction (computer science)|abstraction]]. The drawback is that most [[search space]]s are extremely large, and an uninformed search (especially of a tree) will only take a reasonable amount of time for small examples. As such, to speed up the process, sometimes only an informed search will do.
 
The appropriate search algorithm to use often depends on the data structure being searched, and may also include prior knowledge about the data. Search algorithms can be made faster or more efficient by specially constructed database structures, such as [[search tree]]s, [[hash map]]s, and [[database index]]es.{{Sfn|Beame|Fich|2002|p=39}}{{Sfn|Knuth|1998|loc=§6.5 ("Retrieval on Secondary Keys")}}
=== List search ===
List search algorithms are perhaps the most basic kind of search algorithm. The goal is to find one element of a set by some key (perhaps containing other information related to the key). As this is a common problem in [[computer science]], the [[computational complexity]] of these algorithms has been well studied. The simplest such algorithm is [[linear search]], which simply examines each element of the list in order. It has expensive [[big O notation|O]](n) running time, where ''n'' is the number of items in the list, but can be used directly on any unprocessed list. A more sophisticated list search algorithm is [[binary search]]; it runs in [[big O notation|O]](log ''n'') time. This is significantly better than [[linear search]] for large lists of data, but it requires that the list be sorted before searching (see [[sort algorithm]]) and also be [[random access]]. [[Interpolation search]] is better than binary search for very large sorted lists with fairly even distributions. [[Grover's algorithm]] is a [[quantum computer|quantum algorithm]] that offers quadratic speedup over the classical linear search for unsorted lists.
 
Search algorithms can be classified based on their mechanism of searching into three types of algorithms: linear, binary, and hashing. [[Linear search]] algorithms check every record for the one associated with a target key in a linear fashion.{{Sfn|Knuth|1998|loc=§6.1 ("Sequential Searching")}} [[Binary search algorithm|Binary, or half-interval, searches]] repeatedly target the center of the search structure and divide the search space in half. Comparison search algorithms improve on linear searching by successively eliminating records based on comparisons of the keys until the target record is found, and can be applied on data structures with a defined order.{{Sfn|Knuth|1998|loc=§6.2 ("Searching by Comparison of Keys")}} Digital search algorithms work based on the properties of digits in data structures by using numerical keys.{{Sfn|Knuth|1998|loc=§6.3 (Digital Searching)}} Finally, [[Hash table|hashing]] directly maps keys to records based on a [[hash function]].{{Sfn|Knuth|1998|loc=§6.4, (Hashing)}}
[[Hash table]]s are also used for list search, requiring only constant time for search in the average case, but more space overhead and terrible O(''n'') worst-case search time. Another search based on specialized data structures uses [[self-balancing binary search tree]]s and requires O(log ''n'') time to search; these can be seen as extending the main ideas of binary search to allow fast insertion and removal. See [[associative array]] for more discussion of list search data structures.
 
Algorithms are often evaluated by their [[computational complexity]], or maximum theoretical run time. Binary search functions, for example, have a maximum complexity of {{math|''O''(log ''n'')}}, or logarithmic time. In simple terms, the maximum number of operations needed to find the search target is a logarithmic function of the size of the search space.
Most list search algorithms, such as linear search, binary search, and self-balancing binary search trees, can be extended with little additional cost to find all values less than or greater than a given key, an operation called ''range search''. The glaring exception is hash tables, which cannot perform such a search efficiently.
 
=== TreeApplications of search =algorithms ==
Specific applications of search algorithms include:
[[Tree search algorithm]]s are the heart of searching techniques. These search nodes of [[tree (graph theory)|tree]]s, whether that tree is explicit or implicit (generated on the go). The basic principle is that a [[node (computer science)|node]] is taken from a [[data structure]], its successors examined and added to the data structure. By manipulating the data structure, the tree is explored in different orders for instance level by level ([[Breadth-first search]]) or reaching a [[leaf node]] first and backtracking ([[Depth-first search]]). Other examples of tree-searches include [[Iterative deepening depth-first search|Iterative-deepening search]], [[Depth-limited search]], [[Bidirectional search]] and [[Uniform-cost search]].
 
*Problems in [[combinatorial optimization]], such as:
===Graph search ===
** The [[vehicle routing problem]], a form of [[shortest path problem]]
Many of the problems in [[graph theory]] can be solved using search algorithms, such as [[Dijkstra's algorithm]], [[Kruskal's algorithm]], the [[nearest neighbour algorithm]], and [[Prim's algorithm]]. These can be seen as extensions of the tree-search algorithms.
** The [[knapsack problem]]: Given a set of items, each with a weight and a value, determine the number of each item to include in a collection so that the total weight is less than or equal to a given limit and the total value is as large as possible.
** The [[nurse scheduling problem]]
* Problems in [[constraint satisfaction]], such as:
** The [[map coloring problem]]
** Filling in a [[sudoku]] or [[crossword puzzle]]
* In [[game theory]] and especially [[combinatorial game theory]], choosing the best move to make next (such as with the [[minmax]] algorithm)
* Finding a combination or password from the whole set of possibilities
* [[Factorization|Factoring]] an integer (an important problem in [[cryptography]])
* Search engine optimization (SEO) and content optimization for web crawlers
* Optimizing an industrial process, such as a [[chemical reaction]], by changing the parameters of the process (like temperature, pressure, and pH)
* Retrieving a record from a [[database]]
* Finding the maximum or minimum value in a [[List (abstract data type)|list]] or [[Array data structure|array]]
* Checking to see if a given value is present in a set of values
 
==Classes==
== Informed search ==
In an informed search, a [[heuristic]] that is specific to the problem is used as a guide. A good heuristic will make an informed search dramatically out-perform any uninformed search.
 
===For virtual search spaces===
There are few prominent informed list-search algorithms. A possible member of that category is a hash table with a hashing function that is a heuristic based on the problem at hand. Most informed search algorithms explore trees. These include [[Best-first search]], and [[A Star Search Algorithm|A*]]. Like the uninformed algorithms, they can be extended to work for graphs as well.
{{see also|Solver}}
 
Algorithms for searching virtual spaces are used in the [[constraint satisfaction problem]], where the goal is to find a set of value assignments to certain variables that will satisfy specific mathematical [[equation]]s and [[inequation]]s / equalities. They are also used when the goal is to find a variable assignment that will [[discrete optimization|maximize or minimize]] a certain function of those variables. Algorithms for these problems include the basic [[brute-force search]] (also called "naïve" or "uninformed" search), and a variety of [[heuristic function|heuristic]]s that try to exploit partial knowledge about the structure of this space, such as linear relaxation, constraint generation, and [[Local consistency|constraint propagation]].
== Adversarial search ==
Game-playing computer programs and other forms of [[artificial intelligence]] like [[machine planning]] often use search algorithms like the [[Minimax algorithm]], [[search tree pruning]], and [[alpha-beta pruning]].
 
An important subclass are the [[Local search (optimization)|local search]] methods, that view the elements of the search space as the [[vertex (graph theory)|vertices]] of a graph, with edges defined by a set of heuristics applicable to the case; and scan the space by moving from item to item along the edges, for example according to the [[gradient descent|steepest descent]] or [[best-first search|best-first]] criterion, or in a [[Stochastic optimization|stochastic search]]. This category includes a great variety of general [[metaheuristic]] methods, such as [[simulated annealing]], [[tabu search]], [[A-teams]] <ref>{{Cite journal |last=Talukdar |first=Sarosh |last2=Baerentzen |first2=Lars |last3=Gove |first3=Andrew |last4=De Souza |first4=Pedro |date=1998-12-01 |title=Asynchronous Teams: Cooperation Schemes for Autonomous Agents |url=https://doi.org/10.1023/A:1009669824615 |journal=Journal of Heuristics |language=en |volume=4 |issue=4 |pages=295–321 |doi=10.1023/A:1009669824615 |issn=1572-9397|url-access=subscription }}</ref>, and [[genetic programming]], that combine arbitrary heuristics in specific ways. The opposite of local search would be global search methods. This method is applicable when the search space is not limited and all aspects of the given network are available to the entity running the search algorithm.<ref>{{Cite journal|last1=Hunter|first1=A.H.|last2=Pippenger|first2=Nicholas|date=4 July 2013|title=Local versus global search in channel graphs|journal=Networks: An International Journey|arxiv=1004.2526}}</ref>
== Constraint satisfaction ==
This is a type of search which solves [[constraint satisfaction problem]]s where, rather than looking for a path, the solution is simply a set of values assigned to a set of variables. Because the variables can be processed in any order, the usual tree search algorithms are too inefficient. Methods of solving constraint problems include [[combinatorial search]] and [[backtracking]], both of which take advantage of the freedom associated with constraint problems.
 
This class also includes various [[Tree traversal|tree search algorithm]]s, that view the elements as vertices of a [[tree (graph theory)|tree]], and traverse that tree in some special order. Examples of the latter include the exhaustive methods such as [[depth-first search]] and [[breadth-first search]], as well as various heuristic-based [[Pruning (decision trees)|search tree pruning]] methods such as [[backtracking]] and [[branch and bound]]. Unlike general metaheuristics, which at best work only in a probabilistic sense, many of these tree-search methods are guaranteed to find the exact or optimal solution, if given enough time. This is called "[[Completeness (logic)|completeness]]".
== Other types ==
* [[String searching algorithm]]s search for patterns within [[string]]s; one popular data structure that makes this more efficient is the [[suffix tree]].
* [[Genetic algorithms]] use ideas from [[evolution]] as heuristics for reducing the search space.
* [[Simulated annealing]] is a [[probabilistic]] search algorithm.
* [[taboo search]] is a technique to avoid discrete searches getting stuck in local minima.
 
Another important sub-class consists of algorithms for exploring the [[game tree]] of multiple-player games, such as [[chess]] or [[backgammon]], whose nodes consist of all possible game situations that could result from the current situation. The goal in these problems is to find the move that provides the best chance of a win, taking into account all possible moves of the opponent(s). Similar problems occur when humans or machines have to make successive decisions whose outcomes are not entirely under one's control, such as in [[robot]] guidance or in [[marketing]], [[finance|financial]], or [[military]] strategy planning. This kind of problem — [[combinatorial search]] — has been extensively studied in the context of [[artificial intelligence]]. Examples of algorithms for this class are the [[Minimax|minimax algorithm]], [[alpha–beta pruning]], and the [[A* search algorithm|A* algorithm]] and its variants.
== Related articles ==
 
===For sub-structures of a given structure===
* [[No-Free-Lunch theorems]] relates to the generality of search algorithms and the need for ___domain knowledge.
* [[Secretary problem]] is an online (ie sequentially presented) search problem with imperfect information, and a statistically optimal strategy.
 
An important and extensively studied subclass are the [[List of algorithms#Graph algorithms|graph algorithm]]s, in particular [[graph traversal]] algorithms, for finding specific sub-structures in a given graph — such as [[Glossary of graph theory#Subgraphs|subgraphs]], [[path (graph theory)|paths]], circuits, and so on. Examples include [[Dijkstra's algorithm]], [[Kruskal's algorithm]], the [[nearest neighbour algorithm]], and [[Prim's algorithm]].
[[de:Suchverfahren]]
 
[[it:Algoritmo di ricerca]]
Another important subclass of this category are the [[string searching algorithm]]s, that search for patterns within strings. Two famous examples are the [[Boyer–Moore string-search algorithm|Boyer–Moore]] and [[Knuth–Morris–Pratt algorithm]]s, and several algorithms based on the [[suffix tree]] data structure.
[[ja:&#26908;&#32034;]]
 
[[fi:Hakualgoritmi]]
===Search for the maximum of a function===
[[Category:Search_algorithms]]
In 1953, American [[statistics|statistician]] [[Jack Kiefer (statistician)|Jack Kiefer]] devised [[Fibonacci search technique|Fibonacci search]] which can be used to find the maximum of a unimodal function and has many other applications in computer science.
 
===For quantum computers===
There are also search methods designed for [[Quantum computing|quantum computer]]s, like [[Grover's algorithm]], that are theoretically faster than linear or brute-force search even without the help of data structures or heuristics. While the ideas and applications behind quantum computers are still entirely theoretical, studies have been conducted with algorithms like Grover's that accurately replicate the hypothetical physical versions of quantum computing systems.<ref>{{Cite journal|last1=López|first1=G V|last2=Gorin|first2=T|last3=Lara|first3=L|date=26 February 2008|title=Simulation of Grover's quantum search algorithm in an Ising-nuclear-spin-chain quantum computer with first- and second-nearest-neighbour couplings|journal=Journal of Physics B: Atomic, Molecular and Optical Physics|volume=41|issue=5|page=055504|doi=10.1088/0953-4075/41/5/055504|arxiv=0710.3196|bibcode=2008JPhB...41e5504L|s2cid=18796310}}</ref>
 
==See also==
 
*{{annotated link|Backward induction}}
* {{annotated link|Content-addressable memory}} hardware
* {{annotated link|Dual-phase evolution}}
* {{annotated link|Linear search problem}}
* {{annotated link|No free lunch in search and optimization}}
* {{annotated link|Recommender system}}, also use statistical methods to rank results in very large data sets
* {{annotated link|Search engine (computing)}}
* {{annotated link|Search game}}
* {{annotated link|Selection algorithm}}
* {{annotated link|Solver}}
* {{annotated link|Sorting algorithm}}, necessary for executing certain search algorithms
* {{annotated link|Web search engine}}
Categories:
* [[:Category:Search algorithms]]
 
==References==
===Citations===
{{Reflist|30em}}
 
===Bibliography===
====Books====
{{sfn whitelist|CITEREFKnuth1998}}
*{{TAOCP|volume=3|edition=2}}
 
====Articles====
*{{cite journal|last1=Beame|first1=Paul|last2=Fich|first2=Faith|author2-link=Faith Ellen|title = Optimal Bounds for the Predecessor Problem and Related Problems|journal=[[Journal of Computer and System Sciences]]|volume=65|issue=1|date=August 2002|pages=38–72|doi=10.1006/jcss.2002.1822|s2cid=1991980 |doi-access=free}}
*{{Cite journal|last1=Schmittou|first1=Thomas|last2=Schmittou|first2=Faith E.|date=2002-08-01|title=Optimal Bounds for the Predecessor Problem and Related Problems|journal=Journal of Computer and System Sciences|volume=65|issue=1|pages=38–72|doi=10.1006/jcss.2002.1822|doi-access=free |ref=none}}
 
==External links==
{{Refbegin}}
*[[Wikiversity:Uninformed Search Project|Uninformed Search Project]] at the [[Wikiversity]].
{{Refend}}
 
{{Algorithmic paradigms}}
 
[[Category:Internet search algorithms|Web search algorithms]]
[[Category:Ranking functions|ranking algorithms]]
[[Category:Search algorithms| ]]