Search data structure: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 04:13, 30 December 2015 edit 99.88.191.26 (talk) →Asymptotic amortized worst-case analysis: edit trie asymptotics ← Previous edit		Latest revision as of 14:42, 27 October 2023 edit undo WikiCleanerBot (talk \| contribs) Bots 1,007,735 edits m v2.05b - Bot T5 CW#16 - Fix errors for CW project (Unicode control characters) Tag: WPCleaner
(24 intermediate revisions by 19 users not shown)
Line 1: In [[computer science]], a '''search data structure'''{{citation needed\|date=May 2016}} is any [[data structure]] that allows the efficient [[data retrieval\|retrieval]] of specific items from a [[set (mathematics)\|set]] of items, such as a specific [[record (computer science)\|record]] from a [[database]]. The simplest, most general, and least efficient search structure is merely an unordered sequential [[list (computing)\|list]] of all the items. Locating the desired item in such a list, by the [[linear search]] method, inevitably requires a number of operations proportional to the number ''n'' of items, in the [[worst case complexity\|worst case]] as well as in the [[average case complexity\|average case]]. Useful search data structures allow faster retrieval; however, they are limited to queries of some specific kind. Moreover, since the cost of building such structures is at least proportional to ''n'', they only pay off if several queries are to be performed on the same database (or on a database that changes little between queries). '''Static''' search structures are designed for answering many [[Information retrieval\|queries]] on a fixed database; '''dynamic''' structures also allow insertion, deletion, or [[Update (SQL)\|modification]] of items between successive queries. In the dynamic case, one must also consider the cost of fixing the search structure to account for the changes in the database. ==Classification== Line 9: The simplest kind of query is to locate a record that has a specific field (the ''key'') equal to a specified value ''v''. Other common kinds of query are "find the item with smallest (or largest) key value", "find the item with largest key value not exceeding ''v''", "find all items with key values between specified bounds ''v''<sub>min</sub> and ''v''<sub>max</sub>". In certain databases the key values may be points in some [[dimension (mathematics)\|multi-dimensional space]]. For example, the key may be a geographic position ([[latitude]] and [[longitude]]) on the [[Earth]]. In that case, common kinds of queries are ''"find the record with a key closest to a given point ''v''", or "find all items whose key lies at a given distance from ''v''", or "find all items within a specified region ''R'' of the space". A common special case of the latter are simultaneous range queries on two or more simple keys, such as "find all employee records with salary between 50,000 and 100,000 and hired between 1995 and 2007". Line 23: *[[Heap (data structure)\|Heap]] ====Asymptotic ~~amortized~~ worst-case analysis==== In this table, the [[asymptotic analysis\|asymptotic]] [[big-O notation\|notation ''O''(''f''(''n''))]] means "not exceeding some fixed multiple of ''f''(''n'') in the worst case." Line 29: {\| class="wikitable" \|- ! Data Structure ! ! Insert ! Delete Line 40: \|- \| Unsorted [[Array data structure\|array]] \| [[Constant time\|''O''(1)]]<br /><sup>(see note)</sup> \| ''O''(1)<br /><sup>(see note)</sup> \| N/A \| ''O''(1) Line 58: \| ''O''(1) \| ''O''(''n'') \|- \| [[Stack (abstract data type)\|Stack]] \|''O''(1) \|''O''(1) \| \| \|''O''(''n'') \| \| \|''O''(''n'') \|- \| [[Queue (abstract data type)\|Queue]] \|''O''(1) \|''O''(1) \| \| \|''O''(''n'') \| \| \|''O''(''n'') \|- \| Unsorted [[linked list]] \| ''O''(1) \| ''O''(1)<ref name="listdelete">{{cite book \|title=[[Introduction to Algorithms]] \|publisher=The College of Information Sciences and Technology at Penn State\|isbn=~~9780262530910~~978-0-262-53091-0 \|~~authors~~author=Thomas H. Cormen, \|author2=Charles E. Leiserson, \|author3=Ronald L. Rivest \|year=1990 \|quote=LIST-DELETE runs in ''O''(1) time, but if ~~we wish~~ to delete an element with a given key, Θ(n) time is required in the worst case because we must first call LIST-SEARCH.}}</ref> \| N/A \| ''O''(''n'') Line 79 ⟶ 99: \| ''O''(''n'') \|- \| [[~~Self-balancing~~Skip ~~binary tree~~list]] \| \| \| \| \| \| \| \| \|- \| [[Self-balancing binary search tree]] \| ''O''(log ''n'') \| ''O''(log ''n'') Line 95 ⟶ 125: \| N/A \| ''O''(''n'') \| ''O''(1) for a ''min-heap''<br />''O''(''n'') for a ''max-heap''<ref name="heap">{{cite book \|title=Introduction to Algorithms\|publisher=The College of Information Sciences and Technology at Penn State\|isbn=~~9780262530910~~978-0-262-53091-0 \|~~authors~~author=Thomas H. Cormen, \|author2=Charles E. Leiserson, \|author3=Ronald L. Rivest \|year=1990 \|quote=There are two kinds of binary heaps: max-heaps and min-heaps. In both kinds, the values in the nodes satisfy a '''heap property'''... the largest element in a max-heap is stored at the root... The smallest element in a min-heap is at the root... The operation HEAP-MAXIMUM returns the maximum heap element in Θ(1) time by simply returning the value ''A''[1] in the heap.}}</ref> \| ''O''(1) for a ''max-heap''<br />''O''(''n'') for a ''min-heap''<ref name="heap" /> \| ''O''(''n'') \|- Line 118 ⟶ 148: \| ''O''(''k'') \| ''O''(''k'' ''n'') \|- \| [[Cartesian tree]] \| \| \| \| \| \| \| \| \|- \| [[B-tree]] \| ''O''(log ''n'') \| ''O''(log ''n'') \| ''O''(log ''n'') \| N/A \| ''O''(log ''n'') \| ''O''(log ''n'') \| ''O''(log ''n'') \| ''O''(''n'') \|- \| [[Red–black tree]] \|''O''(log ''n'') \|''O''(log ''n'') \| \| \|''O''(log ''n'') \| \| \|''O''(''n'') \|- \| [[Splay tree]] \| \| \| \| \| \| \| \| \|- \| [[AVL tree]] \| \| \| \| \| ''O''(log ''n'') \| \| \| \|- \| [[k-d tree]] \| \| \| \| \| \| \| \| \|} ''Note'': Insert on an unsorted array is sometimes quoted as being ''O''(''n'') due to the assumption that the element to be inserted must be inserted at one particular ___location of the array, which would require shifting all the subsequent elements by one position. However, in a classic array, the array is used to store arbitrary unsorted elements, and hence the exact position of any given element is of no consequence, and insert is carried out by increasing the array size by 1 and storing the element at the end of the array, which is a ''O''(1) operation.<ref name="games">{{cite book\|isbn=~~9781584506638~~978-1-58450-663-8\|title=Data Structures and Algorithms for Game Developers\|author=Allen Sherrod\|publisher=Cengage Learning\|year=2007\|quote=The insertion of an item into an unordered array does not depend on anything other than placing the new item at the end of the list. This gives the insertion into an unordered array of ''O''(1).}}</ref><ref>{{cite book \|title=[[Introduction to Algorithms]]\|publisher=The College of Information Sciences and Technology at Penn State\|isbn=~~9780262530910~~978-0-262-53091-0 \|~~authors~~author=Thomas H. Cormen, \|author2=Charles E. Leiserson, \|author3=Ronald L. Rivest \|year=1990 }}</ref> Likewise, the deletion operation is sometimes quoted as being ''O''(''n'') due to the assumption that subsequent elements must be shifted, but in a classic unsorted array the order is unimportant (though elements are implicitly ordered by insert-time), so deletion can be carried out by swapping the element to be deleted with the last element in the array and then decrementing the array size by 1, which is a ''O''(1) operation.<ref>{{cite web\|url=~~http~~https://stackoverflow.com/questions/9358481/algorithm-the-time-complexity-of-deletion-in-a-unsorted-array#9358634\|title=Algorithm - the time complexity of deletion in a unsorted array\|quote=Finding the element with a given value is linear. Since the array isn't sorted anyway, you can do the deletion itself in constant time. First swap the element you want to delete to the end of the array, then reduce the array size by one element.}}</ref> This table is only an approximate summary; for each data structure there are special situations and variants that may lead to different costs. Also two or more data structures can be combined to obtain lower costs.