Selection algorithm: Difference between revisions

Content deleted Content added
Line 36:
When data is already organized into a [[data structure]], it may be possible to perform selection in an amount of time that is sublinear in the number of values. As a simple case of this, for data already sorted into an array, selecting the {{nowrap|<math>k</math>th}} element may be performed by a single array lookup, in constant time. For values organized into a two-dimensional array of {{nowrap|size <math>m\times n</math>,}} with sorted rows and columns, selection may be performed in time {{nowrap|<math>O\bigl(m\log(2n/m)\bigr)</math>,}} or faster when <math>k</math> is small relative to the array dimensions.{{r|frejoh}}
 
ForSelection from data organized asin a [[binary heap]] it is possible to perform selection intakes {{nowrap|time <math>O(k)</math>,.}} This is independent of the size <math>n</math> of the whole treeheap, and faster than the <math>O(k\log n)</math> time bound that would be obtained from {{nowrap|[[best-first search]].{{r|frederickson}}}} This same method can be applied more generally to data organized as any kind of heap-ordered tree (a tree in which each node stores one value in which the parent of each non-root node has a smaller value than its child). This method of performing selection in a heap has been applied to problems of listing multiple solutions to combinatorial optimization problems, such as finding the [[k shortest path routing|{{mvar|k}} shortest paths]] in a weighted graph, by defining a [[State space (computer science)|state space]] of solutions in the form of an [[implicit graph|implicitly defined]] heap-ordered tree, and then applying this selection algorithm to this {{nowrap|tree.{{r|kpaths}}}} In the other direction, linear time selection algorithms have been used as a subroutine in a [[priority queue]] data structure related to the heap, improving the time for extracting its {{nowrap|<math>k</math>th}} item from <math>O(\log n)</math> to {{nowrap|<math>O(\log^* n+\log k)</math>;}} here <math>\log^* n</math> is the {{nowrap|[[iterated logarithm]].{{r|bks}}}}
 
For a collection of data values undergoing dynamic insertions and deletions, the [[order statistic tree]] augments a [[self-balancing binary search tree]] structure with a constant amount of additional information per tree node, allowing insertions, deletions, and selection queries that ask for the {{nowrap|<math>k</math>th}} element in the current set to all be performed in <math>O(\log n)</math> time per {{nowrap|operation.{{r|clrs}}}} Going beyond the comparison model of computation, faster times per operation are possible for values that are small integers, on which binary arithmetic operations are {{nowrap|allowed.{{r|pattho}}}} It is not possible for a [[streaming algorithms|streaming algorithm]] with memory sublinear in both <math>n</math> and <math>k</math> to solve selection queries exactly for dynamic data, but the [[count–min sketch]] can be used to solve selection queries approximately, by finding a value whose position in the ordering of the elements (if it were added to them) would be within <math>\varepsilon n</math> steps of <math>k</math>, for a sketch whose size is within logarithmic factors of <math>1/\varepsilon</math>.{{r|cormut}}