Range query (computer science)

This is an old revision of this page, as edited by OAbot (talk | contribs) at 19:05, 13 January 2021 (Open access bot: arxiv added to citation with #oabot.). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

In data structures, a range query consists of preprocessing some input data into a data structure to efficiently answer any number of queries on any subset of the input. Particularly, there is a group of problems that have been extensively studied where the input is an array of unsorted numbers and a query consists of computing some function, such as the minimum, on a specific range of the array.

Definition

A range query   on an array   of n elements of some set S, denoted  , takes two indices  , a function f defined over arrays of elements of S and outputs  .

For example, for   and   an array of numbers, the range query   computes  , for any  . These queries may be answered in constant time and using   extra space by calculating the sums of the first i elements of A and storing them into an auxiliary array B, such that   contains the sum of the first i elements of A for every  . Therefore, any query might be answered by doing  .

This strategy may be extended for every group operator f where the notion of   is well defined and easily computable.[1] Finally, this solution can be extended to two-dimensional arrays with a similar preprocessing.[2]

Examples

Semigroup operators

 
Range minimum query reduced to the lowest common ancestor problem.

When the function of interest in a range query is a semigroup operator, the notion of   is not always defined, so the strategy in the previous section does not work. Andrew Yao showed[3] that there exists an efficient solution for range queries that involve semigroup operators. He proved that for any constant c, a preprocessing of time and space   allows to answer range queries on lists where f is a semigroup operator in   time, where   is a certain functional inverse of the Ackermann function.

There are some semigroup operators that admit slightly better solutions. For instance when  . Assume   then   returns the index of the minimum element of  . Then   denotes the corresponding minimum range query. There are several data structures that allow to answer a range minimum query in   time using a preprocessing of time and space  . One such solution is based on the equivalence between this problem and the lowest common ancestor problem.

The Cartesian tree   of an array   has as root   and as left and right subtrees the Cartesian tree of   and the Cartesian tree of   respectively. A range minimum query   is the lowest common ancestor in   of   and  . Because the lowest common ancestor can be solved in constant time using a preprocessing of time and space  , range minimum query can as well. The solution when   is analogous. Cartesian trees can be constructed in linear time.

Mode

The mode of an array A is the element that appears the most in A. For instance the mode of   is 4. In case of ties any of the most frequent elements might be picked as mode. A range mode query consists in preprocessing   such that we can find the mode in any range of  . Several data structures have been devised to solve this problem, we summarize some of the results in the following table.[1]

Range Mode Queries
Space Query Time Restrictions
     
   

Recently Jørgensen et al. proved a lower bound on the cell-probe model of   for any data structure that uses S cells.[4]

Median

This particular case is of special interest since finding the median has several applications.[5] On the other hand, the median problem, a special case of the selection problem, is solvable in O(n), using the median of medians algorithm.[6] However its generalization through range median queries is recent.[7] A range median query   where A,i and j have the usual meanings returns the median element of  . Equivalently,   should return the element of   of rank  . Range median queries cannot be solved by following any of the previous methods discussed above including Yao's approach for semigroup operators.[8]

There have been studied two variants of this problem, the offline version, where all the k queries of interest are given in a batch, and a version where all the preprocessing is done up front. The offline version can be solved with   time and   space.

The following pseudocode of the quickselect algorithm shows how to find the element of rank r in   an unsorted array of distinct elements, to find the range medians we set  .[7]

rangeMedian(A, i, j, r) {
    if A.length() == 1
        return A[1]

    if A.low is undefined then
        m = median(A)
        A.low  = [e in A | e <= m]
        A.high = [e in A | e > m ]

    calculate t the number of elements of A[i, j] that belong to A.low

    if r <= t then
        return rangeMedian(A.low, i, j, r)
    else
        return rangeMedian(A.high, i, j, r-t)
}

Procedure rangeMedian partitions A, using A's median, into two arrays A.low and A.high, where the former contains the elements of A that are less than or equal to the median m and the latter the rest of the elements of A. If we know that the number of elements of   that end up in A.low is t and this number is bigger than r then we should keep looking for the element of rank r in A.low; otherwise we should look for the element of rank   in A.high. To find t, it is enough to find the maximum index   such that   is in A.low and the maximum index   such that   is in A.high. Then  . The total cost for any query, without considering the partitioning part, is   since at most   recursion calls are done and only a constant number of operations are performed in each of them (to get the value of t fractional cascading should be used). If a linear algorithm to find the medians is used, the total cost of preprocessing for k range median queries is  . The algorithm can also be modified to solve the online version of the problem.[7]

All the problems described above have been studied for higher dimensions as well as their dynamic versions. On the other hand, range queries might be extended to other data structures like trees,[8] such as the level ancestor problem. A similar family of problems are orthogonal range queries, also known as counting queries.

See also

References

  1. ^ a b Krizanc, Danny; Morin, Pat; Smid, Michiel H. M. (2003). "Range Mode and Range Median Queries on Lists and Trees". ISAAC: 517–526. arXiv:cs/0307034.
  2. ^ Meng, He; Munro, J. Ian; Nicholson, Patrick K. (2011). "Dynamic Range Selection in Linear Space". ISAAC: 160–169.
  3. ^ Yao, A. C (1982). "Space-Time Tradeoff for Answering Range Queries". e 14th Annual ACM Symposium on the Theory of Computing: 128–136.
  4. ^ Greve, M; J{\o}rgensen, A.; Larsen, K.; Truelsen, J. (2010). "Cell probe lower bounds and approximations for range mode". Automata, Languages and Programming: 605–616.
  5. ^ Har-Peled, Sariel; Muthukrishnan, S. (2008). "Range Medians". ESA: 503–514.
  6. ^ Blum, M.; Floyd, R. W.; Pratt, V. R.; Rivest, R. L.; Tarjan, R. E. (August 1973). "Time bounds for selection" (PDF). Journal of Computer and System Sciences. 7 (4): 448–461. doi:10.1016/S0022-0000(73)80033-9. {{cite journal}}: Invalid |ref=harv (help)
  7. ^ a b c Beat, Gfeller; Sanders, Peter (2009). "Towards Optimal Range Medians". ICALP (1): 475–486.
  8. ^ a b Bose, P; Kranakis, E.; Morin, P.; Tang, Y. (2005). "Approximate range mode and range median queries". In Proceedings of the 22nd Symposium on Theoretical Aspects of Computer Science (STACS 2005), volume 3404 of Lecture Notes in ComputerScience: 377–388.