Range query (computer science)

In data structures, a range query consists of preprocessing some input data into a data structure to efficiently answer any number of queries on any subset of the input. Particularly, there is a group of problems that have been extensively studied where the input is an array of unsorted numbers and a query consists of computing some function, such as the minimum, on a specific range of the array.

Definition

A range query $q_{f}(A,i,j)$ on an array $A=[a_{1},a_{2},..,a_{n}]$ of n elements of some set $S$ , denoted $A[1,n]$ , takes two indices $1\leq i\leq j\leq n$ , a function $f$ defined over arrays of elements of $S$ and outputs $f(A[i,j])=f(a_{i},\ldots ,a_{j})$ .

For example, for $f=\sum$ and $A[1,n]$ an array of numbers, the range query $\sum _{i,j}A$ computes $\sum A[i,j]=(a_{i}+\ldots +a_{j})$ , for any $1\leq i\leq j\leq n$ . These queries may be answered in constant time and using $O(n)$ extra space by calculating the sums of the first $i$ elements of $A$ and storing them into an auxiliary array $B$ , such that $B[i]$ contains the sum of the first $i$ elements of $A$ for every $0\leq i\leq n$ . Therefore, any query might be answered by doing $\sum A[i,j]=B[j]-B[i-1]$ .

This strategy may be extended for every group operator $f$ where the notion of $f^{-1}$ is well defined and easily computable.^[1] Finally, this solution can be extended to two-dimensional arrays with a similar preprocessing.^[2]

Examples

Semigroup operators

Range minimum query reduced to the lowest common ancestor problem.

When the function of interest in a range query is a semigroup operator, the notion of $f^{-1}$ is not always defined, so the strategy in the previous section does not work. Andrew Yao showed^[3] that there exists an efficient solution for range queries that involve semigroup operators. He proved that for any constant $c$ , a preprocessing of time and space $\theta (c\cdot n)$ allows to answer range queries on lists where $f$ is a semigroup operator in $\theta (\alpha _{c}(n))$ time, where $\alpha _{k}$ is a certain functional inverse of the Ackermann function.

There are some semigroup operators that admit slightly better solutions. For instance when $f\in \{\max ,\min \}$ . Assume $f=\min$ then $\min(A[1..n])$ returns the index of the minimum element of $A[1..n]$ . Then $\min _{i,j}(A)$ denotes the corresponding minimum range query. There are several data structures that allow to answer a range minimum query in $O(1)$ time using a preprocessing of time and space $O(n)$ . One such solution is based on the equivalence between this problem and the lowest common ancestor problem.

The Cartesian tree $T_{A}$ of an array $A[1,n]$ has as root $a_{i}=\min\{a_{1},a_{2},\ldots ,a_{n}\}$ and as left and right subtrees the Cartesian tree of $A[1,i-1]$ and the Cartesian tree of $A[i+1,n]$ respectively. A range minimum query $\min _{i,j}(A)$ is the lowest common ancestor in $T_{A}$ of $a_{i}$ and $a_{j}$ . Because the lowest common ancestor can be solved in constant time using a preprocessing of time and space $O(n)$ , range minimum query can as well. The solution when $f=\max$ is analogous. Cartesian trees can be constructed in linear time.

Mode

The mode of an array A is the element that appears the most in A. For instance the mode of $A=[4,5,6,7,4]$ is 4. In case of ties any of the most frequent elements might be picked as mode. A range mode query consists in preprocessing $A[1,n]$ such that we can find the mode in any range of $A[1,n]$ . Several data structures have been devised to solve this problem, we summarize some of the results in the following table.^[1]

Range Mode Queries
Space	Query Time	Restrictions
$O(n^{2-2\epsilon })$	$O(n^{\epsilon }\log n)$	$0\leq \epsilon \leq {\frac {1}{2}}$
$O\left({\frac {n^{2}\log \log n}{\log n}}\right)$	$O(1)$

Recently Jørgensen et al. proved a lower bound on the cell-probe model of $\Omega \left({\tfrac {\log n}{\log(Sw/n)}}\right)$ for any data structure that uses $S$ cells.^[4]

Median

This particular case is of special interest since finding the median has several applications.^[5] On the other hand, the median problem, a special case of the selection problem, is solvable in O(n), using the median of medians algorithm.^[6] However its generalization through range median queries is recent.^[7] A range median query $\operatorname {median} (A,i,j)$ where A,i and j have the usual meanings returns the median element of $A[i,j]$ . Equivalently, $\operatorname {median} (A,i,j)$ should return the element of $A[i,j]$ of rank ${\frac {j-i}{2}}$ . Range median queries cannot be solved by following any of the previous methods discussed above including Yao's approach for semigroup operators.^[8]

There have been studied two variants of this problem, the offline version, where all the k queries of interest are given in a batch, and a version where all the preprocessing is done up front. The offline version can be solved with $O(n\log k+k\log n)$ time and $O(n\log k)$ space.

The following pseudocode of the quickselect algorithm shows how to find the element of rank $r$ in $A[i,j]$ an unsorted array of distinct elements, to find the range medians we set $r={\frac {j-i}{2}}$ .^[7]

rangeMedian(A, i, j, r) {
    if A.length() == 1
        return A[1]

    if A.low is undefined then
        m = median(A)
        A.low  = [e in A | e <= m]
        A.high = [e in A | e > m ]

    calculate t the number of elements of A[i, j] that belong to A.low

    if r <= t then
        return rangeMedian(A.low, i, j, r)
    else
        return rangeMedian(A.high, i, j, r-t)
}

Procedure rangeMedian partitions A, using A's median, into two arrays A.low and A.high, where the former contains the elements of A that are less than or equal to the median m and the latter the rest of the elements of A. If we know that the number of elements of $A[i,j]$ that end up in A.low is t and this number is bigger than r then we should keep looking for the element of rank r in A.low; otherwise we should look for the element of rank $(r-t)$ in A.high. To find $t$ , it is enough to find the maximum index $m\leq i-1$ such that $a_{m}$ is in A.low and the maximum index $l\leq j$ such that $a_{l}$ is in A.high. Then $t=l-m$ . The total cost for any query, without considering the partitioning part, is $\log n$ since at most $\log n$ recursion calls are done and only a constant number of operations are performed in each of them (to get the value of $t$ fractional cascading should be used). If a linear algorithm to find the medians is used, the total cost of preprocessing for $k$ range median queries is $n\log k$ . The algorithm can also be modified to solve the online version of the problem.^[7]

References

^ ^a ^b Krizanc, Danny; Morin, Pat; Smid, Michiel H. M. (2003). "Range Mode and Range Median Queries on Lists and Trees". ISAAC: 517–526. arXiv:cs/0307034.
^ Meng, He; Munro, J. Ian; Nicholson, Patrick K. (2011). "Dynamic Range Selection in Linear Space". ISAAC: 160–169.
^ Yao, A. C (1982). "Space-Time Tradeoff for Answering Range Queries". e 14th Annual ACM Symposium on the Theory of Computing: 128–136.
^ Greve, M; J{\o}rgensen, A.; Larsen, K.; Truelsen, J. (2010). "Cell probe lower bounds and approximations for range mode". Automata, Languages and Programming: 605–616.
^ Har-Peled, Sariel; Muthukrishnan, S. (2008). "Range Medians". ESA: 503–514.
^ Blum, M.; Floyd, R. W.; Pratt, V. R.; Rivest, R. L.; Tarjan, R. E. (August 1973). "Time bounds for selection" (PDF). Journal of Computer and System Sciences. 7 (4): 448–461. doi:10.1016/S0022-0000(73)80033-9. {{cite journal}}: Invalid |ref=harv (help)
^ ^a ^b ^c Beat, Gfeller; Sanders, Peter (2009). "Towards Optimal Range Medians". ICALP (1): 475–486.
^ ^a ^b Bose, P; Kranakis, E.; Morin, P.; Tang, Y. (2005). "Approximate range mode and range median queries". In Proceedings of the 22nd Symposium on Theoretical Aspects of Computer Science (STACS 2005), volume 3404 of Lecture Notes in ComputerScience: 377–388.

External links

[morin-1] Krizanc, Danny; Morin, Pat; Smid, Michiel H. M. (2003). "Range Mode and Range Median Queries on Lists and Trees". ISAAC: 517–526. arXiv:cs/0307034.

[menhe-2] Meng, He; Munro, J. Ian; Nicholson, Patrick K. (2011). "Dynamic Range Selection in Linear Space". ISAAC: 160–169.

[yao-3] Yao, A. C (1982). "Space-Time Tradeoff for Answering Range Queries". e 14th Annual ACM Symposium on the Theory of Computing: 128–136.

[jorgensen-4] Greve, M; J{\o}rgensen, A.; Larsen, K.; Truelsen, J. (2010). "Cell probe lower bounds and approximations for range mode". Automata, Languages and Programming: 605–616.

[heriel-5] Har-Peled, Sariel; Muthukrishnan, S. (2008). "Range Medians". ESA: 503–514.

[tarjanmedian-6] Blum, M.; Floyd, R. W.; Pratt, V. R.; Rivest, R. L.; Tarjan, R. E. (August 1973). "Time bounds for selection" (PDF). Journal of Computer and System Sciences. 7 (4): 448–461. doi:10.1016/S0022-0000(73)80033-9. {{cite journal}}: Invalid |ref=harv (help)

[ethpaper-7] Beat, Gfeller; Sanders, Peter (2009). "Towards Optimal Range Medians". ICALP (1): 475–486.

[morin_kranakis-8] Bose, P; Kranakis, E.; Morin, P.; Tang, Y. (2005). "Approximate range mode and range median queries". In Proceedings of the 22nd Symposium on Theoretical Aspects of Computer Science (STACS 2005), volume 3404 of Lecture Notes in ComputerScience: 377–388.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]