Content deleted Content added
m Typo/quotemark fixes, replaced: ’s → 's, horizonal → horizontal |
|||
(34 intermediate revisions by 13 users not shown) | |||
Line 2:
{{tone|date=December 2017}}
In [[
==Definition==
Given a [[function (programming)|function]] <math>f</math> that accepts an array, a range query <math>f_q(l, r)</math> on an array <math>a=[a_1,..,a_n]</math> takes two indices <math>l</math> and <math>r</math> and returns the result of <math>f</math> when applied to the subarray <math>[a_l, \ldots, a_r]</math>. For example, for a function <math>\operatorname{sum}</math> that returns the sum of all values in an array, the range query <math>\operatorname{sum}_q(l, r)</math> returns the sum of all values in the range <math>[l, r]</math>.{{Citation needed|date=October 2023}}
==Solutions==
===Prefix sum array===
{{Main|Prefix sum}}
Range sum queries may be answered in [[constant time]] and [[space complexity|linear space]] by pre-computing an array {{mvar|p}} of same length as the input such that for every index {{mvar|i}}, the element {{mvar|p<sub>i</sub>}} is the sum of the first {{mvar|i}} elements of {{mvar|a}}. Any query may then be computed as follows: <math display="block">\operatorname{sum}_q(l, r) = p_r - p_{l-1}.</math>
This strategy may be extended for every [[Group (mathematics)|group]] operator {{mvar|f}} where the notion of <math>f^{-1}</math> is well defined and easily computable.<ref name="morin">{{cite journal|first1=Danny|last1=Krizanc|first2=Pat|last2=Morin|author2-link= Pat Morin |first3=Michiel H. M.|last3=Smid|title=Range Mode and Range Median Queries on Lists and Trees|journal=ISAAC|year=2003|pages=517–526|url=http://cg.scs.carleton.ca/~morin/publications/|arxiv=cs/0307034}}</ref> Finally, this solution can be extended to two-dimensional arrays with a similar preprocessing.<ref name=menhe>{{cite journal|last1=Meng|first1=He|first2=J. Ian|last2=Munro|first3=Patrick K.|last3=Nicholson|title=Dynamic Range Selection in Linear Space|journal=ISAAC|year=2011|pages=160–169}}</ref>▼
▲This strategy may be extended
===Dynamic range queries===
A more difficult subset of the problem consists of executing range queries on dynamic data; that is, data that may mutate between each query. In order to efficiently update array values, more sophisticated data structures like the [[segment tree]] or [[Fenwick tree]] are necessary.{{Citation needed|date=October 2023}}
==Examples==
Line 18 ⟶ 25:
{{main|Range minimum query}}
When the function of interest in a range query is a [[semigroup]] operator, the notion of <math>f^{-1}</math> is not always defined, so the strategy in the previous section does not work. [[Andrew Yao]] showed<ref name="yao">{{cite
There are some semigroup operators that admit slightly better solutions. For instance when <math>f\in \{\max,\min\}</math>. Assume <math> f = \min</math> then <math>\min(A[1..n])</math> returns the index of the [[minimum]] element of <math>A[1..n]</math>. Then <math display="inline">\min_{i,j}(A)</math> denotes the corresponding minimum range query. There are several data structures that allow to answer a range minimum query in <math>O(1)</math> time using a
The [[Cartesian tree]] <math>T_A</math> of an array <math>A[1,n]</math> has as root <math>a_i = \min\{a_1,a_2,\ldots,a_n\}</math> and as left and right subtrees the Cartesian tree of <math>A[1,i-1]</math> and the Cartesian tree of <math>A[i+1,n]</math> respectively. A range minimum query <math display="inline">\min_{i,j}(A)</math> is the [[lowest common ancestor]] in <math>T_A</math> of <math>a_i</math> and <math>a_j</math>. Because the lowest common ancestor can be solved in [[constant time]] using a
===Mode===
{{Main|Range mode query}}
The
{| class="wikitable"
Line 41 ⟶ 48:
|}
Recently Jørgensen et al. proved a lower bound on the [[cell-probe model]] of <math>\Omega\left(\tfrac{\log n}{\log (S w/n)}\right)</math> for any data structure that uses {{mvar|S}} cells.<ref name=jorgensen>{{cite
===Median===
This particular case is of special interest since finding the [[median]] has several applications.<ref name=heriel>{{cite
There have been studied two variants of this problem, the [[offline algorithm|offline]] version, where all the ''k'' queries of interest are given in a batch, and a version where all the
The following pseudocode of the [[Quickselect|quickselect algorithm]] shows how to find the element of rank {{mvar|r}} in <math>A[i,j]</math> an unsorted array of distinct elements, to find the range medians we set <math>r=\frac{j-i}{2}</math>.<ref name="ethpaper">{{cite
rangeMedian(A, i, j, r) {
Line 72 ⟶ 79:
end up in {{code|A.low}} is {{code|t}} and this number is bigger than {{code|r}} then we should keep looking for the element of rank {{code|r}} in {{code|A.low}}; otherwise we should look for the element of rank <math>(r-t)</math> in {{code|A.high}}. To find {{mvar|t}}, it is enough to find the maximum index <math>m\leq i-1</math> such that <math>a_m</math> is in {{code|A.low}} and the maximum index <math>l\leq j</math> such that <math>a_l</math>
is in {{code|A.high}}. Then <math>t=l-m</math>. The total cost for any query, without considering the partitioning part, is <math>\log n</math> since at most <math>\log n</math> recursion calls are done and only a constant number of operations are performed in each of them (to get the value of {{mvar|t}} [[fractional cascading]] should be used).
If a linear algorithm to find the medians is used, the total cost of
===Majority===
Finding frequent elements in a given set of items is one of the most important tasks in data mining. Finding frequent elements might be a difficult task to achieve when most items have similar frequencies. Therefore, it might be more beneficial if some threshold of significance was used for detecting such items. One of the most famous algorithms for finding the majority of an array was proposed by Boyer and Moore <ref>{{
==== Two-dimensional arrays ====
Gagie et al.
<math>\beta=2^{-i}, \;\; i\in \left \{ 1,\dots,\log \left (\frac{1}{\alpha} \right ) \right \}
</math>
where <math>\beta</math> is the
==== One-dimensional arrays ====
Chan et al.
Chan et al.
Using this structure, a range <math>\tau</math>-majority query <math>A[i..j]</math> on <math>A[0..n-1]</math> with <math>0\leq i\leq j \leq n</math> is answered as follows. First, the [[lowest common ancestor]] (LCA) of leaf nodes <math>i</math> and <math>j</math> is found in constant time. Note that there exists a data structure requiring <math>O(n)</math> bits of space that is capable of answering the LCA queries in <math>O(1)</math> time.<ref>{{Cite
==== Tree paths ====
Gagie et al.
To construct this data structure, first <math>{O}(\tau n)</math> nodes are ''marked''. This can be done by marking any node that has distance at least <math>\lceil 1 / \tau\rceil</math> from the bottom of the three (height) and whose depth is divisible by <math>\lceil 1 / \tau\rceil</math>. After doing this, it can be observed that the distance between each node and its nearest marked ancestor is less than <math>2\lceil 1 / \tau\rceil</math>. For a marked node <math>x</math>, <math>\log(depth(x))</math> different sequences (paths towards the root) <math>P_i(x)</math> are stored,
<math>P_{i}(x)=\left\langle \operatorname{label}(x), \operatorname{par}(x), \operatorname{par}^{2}(x), \ldots, \operatorname{par}^{2^i}(x)\right\rangle
</math>
for <math>0\leq i \leq \log(depth(x))</math> where <math>\operatorname{par}(x)</math> returns the label of the direct parent of node <math>x</math>. Put another way, for each marked node, the set of all paths with a power of two length (plus one for the node itself) towards the root is stored. Moreover, for each <math>P_i(x)</math>, the set of all majority ''candidates'' <math>C_i(x)</math> are stored. More specifically, <math>C_i(x)</math> contains the set of all <math>(\tau/2)</math>-majorities in <math>P_i(x)</math> or labels that appear more than <math>(\tau/2).(2^i+1)</math> times in <math>P_i(x)</math>. It is easy to see that the set of candidates <math>C_i(x)</math> can have at most <math>2/\tau</math> distinct labels for each <math>i</math>. Gagie et al.
Each query between two nodes <math>u</math> and <math>v</math> can be answered by using the decomposability property (as explained above) of range <math>\tau</math>-majority queries and by breaking the query path between <math>u</math> and <math>v</math> into four subpaths. Let <math>z</math> be the lowest common ancestor of <math>u</math> and <math>v</math>, with <math>x</math> and <math>y</math> being the nearest marked ancestors of <math>u</math> and <math>v</math> respectively. The path from <math>u</math> to <math>v</math> is decomposed into the paths from <math>u</math> and <math>v</math> to <math>x</math> and <math>y</math> respectively (the size of these paths are smaller than <math>2\lceil 1 / \tau\rceil</math> by definition, all of which are considered as candidates), and the paths from <math>x</math> and <math>y</math> to <math>z</math> (by finding the suitable <math>C_i(x)</math> as explained above and considering all of its labels as candidates). Please note that, boundary nodes have to be handled accordingly so that all of these subpaths are disjoint and from all of them a set of <math>O(1/\tau)</math> candidates is derived. Each of these candidates is then verified using a combination of the <math>labelanc (x, \ell)</math> query which returns the lowest ancestor of node <math>x</math> that has label <math>\ell</math> and the <math>count(x)</math> fields of each node. On a <math>w</math>-bit RAM and an alphabet of size <math>\sigma</math>, the <math>labelanc (x, \ell)</math> query can be answered in <math>O\left(\log \log _{w} \sigma\right) </math> time whilst having linear space requirements.<ref>{{Cite journal|
==Related problems==
All the problems described above have been studied for higher dimensions as well as their dynamic versions. On the other hand, range queries might be extended to other data structures like [[Tree (data structure)|trees]],<ref name="morin kranakis">{{cite
== See also ==
Line 115 ⟶ 122:
==External links==
*[
*[
{{CS-Trees}}
|