Content deleted Content added
m fix typo |
→Examples: I added range majority queries as examples of range queries and discussed range tau-majority queries on 2D arrays. I will add more in the coming days. |
||
Line 73:
is in {{code|A.high}}. Then <math>t=l-m</math>. The total cost for any query, without considering the partitioning part, is <math>\log n</math> since at most <math>\log n</math> recursion calls are done and only a constant number of operations are performed in each of them (to get the value of {{mvar|t}} [[fractional cascading]] should be used).
If a linear algorithm to find the medians is used, the total cost of preprocessing for {{mvar|k}} range median queries is <math> n\log k</math>. The algorithm can also be modified to solve the [[online algorithm|online]] version of the problem.<ref name=ethpaper />
===Majority===
Finding frequent elements in a given set of items is one of the most important tasks in data mining. Finding frequent elements might be a difficult task to achieve when most items have similar frequencies. Therefore, it might be more beneficial if some threshold of significance was used for detecting such items. One of the most famous algorithms for finding the majority of an array was proposed by Boyer and Moore <ref>{{Citation|last=Boyer|first=Robert S.|title=MJRTY—A Fast Majority Vote Algorithm|date=1991|url=http://dx.doi.org/10.1007/978-94-011-3488-0_5|work=Automated Reasoning Series|pages=105–117|place=Dordrecht|publisher=Springer Netherlands|access-date=2021-12-18|last2=Moore|first2=J. Strother}}</ref> which is also known as the [[Boyer–Moore majority vote algorithm]]. Boyer and Moore proposed an algorithm to find the majority element of a string (if it has one) in <math>O(n)</math> time and using <math>O(1)</math> space. In the context of Boyer and Moore’s work and generally speaking, a majority element in a set of items (for example string or an array) is one whose number of instances is more than half of the size of that set. Few years later, Misra and Gries <ref>{{Cite journal|last=Misra|first=J.|last2=Gries|first2=David|date=1982-11|title=Finding repeated elements|url=http://dx.doi.org/10.1016/0167-6423(82)90012-0|journal=Science of Computer Programming|volume=2|issue=2|pages=143–152|doi=10.1016/0167-6423(82)90012-0|issn=0167-6423}}</ref> proposed a more general version of Boyer and Moore's algorithm using <math>O(n log (1 / \tau))</math> comparisons to find all items in an array whose relative frequencies are greater than some threshold <math>0<\tau<1</math>. A range <math>\tau</math>-majority query is one that, given a subrange of a data structure (for example an array) of size <math>|R|</math>, returns the set of all distinct items that appear more than (or in some publications equal to) <math>\tau |R|</math> times in that given range. In different structures that support range <math>\tau</math>-majority queries, <math>\tau </math> can be either static (specified during preprocessing) or dynamic (specified at query time). Many of such approaches are based on the fact that, regardless of the size of the range, for a given <math>\tau</math> there could be at most <math>O(1/\tau)</math> distinct ''candidates'' with relative frequencies at least <math>\tau</math>. By verifying each of these candidates in constant time, <math>O(1/\tau)</math> query time is achieved.
===== Range Majority Queries on Two-Dimensional Arrays =====
Gagie et al. <ref>{{Citation|last=Gagie|first=Travis|title=Finding Frequent Elements in Compressed 2D Arrays and Strings|date=2011|url=http://dx.doi.org/10.1007/978-3-642-24583-1_29|work=String Processing and Information Retrieval|pages=295–300|place=Berlin, Heidelberg|publisher=Springer Berlin Heidelberg|isbn=978-3-642-24582-4|access-date=2021-12-18|last2=He|first2=Meng|last3=Munro|first3=J. Ian|last4=Nicholson|first4=Patrick K.}}</ref> proposed a data structure that supports range <math>\tau</math>-majority queries queries on an <math>m\times n</math> array <math>A</math>. For each query in this data structure a threshold <math>0<\tau<1</math> and a rectangular range are specified, and the set of all elements that have relative frequencies (inside that rectangular range) greater than or equal to <math>\tau</math> are returned as the output. This data structure supports dynamic thresholds (specified at query time) and a preprocessing threshold <math>\alpha</math> based on which it is constructed. During the preprocessing, a set of ''vertical'' and ''horizontal'' intervals are built on the <math>m \times n</math> array. Together, a vertical and a horizontal interval form a ''block.'' Each block is part of a ''superblock'' nine times bigger than itself (three times the size of the block's horizontal interval and three times the size of its vertical one). For each block a set of candidates (with <math>\frac{9}{\alpha}</math> elements at most) is stored which consists of elements that have relative frequencies at least <math>\alpha</math> (the preprocessing threshold as mentioned above) in its respective superblock. These elements are stored in non-increasing order according to their frequencies and it is easy to see that, any element that has a relative frequency at least <math>\alpha</math> in a block must appear its set of candidates. Each <math>\tau</math>-majority query is first answered by finding the ''query block,'' or the biggest block that is contained in the provided query rectangle in <math>O(1)</math> time. For the obtained query block, the first <math>\frac{9}{\tau}</math> candidates are returned (without being verified) in <math>O(1/\tau)</math> time, so this process might return some false positives. Many other data structures (as discussed below) have proposed methods for verifying each candidate in constant time and thus maintaining the <math>O(1/\tau)</math> query time while returning no false positives. The cases in which the query block is smaller than <math>1/\alpha</math> are handled by storing <math>log(1/\alpha)</math> different instances of this data structure of the following form:
<math>\beta=2^{-i}, \;\; i\in \{1,\dots,log(\frac{1}{\alpha})\}
</math>
where <math>\beta</math> is the preprocessing threshold of the <math>i</math>-th instance. Thus, for query blocks smaller than <math>1/\alpha</math> the <math>\lceil\log (1 / \tau)\rceil</math>-th instance is queried. As mentioned above, this data structure has query time <math>O(1/\tau)</math> and requires <math>\mathcal{O}(m n(H+1) \log^2 (1 / \alpha))</math> bits of space by storing a Huffman-encoded copy of it (note the <math>log(1/\alpha)</math> factor and also see [[Huffman coding]]).
==Related problems==
|