Revision as of 16:30, 16 October 2024 edit Headbomb (talk \| contribs) Edit filter managers, Autopatrolled, Extended confirmed users, Page movers, File movers, New page reviewers, Pending changes reviewers, Rollbackers, Template editors 473,523 edits ce ← Previous edit		Revision as of 16:31, 16 October 2024 edit undo Headbomb (talk \| contribs) Edit filter managers, Autopatrolled, Extended confirmed users, Page movers, File movers, New page reviewers, Pending changes reviewers, Rollbackers, Template editors 473,523 edits →Majority: ce Next edit →
Line 85: ==== Two-dimensional arrays ==== Gagie et al.<ref>{{~~Citation~~cite book \|last1=Gagie\|first1=Travis~~\|title=Finding~~ ~~Frequent Elements in Compressed 2D Arrays and Strings~~\|date=2011\|url=http://dx.doi.org/10.1007/978-3-642-24583-1_29~~\|work=String Processing and Information~~ ~~Retrieval~~\|pages=295–300\|place=Berlin, Heidelberg\|publisher=Springer Berlin Heidelberg\|isbn=978-3-642-24582-4\|access-date=2021-12-18\|last2=He\|first2=Meng\|last3=Munro\|first3=J. Ian\|last4=Nicholson\|first4=Patrick K.\|series=Lecture Notes in Computer Science \|volume=7024 \|doi=10.1007/978-3-642-24583-1_29 }}</ref> proposed a data structure that supports range <math>\tau</math>-majority queries on an <math>m\times n</math> array <math>A</math>. For each query <math>\operatorname{Q}=(\operatorname{R}, \tau)</math> in this data structure a threshold <math>0<\tau<1</math> and a rectangular range <math>\operatorname{R}</math> are specified, and the set of all elements that have relative frequencies (inside that rectangular range) greater than or equal to <math>\tau</math> are returned as the output. This data structure supports dynamic thresholds (specified at query time) and a pre-processing threshold <math>\alpha</math> based on which it is constructed. During the pre-processing, a set of ''vertical'' and ''horizontal'' intervals are built on the <math>m \times n</math> array. Together, a vertical and a horizontal interval form a ''block.'' Each block is part of a ''superblock'' nine times bigger than itself (three times the size of the block's horizontal interval and three times the size of its vertical one). For each block a set of candidates (with <math>\frac{9}{\alpha}</math> elements at most) is stored which consists of elements that have relative frequencies at least <math>\frac{\alpha}{9}</math> (the pre-processing threshold as mentioned above) in its respective superblock. These elements are stored in non-increasing order according to their frequencies and it is easy to see that, any element that has a relative frequency at least <math>\alpha</math> in a block must appear its set of candidates. Each <math>\tau</math>-majority query is first answered by finding the ''query block,'' or the biggest block that is contained in the provided query rectangle in <math>O(1)</math> time. For the obtained query block, the first <math>\frac{9}{\tau}</math> candidates are returned (without being verified) in <math>O(1/\tau)</math> time, so this process might return some false positives. Many other data structures (as discussed below) have proposed methods for verifying each candidate in constant time and thus maintaining the <math>O(1/\tau)</math> query time while returning no false positives. The cases in which the query block is smaller than <math>1/\alpha</math> are handled by storing <math>\log \left ( \frac{1}{\alpha} \right )</math> different instances of this data structure of the following form: <math>\beta=2^{-i}, \;\; i\in \left \{ 1,\dots,\log \left (\frac{1}{\alpha} \right ) \right \} Line 93: ==== One-dimensional arrays ==== Chan et al.<ref name=":0">{{~~Citation~~cite book \|last1=Chan\|first1=Timothy M.~~\|title=Linear-Space Data Structures for Range Minority Query in~~ ~~Arrays~~\|date=2012\|url=http://dx.doi.org/10.1007/978-3-642-31155-0_26~~\|work=Algorithm~~ ~~Theory – SWAT 2012~~\|pages=295–306\|place=Berlin, Heidelberg\|publisher=Springer Berlin Heidelberg\|isbn=978-3-642-31154-3\|access-date=2021-12-20\|last2=Durocher\|first2=Stephane\|last3=Skala\|first3=Matthew\|last4=Wilkinson\|first4=Bryan T.\|series=Lecture Notes in Computer Science \|volume=7357 \|doi=10.1007/978-3-642-31155-0_26 }}</ref> proposed a data structure that given a one-dimensional array<math>A</math>, a subrange <math>R</math> of <math>A</math> (specified at query time) and a threshold <math>\tau</math> (specified at query time), is able to return the list of all <math>\tau</math>-majorities in <math>O(1/\tau)</math> time requiring <math>O(n \log n)</math> words of space. To answer such queries, Chan et al.<ref name=":0" /> begin by noting that there exists a data structure capable of returning the ''top-k'' most frequent items in a range in <math>O(k)</math> time requiring <math>O(n)</math> words of space. For a one-dimensional array <math>A[0,..,n-1]</math>, let a one-sided top-k range query to be of form <math>A[0..i] \text { for } 0 \leq i \leq n-1</math>. For a maximal range of ranges <math>A[0..i] \text { through } A[0..j]</math> in which the frequency of a distinct element <math>e</math> in <math>A</math> remains unchanged (and equal to <math>f</math>), a horizontal line segment is constructed. The <math>x</math>-interval of this line segment corresponds to <math>[i,j]</math> and it has a <math>y</math>-value equal to <math>f</math>. Since adding each element to <math>A</math> changes the frequency of exactly one distinct element, the aforementioned process creates <math>O(n)</math> line segments. Moreover, for a vertical line <math>x=i</math> all horizonal line segments intersecting it are sorted according to their frequencies. Note that, each horizontal line segment with <math>x</math>-interval <math>[\ell,r]</math> corresponds to exactly one distinct element <math>e</math> in <math>A</math>, such that <math>A[\ell]=e</math>. A top-k query can then be answered by shooting a vertical ray <math>x=i</math> and reporting the first <math>k</math> horizontal line segments that intersect it (remember from above that these line segments are already sorted according to their frequencies) in <math>O(k)</math> time. Chan et al.<ref name=":0" /> first construct a [[range tree]] in which each branching node stores one copy of the data structure described above for one-sided range top-k queries and each leaf represents an element from <math>A</math>. The top-k data structure at each node is constructed based on the values existing in the subtrees of that node and is meant to answer one-sided range top-k queries. Please note that for a one-dimensional array <math>A</math>, a range tree can be constructed by dividing <math>A</math> into two halves and recursing on both halves; therefore, each node of the resulting range tree represents a range. It can also be seen that this range tree requires <math>O(n \log n)</math> words of space, because there are <math>O(\log n)</math> levels and each level <math>\ell</math> has <math>2^{\ell}</math> nodes. Moreover, since at each level <math>\ell</math> of a range tree all nodes have a total of <math>n</math> elements of <math>A</math> at their subtrees and since there are <math>O(\log n)</math> levels, the space complexity of this range tree is <math>O(n \log n)</math>. Using this structure, a range <math>\tau</math>-majority query <math>A[i..j]</math> on <math>A[0..n-1]</math> with <math>0\leq i\leq j \leq n</math> is answered as follows. First, the [[lowest common ancestor]] (LCA) of leaf nodes <math>i</math> and <math>j</math> is found in constant time. Note that there exists a data structure requiring <math>O(n)</math> bits of space that is capable of answering the LCA queries in <math>O(1)</math> time.<ref>{{Cite journal\|last1=Sadakane\|first1=Kunihiko\|last2=Navarro\|first2=Gonzalo\|date=2010-01-17~~\|title=Fully-Functional~~ ~~Succinct Trees\|journal=Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms~~\|pages=134–149 \|___location=Philadelphia, PA\|publisher=Society for Industrial and Applied Mathematics\|doi=10.1137/1.9781611973075.13\|isbn=978-0-89871-701-3 \|s2cid=3189222 \|doi-access=free}}</ref> Let <math>z</math> denote the LCA of <math>i </math> and <math>j</math>, using <math>z</math> and according to the decomposability of range <math>\tau</math>-majority queries (as described above and in <ref name=":1" />), the two-sided range query <math>A[i..j]</math> can be converted into two one-sided range top-k queries (from <math>z</math> to <math>i</math> and <math>j</math>). These two one-sided range top-k queries return the top-(<math>1/\tau</math>) most frequent elements in each of their respective ranges in <math>O(1/\tau)</math> time. These frequent elements make up the set of ''candidates'' for <math>\tau</math>-majorities in <math>A[i..j]</math> in which there are <math>O(1/\tau)</math> candidates some of which might be false positives. Each candidate is then assessed in constant time using a linear-space data structure (as described in Lemma 3 in <ref>{{Cite journal\|last1=Chan\|first1=Timothy M.\|last2=Durocher\|first2=Stephane\|last3=Larsen\|first3=Kasper Green\|last4=Morrison\|first4=Jason\|last5=Wilkinson\|first5=Bryan T.\|date=2013-03-08\|title=Linear-Space Data Structures for Range Mode Query in Arrays\|url=http://dx.doi.org/10.1007/s00224-013-9455-2\|journal=Theory of Computing Systems\|volume=55\|issue=4\|pages=719–741\|doi=10.1007/s00224-013-9455-2\|s2cid=253747004 \|issn=1432-4350}}</ref>) that is able to determine in <math>O(1)</math> time whether or not a given subrange of an array <math>A</math> contains at least <math>q</math> instances of a particular element <math>e</math>. ==== Tree paths ====

Range query (computer science): Difference between revisions