Content deleted Content added
m Open access bot: url-access=subscription updated in citation with #oabot. |
m Typo/quotemark fixes, replaced: ’s → 's, horizonal → horizontal |
||
Line 82:
===Majority===
Finding frequent elements in a given set of items is one of the most important tasks in data mining. Finding frequent elements might be a difficult task to achieve when most items have similar frequencies. Therefore, it might be more beneficial if some threshold of significance was used for detecting such items. One of the most famous algorithms for finding the majority of an array was proposed by Boyer and Moore <ref>{{cite book |last1=Boyer|first1=Robert S.|date=1991|chapter-url=http://dx.doi.org/10.1007/978-94-011-3488-0_5|pages=105–117|place=Dordrecht|publisher=Springer Netherlands|access-date=2021-12-18|last2=Moore|first2=J. Strother|title=Automated Reasoning |chapter=MJRTY—A Fast Majority Vote Algorithm |series=Automated Reasoning Series |volume=1 |doi=10.1007/978-94-011-3488-0_5 |isbn=978-94-010-5542-0 }}</ref> which is also known as the [[Boyer–Moore majority vote algorithm]]. Boyer and Moore proposed an algorithm to find the majority element of a string (if it has one) in <math>O(n)</math> time and using <math>O(1)</math> space. In the context of Boyer and
==== Two-dimensional arrays ====
Line 93:
==== One-dimensional arrays ====
Chan et al.<ref name=":0">{{cite book |last1=Chan|first1=Timothy M. |date=2012|chapter-url=http://dx.doi.org/10.1007/978-3-642-31155-0_26 |pages=295–306|place=Berlin, Heidelberg|publisher=Springer Berlin Heidelberg|isbn=978-3-642-31154-3|access-date=2021-12-20|last2=Durocher|first2=Stephane|last3=Skala|first3=Matthew|last4=Wilkinson|first4=Bryan T.|title=Algorithm Theory – SWAT 2012 |chapter=Linear-Space Data Structures for Range Minority Query in Arrays |series=Lecture Notes in Computer Science |volume=7357 |doi=10.1007/978-3-642-31155-0_26 }}</ref> proposed a data structure that given a one-dimensional array<math>A</math>, a subrange <math>R</math> of <math>A</math> (specified at query time) and a threshold <math>\tau</math> (specified at query time), is able to return the list of all <math>\tau</math>-majorities in <math>O(1/\tau)</math> time requiring <math>O(n \log n)</math> words of space. To answer such queries, Chan et al.<ref name=":0" /> begin by noting that there exists a data structure capable of returning the ''top-k'' most frequent items in a range in <math>O(k)</math> time requiring <math>O(n)</math> words of space. For a one-dimensional array <math>A[0,..,n-1]</math>, let a one-sided top-k range query to be of form <math>A[0..i] \text { for } 0 \leq i \leq n-1</math>. For a maximal range of ranges <math>A[0..i] \text { through } A[0..j]</math> in which the frequency of a distinct element <math>e</math> in <math>A</math> remains unchanged (and equal to <math>f</math>), a horizontal line segment is constructed. The <math>x</math>-interval of this line segment corresponds to <math>[i,j]</math> and it has a <math>y</math>-value equal to <math>f</math>. Since adding each element to <math>A</math> changes the frequency of exactly one distinct element, the aforementioned process creates <math>O(n)</math> line segments. Moreover, for a vertical line <math>x=i</math> all
Chan et al.<ref name=":0" /> first construct a [[range tree]] in which each branching node stores one copy of the data structure described above for one-sided range top-k queries and each leaf represents an element from <math>A</math>. The top-k data structure at each node is constructed based on the values existing in the subtrees of that node and is meant to answer one-sided range top-k queries. Please note that for a one-dimensional array <math>A</math>, a range tree can be constructed by dividing <math>A</math> into two halves and recursing on both halves; therefore, each node of the resulting range tree represents a range. It can also be seen that this range tree requires <math>O(n \log n)</math> words of space, because there are <math>O(\log n)</math> levels and each level <math>\ell</math> has <math>2^{\ell}</math> nodes. Moreover, since at each level <math>\ell</math> of a range tree all nodes have a total of <math>n</math> elements of <math>A</math> at their subtrees and since there are <math>O(\log n)</math> levels, the space complexity of this range tree is <math>O(n \log n)</math>.
|