Content deleted Content added
m Open access bot: doi added to citation with #oabot. |
m →Majority: Removed/fixed incorrect author parameter(s), performed general fixes |
||
Line 75:
===Majority===
Finding frequent elements in a given set of items is one of the most important tasks in data mining. Finding frequent elements might be a difficult task to achieve when most items have similar frequencies. Therefore, it might be more beneficial if some threshold of significance was used for detecting such items. One of the most famous algorithms for finding the majority of an array was proposed by Boyer and Moore <ref>{{Citation|last1=Boyer|first1=Robert S.|title=MJRTY—A Fast Majority Vote Algorithm|date=1991|url=http://dx.doi.org/10.1007/978-94-011-3488-0_5|pages=105–117|place=Dordrecht|publisher=Springer Netherlands|access-date=2021-12-18|last2=Moore|first2=J. Strother|series=Automated Reasoning Series |volume=1 |doi=10.1007/978-94-011-3488-0_5 |isbn=978-94-010-5542-0 }}</ref> which is also known as the [[Boyer–Moore majority vote algorithm]]. Boyer and Moore proposed an algorithm to find the majority element of a string (if it has one) in <math>O(n)</math> time and using <math>O(1)</math> space. In the context of Boyer and Moore’s work and generally speaking, a majority element in a set of items (for example string or an array) is one whose number of instances is more than half of the size of that set. Few years later, Misra and Gries <ref>{{Cite journal|last1=Misra|first1=J.|last2=Gries|first2=David|date=November 1982|title=Finding repeated elements|url=http://dx.doi.org/10.1016/0167-6423(82)90012-0|journal=Science of Computer Programming|volume=2|issue=2|pages=143–152|doi=10.1016/0167-6423(82)90012-0|issn=0167-6423|doi-access=free}}</ref> proposed a more general version of Boyer and Moore's algorithm using <math>O \left ( n \log \left ( \frac{1}{\tau} \right ) \right )</math> comparisons to find all items in an array whose relative frequencies are greater than some threshold <math>0<\tau<1</math>. A range <math>\tau</math>-majority query is one that, given a subrange of a data structure (for example an array) of size <math>|R|</math>, returns the set of all distinct items that appear more than (or in some publications equal to) <math>\tau |R|</math> times in that given range. In different structures that support range <math>\tau</math>-majority queries, <math>\tau </math> can be either static (specified during preprocessing) or dynamic (specified at query time). Many of such approaches are based on the fact that, regardless of the size of the range, for a given <math>\tau</math> there could be at most <math>O(1/\tau)</math> distinct ''candidates'' with relative frequencies at least <math>\tau</math>. By verifying each of these candidates in constant time, <math>O(1/\tau)</math> query time is achieved. A range <math>\tau</math>-majority query is decomposable <ref name=":1">{{Cite book|
==== Two-dimensional arrays ====
|