Content deleted Content added
Shreevatsa (talk | contribs) replace "et al" with a link to the article about the paper/algorithm |
Shreevatsa (talk | contribs) m remove spaces before ref, and wikilink |
||
Line 34:
=== Min/max sketches ===
Min/max sketches<ref name=cosma2011>{{cite journal | last1=Cosma | first1=Ioana A. | last2 = Clifford | first2 = Peter | year=2011 | title=A statistical analysis of probabilistic counting algorithms | journal=Scandinavian Journal of Statistics| arxiv=0801.3552 }} </ref><ref>{{cite book | last1=Giroire | first1=Frederic | title=2007 Proceedings of the Fourth Workshop on Analytic Algorithmics and Combinatorics (ANALCO) | pages=223–231 | last2 = Fusy | first2 = Eric | year=2007 |citeseerx=10.1.1.214.270 |doi=10.1137/1.9781611972979.9| isbn=978-1-61197-297-9 }} </ref> store only the minimum/maximum hashed values. Examples of known min/max sketch estimators: Chassaing et al.
The intuition behind such estimators is that each sketch carries information about the desired quantity. For example, when every element <math> e_j </math> is associated with a uniform [[Random variable|RV]], <math> h(e_j) \sim U(0,1) </math>, the expected minimum value of <math> h(e_1),h(e_2), \ldots, h(e_n) </math> is <math> 1/(n+1) </math>. The hash function guarantees that <math> h(e_j) </math> is identical for all the appearances of <math> e_j </math>. Thus, the existence of duplicates does not affect the value of the extreme order statistics.
There are other estimation techniques other than min/max sketches. The first paper on count-distinct estimation<ref>{{cite journal | last1=Flajolet | first1=Philippe|author1-link=Philippe Flajolet | last2 = Martin | first2 = G. Nigel | year=1985 | title=Probabilistic counting algorithms for data base applications | journal=J. Comput. Syst. Sci. | volume=31| issue=2| pages=182–209| doi=10.1016/0022-0000(85)90041-8| url=https://hal.inria.fr/inria-00076244/file/RR-0313.pdf}} </ref> describes the [[Flajolet–Martin algorithm]], a bit pattern sketch. In this case, the elements are hashed into a bit vector and the sketch holds the logical OR of all hashed values. The first asymptotically space- and time-optimal algorithm for this problem was given by [[Daniel Kane (mathematician)|Daniel M. Kane]], [[Jelani Nelson]], and David P. Woodruff.<ref name=optimalf0>{{cite journal | last1=Kane | first1=Daniel M. | last2 = Nelson | first2 = Jelani | last3=Woodruff | first3=David P. | year=2010 | authorlink1=Daniel_Kane_(mathematician) | authorlink2=Jelani_Nelson | title=An Optimal Algorithm for the Distinct Elements Problem | journal=Proceedings of the 29th Annual ACM Symposium on Principles of Database Systems (PODS)|url=https://dash.harvard.edu/bitstream/handle/1/13820438/f0.pdf;sequence=1}} </ref>
===Bottom-''m'' sketches===
Line 66:
.<ref>{{cite journal | last1=Cohen |author1-link=Edith Cohen| first1=Reuven | last2 = Katzir | first2 = Liran | last3=Yehezkel | first3=Aviv | year=2014| title=A Unified Scheme for Generalizing Cardinality Estimators to Sum Aggregation | journal=Information Processing Letters|doi=10.1016/j.ipl.2014.10.009 | volume=115 |issue=2| pages=336–342}}</ref>
For example, the weighted estimator proposed by Cohen et al.<ref name=edithCohen /> can be obtained when the continuous max sketches estimator is extended to solve the weighted problem.
In particular, the [[HyperLogLog]] algorithm
==See also==
|