Count-distinct problem: Difference between revisions

Content deleted Content added
Simplified the description of CVM algorithm and also cited the source for epsilon-delta guarantees.
Added URL to Knuth's note.
Line 52:
 
=== CVM Algorithm ===
Compared to other approximation algorithms for the count-distinct problem the CVM Algorithm uses sampling instead of hashing. The CVM Algorithm provides an unbiased estimator for the number of distinct elements in a stream<ref name=":0" />, in addition to the standard (ε-δ) guarantees<ref>{{Cite journal |last=Chakraborty |first=Sourav |last2=Vinodchandran |first2=N. V. |last3=Meel |first3=Kuldeep S. |date=2022 |title=Distinct Elements in Streams: An Algorithm for the (Text) Book |url=http://arxiv.org/abs/2301.10191 |pages=6 pages, 727571 bytes |doi=10.4230/LIPIcs.ESA.2022.34 |issn=1868-8969}}</ref>. Here is the algorithm:<ref name=":0">{{cite journal |last1=Knuth |first1=Donald |date=May 2023 |title=The CVM Algorithm for Estimating Distinct Elements in Streams |dateurl=Mayhttps://cs.stanford.edu/~knuth/papers/cvm-note.pdf 2023|journal=}}</ref>
 
Initialize a counter, {{mvar|t}}, to zero, {{nowrap|<math> t \leftarrow 0 </math>.}}