Count-distinct problem: Difference between revisions

Content deleted Content added
Simplified to the version that's easy to understand
WikiCleanerBot (talk | contribs)
m v2.05b - Bot T20 CW#61 - Fix errors for CW project (Reference before punctuation - Link equal to linktext)
Line 52:
 
=== CVM Algorithm ===
Compared to other approximation algorithms for the count-distinct problem the CVM Algorithm<ref>{{Cite journal |last=Chakraborty |first=Sourav |last2=Vinodchandran |first2=N. V. |last3=Meel |first3=Kuldeep S. |date=2022 |title=Distinct Elements in Streams: An Algorithm for the (Text) Book |url=http://arxiv.org/abs/2301.10191 |pages=6 pages, 727571 bytes |doi=10.4230/LIPIcs.ESA.2022.34 |issn=1868-8969}}</ref> (named by [[Donald Knuth | Donald Knuth]] after the initials of Sourav Chakraborty, N. V. Vinodchandran, and Kuldeep S. Meel) uses sampling instead of hashing. The CVM Algorithm provides an unbiased estimator for the number of distinct elements in a stream,<ref name=":0" />, in addition to the standard (ε-δ) guarantees. Below is the modification of CVM algorithm, proposed by Donald Knuth, that maintains a buffer of maximum size s<ref name=":0">{{cite journal |last1=Knuth |first1=Donald |date=May 2023 |title=The CVM Algorithm for Estimating Distinct Elements in Streams |url=https://cs.stanford.edu/~knuth/papers/cvm-note.pdf |journal=}}</ref>
 
{{nowrap|Initialize <math> p \leftarrow 1 </math>}}