Poisson binomial distribution: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 13:11, 26 August 2020 edit 82.169.117.125 (talk) The word “sequence” implies an order, which is incorrect. Tags: Mobile edit Mobile web edit ← Previous edit		Latest revision as of 10:47, 12 July 2025 edit undo Citation bot (talk \| contribs) Bots 5,870,990 edits Removed parameters. \| Use this bot. Report bugs. \| #UCB_CommandLine
(42 intermediate revisions by 21 users not shown)
Line 1: {{Short description\|Probability distribution}} {{Probability distribution \| name = Poisson binomial Line 14 ⟶ 15: \| entropy = \| mgf = <math>\prod\limits_{j=1}^n (1-p_j+p_j e^t)</math> \| cf = <math>\prod\limits_{j=1}^n (1-p_j+p_j e^{it})</math> \| pgf = <math>\prod\limits_{j=1}^n (1-p_j+p_j z)</math> }} In [[probability theory]] and [[statistics]], the '''Poisson binomial distribution''' is the [[discrete probability distribution]] of a sum of [[statistical independence\|independent]] [[Bernoulli trial]]s that are not necessarily identically distributed. The concept is named after [[Siméon Denis Poisson]]. Line 21 ⟶ 23: number of successes in a collection of ''n'' [[statistical independence\|independent]] yes/no experiments with success [[probability\|probabilities]] <math>p_1, p_2, \dots , p_n</math>. The ordinary [[binomial distribution]] is a special case of the Poisson binomial distribution, when all success probabilities are the same, that is <math>p_1 = p_2 = \cdots = p_n</math>. ==~~Mean~~ ~~and~~Definitions ~~Variance~~== ===Probability mass function=== Since a Poisson binomial distributed variable is a sum of ''n'' independent Bernoulli distributed variables, its mean and variance will simply be sums of the mean and variance of the ''n'' Bernoulli distributions: ~~: <math>\mu = \sum\limits_{i=1}^n p_i</math>~~ ~~: <math>\sigma^2 =\sum\limits_{i=1}^n (1-p_i) p_i</math>~~ For fixed values of the mean (<math>\mu</math>) and size (''n''), the variance is maximal when all success probabilities are equal and we have a binomial distribution. When the mean is fixed, the variance is bounded from above by the variance of the [[Poisson distribution]] with the same mean which is attained asymptotically {{citation needed\|date=July 2019}} as ''n'' tends to infinity. ~~==Probability Mass Function==~~ The probability of having ''k'' successful trials out of a total of ''n'' can be written as the sum <ref name=":1"> {{Cite journal \| volume = 3 Line 50 ⟶ 44: :<math>\Pr(K=k) = \sum\limits_{A\in F_k} \prod\limits_{i\in A} p_i \prod\limits_{j\in A^c} (1-p_j) </math> where <math>F_k</math> is the set of all subsets of ''k'' integers that can be selected from <math>\{1,2,3,...,''n''\}</math>. For example, if ''n'' = 3, then <math>F_2=\left\{ \{1,2\},\{1,3\},\{2,3\} \right\}</math>. <math>A^c</math> is the [[Complement (set theory)\|complement]] of <math>A</math>, i.e. <math>A^c =\{1,2,3,\dots,n\}\~~setminus~~smallsetminus A</math>. <math>F_k</math> will contain <math>n!/((n-k)!k!)</math> elements, the sum over which is infeasible to compute in practice unless the number of trials ''n'' is small (e.g. if ''n'' = 30, <math>F_{15}</math> contains over 10<sup>20</sup> elements). However, there are other, more efficient ways to calculate <math>\Pr(K=k)</math>. Line 87 ⟶ 81: \prod\limits_{i=1}^n (1-p_i) & k=0 \\ \frac{1}{k} \sum\limits_{i=1}^k (-1)^{i-1}\Pr (K=k-i)T(i) & k>0 \\ \end{cases} </math> where Line 93 ⟶ 87: : <math> T(i)=\sum\limits_{j=1}^n \left( \frac{p_j}{1-p_j} \right)^i.</math> The recursive formula is not [[numerically stable]], and should be avoided if <math>n</math> is greater than approximately 20. ~~Another possibility is using the [[discrete Fourier transform]].<ref>~~ An alternative is to use a [[divide-and-conquer algorithm]]: if we assume <math>n = 2^b</math> is a power of two, denoting by <math>f(p_{i:j})</math> the Poisson binomial of <math>p_i, \dots, p_j</math> and <math></math> the [[Convolution of probability distributions\|convolution]] operator, we have <math>f(p_{1:2^b}) = f(p_{1:2^{b-1}})f(p_{2^{b-1}+1:2^b})</math>. More generally, the probability mass function of a Poisson binomial can be expressed as the convolution of the vectors <math>P_1, \dots, P_n</math> where <math>P_i=[1-p_i,p_i]</math>. This observation leads to the direct convolution (DC) algorithm for computing <math>\Pr (K=0) </math> through <math>\Pr (K=n) </math>: <small>''// PMF and nextPMF begin at index 0'' '''function''' DC(<math>p_1, \dots, p_n</math>) '''is''' declare new PMF array of size 1 PMF[0] = [1] '''for''' i = 1 to <math>n</math> '''do''' declare new nextPMF array of size i + 1 nextPMF[0] = (1 - <math>p_i</math>) * PMF[0] nextPMF[i] = <math>p_i</math> * PMF[i - 1] '''for''' k = 1 '''to''' i - 1 '''do''' nextPMF[k] = <math>p_i</math> * PMF[k - 1] + (1 - <math>p_i</math>) * PMF[k] '''repeat''' PMF = nextPMF '''repeat''' '''return''' PMF '''end function'''</small> <math>\Pr (K=k) </math>will be found in PMF[k]. DC is numerically stable, exact, and, when implemented as a software routine, exceptionally fast for <math>n \leq 2000 </math>. It can also be quite fast for larger <math>n </math>, depending on the distribution of the <math>p_i </math>.<ref name=":0" /> Another possibility is using the [[discrete Fourier transform]].<ref> {{Cite journal \| volume = 46 Line 106 ⟶ 122: \| doi = 10.1109/TAES.2010.5461658 \| bibcode = 2010ITAES..46..803F \| s2cid = 1456258 }} </ref> :<math>\Pr (K=k)=\frac{1}{n+1} \~~sum\limits_~~sum_{l\ell=0}^n C^{-lk} \~~prod\limits_~~prod_{m=1}^n \left( 1+(C^l\ell-1) p_m \right) </math> where <math>C=\exp \left( \frac{2i\pi }{n+1} \right)</math> and <math>i=\sqrt{-1}</math>. Still other methods are described in "Statistical Applications of the Poisson-Binomial and conditional Bernoulli distributions" by Chen and Liu<ref>{{Cite journal ~~<ref>{{Cite journal~~ \| volume = 7 \| pages = 875–892 Line 120 ⟶ 136: \| first = S. X. \|author2=J. S. Liu \| title = Statistical Applications of the ~~Poisson-Binomial~~Poisson–Binomial and conditional Bernoulli distributions \| journal = Statistica Sinica \| year = 1997 \| url = http://www3.stat.sinica.edu.tw/statistica/password.asp?vol=7&num=4&art=4 }}</ref> and in "A simple and fast method for computing the Poisson binomial distribution function" by Biscarri et al.<ref name=":0">{{Cite journal \|last1=Biscarri \|first1=William \|last2=Zhao \|first2=Sihai Dave \|last3=Brunner \|first3=Robert J. \|date=2018-06-01 \|title=A simple and fast method for computing the Poisson binomial distribution function \|url=https://www.sciencedirect.com/science/article/pii/S0167947318300082 \|journal=Computational Statistics & Data Analysis \|volume=122 \|pages=92–100 \|doi=10.1016/j.csda.2018.01.007 \|issn=0167-9473}}</ref> ~~}}</ref>~~ . === Cumulative distribution function === The [[cumulative distribution function]] (CDF) can be expressed as: : <math>\Pr(K \leq k) = \sum^{k}_{\ell=0} \sum_{A\in F_\ell} \prod_{i\in A} p_i \prod_{j\in A^c} (1-p_j), </math> where <math>F_\ell</math> is the set of all subsets of size <math>\ell</math> that can be selected from <math>\{1,2,3,\ldots,n\}</math>. It can be computed by invoking the DC function above, and then adding elements <math>0</math> through <math>k</math> of the returned PMF array. == Properties == ===Mean and variance=== Since a Poisson binomial distributed variable is a sum of ''n'' independent Bernoulli distributed variables, its mean and variance will simply be sums of the mean and variance of the ''n'' Bernoulli distributions: : <math>\mu = \sum\limits_{i=1}^n p_i</math> : <math>\sigma^2 =\sum\limits_{i=1}^n (1-p_i) p_i</math> ===Entropy=== There is no simple formula for the entropy of a Poisson binomial distribution, but the entropy is bounded above by the entropy of a binomial distribution with the same number parameter and the same mean. Therefore, the entropy is also bounded above by the entropy of a Poisson distribution with the same mean.<ref>{{Cite journal \| volume = 47 \| issue = 5 Line 136 ⟶ 169: \| journal = IEEE Transactions on Information Theory \| year = 2001 \| url = https://web.archive.org/web/20160116154755id_/http://yaroslavvb.com/papers/harremoes-binomial.pdf ~~\| url = http://www.harremoes.dk/Peter/poisson.pdf~~ \| doi=10.1109/18.930936 }}</ref> Line 156 ⟶ 189: \|year = 1981 \|mr = 0618689 \|url = https://books.google.~~co.uk~~com/books?id=9qziBQAAQBAJ \|isbn = 0-12-274460-8 }}</ref> This conjecture was proved by Erwan Hillion and Oliver Johnson in 2015.<ref>{{Cite journal \|~~last~~last1=Hillion\|~~first~~first1=Erwan\|last2=Johnson\|first2=Oliver\|date=2015-03-05\|title=A proof of the ~~Shepp-Olkin~~Shepp–Olkin entropy concavity conjecture \|journal=Bernoulli\|volume = 23\|issue=4B\|pages = 3638–3649 \| arxiv=1503.01570 \|doi=10.3150/16-BEJ860\|s2cid=8358662}}</ref> The ~~Shepp-Olkin~~Shepp–Olkin monotonicity conjecture, also from the same 1981 paper, is that the entropy is monotone increasing in <math>p_i</math>, if all <math>p_i \leq 1/2</math>. This conjecture was also proved by Hillion and Johnson, in 2019 .<ref>{{Cite journal \|~~last~~last1=Hillion\|~~first~~first1=Erwan\|last2=Johnson\|first2=Oliver\|date=2019-11-09\|title=A proof of the ~~Shepp-Olkin~~Shepp–Olkin entropy monotonicity conjecture \|journal=Electronic Journal of Probability\| volume = 24 \|number=126 \| pages = ~~1-14~~1–14 \|doi=10.1214/19-EJP380\|doi-access=free\|arxiv=1810.09791}}</ref> ===Chernoff bound=== The probability that a Poisson binomial distribution gets large, can be bounded using its moment generating function as follows (valid when <math>s \geq \mu</math> and for any <math>t>0</math>): : <math> Line 176 ⟶ 209: </math> where we took <math display="inline">t=\log\left(s~~\left~~/\~~sum_ip_i\right.~~mu\right)</math>. This is similar to the [[Binomial distribution#Tail bounds\|tail bounds of a binomial distribution]]. == Related distribution == === Approximation by binomial distribution === A Poisson binomial distribution <math>PB</math> can be approximated by a binomial distribution <math>B</math> where <math>\mu</math>, the mean of the <math>p_i</math>, is the success probability of <math>B</math>. The variances of <math>PB</math> and <math>B</math> are related by the formula : <math> \operatorname{Var}(PB)=\operatorname{Var}(B)- \sum_{i=1}^n (p_i-\mu)^2</math> As can be seen, the closer the <math>p_i</math> are to <math>\mu</math>, that is, the more the <math>p_i</math> tend to homogeneity, the larger <math>PB</math>'s variance. When all the <math>p_i</math>are equal to <math>\mu</math>, <math>PB</math> becomes <math>B</math>, <math>\operatorname{Var}(PB)=\operatorname{Var}(B)</math>, and the variance is at its maximum.<ref name=":1" /> Ehm has determined bounds for the [[Total variation distance of probability measures\|total variation distance]] of <math>PB</math> and <math>B</math>, in effect providing bounds on the error introduced when approximating <math>PB</math> with <math>B</math>. Let <math>\nu = 1 - \mu</math> and <math>d(PB,B)</math> be the total variation distance of <math>PB</math> and <math>B</math>. Then : <math>d(PB,B)\le(1-\mu^{n+1}-\nu^{n+1}) \frac{\sum_{i=1}^n (p_i-\mu)^2}{((n+1)\mu\nu)}</math> : <math>d(PB,B)\ge C\min\left\{\,1,\frac{1}{n\mu\nu}\,\right\} \sum_{i=1}^n (p_i-\mu)^2</math> where <math>C\ge\frac{1}{124}</math>. <math>d(PB,B)</math> tends to 0 if and only if <math>\operatorname{Var}(PB)/\operatorname{Var}(B)</math> tends to 1.<ref>{{Cite journal \|last=Ehm \|first=Werner \|date=1991-01-01 \|title=Binomial approximation to the Poisson binomial distribution \|url=https://dx.doi.org/10.1016/0167-7152%2891%2990170-V \|journal=Statistics & Probability Letters \|volume=11 \|issue=1 \|pages=7–16 \|doi=10.1016/0167-7152(91)90170-V \|issn=0167-7152\|url-access=subscription }}</ref> === Approximation by Poisson distribution === A Poisson binomial distribution <math>PB</math> can also be approximated by a [[Poisson distribution]] <math>Po</math> with mean <math>\lambda=\sum_{i=1}^n p_i </math>. Barbour and Hall have shown that : <math>\frac{1}{32}\min\left\{\,\frac{1}{\lambda},1\,\right\} \sum_{i=1}^n p_i^2\le d(PB, Po)\le\frac{1-\varepsilon^{-\lambda}} \lambda \sum_{i=1}^n p_i^2</math> where <math>d(PB,B)</math> is the total variation distance of <math>PB</math> and <math>Po</math>.<ref>{{Cite journal \| last1=Barbour \| first1=A. D. \| last2=Hall \| first2=Peter \| date=1984 \| title=On the Rate of Poisson Convergence \| url=https://www.zora.uzh.ch/id/eprint/23054/11/Barbour_1984V.pdf \| journal=[[Mathematical Proceedings of the Cambridge Philosophical Society]] \| volume=95 \| issue=3 \| pages=473–480 \| doi=10.1017/S0305004100061806}}</ref> It can be seen that the smaller the <math>p_i</math>, the better <math>Po</math> approximates <math>PB</math>. As <math>\operatorname{Var}(Po)=\lambda=\sum_{i=1}^n p_i </math> and <math>\operatorname{Var}(PB) =\sum\limits_{i=1}^n p_i-\sum\limits_{i=1}^n p_i^2</math>, <math>\operatorname{Var}(\mathrm{Po})>\operatorname{Var}(PB) </math>; so a Poisson binomial distribution's variance is bounded above by a Poisson distribution with <math>\lambda=\sum_{i=1}^n p_i </math>, and the smaller the <math>p_i</math>, the closer <math>\operatorname{Var}(\mathrm{Po}) </math> will be to <math>\operatorname{Var}(PB) </math>. == Computational methods == The reference <ref name="hong2013"> {{cite journal \|last1=Hong \|first1=Yili \|title=On computing the distribution function for the Poisson binomial distribution \|journal=Computational Statistics & Data Analysis \|date=March 2013 \|volume=59 \|pages=41–51 \| doi = 10.1016/j.csda.2012.10.006}} </ref> discusses techniques of evaluating the probability mass function of the Poisson binomial distribution. The following software implementations are based on it: * An R package [https://CRAN.R-project.org/package=poibin ''poibin''] was provided along with the paper,<ref name="hong2013"/> which is available for the computing of the cdf, pmf, quantile function, and random number generation of the Poisson binomial distribution. For computing the PMF, a DFT algorithm or a recursive algorithm can be specified to compute the exact PMF, and approximation methods using the normal and Poisson distribution can also be specified. * [https://github.com/tsakim/poibin ''poibin'' – Python implementation] – can compute the PMF and CDF, uses the DFT method described in the paper for doing so. ==See also==