Poisson binomial distribution: Difference between revisions

Content deleted Content added
The word “sequence” implies an order, which is incorrect.
Tags: Mobile edit Mobile web edit
Citation bot (talk | contribs)
Removed parameters. | Use this bot. Report bugs. | #UCB_CommandLine
 
(42 intermediate revisions by 21 users not shown)
Line 1:
{{Short description|Probability distribution}}
{{Probability distribution
| name = Poisson binomial
Line 14 ⟶ 15:
| entropy =
| mgf = <math>\prod\limits_{j=1}^n (1-p_j+p_j e^t)</math>
| cf = <math>\prod\limits_{j=1}^n (1-p_j+p_j e^{it})</math>
| pgf = <math>\prod\limits_{j=1}^n (1-p_j+p_j z)</math>
}}
In [[probability theory]] and [[statistics]], the '''Poisson binomial distribution''' is the [[discrete probability distribution]] of a sum of [[statistical independence|independent]] [[Bernoulli trial]]s that are not necessarily identically distributed. The concept is named after [[Siméon Denis Poisson]].
Line 21 ⟶ 23:
number of successes in a collection of ''n'' [[statistical independence|independent]] yes/no experiments with success [[probability|probabilities]] <math>p_1, p_2, \dots , p_n</math>. The ordinary [[binomial distribution]] is a special case of the Poisson binomial distribution, when all success probabilities are the same, that is <math>p_1 = p_2 = \cdots = p_n</math>.
 
==Mean andDefinitions Variance==
 
===Probability mass function===
Since a Poisson binomial distributed variable is a sum of ''n'' independent Bernoulli distributed variables, its mean and variance will simply be sums of the mean and variance of the ''n'' Bernoulli distributions:
 
: <math>\mu = \sum\limits_{i=1}^n p_i</math>
 
: <math>\sigma^2 =\sum\limits_{i=1}^n (1-p_i) p_i</math>
 
For fixed values of the mean (<math>\mu</math>) and size (''n''), the variance is maximal when all success probabilities are equal and we have a binomial distribution. When the mean is fixed, the variance is bounded from above by the variance of the [[Poisson distribution]] with the same mean which is attained asymptotically {{citation needed|date=July 2019}} as ''n'' tends to infinity.
 
==Probability Mass Function==
 
The probability of having ''k'' successful trials out of a total of ''n'' can be written as the sum
<ref name=":1">
{{Cite journal
| volume = 3
Line 50 ⟶ 44:
:<math>\Pr(K=k) = \sum\limits_{A\in F_k} \prod\limits_{i\in A} p_i \prod\limits_{j\in A^c} (1-p_j) </math>
 
where <math>F_k</math> is the set of all subsets of ''k'' integers that can be selected from <math>\{1,2,3,...,''n''\}</math>. For example, if ''n''&nbsp;=&nbsp;3, then <math>F_2=\left\{ \{1,2\},\{1,3\},\{2,3\} \right\}</math>. <math>A^c</math> is the [[Complement (set theory)|complement]] of <math>A</math>, i.e. <math>A^c =\{1,2,3,\dots,n\}\setminussmallsetminus A</math>.
 
<math>F_k</math> will contain <math>n!/((n-k)!k!)</math> elements, the sum over which is infeasible to compute in practice unless the number of trials ''n'' is small (e.g. if ''n''&nbsp;=&nbsp;30, <math>F_{15}</math> contains over 10<sup>20</sup> elements). However, there are other, more efficient ways to calculate <math>\Pr(K=k)</math>.
Line 87 ⟶ 81:
\prod\limits_{i=1}^n (1-p_i) & k=0 \\
\frac{1}{k} \sum\limits_{i=1}^k (-1)^{i-1}\Pr (K=k-i)T(i) & k>0 \\
\end{cases} </math>
 
where
Line 93 ⟶ 87:
: <math> T(i)=\sum\limits_{j=1}^n \left( \frac{p_j}{1-p_j} \right)^i.</math>
 
The recursive formula is not [[numerically stable]], and should be avoided if <math>n</math> is greater than approximately 20. Another possibility is using the [[discrete Fourier transform]].<ref>
 
An alternative is to use a [[divide-and-conquer algorithm]]: if we assume <math>n = 2^b</math> is a power of two, denoting by <math>f(p_{i:j})</math> the Poisson binomial of <math>p_i, \dots, p_j</math> and <math>*</math> the [[Convolution of probability distributions|convolution]] operator, we have <math>f(p_{1:2^b}) = f(p_{1:2^{b-1}})*f(p_{2^{b-1}+1:2^b})</math>.
 
More generally, the probability mass function of a Poisson binomial can be expressed as the convolution of the vectors <math>P_1, \dots, P_n</math> where <math>P_i=[1-p_i,p_i]</math>. This observation leads to the direct convolution (DC) algorithm for computing <math>\Pr (K=0) </math> through <math>\Pr (K=n) </math>:
<small>''// PMF and nextPMF begin at index 0''
'''function''' DC(<math>p_1, \dots, p_n</math>) '''is'''
declare new PMF array of size 1
PMF[0] = [1]
'''for''' i = 1 to <math>n</math> '''do'''
declare new nextPMF array of size i + 1
nextPMF[0] = (1 - <math>p_i</math>) * PMF[0]
nextPMF[i] = <math>p_i</math> * PMF[i - 1]
'''for''' k = 1 '''to''' i - 1 '''do'''
nextPMF[k] = <math>p_i</math> * PMF[k - 1] + (1 - <math>p_i</math>) * PMF[k]
'''repeat'''
PMF = nextPMF
'''repeat'''
'''return''' PMF
'''end function'''</small>
<math>\Pr (K=k) </math>will be found in PMF[k]. DC is numerically stable, exact, and, when implemented as a software routine, exceptionally fast for <math>n \leq 2000 </math>. It can also be quite fast for larger <math>n </math>, depending on the distribution of the <math>p_i </math>.<ref name=":0" />
 
Another possibility is using the [[discrete Fourier transform]].<ref>
{{Cite journal
| volume = 46
Line 106 ⟶ 122:
| doi = 10.1109/TAES.2010.5461658
| bibcode = 2010ITAES..46..803F
| s2cid = 1456258
}}
</ref>
 
:<math>\Pr (K=k)=\frac{1}{n+1} \sum\limits_sum_{l\ell=0}^n C^{-lk} \prod\limits_prod_{m=1}^n \left( 1+(C^l\ell-1) p_m \right) </math>
 
where <math>C=\exp \left( \frac{2i\pi }{n+1} \right)</math> and <math>i=\sqrt{-1}</math>.
 
Still other methods are described in "Statistical Applications of the Poisson-Binomial and conditional Bernoulli distributions" by Chen and Liu<ref>{{Cite journal
<ref>{{Cite journal
| volume = 7
| pages = 875–892
Line 120 ⟶ 136:
| first = S. X.
|author2=J. S. Liu
| title = Statistical Applications of the Poisson-BinomialPoisson–Binomial and conditional Bernoulli distributions
| journal = Statistica Sinica
| year = 1997
| url = http://www3.stat.sinica.edu.tw/statistica/password.asp?vol=7&num=4&art=4
}}</ref> and in "A simple and fast method for computing the Poisson binomial distribution function" by Biscarri et al.<ref name=":0">{{Cite journal |last1=Biscarri |first1=William |last2=Zhao |first2=Sihai Dave |last3=Brunner |first3=Robert J. |date=2018-06-01 |title=A simple and fast method for computing the Poisson binomial distribution function |url=https://www.sciencedirect.com/science/article/pii/S0167947318300082 |journal=Computational Statistics & Data Analysis |volume=122 |pages=92–100 |doi=10.1016/j.csda.2018.01.007 |issn=0167-9473}}</ref>
}}</ref>
 
.
=== Cumulative distribution function ===
The [[cumulative distribution function]] (CDF) can be expressed as:
 
: <math>\Pr(K \leq k) = \sum^{k}_{\ell=0} \sum_{A\in F_\ell} \prod_{i\in A} p_i \prod_{j\in A^c} (1-p_j), </math>
 
where <math>F_\ell</math> is the set of all subsets of size <math>\ell</math> that can be selected from <math>\{1,2,3,\ldots,n\}</math>.
 
It can be computed by invoking the DC function above, and then adding elements <math>0</math> through <math>k</math> of the returned PMF array.
 
== Properties ==
===Mean and variance===
 
Since a Poisson binomial distributed variable is a sum of ''n'' independent Bernoulli distributed variables, its mean and variance will simply be sums of the mean and variance of the ''n'' Bernoulli distributions:
 
: <math>\mu = \sum\limits_{i=1}^n p_i</math>
 
: <math>\sigma^2 =\sum\limits_{i=1}^n (1-p_i) p_i</math>
 
===Entropy===
There is no simple formula for the entropy of a Poisson binomial distribution, but the entropy is bounded above by the entropy of a binomial distribution with the same number parameter and the same mean. Therefore, the entropy is also bounded above by the entropy of a Poisson distribution with the same mean.<ref>{{Cite journal
| volume = 47 | issue = 5
Line 136 ⟶ 169:
| journal = IEEE Transactions on Information Theory
| year = 2001
| url = https://web.archive.org/web/20160116154755id_/http://yaroslavvb.com/papers/harremoes-binomial.pdf
| url = http://www.harremoes.dk/Peter/poisson.pdf
| doi=10.1109/18.930936
}}</ref>
Line 156 ⟶ 189:
|year = 1981
|mr = 0618689
|url = https://books.google.co.ukcom/books?id=9qziBQAAQBAJ
|isbn = 0-12-274460-8
}}</ref> This conjecture was proved by Erwan Hillion and Oliver Johnson in 2015.<ref>{{Cite journal |lastlast1=Hillion|firstfirst1=Erwan|last2=Johnson|first2=Oliver|date=2015-03-05|title=A proof of the Shepp-OlkinShepp–Olkin entropy concavity conjecture |journal=Bernoulli|volume = 23|issue=4B|pages = 3638–3649 | arxiv=1503.01570 |doi=10.3150/16-BEJ860|s2cid=8358662}}</ref> The Shepp-OlkinShepp–Olkin monotonicity conjecture, also from the same 1981 paper, is that the entropy is monotone increasing in <math>p_i</math>, if all <math>p_i \leq 1/2</math>. This conjecture was also proved by Hillion and Johnson, in 2019 .<ref>{{Cite journal |lastlast1=Hillion|firstfirst1=Erwan|last2=Johnson|first2=Oliver|date=2019-11-09|title=A proof of the Shepp-OlkinShepp–Olkin entropy monotonicity conjecture |journal=Electronic Journal of Probability| volume = 24 |number=126 | pages = 1-141–14 |doi=10.1214/19-EJP380|doi-access=free|arxiv=1810.09791}}</ref>
 
===Chernoff bound===
 
The probability that a Poisson binomial distribution gets large, can be bounded using its moment generating function as follows (valid when <math>s \geq \mu</math> and for any <math>t>0</math>):
 
: <math>
Line 176 ⟶ 209:
</math>
 
where we took <math display="inline">t=\log\left(s\left/\sum_ip_i\right.mu\right)</math>. This is similar to the [[Binomial distribution#Tail bounds|tail bounds of a binomial distribution]].
 
== Related distribution ==
=== Approximation by binomial distribution ===
A Poisson binomial distribution <math>PB</math> can be approximated by a binomial distribution <math>B</math> where <math>\mu</math>, the mean of the <math>p_i</math>, is the success probability of <math>B</math>. The variances of <math>PB</math> and <math>B</math> are related by the formula
 
: <math> \operatorname{Var}(PB)=\operatorname{Var}(B)- \sum_{i=1}^n (p_i-\mu)^2</math>
 
As can be seen, the closer the <math>p_i</math> are to <math>\mu</math>, that is, the more the <math>p_i</math> tend to homogeneity, the larger <math>PB</math>'s variance. When all the <math>p_i</math>are equal to <math>\mu</math>, <math>PB</math> becomes <math>B</math>, <math>\operatorname{Var}(PB)=\operatorname{Var}(B)</math>, and the variance is at its maximum.<ref name=":1" />
 
Ehm has determined bounds for the [[Total variation distance of probability measures|total variation distance]] of <math>PB</math> and <math>B</math>, in effect providing bounds on the error introduced when approximating <math>PB</math> with <math>B</math>. Let <math>\nu = 1 - \mu</math> and <math>d(PB,B)</math> be the total variation distance of <math>PB</math> and <math>B</math>. Then
 
: <math>d(PB,B)\le(1-\mu^{n+1}-\nu^{n+1}) \frac{\sum_{i=1}^n (p_i-\mu)^2}{((n+1)\mu\nu)}</math>
 
: <math>d(PB,B)\ge C\min\left\{\,1,\frac{1}{n\mu\nu}\,\right\} \sum_{i=1}^n (p_i-\mu)^2</math>
 
where <math>C\ge\frac{1}{124}</math>.
 
<math>d(PB,B)</math> tends to 0 if and only if <math>\operatorname{Var}(PB)/\operatorname{Var}(B)</math> tends to 1.<ref>{{Cite journal |last=Ehm |first=Werner |date=1991-01-01 |title=Binomial approximation to the Poisson binomial distribution |url=https://dx.doi.org/10.1016/0167-7152%2891%2990170-V |journal=Statistics & Probability Letters |volume=11 |issue=1 |pages=7–16 |doi=10.1016/0167-7152(91)90170-V |issn=0167-7152|url-access=subscription }}</ref>
 
=== Approximation by Poisson distribution ===
A Poisson binomial distribution <math>PB</math> can also be approximated by a [[Poisson distribution]] <math>Po</math> with mean <math>\lambda=\sum_{i=1}^n p_i
</math>. Barbour and Hall have shown that
 
: <math>\frac{1}{32}\min\left\{\,\frac{1}{\lambda},1\,\right\} \sum_{i=1}^n p_i^2\le d(PB, Po)\le\frac{1-\varepsilon^{-\lambda}} \lambda \sum_{i=1}^n p_i^2</math>
 
where <math>d(PB,B)</math> is the total variation distance of <math>PB</math> and <math>Po</math>.<ref>{{Cite journal
| last1=Barbour | first1=A. D.
| last2=Hall | first2=Peter
| date=1984
| title=On the Rate of Poisson Convergence | url=https://www.zora.uzh.ch/id/eprint/23054/11/Barbour_1984V.pdf
| journal=[[Mathematical Proceedings of the Cambridge Philosophical Society]]
| volume=95
| issue=3
| pages=473–480
| doi=10.1017/S0305004100061806}}</ref> It can be seen that the smaller the <math>p_i</math>, the better <math>Po</math> approximates <math>PB</math>.
 
As <math>\operatorname{Var}(Po)=\lambda=\sum_{i=1}^n p_i
</math> and <math>\operatorname{Var}(PB) =\sum\limits_{i=1}^n p_i-\sum\limits_{i=1}^n p_i^2</math>, <math>\operatorname{Var}(\mathrm{Po})>\operatorname{Var}(PB)
</math>; so a Poisson binomial distribution's variance is bounded above by a Poisson distribution with <math>\lambda=\sum_{i=1}^n p_i </math>, and the smaller the <math>p_i</math>, the closer <math>\operatorname{Var}(\mathrm{Po})
</math> will be to <math>\operatorname{Var}(PB)
</math>.
 
== Computational methods ==
 
The reference <ref name="hong2013">
{{cite journal |last1=Hong |first1=Yili |title=On computing the distribution function for the Poisson binomial distribution |journal=Computational Statistics & Data Analysis |date=March 2013 |volume=59 |pages=41–51 | doi = 10.1016/j.csda.2012.10.006}}
</ref> discusses techniques of evaluating the probability mass function of the Poisson binomial distribution. The following software implementations are based on it:
* An R package [https://CRAN.R-project.org/package=poibin ''poibin''] was provided along with the paper,<ref name="hong2013"/> which is available for the computing of the cdf, pmf, quantile function, and random number generation of the Poisson binomial distribution. For computing the PMF, a DFT algorithm or a recursive algorithm can be specified to compute the exact PMF, and approximation methods using the normal and Poisson distribution can also be specified.
* [https://github.com/tsakim/poibin ''poibin'' – Python implementation] – can compute the PMF and CDF, uses the DFT method described in the paper for doing so.
 
==See also==