Probability distribution: Difference between revisions

Content deleted Content added
Introduction: Micro improvement - it could have other representations (ex: other languages)
AnomieBOT (talk | contribs)
m Dating maintenance tags: {{Cn}}
 
(37 intermediate revisions by 18 users not shown)
Line 4:
{{Probability fundamentals}}
 
In [[probability theory]] and [[statistics]], a '''probability distribution''' is the mathematicala [[Function (mathematics)|function]] that gives the probabilities of occurrence of different possible '''outcomesevents''' for an [[Experiment (probability theory)|experiment]].<ref name=":02">{{Cite book|title=The Cambridge dictionary of statistics|last=Everitt | first = Brian |date=2006|publisher=Cambridge University Press|isbn=978-0-511-24688-3 |edition=3rd|___location=Cambridge, UK|oclc=161828328}}</ref><ref>{{Cite book|title=Basic probability theory|last=Ash, Robert B.|date=2008|publisher=Dover Publications |isbn=978-0-486-46628-6 |edition=Dover |___location=Mineola, N.Y. |pages=66–69|oclc=190785258}}</ref> It is a mathematical description of a [[Randomness|random]] phenomenon in terms of its [[sample space]] and the [[Probability|probabilities]] of [[Event (probability theory)|events]] ([[subset]]s of the sample space).<ref name=":1">{{cite book|title=Probability and statistics: the science of uncertainty|last1=Evans |first1=Michael |date=2010|publisher=W.H. Freeman and Co|last2=Rosenthal |first2=Jeffrey S. |isbn=978-1-4292-2462-8 |edition=2nd|___location=New York|pages=38|oclc=473463742}}</ref>
 
For instance, if {{mvar|X}} is used to denote the outcome of a coin toss ("the experiment"), then the probability distribution of {{mvar|X}} would take the value 0.5 (1 in 2 or 1/2) for {{math|1=''X'' = heads}}, and 0.5 for {{math|1=''X'' = tails}} (assuming that [[fair coin|the coin is fair]]). More commonly, probability distributions are used to compare the relative occurrence of many different random values.
Line 11:
 
==Introduction==
A probability distribution is a mathematical description of the probabilities of events, subsets of the [[sample space]]. The sample space, often represented in notation by <math>\ \Omega\ ,</math> is the [[Set (mathematics)|set]] of all possible [[outcome (probability)|outcomes]] of a random phenomenon being observed;. itThe sample space may be any set: Aa set of [[real numbers]], a set of descriptive labels, a set of [[vector (mathematics)|vectors]], a set of arbitrary non-numerical values, etc. For example, the sample space of a coin flip could be {{math|{{nobr|&ensp;Ω {{=}} {{big|<nowiki>{</nowiki>}} "heads", "tails" {{big|<nowiki>}</nowiki>}} }}.}}
 
To define probability distributions for the specific case of [[random variables]] (so the sample space can be seen as a numeric set), it is common to distinguish between '''discrete''' and '''absolutely continuous''' [[random variable]]s. In the discrete case, it is sufficient to specify a [[probability mass function]] <math>\ p\ </math> assigning a probability to each possible outcome: for example,(e.g. when throwing a fair [[dice|die]], each of the six digits {{math|“1”}} to {{math|“6”}}, corresponding to the number of dots on the die, has the probability <math>\ \tfrac{1}{6} ~).</math> The probability of an [[Event (probability theory)|event]] is then defined to be the sum of the probabilities of all outcomes that satisfy the event; for example, the probability of the event "the die rolls an even value" is
<math display="block">\ p(\text{“}2\text{”}) + p(\text{“}4\text{”}) + p(\text{“}6\text{”}) = \tfracfrac{1}{6} + \tfracfrac{1}{6} + \tfracfrac{1}{6} = \tfracfrac{1}{2} ~.</math>
In contrast, when a random variable takes values from a continuum then by convention, any individual outcome is assigned probability zero. For such '''continuous random variables''', only events that include infinitely many outcomes such as intervals have probability greater than 0.
 
For example, consider measuring the weight of a piece of ham in the supermarket, and assume the scale can provide arbitrarily many digits of precision. Then, the probability that it weighs ''exactly'' 500&nbsp;[[gram|g]] must be zero because no matter how high the level of precision chosen, it cannot be assumed thethat there are no non-zero decimal digits in the remaining omitted digits ignored by the precision level.
In contrast, when a random variable takes values from a continuum then by convention, any individual outcome is assigned probability zero. For such '''continuous random variables''', only events that include infinitely many outcomes such as intervals have probability greater than 0.
 
For example, consider measuring the weight of a piece of ham in the supermarket, and assume the scale can provide arbitrarily many digits of precision. Then, the probability that it weighs ''exactly'' 500[[gram|g]] must be zero because no matter how high the level of precision chosen, it cannot be assumed the there are no non-zero decimal digits in the remaining omitted digits ignored by the precision level.
 
However, for the same use case, it is possible to meet quality control requirements such as that a package of "500&nbsp;g" of ham must weigh between 490&nbsp;g and 510&nbsp;g with at least 98% probability. This is possible because this measurement does not require as much precision from the underlying equipment.
 
[[File:Combined Cumulative Distribution Graphs.png|thumb|455x455px| Figure 1: The left graph shows a probability density function. The right graph shows the cumulative distribution function,. for which theThe value at {{font color|#ED1C24|'''a'''}} in the cumulative distribution equals the area under the probability density curve up to the left ofpoint {{font color|#ED1C24|'''a'''}}.]]
 
Absolutely continuousContinuous probability distributions can be described in several ways. The [[probability density function]] describes the [[infinitesimal]] probability of any given value, and the probability that the outcome lies in a given interval can be computed by [[Integration (mathematics)|integrating]] the probability density function over that interval.<ref name=":3"/> An alternative description of the distribution is by means of the [[cumulative distribution function]], which describes the probability that the random variable is no larger than a given value (i.e., <{{math>\ \boldsymbol\mathcal{|''P}''(''X'' < ''x'')\ </math>}} for some {{nobrmvar|<math>\ x\ </math>).}}. The cumulative distribution function is the area under the [[probability density function]] from <{{math>\ |-\infty\ </math>∞}} to <math>\ {{mvar|x\ }},</math> as shown in figure 1.<ref name='dekking'>{{cite book |last=Dekking |first=Michel (1946–) |year=2005 |title=A Modern Introduction to Probability and Statistics : Understanding why and how |publisher=Springer |isbn=978-1-85233-896-1 |___location=London, UK |oclc=262680588}}</ref>
 
Most continuous probability distributions encountered in practice are not only continuous but also [[absolutely continuous]]. Such distributions can be described by their [[probability density function]]. Informally, the probability density <math>f</math> of a random variable <math>X</math> describes the [[infinitesimal]] probability that <math>X</math> takes any value <math>x</math> — that is <math>P(x \leq X < x + \Delta x) \approx f(x) \, \Delta x</math> as <math>\Delta x > 0</math> becomes is arbitrarily small. The probability that <math>X</math> lies in a given interval can be computed rigorously by [[Integration (mathematics)|integrating]] the probability density function over that interval.<ref name=":3"/>
 
==General probability definition==
 
Let <math> (\Omega, \mathcal{F}, P) </math> be a [[probability space]], <math> (E, \mathcal{E}) </math> be a [[measurable space]], and <math> X : \Omega \to E </math> be a <math> (E, \mathcal{E}) </math>-valued random variable. Then the '''probability distribution''' of <math>X</math> is the [[pushforward measure]] of the probability measure <math>P</math> onto <math> (E, \mathcal{E}) </math> induced by <math>X</math>. Explicitly, this pushforward measure on <math> (E, \mathcal{E}) </math> is given by
A probability distribution can be described in various forms, such as by a probability mass function or a cumulative distribution function. One of the most general descriptions, which applies for absolutely continuous and discrete variables, is by means of a probability function <math>P\colon \mathcal{A} \to \Reals</math> whose '''input space''' <math>\mathcal{A}</math> is a [[σ-algebra]], and gives a [[real number]] '''probability''' as its output, particularly, a number in <math>[0,1] \subseteq \Reals</math>.
<math display="block">X_{*} (P) (B) = P \left( X^{-1} (B) \right)</math> for <math>B \in \mathcal{E}.</math>
 
Any probability distribution is a [[probability measure]] on <math> (E, \mathcal{E}) </math> (in general different from <math>P</math>, unless <math>X</math> happens to be the identity map).{{cn|date=May 2025}}
 
A probability distribution can be described in various forms, such as by a probability mass function or a cumulative distribution function. One of the most general descriptions, which applies for absolutely continuous and discrete variables, is by means of a probability function <math>P \colon \mathcal{A} \to \Reals</math> whose '''input space''' <math>\mathcal{A}</math> is a [[σ-algebra]], and gives a [[real number]] '''probability''' as its output, particularly, a number in <math>[0,1] \subseteq \Reals</math>.
 
The probability function <math>P</math> can take as argument subsets of the sample space itself, as in the coin toss example, where the function <math>P</math> was defined so that {{math|1=''P''(heads) = 0.5}} and {{math|1=''P''(tails) = 0.5}}. However, because of the widespread use of [[random variables]], which transform the sample space into a set of numbers (e.g., <math>\R</math>, <math>\N</math>), it is more common to study probability distributions whose argument are subsets of these particular kinds of sets (number sets),<ref>{{cite book| last1 = Walpole | first1 = R.E. | last2 = Myers | first2 = R.H. | last3 = Myers | first3 = S.L. | last4 = Ye | first4 = K.|year=1999|title=Probability and statistics for engineers|publisher=Prentice Hall}}</ref> and all probability distributions discussed in this article are of this type. It is common to denote as <math>P(X \in E)</math> the probability that a certain value of the variable <math>X</math> belongs to a certain event <math>E</math>.<ref name='ross' /><ref name='degroot' />
Line 50 ⟶ 57:
 
=== Basic terms ===
*''[[Random variable]]'': takes values from a sample space; probabilities describe which values and set of values are taken more likely taken.
*''[[Event (probability theory)|Event]]'': set of possible values (outcomes) of a random variable that occurs with a certain probability.
*''[[Probability measure|Probability function]]'' or ''probability measure'': describes the probability <math>P(X \in E)</math> that the event <math>E,</math> occurs.<ref name='vapnik'>Chapters 1 and 2 of {{harvp|Vapnik|1998}}</ref>
Line 94 ⟶ 101:
Conversely, any function <math>F:\mathbb{R}\to\mathbb{R}</math> that satisfies the first four of the properties above is the cumulative distribution function of some probability distribution on the real numbers.<ref>{{Cite book|title=Probability and stochastics|last=Erhan|first=Çınlar|date=2011|publisher=Springer|isbn=9780387878584|___location=New York|pages=57}}</ref>
 
Any probability distribution can be decomposed as the [[mixture distribution|mixture]] of a [[Discrete probability distribution|discrete]], an [[Absolutely continuous probability distribution|absolutely continuous]] and a [[Singular measuredistribution|singular continuous distribution]],<ref>see [[Lebesgue's decomposition theorem]]</ref> and thus any cumulative distribution function admits a decomposition as the [[convex sum]] of the three according cumulative distribution functions.
 
==Discrete probability distribution==
Line 115 ⟶ 122:
Thus the cumulative distribution function has the form
<math display="block">F(x) = P(X \leq x) = \sum_{\omega \leq x} p(\omega).</math>
 
The points where the cdf jumps always form a countable set; this may be any countable set and thus may even be dense in the real numbers.
 
===Dirac delta representation===
A discrete probability distribution is often represented with [[Dirac measure]]s, also called one-point distributions (see below), the probability distributions of [[Degenerate distribution|deterministic random variable]]s. For any outcome <math>\omega</math>, let <math>\delta_\omega</math> be the Dirac measure concentrated at <math>\omega</math>. Given a discrete probability distribution, there is a countable set <math>A</math> with <math>P(X \in A) = 1</math> and a probability mass function <math>p</math>. If <math>E</math> is any event, then
<math display="block">P(X \in E) = \sum_{\omega \in A} p(\omega) \delta_\omega(E),</math>
or in short,
<math display="block">P_X = \sum_{\omega \in A} p(\omega) \delta_\omega.</math>
 
Similarly, discrete distributions can be represented with the [[Dirac delta function]] as a [[Generalized function|generalized]] [[probability density function]] <math>f</math>, where
<math display="block">f(x) = \sum_{\omega \in A} p(\omega) \delta(x - \omega),</math>
which means
<math display="block">P(X \in E) = \int_E f(x) \, dx = \sum_{\omega \in A} p(\omega) \int_E \delta(x - \omega) = \sum_{\omega \in A \cap E} p(\omega)</math>
for any event <math>E.</math><ref>{{Cite journal|last=Khuri|first=André I.|date=March 2004| title=Applications of Dirac's delta function in statistics|journal=International Journal of Mathematical Education in Science and Technology| language=en|volume=35|issue=2|pages=185–195| doi=10.1080/00207390310001638313|s2cid=122501973|issn=0020-739X}}</ref>
 
===Indicator-function representation===
For a discrete random variable <math>X</math>, let <math>u_0, u_1, \dots</math> be the values it can take with non-zero probability. Denote
 
<math display="block">\Omega_i=X^{-1}(u_i)= \{\omega: X(\omega)=u_i\},\, i=0, 1, 2, \dots</math>
 
These are [[disjoint set]]s, and for such sets
 
<math display="block">P\left(\bigcup_i \Omega_i\right)=\sum_i P(\Omega_i)=\sum_i P(X=u_i)=1.</math>
 
It follows that the probability that <math>X</math> takes any value except for <math>u_0, u_1, \dots</math> is zero, and thus one can write <math>X</math> as
 
<math display="block">X(\omega)=\sum_i u_i 1_{\Omega_i}(\omega)</math>
 
except on a set of probability zero, where <math>1_A</math> is the indicator function of <math>A</math>. This may serve as an alternative definition of discrete random variables.
 
===One-point distribution===
A special case is the discrete distribution of a random variable that can take on only one fixed value;, in other words, it is a [[deterministicDirac distribution]]measure. Expressed formally, the random variable <math>X</math> has a one-point distribution if it has a possible outcome <math>x</math> such that <math>P(X{=}x)=1.</math><ref>{{cite book |title=Probability Theory and Mathematical Statistics |first=Marek |last=Fisz |edition=3rd |publisher=John Wiley & Sons |year=1963 |isbn=0-471-26250-1 |page=129}}</ref> All other possible outcomes then have probability 0. Its cumulative distribution function jumps immediately from 0 before <math>x</math> to 1 at <math>x</math>. It is closely related to a deterministic distribution, which cannot take on any other value, while a one-point distribution can take other values, though only with probability 0. For most practical purposes the two notions are equivalent.
 
== Absolutely continuous probability distribution==
Line 187 ⟶ 192:
Most algorithms are based on a [[pseudorandom number generator]] that produces numbers <math>X</math> that are uniformly distributed in the [[half-open interval]] {{closed-open|0, 1}}. These [[random variate]]s <math>X</math> are then transformed via some algorithm to create a new random variate having the required probability distribution. With this source of uniform pseudo-randomness, realizations of any random variable can be generated.<ref name=":0">{{Citation|last1=Dekking|first1=Frederik Michel| title=Why probability and statistics?|date=2005|work=A Modern Introduction to Probability and Statistics| pages=1–11| publisher =Springer London|isbn=978-1-85233-896-1|last2=Kraaikamp|first2=Cornelis| last3=Lopuhaä|first3=Hendrik Paul| last4=Meester| first4=Ludolf Erwin| doi=10.1007/1-84628-168-7_1}}</ref>
 
For example, suppose <math>{{mvar|U</math>}} has a uniform distribution between 0 and 1. To construct a random Bernoulli variable for some <{{math>|0 < p < 1</math>}}, we define
<math display="block">X = \begin{cases}
1,& \text{if } U<p\\
0,& \text{if } U\geq p.
\end{cases}</math>
We thus have
so that
<math display="block">\PrP(X=1) = \PrP(U<p) = p, \quad P(X=0) = P(U\geq p) = 1-p.</math>
ThisTherefore, the random variable ''{{mvar|X''}} has a Bernoulli distribution with parameter <math>{{mvar|p</math>}}.<ref name=":0"/> This is a transformation of discrete random variable.
\Pr(X=0) = \Pr(U\geq p) = 1-p.</math>
 
This random variable ''X'' has a Bernoulli distribution with parameter <math>p</math>.<ref name=":0"/> This is a transformation of discrete random variable.
 
For a distribution function <math>F</math> of an absolutely continuous random variable, an absolutely continuous random variable must be constructed. <math>F^{\mathit{inv}}</math>, an inverse function of <math>F</math>, relates to the uniform variable <math>U</math>:
<math display="block">{U\leq F(x)} = {F^{\mathit{inv}}(U)\leq x}.</math>
 
This method can be adapted to generate real-valued random variables with any distribution: for be any cumulative distribution function {{mvar|F}}, let {{math|''F''{{sup|inv}}}} be the generalized left inverse of <math>F,</math> also known in this context as the ''[[quantile function]]'' or ''inverse distribution function'':
For example, suppose a random variable that has an exponential distribution <math>F(x) = 1 - e^{-\lambda x}</math> must be constructed.
<math display="block">F^{U\leq Fmathrm{inv}}(xp)} = \inf \{F^{x \mathitin \mathbb{invR}}(U) : p \leqle F(x)\}.</math>
Then, {{math|''F''{{sup|inv}}(''p'') ≤ ''x''}} if and only if {{math|''p'' ≤ ''F''(''x'')}}. As a result, if {{mvar|U}} is uniformly distributed on {{math|[0, 1]}}, then the cumulative distribution function of {{math|''X'' {{=}} ''F''{{sup|inv}}(''U'')}} is {{mvar|F}}.
 
For example, suppose we want to generate a random variable that hashaving an exponential distribution with parameter <math>\lambda</math> — that is, with cumulative distribution function <math>F( : x) =\mapsto 1 - e^{-\lambda x}.</math> must be constructed.
<math display="block">\begin{align}
F(x) = u &\Leftrightarrow 1-e^{-\lambda x} = u \\[2pt]
Line 209 ⟶ 212:
&\Leftrightarrow x = \frac{-1}{\lambda}\ln(1-u)
\end{align}</math>
so <math>F^{\mathitmathrm{inv}}(u) = -\fractfrac{-1}{\lambda}\ln(1-u)</math>, and if <math>{{mvar|U</math>}} has a <uniform distribution on {{math>U(|[0, 1)</math> distribution,}} then the random variable <math>X</math> is defined by <math>X = F^{-\mathittfrac{inv}}(U) = \frac{-1}{\lambda} \ln(1-U)</math>. This has an exponential distribution ofwith parameter <math>\lambda.</math>.<ref name=":0" />
 
Although from a theoretical point of view this method always works, in practice the inverse distribution function is unknown and/or cannot be computed efficiently. In this case, other methods (such as the [[Monte Carlo method]]) are used.
A frequent problem in statistical simulations (the [[Monte Carlo method]]) is the generation of [[Pseudorandomness|pseudo-random numbers]] that are distributed in a given way.
 
== Common probability distributions and their applications ==