Content deleted Content added
→Introduction: Micro improvement - it could have other representations (ex: other languages) |
m Dating maintenance tags: {{Cn}} |
||
(37 intermediate revisions by 18 users not shown) | |||
Line 4:
{{Probability fundamentals}}
In [[probability theory]] and [[statistics]], a '''probability distribution''' is
For instance, if {{mvar|X}} is used to denote the outcome of a coin toss ("the experiment"), then the probability distribution of {{mvar|X}} would take the value 0.5 (1 in 2 or 1/2) for {{math|1=''X'' = heads}}, and 0.5 for {{math|1=''X'' = tails}} (assuming that [[fair coin|the coin is fair]]). More commonly, probability distributions are used to compare the relative occurrence of many different random values.
Line 11:
==Introduction==
A probability distribution is a mathematical description of the probabilities of events, subsets of the [[sample space]]. The sample space, often represented in notation by <math>\ \Omega\ ,</math> is the [[Set (mathematics)|set]] of all possible [[outcome (probability)|outcomes]] of a random phenomenon being observed
To define probability distributions for the specific case of [[random variables]] (so the sample space can be seen as a numeric set), it is common to distinguish between '''discrete''' and '''
<math display="block">
In contrast, when a random variable takes values from a continuum then by convention, any individual outcome is assigned probability zero. For such
For example, consider measuring the weight of a piece of ham in the supermarket, and assume the scale can provide arbitrarily many digits of precision. Then, the probability that it weighs ''exactly'' 500 [[gram|g]] must be zero because no matter how high the level of precision chosen, it cannot be assumed
▲In contrast, when a random variable takes values from a continuum then by convention, any individual outcome is assigned probability zero. For such '''continuous random variables''', only events that include infinitely many outcomes such as intervals have probability greater than 0.
▲For example, consider measuring the weight of a piece of ham in the supermarket, and assume the scale can provide arbitrarily many digits of precision. Then, the probability that it weighs ''exactly'' 500[[gram|g]] must be zero because no matter how high the level of precision chosen, it cannot be assumed the there are no non-zero decimal digits in the remaining omitted digits ignored by the precision level.
However, for the same use case, it is possible to meet quality control requirements such as that a package of "500 g" of ham must weigh between 490 g and 510 g with at least 98% probability. This is possible because this measurement does not require as much precision from the underlying equipment.
[[File:Combined Cumulative Distribution Graphs.png|thumb|455x455px| Figure 1: The left graph shows a probability density function. The right graph shows the cumulative distribution function
Most continuous probability distributions encountered in practice are not only continuous but also [[absolutely continuous]]. Such distributions can be described by their [[probability density function]]. Informally, the probability density <math>f</math> of a random variable <math>X</math> describes the [[infinitesimal]] probability that <math>X</math> takes any value <math>x</math> — that is <math>P(x \leq X < x + \Delta x) \approx f(x) \, \Delta x</math> as <math>\Delta x > 0</math> becomes is arbitrarily small. The probability that <math>X</math> lies in a given interval can be computed rigorously by [[Integration (mathematics)|integrating]] the probability density function over that interval.<ref name=":3"/>
==General probability definition==
Let <math> (\Omega, \mathcal{F}, P) </math> be a [[probability space]], <math> (E, \mathcal{E}) </math> be a [[measurable space]], and <math> X : \Omega \to E </math> be a <math> (E, \mathcal{E}) </math>-valued random variable. Then the '''probability distribution''' of <math>X</math> is the [[pushforward measure]] of the probability measure <math>P</math> onto <math> (E, \mathcal{E}) </math> induced by <math>X</math>. Explicitly, this pushforward measure on <math> (E, \mathcal{E}) </math> is given by
A probability distribution can be described in various forms, such as by a probability mass function or a cumulative distribution function. One of the most general descriptions, which applies for absolutely continuous and discrete variables, is by means of a probability function <math>P\colon \mathcal{A} \to \Reals</math> whose '''input space''' <math>\mathcal{A}</math> is a [[σ-algebra]], and gives a [[real number]] '''probability''' as its output, particularly, a number in <math>[0,1] \subseteq \Reals</math>.▼
<math display="block">X_{*} (P) (B) = P \left( X^{-1} (B) \right)</math> for <math>B \in \mathcal{E}.</math>
Any probability distribution is a [[probability measure]] on <math> (E, \mathcal{E}) </math> (in general different from <math>P</math>, unless <math>X</math> happens to be the identity map).{{cn|date=May 2025}}
▲A probability distribution can be described in various forms, such as by a probability mass function or a cumulative distribution function. One of the most general descriptions, which applies for absolutely continuous and discrete variables, is by means of a probability function <math>P \colon \mathcal{A} \to \Reals</math> whose '''input space''' <math>\mathcal{A}</math> is a [[σ-algebra]], and gives a [[real number]] '''probability''' as its output, particularly, a number in <math>[0,1] \subseteq \Reals</math>.
The probability function <math>P</math> can take as argument subsets of the sample space itself, as in the coin toss example, where the function <math>P</math> was defined so that {{math|1=''P''(heads) = 0.5}} and {{math|1=''P''(tails) = 0.5}}. However, because of the widespread use of [[random variables]], which transform the sample space into a set of numbers (e.g., <math>\R</math>, <math>\N</math>), it is more common to study probability distributions whose argument are subsets of these particular kinds of sets (number sets),<ref>{{cite book| last1 = Walpole | first1 = R.E. | last2 = Myers | first2 = R.H. | last3 = Myers | first3 = S.L. | last4 = Ye | first4 = K.|year=1999|title=Probability and statistics for engineers|publisher=Prentice Hall}}</ref> and all probability distributions discussed in this article are of this type. It is common to denote as <math>P(X \in E)</math> the probability that a certain value of the variable <math>X</math> belongs to a certain event <math>E</math>.<ref name='ross' /><ref name='degroot' />
Line 50 ⟶ 57:
=== Basic terms ===
*''[[Random variable]]'': takes values from a sample space; probabilities describe which values and set of values are
*''[[Event (probability theory)|Event]]'': set of possible values (outcomes) of a random variable that occurs with a certain probability.
*''[[Probability measure|Probability function]]'' or ''probability measure'': describes the probability <math>P(X \in E)</math> that the event <math>E,</math> occurs.<ref name='vapnik'>Chapters 1 and 2 of {{harvp|Vapnik|1998}}</ref>
Line 94 ⟶ 101:
Conversely, any function <math>F:\mathbb{R}\to\mathbb{R}</math> that satisfies the first four of the properties above is the cumulative distribution function of some probability distribution on the real numbers.<ref>{{Cite book|title=Probability and stochastics|last=Erhan|first=Çınlar|date=2011|publisher=Springer|isbn=9780387878584|___location=New York|pages=57}}</ref>
Any probability distribution can be decomposed as the [[mixture distribution|mixture]] of a [[Discrete probability distribution|discrete]], an [[Absolutely continuous probability distribution|absolutely continuous]] and a [[Singular
==Discrete probability distribution==
Line 115 ⟶ 122:
Thus the cumulative distribution function has the form
<math display="block">F(x) = P(X \leq x) = \sum_{\omega \leq x} p(\omega).</math>
The points where the cdf jumps always form a countable set; this may be any countable set and thus may even be dense in the real numbers.
===Dirac delta representation===
A discrete probability distribution is often represented with [[Dirac measure]]s, also called one-point distributions (see below), the probability distributions of [[Degenerate distribution|deterministic random variable]]s. For any outcome <math>\omega</math>, let <math>\delta_\omega</math> be the Dirac measure concentrated at <math>\omega</math>. Given a discrete probability distribution, there is a countable set <math>A</math> with <math>P(X \in A) = 1</math> and a probability mass function <math>p</math>. If <math>E</math> is any event, then
<math display="block">P(X \in E) = \sum_{\omega \in A} p(\omega) \delta_\omega(E),</math>
or in short, <math display="block">P_X = \sum_{\omega \in A} p(\omega) \delta_\omega.</math> Similarly, discrete distributions can be represented with the [[Dirac delta function]] as a [[Generalized function|generalized]] [[probability density function]] <math>f</math>, where
<math display="block">f(x) = \sum_{\omega \in A} p(\omega) \delta(x - \omega),</math> which means <math display="block">P(X \in E) = \int_E f(x) \, dx = \sum_{\omega \in A} p(\omega) \int_E \delta(x - \omega) = \sum_{\omega \in A \cap E} p(\omega)</math>
for any event <math>E.</math><ref>{{Cite journal|last=Khuri|first=André I.|date=March 2004| title=Applications of Dirac's delta function in statistics|journal=International Journal of Mathematical Education in Science and Technology| language=en|volume=35|issue=2|pages=185–195| doi=10.1080/00207390310001638313|s2cid=122501973|issn=0020-739X}}</ref> ===Indicator-function representation===
For a discrete random variable <math>X</math>, let <math>u_0, u_1, \dots</math> be the values it can take with non-zero probability. Denote
<math display="block">\Omega_i=X^{-1}(u_i)= \{\omega: X(\omega)=u_i\},\, i=0, 1, 2, \dots</math>
These are [[disjoint set]]s, and for such sets
<math display="block">P\left(\bigcup_i \Omega_i\right)=\sum_i P(\Omega_i)=\sum_i P(X=u_i)=1.</math>
It follows that the probability that <math>X</math> takes any value except for <math>u_0, u_1, \dots</math> is zero, and thus one can write <math>X</math> as
<math display="block">X(\omega)=\sum_i u_i 1_{\Omega_i}(\omega)</math>
except on a set of probability zero, where <math>1_A</math> is the indicator function of <math>A</math>. This may serve as an alternative definition of discrete random variables.
===One-point distribution===
A special case is the discrete distribution of a random variable that can take on only one fixed value
== Absolutely continuous probability distribution==
Line 187 ⟶ 192:
Most algorithms are based on a [[pseudorandom number generator]] that produces numbers <math>X</math> that are uniformly distributed in the [[half-open interval]] {{closed-open|0, 1}}. These [[random variate]]s <math>X</math> are then transformed via some algorithm to create a new random variate having the required probability distribution. With this source of uniform pseudo-randomness, realizations of any random variable can be generated.<ref name=":0">{{Citation|last1=Dekking|first1=Frederik Michel| title=Why probability and statistics?|date=2005|work=A Modern Introduction to Probability and Statistics| pages=1–11| publisher =Springer London|isbn=978-1-85233-896-1|last2=Kraaikamp|first2=Cornelis| last3=Lopuhaä|first3=Hendrik Paul| last4=Meester| first4=Ludolf Erwin| doi=10.1007/1-84628-168-7_1}}</ref>
For example, suppose
<math display="block">X = \begin{cases}
1
0
\end{cases}</math>
We thus have
<math display="block">
▲This random variable ''X'' has a Bernoulli distribution with parameter <math>p</math>.<ref name=":0"/> This is a transformation of discrete random variable.
<math display="block">{U\leq F(x)} = {F^{\mathit{inv}}(U)\leq x}.</math>▼
This method can be adapted to generate real-valued random variables with any distribution: for be any cumulative distribution function {{mvar|F}}, let {{math|''F''{{sup|inv}}}} be the generalized left inverse of <math>F,</math> also known in this context as the ''[[quantile function]]'' or ''inverse distribution function'':
For example, suppose a random variable that has an exponential distribution <math>F(x) = 1 - e^{-\lambda x}</math> must be constructed.▼
▲<math display="block">F^{
Then, {{math|''F''{{sup|inv}}(''p'') ≤ ''x''}} if and only if {{math|''p'' ≤ ''F''(''x'')}}. As a result, if {{mvar|U}} is uniformly distributed on {{math|[0, 1]}}, then the cumulative distribution function of {{math|''X'' {{=}} ''F''{{sup|inv}}(''U'')}} is {{mvar|F}}.
▲For example, suppose we want to generate a random variable
<math display="block">\begin{align}
F(x) = u &\Leftrightarrow 1-e^{-\lambda x} = u \\[2pt]
Line 209 ⟶ 212:
&\Leftrightarrow x = \frac{-1}{\lambda}\ln(1-u)
\end{align}</math>
so <math>F^{\
Although from a theoretical point of view this method always works, in practice the inverse distribution function is unknown and/or cannot be computed efficiently. In this case, other methods (such as the [[Monte Carlo method]]) are used.
== Common probability distributions and their applications ==
|