Probability mass function: Difference between revisions

Content deleted Content added
+ "," to break verbose phrases. Minor C/E fixes. + inline refs. - dup empty lines. "occurrencies" -> "occurrences". Typeset inline math properly. Lc "categorical distribution". + cm on geometric distribution.
Tags: Reverted Visual edit
Reverted 1 edit by Miaumee (talk): Per User talk:Miaumee, this is apparently the preferred response to poor editing
Line 1:
{{Short description|Discrete-variable probability distribution}}
[[Image:Discrete probability distrib.svg|right|thumb|The graph of a probability mass function. All the values of this function must be non-negative and sum up to 1.]]
In [[probability theory|probability]] and [[statistics]], a '''probability mass function''' ('''PMF''')<ref name=":1">{{Cite web|date=2020-04-26|title=List of Probability and Statistics Symbols|url=https://mathvault.ca/hub/higher-math/math-symbols/probability-statistics-symbols/|access-date=2020-09-13|website=Math Vault|language=en-US}}</ref><ref name=":2">{{Cite web|title=Probability Mass Function {{!}} PMF|url=https://www.probabilitycourse.com/chapter3/3_1_3_pmf.php|access-date=2020-09-13|website=www.probabilitycourse.com}}</ref> is a function that gives the probability that a [[discrete random variable]] is exactly equal to some value.<ref>{{cite book|author=Stewart, William J.|title=Probability, Markov Chains, Queues, and Simulation: The Mathematical Basis of Performance Modeling|publisher=Princeton University Press|year=2011|isbn=978-1-4008-3281-1|page=105|url=https://books.google.com/books?id=ZfRyBS1WbAQC&pg=PT105}}</ref> Sometimes, it is also known as the discrete density function. The probability mass function is often the primary means of defining a [[discrete probability distribution]];, and such functions exist for either [[Scalar variable|scalar]] or [[multivariate random variable]]s whose [[Domain of a function|___domain]] is discrete.
 
A probability mass function differs from a [[probability density function]] (PDF) in that the latter is associated with continuous rather than discrete random variables. A PDF must be [[integration (mathematics)|integrated]] over an interval to yield a probability.<ref name=":0">{{Cite book|title=A modern introduction to probability and statistics : understanding why and how|date=2005|publisher=Springer|others=Dekking, Michel, 1946-|isbn=978-1-85233-896-1|___location=London|oclc=262680588}}</ref>
 
The value of the random variable which hashaving the largest probability mass is called the [[mode (statistics)|mode]].
 
==Formal definition==
 
Probability mass function is the probability distribution of a discrete random variable, and provides the possible values and their associated probabilities. It is the function <math>p:\mathbb{\R}</math> <math>\rightarrow [0,1]</math> defined by<ref name=":2" /><ref>{{Cite web|date=2018-12-28|title=3.2: Probability Mass Functions (PMFs) and Cumulative Distribution Functions (CDFs) for Discrete Random Variables|url=https://stats.libretexts.org/Courses/Saint_Mary%27s_College%2C_Notre_Dame/MATH_345__-_Probability_(Kuter)/3%3A_Discrete_Random_Variables/3.2%3A_Probability_Mass_Functions_(PMFs)_and_Cumulative_Distribution_Functions_(CDFs)_for_Discrete_Random_Variables|access-date=2020-09-13|website=Statistics LibreTexts|language=en}}</ref>
{{Equation box 1
|indent =
Line 20:
|background colour=#F5FFFA}}
 
for <math>-\infin < x < \infin</math>,<ref name=":0" /> where <math>P</math> is a [[probability measure]]. <math>p_X(x)</math> can also be simplified as <math>p(x)</math>.<ref>{{Cite book|title=Engineering optimization : theory and practice|last=Rao, Singiresu S., 1944-|date=1996|publisher=Wiley|isbn=0-471-55034-5|edition=3rd|___location=New York|oclc=62080932}}</ref> In some occasions, <math>f_X(x)</math> is also used.<ref name=":1" />
 
The probabilities associated with each possible values must be positive and sum up to 1. For all other values, the probabilities need to be 0.
Line 28:
:<math>p(x) = 0</math> for all other x
 
Thinking of probability as mass canhelps helpto avoid mistakes, since the physical mass is conserved (as is the total probability for all hypothetical outcomes <math>x</math>).
 
==Measure theoretic formulation==
Line 36:
 
Suppose that <math>(A, \mathcal A, P)</math> is a [[probability space]]
and that <math>(B, \mathcal B)</math> is a measurable space whose underlying [[sigma algebra|σ-algebra]] is discrete, so in particular contains singleton sets of <math>B</math>. In this setting, a random variable <math> X \colon A \to B</math> is discrete, provided that its image is countable.
The [[pushforward measure]] <math>X_{*}(P)</math>—called a distribution of <math>X</math> in this context—is a probability measure on <math>B</math> whose restriction to singleton sets induces a probability mass function <math>f_X \colon B \to \mathbb R</math>, since <math>f_X(b)=P(X^{-1}(b))=[X_*(P)](\{b\})</math> for each <math>b \in B</math>.
 
Now, suppose that <math>(B, \mathcal B, \mu)</math> is a [[measure space]] equipped with the counting measure μ. The probability density function <math>f</math> of <math>X</math> with respect to the counting measure, if it exists, is the [[Radon–Nikodym derivative]] of the pushforward measure of <math>X</math> (with respect to the counting measure), so <math> f = d X_*P / d \mu</math> and <math>f</math> is a function from <math>B</math> to the non-negative reals. Hence As a consequence, for any <math>b \in B</math>, we have that
:<math>P(X=b)=P(X^{-1}( \{ b \} )) := \int_{X^{-1}(\{b \})} dP =</math><math>\int_{ \{b \}} f d \mu = f(b),</math>
 
demonstrating that <math>f</math> is in fact a probability mass function.
 
 
When there is a natural order among the potential outcomes <math>x</math>, it may be convenient to assign numerical values to them (or ''n''-tuples in case of a discrete [[multivariate random variable]]), and to consider also values not in the [[Image (mathematics)|image]] of <math>X</math>. That is, <math>f_X</math> may be defined for all [[real number]]s and <math>f_X(x)=0</math> for all <math>x \notin X(S)</math> as shown in the figure.
 
When there is a natural order among the potential outcomes <math>x</math>, it may be convenient to assign numerical values to them (or ''n''-tuples in case of a discrete [[multivariate random variable]]), and to consider also values not in the [[Image (mathematics)|image]] of <math>X</math>. That is, <math>f_X</math> may be defined for all [[real number]]s and <math>f_X(x)=0</math> for all <math>x \notin X(S)</math> as shown in the figure.
 
The image of <math>X</math> has a [[countable]] subset on which the probability mass function <math>f_X(x)</math> is one. Consequently, the probability mass function is zero for all but a countable number of values of <math>x</math>.
 
The discontinuity of probability mass functions is related to the fact that the [[cumulative distribution function]] of a discrete random variable is also discontinuous. If <math>X</math> is a discrete random variable, then <math> P(X = x) = 1</math> means that the casual event <math>(X = x)</math> is certain (it is true in the 100% of the occurrencesoccurrencies); on the contrary, <math>P(X = x) = 0</math> means that the casual event <math>(X = x)</math> is always impossible. This statement isn't true for a [[continuous random variable]] <math>X</math>, for which <math>P(X = x) = 0</math> for any possible <math>x</math>: in fact, by definition, a continuous random variable can have an [[infinite set]] of possible values, and thus the probability that it has a single particular value ''x'' is equal to <math>\tfracfrac{1}{\infty} = 0</math>. [[Discretization of continuous features|Discretization]] is the process of converting a continuous random variable into a discrete one.
 
==Examples==
Line 54 ⟶ 56:
 
===Finite===
ThreeThere are three major distributions exist when it comes to discrete distributions:associated, the [[Bernoulli distribution]], the [[Binomial distribution]] and the [[geometric distribution]].
 
*[[Bernoulli distribution]], Ber(p),<ref name=":1" /> is used to model an experiment with only two possible outcomes. The two outcomes are often encoded as 1 and 0.
 
:<math>p_X(x) = \begin{cases} p, & \text{if }x\text{ is 1} \\ 1-p, & \text{if }x\text{ is 0} \end{cases}</math>
Line 62 ⟶ 64:
::<math>p_X(x) = \begin{cases}\frac{1}{2}, &x \in \{0, 1\},\\0, &x \notin \{0, 1\}.\end{cases}</math>
 
*[[Binomial distribution]], Bin(n,p),<ref name=":1" /> models the number of successes when someone draws n times with replacement. Each draw or experiment is independent, with two possible outcomes. The associated probability mass function is<math>\binom{n}{k}p^k (1-p)^{n-k}</math>. [[Image:Fair dice probability distribution.svg|right|thumb|The probability mass function of a [[Dice|fair die]]. All the numbers on the {{dice}} have an equal chance of appearing on top when the die stops rolling.]]
 
:An example of the Binomial distribution is the probability of getting exactly one 6, when someone rolls a fair die three times.
 
 
 
*[[Geometric distribution]], Geo(''p''),<ref name=":1" /> describes the number of trials needed to get one success, denoted as Geo(p). Its probability mass function is <math>p_X(k) = (1-p)^{k-1} p</math>.
 
:An example is tossing the coin until the first head appears.
::
 
Other distributions that can be modeled using a probability mass function are the [[categoricalCategorical distribution]] (also known as the generalized Bernoulli distribution), and the [[multinomial distribution]].
 
* If the discrete distribution has two or more categories, one of which may occur, thenwhether whenor therenot isthese onlycategories have a singlenatural trialordering, orwhen drawthere (whetheris these categories haveonly a naturalsingle orderingtrial or not(draw), this is a categorical distribution.
* An example of a [[Joint probability distribution|multivariate discrete distribution]], and of its probability mass function, is provided by the [[multinomial distribution]]. Here, the multiple random variables are the numbers of successes in each of the categories after a given number of trials, and each non-zero probability mass gives the probability of a certain combination of numbers of successes in the various categories.
 
===Infinite<!-- Geometric distribution should be here instead. -->===
 
*The following exponentially declining distribution is an example of a distribution with an infinite number of possible outcomes—all the positive integers:
Line 88 ⟶ 91:
{{Main|Joint probability distribution}}
 
Two or more discrete random variables have a joint probability mass function, which gives the probability of each possible combination of realizations for the random variables. For random variables <math>X_1, \ldots, X_n</math>, the joint probability mass function is sometimes denoted as <math>f(x_1, \ldots, x_n)</math>.<ref name=":1" />
 
==References==