Probability mass function: Difference between revisions

Content deleted Content added
whether to call it PMF or discrete PDF is a non-generally accepted convention. Hence, the differentiation must be against a continuous PDF and not against a PDF in general.
 
(24 intermediate revisions by 16 users not shown)
Line 1:
{{Short description|Discrete-variable probability distribution}}
[[Image:Discrete probability distrib.svg|right|thumb|The graph of a probability mass function. All the values of this function must be non-negative and sum up to 1.]]
In [[probability theory|probability]] and [[statistics]], a '''probability mass function''' (sometimes called ''probability function'' or ''frequency function''<ref>[https://online.stat.psu.edu/stat414/lesson/7/7.2 7.2 - Probability Mass Functions | STAT 414 - PennState - Eberly College of Science]</ref>) is a function that gives the probability that a [[discrete random variable]] is exactly equal to some value.<ref>{{cite book|author=Stewart, William J.| title=Probability, Markov Chains, Queues, and Simulation: The Mathematical Basis of Performance Modeling|publisher=Princeton University Press|year=2011|isbn=978-1-4008-3281-1|page=105|url=https://books.google.com/books?id=ZfRyBS1WbAQC&pg=PT105}}</ref> Sometimes it is also known as the '''discrete probability density function'''. The probability mass function is often the primary means of defining a [[discrete probability distribution]], and such functions exist for either [[Scalar variable|scalar]] or [[multivariate random variable]]s whose [[Domain of a function|___domain]] is discrete.
 
A probability mass function differs from a [[probability density function|continuous probability density function]] (PDF) in that the latter is associated with continuous rather than discrete random variables. A continuous PDF must be [[integration (mathematics)|integrated]] over an interval to yield a probability.<ref name=":0">{{Cite book|title=A modern introduction to probability and statistics : understanding why and how|date=2005|publisher=Springer|others=Dekking, Michel, 1946-|isbn=978-1-85233-896-1|___location=London|oclc=262680588}}</ref>
 
The value of the random variable having the largest probability mass is called the [[mode (statistics)|mode]].
 
==Formal definition==
Probability mass function is the probability distribution of a [[discrete random variable]], and provides the possible values and their associated probabilities. It is the function <math>p: \R \to [0,1]</math> defined by
 
Probability mass function is the probability distribution of a discrete random variable, and provides the possible values and their associated probabilities. It is the function <math>p: \R \to [0,1]</math> defined by
{{Equation box 1
|indent =
|title=
|equation = <math>p_X(x) = P(X = x) </math>
|cellpadding= 6
|border
Line 25 ⟶ 24:
<math display="block">\sum_x p_X(x) = 1 </math> and <math display="block"> p_X(x)\geq 0.</math>
 
Thinking of probability as mass helps to avoid mistakes since the physical mass is [[Conservation of mass|conserved]] as is the total probability for all hypothetical outcomes <math>x</math>.
 
==Measure theoretic formulation==
 
A probability mass function of a discrete random variable <math>X</math> can be seen as a special case of two more general measure theoretic constructions:
the [[probability distribution|distribution]] of <math>X</math> and the [[probability density function]] of <math>X</math> with respect to the [[counting measure]]. We make this more precise below.
Line 36 ⟶ 34:
The [[pushforward measure]] <math>X_{*}(P)</math>—called the distribution of <math>X</math> in this context—is a probability measure on <math>B</math> whose restriction to singleton sets induces the probability mass function (as mentioned in the previous section) <math>f_X \colon B \to \mathbb R</math> since <math>f_X(b)=P( X^{-1}( b ))=P(X=b)</math> for each <math>b \in B</math>.
 
Now suppose that <math>(B, \mathcal B, \mu)</math> is a [[measure space]] equipped with the counting measure μ<math>\mu</math>. The probability density function <math>f</math> of <math>X</math> with respect to the counting measure, if it exists, is the [[Radon–Nikodym derivative]] of the pushforward measure of <math>X</math> (with respect to the counting measure), so <math> f = d X_*P / d \mu</math> and <math>f</math> is a function from <math>B</math> to the non-negative reals. As a consequence, for any <math>b \in B</math> we have
<math display="block">P(X=b)=P( X^{-1}( b) ) = (X_*(P)(b) = \int_{ b } f d \mu = f(b),</math>
 
demonstrating that <math>f</math> is in fact a probability mass function.
Line 45 ⟶ 43:
The image of <math>X</math> has a [[countable]] subset on which the probability mass function <math>f_X(x)</math> is one. Consequently, the probability mass function is zero for all but a countable number of values of <math>x</math>.
 
The discontinuity of probability mass functions is related to the fact that the [[cumulative distribution function]] of a discrete random variable is also discontinuous. If <math>X</math> is a discrete random variable, then <math> P(X = x) = 1</math> means that the casual event <math>(X = x)</math> is certain (it is true in the 100% of the occurrences); on the contrary, <math>P(X = x) = 0</math> means that the casual event <math>(X = x)</math> is always impossible. This statement isn't true for a [[continuous random variable]] <math>X</math>, for which <math>P(X = x) = 0</math> for any possible <math>x</math>. [[Discretization of continuous features|Discretization]] is the process of converting a continuous random variable into a discrete one.
 
==Examples==
Line 51 ⟶ 49:
 
===Finite===
There are three major distributions associated, the [[Bernoulli distribution]], the [[binomial distribution]] and the [[geometric distribution]].
 
*Bernoulli distribution: '''ber(p) ''', is used to model an experiment with only two possible outcomes. The two outcomes are often encoded as 1 and 0. <math display="block">p_X(x) = \begin{cases}
p, & \text{if }x\text{ is 1} \\
1-p, & \text{if }x\text{ is 0}
\end{cases}</math> An example of the Bernoulli distribution is tossing a coin. Suppose that <math>S</math> is the sample space of all outcomes of a single toss of a [[fair coin]], and <math>X</math> is the random variable defined on <math>S</math> assigning 0 to the category "tails" and 1 to the category "heads". Since the coin is fair, the probability mass function is <math display="block">p_X(x) = \begin{cases}
\frac{1}{2}, &x \in= \{0, 1\},\\
\frac{1}{2}, &x = 1,\\
0, &x \notin \{0, 1\}.
\end{cases}</math>
* [[Binomial distribution]], models the number of successes when someone draws n times with replacement. Each draw or experiment is independent, with two possible outcomes. The associated probability mass function is <math display="inline">\binom{n}{k} p^k (1-p)^{n-k}</math>. [[Image:Fair dice probability distribution.svg|right|thumb|The probability mass function of a [[Dice|fair die]]. All the numbers on the {{dice}}die have an equal chance of appearing on top when the die stops rolling.]]{{pb}}An example of the binomial distribution is the probability of getting exactly one 6 when someone rolls a fair die three times.
* Geometric distribution describes the number of trials needed to get one success. Its probability mass function is <math display="inline">p_X(k) = (1-p)^{k-1} p</math>.{{pb}}An example is tossing a coin until the first "heads" appears. <math>p</math> denotes the probability of the outcome "heads", and <math>k</math> denotes the number of necessary coin tosses. {{pb}}Other distributions that can be modeled using a probability mass function are the [[categorical distribution]] (also known as the generalized Bernoulli distribution) and the [[multinomial distribution]].
* If the discrete distribution has two or more categories one of which may occur, whether or not these categories have a natural ordering, when there is only a single trial (draw) this is a categorical distribution.
Line 81 ⟶ 80:
 
{{Theory of probability distributions}}
{{Authority control}}
 
[[Category:Types of probability distributions]]