Probability mass function: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 21:38, 13 September 2020 edit Miaumee (talk \| contribs) Extended confirmed users 765 edits + "," to break verbose phrases. Minor C/E fixes. + inline refs. - dup empty lines. "occurrencies" -> "occurrences". Typeset inline math properly. Lc "categorical distribution". + cm on geometric distribution. Tags: Reverted Visual edit ← Previous edit		Latest revision as of 19:51, 12 March 2025 edit undo SFBB (talk \| contribs) Extended confirmed users 2,022 edits whether to call it PMF or discrete PDF is a non-generally accepted convention. Hence, the differentiation must be against a continuous PDF and not against a PDF in general.
(44 intermediate revisions by 28 users not shown)
Line 1: {{Short description\|Discrete-variable probability distribution}} [[Image:Discrete probability distrib.svg\|right\|thumb\|The graph of a probability mass function. All the values of this function must be non-negative and sum up to 1.]] In [[probability theory\|probability]] and [[statistics]], a '''probability mass function''' (sometimes called ''probability function'~~PMF~~' or ''frequency function'')<ref ~~name=":1"~~>~~{{Cite web\|date=2020-04-26\|title=List of Probability and Statistics Symbols\|url=~~[https://~~mathvault~~online.castat.psu.edu/~~hub~~stat414/~~higher-math~~lesson/~~math-symbols~~7/~~probability-statistics-symbols/\|access-date=2020-09-13\|website=Math~~7.2 7.2 ~~Vault\|language=en~~-~~US}}</ref><ref~~ ~~name=":2">{{Cite web\|title=~~Probability Mass ~~Function~~Functions ~~{{!}}~~\| STAT 414 ~~PMF\|url=https://www.probabilitycourse.com/chapter3/3_1_3_pmf.php\|access~~-~~date=2020~~ PennState -~~09-13\|website=www.probabilitycourse.com}}~~ Eberly College of Science]</ref>) is a function that gives the probability that a [[discrete random variable]] is exactly equal to some value.<ref>{{cite book\|author=Stewart, William J.\| title=Probability, Markov Chains, Queues, and Simulation: The Mathematical Basis of Performance Modeling\|publisher=Princeton University Press\|year=2011\|isbn=978-1-4008-3281-1\|page=105\|url=https://books.google.com/books?id=ZfRyBS1WbAQC&pg=PT105}}</ref> Sometimes, it is also known as the '''discrete probability density function'''. The probability mass function is often the primary means of defining a [[discrete probability distribution]];, and such functions exist for either [[Scalar variable\|scalar]] or [[multivariate random variable]]s whose [[Domain of a function\|___domain]] is discrete. A probability mass function differs from a [[probability density function\|continuous probability density function]] (PDF) in that the latter is associated with continuous rather than discrete random variables. A continuous PDF must be [[integration (mathematics)\|integrated]] over an interval to yield a probability.<ref name=":0">{{Cite book\|title=A modern introduction to probability and statistics : understanding why and how\|date=2005\|publisher=Springer\|others=Dekking, Michel, 1946-\|isbn=978-1-85233-896-1\|___location=London\|oclc=262680588}}</ref> The value of the random variable ~~which has~~having the largest probability mass is called the [[mode (statistics)\|mode]]. ==Formal definition== Probability mass function is the probability distribution of a [[discrete random variable]], and provides the possible values and their associated probabilities. It is the function <math>p: \R \to [0,1]</math> defined by Probability mass function is the probability distribution of a discrete random variable, and provides the possible values and their associated probabilities. It is the function <math>p:\mathbb{\R}</math> <math>\rightarrow [0,1]</math> defined by<ref name=":2" /><ref>{{Cite web\|date=2018-12-28\|title=3.2: Probability Mass Functions (PMFs) and Cumulative Distribution Functions (CDFs) for Discrete Random Variables\|url=https://stats.libretexts.org/Courses/Saint_Mary%27s_College%2C_Notre_Dame/MATH_345__-_Probability_(Kuter)/3%3A_Discrete_Random_Variables/3.2%3A_Probability_Mass_Functions_(PMFs)_and_Cumulative_Distribution_Functions_(CDFs)_for_Discrete_Random_Variables\|access-date=2020-09-13\|website=Statistics LibreTexts\|language=en}}</ref> {{Equation box 1 \|indent = \|title= \|equation = <math>p_X(~~x_i~~x) = P(X = ~~x_i~~x) </math> ~~</math>~~ \|cellpadding= 6 \|border Line 20 ⟶ 18: \|background colour=#F5FFFA}} for <math>-\infin < x < \infin</math>,<ref name=":0" /> where <math>P</math> is a [[probability measure]]. <math>p_X(x)</math> can also be simplified as <math>p(x)</math>.<ref>{{Cite book\|title=Engineering optimization : theory and practice\| last=Rao, \| first = Singiresu S.~~, 1944-~~\|date=1996\|publisher=Wiley\|isbn=0-471-55034-5\|edition=3rd\|___location=New York\|oclc=62080932}}</ref~~> In some occasions, <math>f_X(x)</math> is also used.<ref name=":1" /~~> The probabilities associated with ~~each~~all ~~possible~~(hypothetical) values must be ~~positive~~non-negative and sum up to 1~~. For all other values~~, ~~the probabilities need to be 0.~~ :<math display="block">\~~sum~~sum_x p_X(~~x_i~~x) = 1 </math> and <math display="block"> p_X(x)\geq 0.</math> ~~:<math>p(x_i)>0</math>~~ ~~:<math>p(x) = 0</math> for all other x~~ Thinking of probability as mass ~~can~~helps ~~help~~to avoid mistakes, since the physical mass is [[Conservation of mass\|conserved]] (as is the total probability for all hypothetical outcomes <math>x</math>). ==Measure theoretic formulation== A probability mass function of a discrete random variable <math>X</math> can be seen as a special case of two more general measure theoretic constructions: the [[probability distribution\|distribution]] of <math>X</math> and the [[probability density function]] of <math>X</math> with respect to the [[counting measure]]. We make this more precise below. Suppose that <math>(A, \mathcal A, P)</math> is a [[probability space]] and that <math>(B, \mathcal B)</math> is a measurable space whose underlying [[sigma algebra\|σ-algebra]] is discrete, so in particular contains singleton sets of <math>B</math>. In this setting, a random variable <math> X \colon A \to B</math> is discrete, provided ~~that~~ its image is countable. The [[pushforward measure]] <math>X_{}(P)</math>—called athe distribution of <math>X</math> in this context—is a probability measure on <math>B</math> whose restriction to singleton sets induces athe probability mass function (as mentioned in the previous section) <math>f_X \colon B \to \mathbb R</math>, since <math>f_X(b)=P( X^{-1}( b ))=~~[X_(~~P)](\{X=b\})</math> for each <math>b \in B</math>. Now, suppose that <math>(B, \mathcal B, \mu)</math> is a [[measure space]] equipped with the counting measure μ<math>\mu</math>. The probability density function <math>f</math> of <math>X</math> with respect to the counting measure, if it exists, is the [[Radon–Nikodym derivative]] of the pushforward measure of <math>X</math> (with respect to the counting measure), so <math> f = d X_P / d \mu</math> and <math>f</math> is a function from <math>B</math> to the non-negative reals. ~~Hence~~ As a consequence, for any <math>b \in B</math>, we have ~~that~~ :<math display="block">P(X=b)=P( X^{-1}( \{ b) ~~\} )~~) := ~~\int_{X^{-1}~~X_(P)(\{b \})} dP= ~~=</math><math>~~\int_{ \{b \}} f d \mu = f(b),</math> demonstrating that <math>f</math> is in fact a probability mass function. When there is a natural order among the potential outcomes <math>x</math>, it may be convenient to assign numerical values to them (or ''n''-tuples in case of a discrete [[multivariate random variable]]), and to consider also values not in the [[Image (mathematics)\|image]] of <math>X</math>. That is, <math>f_X</math> may be defined for all [[real number]]s and <math>f_X(x)=0</math> for all <math>x \notin X(S)</math> as shown in the figure. The image of <math>X</math> has a [[countable]] subset on which the probability mass function <math>f_X(x)</math> is one. Consequently, the probability mass function is zero for all but a countable number of values of <math>x</math>. The discontinuity of probability mass functions is related to the fact that the [[cumulative distribution function]] of a discrete random variable is also discontinuous. If <math>X</math> is a discrete random variable, then <math> P(X = x) = 1</math> means that the casual event <math>(X = x)</math> is certain (it is true in ~~the~~ 100% of the occurrences); on the contrary, <math>P(X = x) = 0</math> means that the casual event <math>(X = x)</math> is always impossible. This statement isn't true for a [[continuous random variable]] <math>X</math>, for which <math>P(X = x) = 0</math> for any possible <math>x</math>: in fact, by definition, a continuous random variable can have an [[infinite set]] of possible values, and thus the probability that it has a single particular value ''x'' is equal to <math>\tfrac{1}{\infty} = 0</math>. [[Discretization of continuous features\|Discretization]] is the process of converting a continuous random variable into a discrete one. ==Examples== Line 54 ⟶ 49: ===Finite=== ~~Three~~There are three major distributions ~~exist when it comes to discrete distributions:~~associated, the [[Bernoulli distribution]], the [[~~Binomial~~binomial distribution]] and the [[geometric distribution]]. [[Bernoulli distribution]], Ber(p),<ref name=":1" /> is used to model an experiment with only two possible outcomes. The two outcomes are often encoded as 1 and 0.▼ ~~:<math>p_X(x) = \begin{cases} p, & \text{if }x\text{ is 1} \\ 1-p, & \text{if }x\text{ is 0} \end{cases}</math>~~ :An example of the Bernoulli distribution is tossing a coin. Suppose that <math>S</math> is the sample space of all outcomes of a single toss of a fair coin, and <math>X</math> is the random variable defined on <math>S</math> assigning 0 to the category "tails" and 1 to the category "heads". Since the coin is fair, the probability mass function is▼ ~~::<math>p_X(x) = \begin{cases}\frac{1}{2}, &x \in \{0, 1\},\\0, &x \notin \{0, 1\}.\end{cases}</math>~~ [[Binomial distribution]], Bin(n,p),<ref name=":1" /> models the number of successes when someone draws n times with replacement. Each draw or experiment is independent, with two possible outcomes. The associated probability mass function is<math>\binom{n}{k}p^k (1-p)^{n-k}</math>. [[Image:Fair dice probability distribution.svg\|right\|thumb\|The probability mass function of a [[Dice\|fair die]]. All the numbers on the {{dice}} have an equal chance of appearing on top when the die stops rolling.]]▼ ~~:An example of the Binomial distribution is the probability of getting exactly one 6, when someone rolls a fair die three times.~~ [[Geometric distribution]], Geo(''p''),<ref name=":1" /> describes the number of trials needed to get one success. Its probability mass function is <math>p_X(k) = (1-p)^{k-1} p</math>. ~~:An example is tossing the coin until the first head appears.~~ Other distributions that can be modeled using a probability mass function are the [[categorical distribution]] (also known as the generalized Bernoulli distribution), and the [[multinomial distribution]]. If the discrete distribution has two or more categories, one of which may occur, then when there is only a single trial or draw (whether these categories have a natural ordering or not), this is a categorical distribution.▼ * An example of a [[Joint probability distribution\|multivariate discrete distribution]], and of its probability mass function, is provided by the [[multinomial distribution]]. Here, the multiple random variables are the numbers of successes in each of the categories after a given number of trials, and each non-zero probability mass gives the probability of a certain combination of numbers of successes in the various categories.▼ ~~===Infinite<!-- Geometric distribution should be here instead. -->===~~ The following exponentially declining distribution is an example of a distribution with an infinite number of possible outcomes—all the positive integers: ▲[[Bernoulli distribution~~]],~~: ~~Ber~~'''ber(p) ''',~~<ref name=":1" />~~ is used to model an experiment with only two possible outcomes. The two outcomes are often encoded as 1 and 0. <math display="block">p_X(x) = \begin{cases} ~~::<math>\text{Pr}(X=i)= \frac{1}{2^i}\quad \text{for}\quad i=1, 2, 3, \dots .</math>~~ p, & \text{if }x\text{ is 1} \\ 1-p, & \text{if }x\text{ is 0} ▲:\end{cases}</math> An example of the Bernoulli distribution is tossing a coin. Suppose that <math>S</math> is the sample space of all outcomes of a single toss of a [[fair coin]], and <math>X</math> is the random variable defined on <math>S</math> assigning 0 to the category "tails" and 1 to the category "heads". Since the coin is fair, the probability mass function is <math display="block">p_X(x) = \begin{cases} \frac{1}{2}, &x = 0,\\ \frac{1}{2}, &x = 1,\\ 0, &x \notin \{0, 1\}. \end{cases}</math> ▲[[ Binomial distribution]], ~~Bin(n,p),<ref name=":1" />~~ models the number of successes when someone draws n times with replacement. Each draw or experiment is independent, with two possible outcomes. The associated probability mass function is <math display="inline">\binom{n}{k} p^k (1-p)^{n-k}</math>. [[Image:Fair dice probability distribution.svg\|right\|thumb\|The probability mass function of a [[Dice\|fair die]]. All the numbers on the ~~{{dice}}~~die have an equal chance of appearing on top when the die stops rolling.]]{{pb}}An example of the binomial distribution is the probability of getting exactly one 6 when someone rolls a fair die three times. Geometric distribution describes the number of trials needed to get one success. Its probability mass function is <math display="inline">p_X(k) = (1-p)^{k-1} p</math>.{{pb}}An example is tossing a coin until the first "heads" appears. <math>p</math> denotes the probability of the outcome "heads", and <math>k</math> denotes the number of necessary coin tosses. {{pb}}Other distributions that can be modeled using a probability mass function are the [[categorical distribution]] (also known as the generalized Bernoulli distribution) and the [[multinomial distribution]]. ▲* If the discrete distribution has two or more categories, one of which may occur, ~~then~~whether ~~when~~or ~~there~~not isthese ~~only~~categories have a ~~single~~natural ~~trial~~ordering, orwhen ~~draw~~there ~~(whether~~is ~~these categories have~~only a ~~natural~~single ~~ordering~~trial ~~or not~~(draw), this is a categorical distribution. ▲* An example of a [[Joint probability distribution\|multivariate discrete distribution]], and of its probability mass function, is provided by the [[multinomial distribution]]. Here, the multiple random variables are the numbers of successes in each of the categories after a given number of trials, and each non-zero probability mass gives the probability of a certain combination of numbers of successes in the various categories. {{clear}} ===Infinite=== The following exponentially declining distribution is an example of a distribution with an infinite number of possible outcomes—all the positive integers: <math display="block">\text{Pr}(X=i)= \frac{1}{2^i}\qquad \text{for } i=1, 2, 3, \dots </math> Despite the infinite number of possible outcomes, the total probability mass is 1/2 + 1/4 + 1/8 + ~~...~~⋯ = 1, satisfying the unit total probability requirement for a probability distribution. ==Multivariate case== {{Main\|Joint probability distribution}} Two or more discrete random variables have a joint probability mass function, which gives the probability of each possible combination of realizations for the random variables. ~~For random variables <math>X_1, \ldots, X_n</math>, the joint probability mass function is sometimes denoted as <math>f(x_1, \ldots, x_n)</math>.<ref name=":1" />~~ ==References== Line 97 ⟶ 80: {{Theory of probability distributions}} {{Authority control}} [[Category:Types of probability distributions]]