Conditional probability distribution: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 19:37, 2 January 2019 edit Fvultier (talk \| contribs) Extended confirmed users 1,158 edits More about conditional CDFs. ← Previous edit		Latest revision as of 15:16, 3 August 2025 edit undo RankASea (talk \| contribs) Extended confirmed users 1,006 edits Link suggestions feature: 3 links added. Tags: Visual edit Mobile edit Mobile web edit Newcomer task Suggested: add links
(48 intermediate revisions by 37 users not shown)
Line 1: {{Short description\|Probability theory and statistics concept}} {{~~refimprove~~more citations needed\|date=April 2013}} In [[probability theory]] and [[statistics]], the conditional probability distribution is a probability distribution that describes the probability of an outcome given the occurrence of a particular event. Given two [[joint probability distribution\|jointly distributed]] [[random variable]]s <math>X</math> and <math>Y</math>, the '''conditional probability distribution''' of ''<math>Y''</math> given ''<math>X''</math> is the [[probability distribution]] of <math>Y</math> when <math>X</math> is known to be a particular value; in some cases the conditional probabilities may be expressed as functions containing the unspecified value <math>x</math> of <math>X</math> as a parameter. When both <math>X</math> and <math>Y</math> are [[categorical variable]]s, a [[conditional probability table]] is typically used to represent the conditional probability. The conditional distribution contrasts with the [[marginal distribution]] of a random variable, which is its distribution without reference to the value of the other variable. If the conditional distribution of <math>Y</math> given <math>X</math> is a [[continuous distribution]], then its [[probability density function]] is known as the '''conditional density function'''.{{sfnp\|Ross\|1993\|pp=88–91}} The properties of a conditional distribution, such as the [[Moment (mathematics)\|moments]], are often referred to by corresponding names such as the [[conditional mean]] and [[conditional variance]]. More generally, one can refer to the conditional distribution of a subset of a set of more than two variables; this conditional distribution is contingent on the values of all the remaining variables, and if more than one variable is included in the subset then this conditional distribution is the conditional [[joint distribution]] of the included variables. ~~==Conditional cumulative distribution==~~ Given a random variable <math>X</math> and a [[event (probability theory)\|random event]] <math>A</math>, the conditional cumulative distribution of <math>X</math> given <math>A</math> is defined by<ref name=KunIlPark>{{cite book \| author=Park,Kun Il\| title=Fundamentals of Probability and Stochastic Processes with Applications to Communications\| publisher=Springer \| year=2018 \| isbn=978-3-319-68074-3}}</ref>{{rp\|p. 97}} ~~:<math>F_{X\|A}(x) \triangleq \frac{P(X \leq x \cap A)}{P(A)}</math>~~ ~~for <math>P(A) > 0</math>.~~ ~~If another random variable is denoted by <math>Y</math>, it is possible to condition on the event <math>\{Y \leq y \}</math>. This yields~~ ~~:<math>F_{X\|Y \leq y}(x\|y) = \frac{P(X \leq x \cap Y \leq y)}{P(Y \leq y)}</math>~~ ~~which can be written as~~ ~~:<math>F_{X\|Y \leq y}(x\|y) = \frac{F_{X,Y}(x,y)}{F_Y(y)}</math>~~ where <math>F_{X,Y}(x,y)</math> denotes the joint cumulative distribution function of <math>X</math> and <math>Y</math> and <math>F_Y(y)</math> is the cumulative distribution function of <math>Y</math>. ==Conditional discrete distributions== For [[discrete random variable]]s, the conditional [[~~conditional~~ probability]] mass function]] of <math>Y</math> given ~~the occurrence of the value~~ <math>X=x~~</math> of <math>X~~</math> can be written according to its definition as: {{Equation box 1 \|indent = \|title= \|equation = :<math>p_{Y\|X}(y \mid x) \triangleq P(Y = y \mid X = x) = \frac{P(\{X=x\} \cap \{Y=y\})}{P(X=x)}\qquad</math> \|cellpadding= 6 \|border Line 34 ⟶ 19: \|background colour=#F5FFFA}} Due to the occurrence of <math>P(X=x)</math> in athe denominator, this is defined only for non-zero (hence strictly positive) <math>P(X=x).</math> The relation with the probability distribution of <math>X</math> given <math>Y</math> is: :<math>P(Y=y \mid X=x) P(X=x) = P(\{X=x\} \cap \{Y=y\}) = P(X=x \mid Y=y)P(Y=y).</math> ===Example=== Consider the roll of a fair ~~{{dice}}~~die and let <math>X=1</math> if the number is even (i.e., 2, 4, or 6) and <math>X=0</math> otherwise. Furthermore, let <math>Y=1</math> if the number is prime (i.e., 2, 3, or 5) and <math>Y=0</math> otherwise. {\| class="wikitable" \|- ! D !! 1 !! 2 !! 3 !! 4 !! 5 !! 6 \|- \| X \|\| 0 \|\| 1 \|\| 0 \|\| 1 \|\| 0 \|\| 1 Line 51 ⟶ 36: \|} Then the unconditional probability that <math>X=1</math> is 3/6 = 1/2 (since there are six possible rolls of the ~~die~~dice, of which three are even), whereas the probability that <math>X=1</math> conditional on <math>Y=1</math> is 1/3 (since there are three possible [[prime number]] rolls—2, 3, and 5—of which one is even). ==Conditional continuous distributions== Similarly for [[continuous random variable]]s, the conditional [[probability density function]] of <math>Y</math> given the occurrence of the value <math>x</math> of <math>X</math> can be written as~~<ref name=KunIlPark/>~~{{rpsfnp\|Park\|2018\|p. =99}} {{Equation box 1 \|indent = \|title= \|equation = :<math>~~f_Y~~f_{Y\mid X}(y \mid X=x) = \frac{f_{X, Y}(x, y)}{f_X(x)}\qquad</math> \|cellpadding= 6 \|border Line 68 ⟶ 53: The relation with the probability distribution of <math>X</math> given <math>Y</math> is given by: :<math>~~f_Y~~f_{Y\mid X}(y \mid X=x)f_X(x) = f_{X,Y}(x, y) = ~~f_X~~f_{X\|Y}(x \mid Y=y)f_Y(y). </math> The concept of the conditional distribution of a continuous random variable is not as intuitive as it might seem: [[Borel's paradox]] shows that conditional probability density functions need not be invariant under coordinate transformations. Line 76 ⟶ 61: The graph shows a [[bivariate normal distribution\|bivariate normal joint density]] for random variables <math>X</math> and <math>Y</math>. To see the distribution of <math>Y</math> conditional on <math>X=70</math>, one can first visualize the line <math>X=70</math> in the <math>X,Y</math> [[plane (geometry)\|plane]], and then visualize the plane containing that line and perpendicular to the <math>X,Y</math> plane. The intersection of that plane with the joint normal density, once rescaled to give unit area under the intersection, is the relevant conditional density of <math>Y</math>. <math>Y\mid X=70 \ \sim\ \mathcal{N}\left(\mu_Y+\frac{\sigma_Y}{\sigma_X}\rho( 70 - \mu_X),\, (1-\rho^2)\sigma_Y^2\right).</math> ==Relation to independence== Random variables <math>X</math>, <math>Y</math> are [[Statistical independence\|independent]] [[if and only if]] the conditional distribution of <math>Y</math> given <math>X</math> is, for all possible realizations of <math>X</math>, equal to the unconditional distribution of <math>Y</math>. For discrete random variables this means <math>P(Y=y\|X=x) = P(Y=y)</math> for all possible <math>xy</math> and <math>yx</math> with <math>P(X=x)>0</math>. For continuous random variables <math>X</math> and <math>Y</math>, having a [[joint density function]], it means <math>f_Y(y\|X=x) = f_Y(y)</math> for all possible <math>xy</math> and <math>yx</math> with <math>f_X(x)>0</math>. ==Properties== Seen as a function of <math>y</math> for given <math>x</math>, <math>P(Y=y\|X=x)</math> is a probability mass function and so the sum over all <math>y</math> (or integral if it is a conditional probability density) is 1. Seen as a function of <math>x</math> for given <math>y</math>, it is a [[likelihood function]], so that the sum (or integral) over all <math>x</math> need not be 1. Additionally, a marginal of a joint distribution can be expressed as the expectation of the corresponding conditional distribution. For instance, <math> p_X(x) = E_{Y}[p_{X\|Y}(x \ \|\ Y)] </math>. ==Measure-theoretic formulation== Let <math>(\Omega, \mathcal{F}, P)</math> be a [[probability space]], <math>\mathcal{G} \subseteq \mathcal{F}</math> a <math>\sigma</math>-field in <math>\mathcal{F}</math>,. ~~and~~Given <math>~~X :~~ A\~~Omega~~in \~~to \mathbb~~mathcal{RF}</math>, athe ~~real-valued~~[[Radon–Nikodym ~~random~~theorem]] ~~variable~~implies ~~(measurable~~that ~~with~~there ~~respect~~is{{sfnp\|Billingsley\|1995\|p=430}} ~~to the Borel~~a <math>\~~sigma~~mathcal{G}</math>-~~field~~measurable random variable <math>P(A\mid\mathcal{RG}~~^1</math>~~):\Omega\to ~~on <math>~~\mathbb{R}</math>)., Itcalled ~~can~~the ~~be shown that there exists<ref>~~[[~~#billingsley95\|Billingsley~~conditional ~~(1995)~~probability]], p.such ~~439~~that<~~/ref>~~math ~~a function <math~~display="block">\~~mu :~~int_G P(A\mid\mathcal{RG}^1 )(\~~times~~omega) dP(\~~Omega~~ omega)=P(A\tocap ~~\mathbb{R}~~G)</math>for ~~such that~~every <math>G\~~mu(\cdot,~~in \~~omega)~~mathcal{G}</math>, isand such a ~~probability~~random ~~measure~~variable onis ~~<math>\mathcal{R}^1</math>~~uniquely ~~for~~defined ~~each~~up ~~<math>\omega~~to ~~\in~~sets ~~\Omega</math>~~of (iprobability zero.~~e.,~~ itA conditional probability is called [[Regular conditional probability\|'''regular''']]) ~~and~~if <math>~~\mu(H,~~ \~~cdot) =~~ operatorname{P}(X \~~in H~~ cdot\mid \mathcal{G})(\omega) </math> ~~{{Definition~~is ~~needed\|date=June~~a ~~2017}}~~[[probability ~~(almost~~measure]] ~~surely) for every~~on <math>H (\inOmega, \mathcal{RF}^1)</math>. ~~For~~for ~~any~~all <math>\omega \in \Omega</math>~~, the function <math>\mu(\cdot, \omega) : \mathcal{R}^1 \to \mathbb{R}</math> is called~~ a ~~'''[[Conditional expectation#Definition of conditional probability\|conditional probability]] distribution''' of <math>X</math> given <math>\mathcal{G}</math>~~. ~~In this case,~~e. ~~:<math>E[X \mid \mathcal{G}] = \int_{-\infty}^\infty x \, \mu(d x, \cdot)</math>~~ Special cases: ~~almost surely.~~ * For the trivial sigma algebra <math>\mathcal BG= \{\emptyset,\Omega\}</math>, the conditional probability is athe constant function, <math>\operatorname{P}\!\left( A\mid \{\emptyset,\Omega\} \right) ~~\equiv~~= \operatorname{P}(A).</math>▼ == Relation to conditional expectation ==▼ ~~For~~* ~~any event~~If <math>A \in \mathcal{AG}</math>, ~~\supseteq~~ then <math>\operatorname{P}(A\mid\mathcal B{G})=1_A</math>, ~~define~~ the [[indicator function (defined [[#Relation to conditional expectation\|below]]:). Let <math>X : \Omega \to E</math> be a <math>(E, \mathcal{E})</math>-valued random variable. For each <math>B \in \mathcal{E}</math>, define <math display="block">\mu_{X \, \| \, \mathcal{G}} (B \, \|\, \mathcal{G}) = \mathrm{P} (X^{-1}(B) \, \| \, \mathcal{G}).</math>For any <math>\omega \in \Omega</math>, the function <math>\mu_{X \, \| \mathcal{G}}(\cdot \, \| \mathcal{G}) (\omega) : \mathcal{E} \to \mathbb{R}</math> is called the '''conditional probability distribution''' of <math>X</math> given <math>\mathcal{G}</math>. If it is a probability measure on <math>(E, \mathcal{E})</math>, then it is called [[Regular conditional probability\|'''regular''']]. For a real-valued random variable (with respect to the Borel <math>\sigma</math>-field <math>\mathcal{R}^1</math> on <math>\mathbb{R}</math>), every conditional probability distribution is regular.{{sfnp\|Billingsley\|1995\|p=439}} In this case,<math>E[X \mid \mathcal{G}] = \int_{-\infty}^\infty x \, \mu_{X \mid \mathcal{G}}(d x, \cdot)</math> almost surely. ▲=== Relation to conditional expectation === For any event <math>A \in \mathcal{F}</math>, define the [[indicator function]]: :<math>\mathbf{1}_A (\omega) = \begin{cases} 1 \; &\text{if } \omega \in A, \\ 0 \; &\text{if } \omega \notin A, \end{cases}</math> Line 97 ⟶ 92: :<math>\operatorname{E}(\mathbf{1}_A) = \operatorname{P}(A). \; </math> ~~Then~~Given ~~the~~a ~~'''[[conditional probability]] given~~ <math>\~~scriptstyle \mathcal B~~sigma</math>~~''' is a function~~-field <math>\~~scriptstyle \operatorname~~mathcal{PG}( \~~cdot\mid~~subseteq \mathcal{BF}~~):\mathcal{A} \times \Omega \to (0,1)~~</math>, ~~such~~the ~~that~~conditional probability <math>~~\scriptstyle~~ \operatorname{P}(A\mid\mathcal{BG})</math> is a version of the [[conditional expectation]] of the indicator function for <math>A</math>: :<math>\operatorname{P}(A\mid\mathcal{BG}) = \operatorname{E}(\mathbf{1}_A\mid\mathcal{BG}) \; </math> An expectation of a random variable with respect to a regular conditional probability is equal to its conditional expectation. ~~In other words, <math>\scriptstyle \operatorname{P}(A\mid\mathcal{B}) </math> is a <math>\scriptstyle \mathcal B</math>-measurable function satisfying~~ ===Interpretation of conditioning on a Sigma Field=== :<math>\int_B \operatorname{P}(A\mid\mathcal{B}) (\omega) \, \mathrm{d} \operatorname{P}(\omega) = \operatorname{P} (A \cap B) \qquad \text{for all} \quad A \in \mathcal{A}, B \in \mathcal{B}. </math> Consider the probability space <math>(\Omega, \mathcal{F}, \mathbb{P})</math> and a sub-sigma field <math>\mathcal{A} \subset \mathcal{F}</math>. The sub-sigma field <math>\mathcal{A}</math> can be loosely interpreted as containing a subset of the information in <math>\mathcal{F}</math>. For example, we might think of <math>\mathbb{P}(B\|\mathcal{A})</math> as the probability of the event <math>B</math> given the information in <math>\mathcal{A}</math>. Also recall that an event <math>B</math> is independent of a sub-sigma field <math>\mathcal{A}</math> if <math>\mathbb{P}(B \| A) = \mathbb{P}(B)</math> for all <math>A \in \mathcal{A}</math>. It is incorrect to conclude in general that the information in <math>\mathcal{A}</math> does not tell us anything about the probability of event <math>B</math> occurring. This can be shown with a counter-example: A conditional probability is [[Regular conditional probability\|'''regular''']] if <math>\scriptstyle \operatorname{P}(\cdot\mid\mathcal{B})(\omega) </math> is also a [[probability measure]] for all ''ω'' ∈ ''Ω''. An expectation of a random variable with respect to a regular conditional probability is equal to its conditional expectation. Consider a probability space on the [[unit interval]], <math>\Omega = [0, 1]</math>. Let <math>\mathcal{G}</math> be the sigma-field of all countable sets and sets whose complement is countable. So each set in <math>\mathcal{G}</math> has measure <math>0</math> or <math>1</math> and so is independent of each event in <math>\mathcal{F}</math>. However, notice that <math>\mathcal{G}</math> also contains all the singleton events in <math>\mathcal{F}</math> (those sets which contain only a single <math>\omega \in \Omega</math>). So knowing which of the events in <math>\mathcal{G}</math> occurred is equivalent to knowing exactly which <math>\omega \in \Omega</math> occurred! So in one sense, <math>\mathcal{G}</math> contains no information about <math>\mathcal{F}</math> (it is independent of it), and in another sense it contains all the information in <math>\mathcal{F}</math>.{{sfnp\|Billingsley\|2012}}{{Page needed\|date=May 2025}} ▲* For the trivial sigma algebra <math>\mathcal B= \{\emptyset,\Omega\}</math> the conditional probability is a constant function, <math>\operatorname{P}\!\left( A\mid \{\emptyset,\Omega\} \right) \equiv\operatorname{P}(A).</math> * For <math>A\in \mathcal{B}</math>, as outlined above, <math>\operatorname{P}(A\mid\mathcal{B})=1_A.</math>. == See also == * [[Conditioning (probability)]] * [[Conditional probability]] * [[Regular conditional probability]] * [[Bayes' theorem]] ~~==Notes==~~ ~~{{reflist}}~~ ==References== ===Citations=== {{cite book {{Reflist}} ~~\| first = Patrick \| last = Billingsley~~ ~~\| authorlink = Patrick Billingsley~~ ===Sources=== ~~\| title = Probability and Measure \|edition=3rd~~ {{refbegin}} ~~\| publisher = John Wiley and Sons~~ {{cite book \|last= Billingsley \|first= Patrick \|date= 1995 \|title= Probability and Measure \|edition= 3rd \|publisher= John Wiley and Sons \|___location= New York \|isbn= 0-471-00710-2 \|author-link= Patrick Billingsley \|url= https://books.google.com/books?id=a3gavZbxyJcC }} ~~\| ___location = New York~~ * {{cite book \|last= Billingsley \|first= Patrick \|date= 2012 \|title= Probability and Measure \|edition= Anniversary \|publisher= Wiley \|___location= Hoboken, New Jersey \|isbn= 978-1-118-12237-2 }} ~~\| year = 1995~~ * {{cite book \|last= Park \|first= Kun Il \|date= 2018 \|title= Fundamentals of Probability and Stochastic Processes with Applications to Communications \|publisher= Springer \|isbn= 978-3-319-68074-3}} ~~\| ref = billingsley95~~ * {{cite book \|last= Ross \|first= Sheldon M. \|date= 1993 \|title=Introduction to Probability Models \|edition= 5th \|___location= San Diego \|publisher= Academic Press \|isbn=0-12-598455-3 \|author-link= Sheldon M. Ross }} ~~\| url = https://books.google.com/books?id=a3gavZbxyJcC~~ {{refend}} }} {{Authority control}} [[Category:Theory of probability distributions]] [[Category:Conditional probability~~\|Distribution~~]]