Regular conditional probability: Difference between revisions

Content deleted Content added
dab-needed tag
 
(14 intermediate revisions by 8 users not shown)
Line 1:
In [[probability theory]], '''regular conditional probability''' is a concept that formalizes the notion of conditioning on the outcome of a [[random variable]]. The resulting '''conditional probability distribution''' is a parametrized family of probability measures called a [[Markov kernel]].
 
==Motivation Definition ==
 
=== Conditional probability distribution ===
Consider two random variables ''X'' and ''Y'', where the represents the roll of a die.
The conditional probability of ''Y'' being in a Borel set <math>A \subseteq \mathbb{R}</math> is given by
:<math>P(Y \in A | X = x) = \frac{P(Y \in A, X = x)}{P(X=x)}.</math>
Conditional probability forms a two-variable function <math>\nu:\mathbb{R} \times \mathcal{F} \to \mathbb{R}</math>
:<math>\nu(x, A) = P(A | X =x)</math>
Note that when ''x'' is not a possible outcome of ''X'', the function is undefined: the roll of a die coming up 27 is a probability zero event. The function <math>\nu</math> is defined [[almost everywhere]] in ''x''.
 
Now considerConsider two continuous random variables <math>X, Y : \Omega \to \mathbb{R}</math>. The ''Xconditional probability distribution'' andof ''Y'', withgiven ''X'' is a two variable densityfunction <math>f_\kappa_{X,Y\mid X}: \mathbb{R} \times \mathcal{B}(x,y\mathbb{R}) \to [0,1]</math>.
The conditional probability of ''Y'' being in ''A'' is given by
:<math>P(Y \in A | X = x) = \frac{\int_A f_{X,Y}(x, y) \mathrm{d}y}{\int_\mathbb{R} f_{X,Y}(x, y) \mathrm{d}y}.</math>
Conditional probability is a two variable function as before, undefined outside of the [[support]]{{dn|date=February 2021}} of the distribution of ''X''.
 
If the random variable ''X'' is discrete
Note that this is not the same as conditioning on the event <math>B = \{X = x\}</math>, but is rather a limit: see [[Conditional probability#Conditioning on an event of probability zero]].
:<math>\kappa_{Y\mid X}(x, A) = P(Y \in A \mid X = x) = \begin{cases}
\frac{P(Y \in A, X = x)}{P(X=x)} & \text{ if } P(X = x) > 0 \\[3pt]
\text{arbitrary value} & \text{ otherwise}.
\end{cases}</math>
 
If the random variables ''X'', ''Y'' are continuous with density <math>f_{X,Y}(x,y)</math>.
==Relation to conditional expectation==
:<math>\kappa_{Y\mid X}(x, A) = \begin{cases}
:<math>P(Y \in A | X = x) = \frac{\int_A f_{X,Y}(x, y) \, \mathrm{d}y}{\int_\mathbb{R} f_{X,Y}(x, y) \mathrm{d}y}.</math> &
\text{ if } \int_\mathbb{R} f_{X,Y}(x, y) \, \mathrm{d}y > 0 \\[3pt]
\text{arbitrary value} & \text{ otherwise}.
\end{cases}</math>
 
A more general definition can be given in terms of [[conditional expectation]]. Consider a function <math> e_{Y \in A} : \mathbb{R} \to [0,1]</math> satisfying
In probability theory, the theory of [[conditional expectation]] is developed before that of regular conditional distributions.<ref>{{cite book |last1=Durrett |first1=Richard |title=Probability : theory and examples |date=2010 |publisher=Cambridge University Press |___location=Cambridge |isbn=9780521765398 |edition=4th}}</ref><ref>{{cite book |last1=Klenke |first1=Achim |title=Probability theory : a comprehensive course |___location=London |isbn=978-1-4471-5361-0 |edition=Second}}</ref>
:<math>e_{Y \nuin A}(X(\omega, A)) = \mathbb{operatorname E}[1_{XY \in A} |\mid YX](\omega).</math>
for almost all <math>\omega</math>.
TheThen the conditional probability of ''Y'' being in ''A''distribution is given by
:<math>P(\kappa_{Y \in A |mid X}(x, = xA) = \frace_{P(Y \in A, X = x)}{P(X=x)}.</math>
 
As with conditional expectation, this can be further generalized to conditioning on a sigma algebra <math>\mathcal{F}</math>. In that case the conditional distribution is a function <math>\Omega \times \mathcal{B}(\mathbb{R}) \to [0, 1]</math>:
For discrete and continuous random variables, the conditional expectation is given by
:<math> \kappa_{Y\mid\mathcal{F}}(\omega, A) = \operatorname E[1_{Y \in A} \mid \mathcal{F}](\omega)</math>
 
=== Regularity ===
 
For working with <math>\kappa_{Y\mid X}</math>, it is important that it be ''regular'', that is:
# For almost all ''x'', <math>A \mapsto \kappa_{Y\mid X}(x, A)</math> is a probability measure
# For all ''A'', <math>x \mapsto \kappa_{Y\mid X}(x, A)</math> is a measurable function
In other words <math>\kappa_{Y\mid X}</math> is a [[Markov kernel]].
 
The second condition holds trivially, but the proof of the first is more involved. It can be shown that if ''Y'' is a random element <math>\Omega \to S</math> in a [[Radon space]] ''S'', there exists a <math>\kappa_{Y\mid X}</math> that satisfies the first condition.<ref>{{cite book |last1=Klenke |first1=Achim |title=Probability theory : a comprehensive course |date=30 August 2013 |___location=London |isbn=978-1-4471-5361-0 |edition=Second}}</ref> It is possible to construct more general spaces where a regular conditional probability distribution does not exist.<ref>Faden, A.M., 1985. The existence of regular conditional probabilities: necessary and sufficient conditions. ''The Annals of Probability'', 13(1), pp.&nbsp;288–298.</ref>
 
=== Relation to conditional expectation ===
For discrete and continuous random variables, the [[conditional expectation]] iscan givenbe byexpressed as
:<math>
\begin{aligned}
\mathbb{operatorname E}[X|Y\mid X=yx] &= \sum_xsum_y xy \, P(X=x|Y=y\mid X=x) \\
\mathbb{operatorname E}[X|Y\mid X=yx] &= \int xy \, f_{X|Y\mid X}(x, y) \, \mathrm{d}xy
\end{aligned}
</math>
where <math>f_{X|Y\mid X}(x, y)</math> is the [[conditional density]] of {{mvar|XY}} given {{mvar|YX}}.
 
ItThis isresult naturalcan tobe askextended whetherto measure theoretical conditional expectation canusing alsothe beregular conditional expressedprobability asdistribution:
:<math>\mathbb{operatorname E}[X|Y\mid X](\omega) = \int xy \nu, \kappa_{Y\mid\sigma(X)}(\omega, \mathrm{d}xy).</math>
where <math>\nu : \Omega \times \mathcal{B}(\overline{\mathbb{R}}) \to [0,1]</math> is a family of measures parametrized by outcome <math>\omega</math>.
 
== Formal definition==
Such a [[Markov kernel]] can be defined using conditional expectation:
:<math>\nu(\omega, A) = \mathbb{E}[1_{X \in A} | Y](\omega).</math>
It can be shown that for almost all <math>\omega</math>, this is a probability measure if <math>X : \Omega \to \mathbb{R}</math>. There are, however, counterexamples when the random variable {{mvar|X}} takes values in a more general space {{mvar|E}}. A space {{mvar|E}} can be constructed where <math>\nu</math> does not form a probability measure almost everywhere.
 
==Definition==
Let <math>(\Omega, \mathcal F, P)</math> be a [[probability space]], and let <math>T:\Omega\rightarrow E</math> be a [[random variable]], defined as a [[Borel measure|Borel-]][[measurable function]] from <math>\Omega</math> to its [[Probability space#Random variables|state space]] <math>(E, \mathcal E)</math>.
One should think of <math>T</math> as a way to "disintegrate" the sample space <math>\Omega</math> into <math>\{ T^{-1}(x) \}_{x \in E}</math>.
Line 46 ⟶ 60:
* For all <math>A\in\mathcal F</math>, <math>\nu(\cdot, A)</math> (a mapping <math>E \to [0,1]</math>) is <math>\mathcal E</math>-measurable, and
* For all <math>A\in\mathcal F</math> and all <math>B\in\mathcal E</math><ref>D. Leao Jr. et al. ''Regular conditional probability, disintegration of probability and Radon spaces.'' Proyecciones. Vol. 23, No. 1, pp. 15–29, May 2004, Universidad Católica del Norte, Antofagasta, Chile [http://www.scielo.cl/pdf/proy/v23n1/art02.pdf PDF]</ref>
:: <math>P\big(A\cap T^{-1}(B)\big) = \int_B \nu(x,A) \,(P\big(circ T^{-1})(\mathrm{d }x)\big).</math>
where <math>P\circ T^{-1}</math> is the [[pushforward measure]] <math>T_*P</math> of the distribution of the random element <math>T</math>,
<math>x\in\mathrmoperatorname{supp}\,T,</math> i.e. the [[Support (measure theory)|topological support]] of the <math>T_* P</math>.
Specifically, if we take <math>B=E</math>, then <math>A \cap T^{-1}(E) = A</math>, and so
:<math>P(A) = \int_E \nu(x,A) \, (P\big(circ T^{-1})(\mathrm{d }x)\big),</math>,
where <math>\nu(x, A)</math> can be denoted, using more familiar terms <math>P(A\ |\ T=x)</math>.
(this is "defined" to be conditional probability of <math>A</math> given <math>x</math>, which
can be undefined in elementary constructions of conditional probability).
As can be seen from the integral above, the value of <math>\nu</math> for points ''x'' outside the support of the random variable is meaningless; its significance as a conditional probability is strictly limited to the support of ''T''.
 
The [[measurable space]] <math>(\Omega, \mathcal F)</math> is said to have the '''regular conditional probability property''' if for all [[probability measure]]s <math>P</math> on <math>(\Omega, \mathcal F),</math> all [[random variable]]s on <math>(\Omega, \mathcal F, P)</math> admit a regular conditional probability. A [[Radon space]], in particular, has this property.
 
See also [[Conditional expectation#Definition of conditional probability|conditional probability]] and [[Conditional probability distribution#Measure-Theoretic Formulation|conditional probability distribution]].
 
==Alternate definition==
{{disputeabout|'''this way leads to irregular conditional probability'''|Non-regular conditional probability|date=September 2009}}
Consider a [[Radon space]] <math> \Omega </math> (that is a probability measure defined on a Radon space endowed with the Borel sigma-algebra) and a real-valued random variable ''T''. As discussed above, in this case there exists a regular conditional probability with respect to ''T''. Moreover, we can alternatively define the '''regular conditional probability''' for an event ''A'' given a particular value ''t'' of the random variable ''T'' in the following manner:
 
:<math> P (A|T=t) = \lim_{U\supset \{T= t\}} \frac {P(A\cap U)}{P(U)},</math>
 
where the [[Limit (mathematics)|limit]] is taken over the [[Net (mathematics)|net]] of [[Open set|open]] [[Neighbourhood (mathematics)|neighborhoods]] ''U'' of ''t'' as they become [[Subset|smaller with respect to set inclusion]]. This limit is defined if and only if the probability space is [[Radon space|Radon]], and only in the support of ''T'', as described in the article. This is the restriction of the transition probability to the support of ''T''. To describe this limiting process rigorously:
 
For every <math>\epsilon > 0,</math> there exists an open neighborhood ''U'' of the event {''T=t''}, such that for every open ''V'' with <math>\{T=t\} \subset V \subset U,</math>
:<math>\left|\frac {P(A\cap V)}{P(V)}-L\right| < \epsilon,</math>
where <math>L = P (A|T=t)</math> is the limit.
 
:<math> P (A|\mid T=t) = \lim_{U\supset \{T= t\}} \frac {P(A\cap U)}{P(U)},</math>
==Example==
To continue with our motivating example above, we consider a real-valued random variable ''X'' and write
:<math>P(A|X=x_0) = \nu(x_0,A) = \lim_{\epsilon\rightarrow 0+} \frac {P(A\cap\{x_0-\epsilon < X < x_0+\epsilon\})}{P(\{x_0-\epsilon < X < x_0+\epsilon\})},</math>
(where <math>x_0=2/3</math> for the example given.) This limit, if it exists, is a regular conditional probability for ''X'', restricted to <math>\mathrm{supp}\,X.</math>
 
where the [[Limit (mathematics)|limit]] is taken over the [[Net (mathematics)|net]] of [[Open set|open]] [[Neighbourhood (mathematics)|neighborhoods]] ''U'' of ''t'' as they become [[Subset|smaller with respect to set inclusion]]. This limit is defined if and only if the probability space is [[Radon space|Radon]], and only in the support of ''T'', as described in the article. This is the restriction of the transition probability to the support of &nbsp;''T''. To describe this limiting process rigorously:
In any case, it is easy to see that this limit fails to exist for <math>x_0</math> outside the support of ''X'': since the support of a random variable is defined as the set of all points in its state space whose every [[Neighbourhood (mathematics)|neighborhood]] has positive probability, for every point <math>x_0</math> outside the support of ''X'' (by definition) there will be an <math>\epsilon > 0</math> such that <math>P(\{x_0-\epsilon < X < x_0+\epsilon\})=0.</math>
 
For every <math>\epsilonvarepsilon > 0,</math> there exists an open neighborhood ''U'' of the event {''T''&nbsp;=&nbsp;''t''}, such that for every open ''V'' with <math>\{T=t\} \subset V \subset U,</math>
Thus if ''X'' is distributed uniformly on <math>[0,1],</math> it is truly meaningless to condition a probability on "<math>X=3/2</math>".
:<math>\left|\frac {P(A\cap V)}{P(V)}-L\right| < \epsilonvarepsilon,</math>
where <math>L = P (A|\mid T=t)</math> is the limit.
 
==See also==