Regular conditional probability: Difference between revisions

Content deleted Content added
m Disambiguated: supportsupport (mathematics)
cleaned up based on feedback
Line 1:
In [[probability theory]], '''regular conditional probability''' is a concept that formalizes the notion of conditioning on the outcome of a [[random variable]]. The resulting '''conditional probability distribution''' is a parametrized family of probability measures called a [[Markov kernel]].
 
==Motivation Definition ==
 
=== Conditional probability distribution ===
Consider two random variables ''X'' and ''Y'', where the represents the roll of a die.
The conditional probability of ''Y'' being in a Borel set <math>A \subseteq \mathbb{R}</math> is given by
:<math>P(Y \in A | X = x) = \frac{P(Y \in A, X = x)}{P(X=x)}.</math>
Conditional probability forms a two-variable function <math>\nu:\mathbb{R} \times \mathcal{F} \to \mathbb{R}</math>
:<math>\nu(x, A) = P(A | X =x)</math>
Note that when ''x'' is not a possible outcome of ''X'', the function is undefined: the roll of a die coming up 27 is a probability zero event. The function <math>\nu</math> is defined [[almost everywhere]] in ''x''.
 
Now considerConsider two continuous random variables <math>X, Y : \Omega \to \mathbb{R}</math>. The ''Xconditional probability distribution'' andof ''Y'', withgiven ''X'' is a two variable densityfunction <math>f_\kappa_{X,Y|X}: \mathbb{R} \times \mathcal{B}(x,y\mathbb{R}) \to [0,1]</math>.
The conditional probability of ''Y'' being in ''A'' is given by
:<math>P(Y \in A | X = x) = \frac{\int_A f_{X,Y}(x, y) \mathrm{d}y}{\int_\mathbb{R} f_{X,Y}(x, y) \mathrm{d}y}.</math>
Conditional probability is a two variable function as before, undefined outside of the [[support (mathematics)|support]] of the distribution of ''X''.
 
If the random variable ''X'' is discrete
Note that this is not the same as conditioning on the event <math>B = \{X = x\}</math>, but is rather a limit: see [[Conditional probability#Conditioning on an event of probability zero]].
:<math>\kappa_{Y|X}(x, A) = P(Y \in A | X = x) = \begin{cases}
\frac{P(Y \in A, X = x)}{P(X=x)} & \text{ if } P(X = x) > 0 \\
\text{arbitrary value} & \text{ otherwise}.
\end{cases}</math>
 
If the random variables ''X'', ''Y'' are continuous with density <math>f_{X,Y}(x,y)</math>.
==Relation to conditional expectation==
:<math>\kappa_{Y|X}(x, A) = \begin{cases}
:<math>P(Y \in A | X = x) = \frac{\int_A f_{X,Y}(x, y) \mathrm{d}y}{\int_\mathbb{R} f_{X,Y}(x, y) \mathrm{d}y}.</math> &
\text{ if } \int_\mathbb{R} f_{X,Y}(x, y) \mathrm{d}y > 0 \\
\text{arbitrary value} & \text{ otherwise}.
\end{cases}</math>
 
A more general definition can be given in terms of [[conditional expectation]]. Consider a function <math> e_{Y \in A} : \mathbb{R} \to [0,1]</math> satisfying
In probability theory, the theory of [[conditional expectation]] is developed before that of regular conditional distributions.<ref>{{cite book |last1=Durrett |first1=Richard |title=Probability : theory and examples |date=2010 |publisher=Cambridge University Press |___location=Cambridge |isbn=9780521765398 |edition=4th}}</ref><ref>{{cite book |last1=Klenke |first1=Achim |title=Probability theory : a comprehensive course |___location=London |isbn=978-1-4471-5361-0 |edition=Second}}</ref>
:<math>P(e_{Y \in A | }(X = x(\omega)) = \fracmathbb{E}[1_{P(Y \in A,} | X = x)}{P](X=x\omega)}.</math>
for almost all <math>\omega</math>.
TheThen the conditional probability of ''Y'' being in ''A''distribution is given by
:<math>\nukappa_{Y|X}(x, A) = P(Ae_{Y |\in X =A}(x).</math>
 
As with conditional expectation, this can be further generalized to conditioning on a sigma algebra <math>\mathcal{F}</math>. In that case the conditional distribution is a function <math>\Omega \times \mathcal{B}(\mathbb{R}) \to [0, 1]</math>:
For discrete and continuous random variables, the conditional expectation is given by
:<math> \nukappa_{Y|\mathcal{F}}(\omega, A) = \mathbb{E}[1_{XY \in A} | Y](\omega).mathcal{F}]</math>
 
=== Regularity ===
 
For working with <math>\kappa_{Y|X}</math>, it is important that it be ''regular'', that is:
* For almost all ''x'', <math>A \mapsto \kappa_{Y|X}(x, A)</math> is a probability measure
* For all ''A'', <math>x \mapsto \kappa_{Y|X}(x, A)</math> is a measurable function
 
The former is trivial, but the proof of the latter is more involved. It can be shown that if ''Y'' is a random element <math>\Omega \to S</math> in a [[Radon space]] ''S'', there exists a <math>\kappa_{Y|X}</math> that satisfies the measurability condition.<ref>{{cite book |last1=Klenke |first1=Achim |title=Probability theory : a comprehensive course |___location=London |isbn=978-1-4471-5361-0 |edition=Second}}</ref> It is possible to construct more general spaces where a regular conditional probability distribution does not exist.<ref>Faden, A.M., 1985. The existence of regular conditional probabilities: necessary and sufficient conditions. ''The Annals of Probability'', 13(1), pp.288-298.</ref>
 
=== Relation to conditional expectation ===
 
In probability theory, the theory of [[conditional expectation]] is developed before that of regular conditional distributions.<ref>{{cite book |last1=Durrett |first1=Richard |title=Probability : theory and examples |date=2010 |publisher=Cambridge University Press |___location=Cambridge |isbn=9780521765398 |edition=4th}}</ref><ref>{{cite book |last1=Klenke |first1=Achim |title=Probability theory : a comprehensive course |___location=London |isbn=978-1-4471-5361-0 |edition=Second}}</ref>
 
For discrete and continuous random variables, the conditional expectation iscan givenbe byexpressed as
:<math>
\begin{aligned}
\mathbb{E}[X|Y|X=yx] &= \sum_xsum_y xy \, P(X=x|Y=y|X=x)\\
\mathbb{E}[Y|X|Y=yx] &= \int xy \, f_{X|Y|X}(x, y) \mathrm{d}xy
\end{aligned}
</math>
where <math>f_{X|Y|X}(x, y)</math> is the [[conditional density]] of {{mvar|XY}} given {{mvar|YX}}.
 
ItThis isresult naturalcan tobe askextended whetherto measure theoretical conditional expectation canusing alsothe beregular conditional expressedprobability asdistribution:
:<math>\mathbb{E}[X|Y|X](\omega) = \int xy \nu, \kappa_{Y|\sigma(X)}(\omega, \mathrm{d}xy)</math> .
where <math>\nu : \Omega \times \mathcal{B}(\overline{\mathbb{R}}) \to [0,1]</math> is a family of measures parametrized by outcome <math>\omega</math>.
 
== Formal definition==
Such a [[Markov kernel]] can be defined using conditional expectation:
:<math>\nu(\omega, A) = \mathbb{E}[1_{X \in A} | Y](\omega).</math>
It can be shown that for almost all <math>\omega</math>, this is a probability measure if <math>X : \Omega \to \mathbb{R}</math>. There are, however, counterexamples when the random variable {{mvar|X}} takes values in a more general space {{mvar|E}}. A space {{mvar|E}} can be constructed where <math>\nu</math> does not form a probability measure almost everywhere.
 
==Definition==
Let <math>(\Omega, \mathcal F, P)</math> be a [[probability space]], and let <math>T:\Omega\rightarrow E</math> be a [[random variable]], defined as a [[Borel measure|Borel-]][[measurable function]] from <math>\Omega</math> to its [[Probability space#Random variables|state space]] <math>(E, \mathcal E)</math>.
One should think of <math>T</math> as a way to "disintegrate" the sample space <math>\Omega</math> into <math>\{ T^{-1}(x) \}_{x \in E}</math>.
Line 48 ⟶ 64:
:<math>P\big(A\cap T^{-1}(B)\big) = \int_B \nu(x,A) \,P\big(T^{-1}(d x)\big).</math>
where <math>P\circ T^{-1}</math> is the [[pushforward measure]] <math>T_*P</math> of the distribution of the random element <math>T</math>,
<math>x\in\mathrm{supp}\,T,</math> i.e. the [[Support (measure theory)|topological support]] of the <math>T_* P</math>.
Specifically, if we take <math>B=E</math>, then <math>A \cap T^{-1}(E) = A</math>, and so
:<math>P(A) = \int_E \nu(x,A) \,P\big(T^{-1}(d x)\big)</math>,
where <math>\nu(x, A)</math> can be denoted, using more familiar terms <math>P(A\ |\ T=x)</math>.
(this is "defined" to be conditional probability of <math>A</math> given <math>x</math>, which
can be undefined in elementary constructions of conditional probability).
As can be seen from the integral above, the value of <math>\nu</math> for points ''x'' outside the support of the random variable is meaningless; its significance as a conditional probability is strictly limited to the support of ''T''.
 
The [[measurable space]] <math>(\Omega, \mathcal F)</math> is said to have the '''regular conditional probability property''' if for all [[probability measure]]s <math>P</math> on <math>(\Omega, \mathcal F),</math> all [[random variable]]s on <math>(\Omega, \mathcal F, P)</math> admit a regular conditional probability. A [[Radon space]], in particular, has this property.
 
See also [[Conditional expectation#Definition of conditional probability|conditional probability]] and [[Conditional probability distribution#Measure-Theoretic Formulation|conditional probability distribution]].
 
==Alternate definition==
Line 71 ⟶ 80:
:<math>\left|\frac {P(A\cap V)}{P(V)}-L\right| < \epsilon,</math>
where <math>L = P (A|T=t)</math> is the limit.
 
==Example==
To continue with our motivating example above, we consider a real-valued random variable ''X'' and write
:<math>P(A|X=x_0) = \nu(x_0,A) = \lim_{\epsilon\rightarrow 0+} \frac {P(A\cap\{x_0-\epsilon < X < x_0+\epsilon\})}{P(\{x_0-\epsilon < X < x_0+\epsilon\})},</math>
(where <math>x_0=2/3</math> for the example given.) This limit, if it exists, is a regular conditional probability for ''X'', restricted to <math>\mathrm{supp}\,X.</math>
 
In any case, it is easy to see that this limit fails to exist for <math>x_0</math> outside the support of ''X'': since the support of a random variable is defined as the set of all points in its state space whose every [[Neighbourhood (mathematics)|neighborhood]] has positive probability, for every point <math>x_0</math> outside the support of ''X'' (by definition) there will be an <math>\epsilon > 0</math> such that <math>P(\{x_0-\epsilon < X < x_0+\epsilon\})=0.</math>
 
Thus if ''X'' is distributed uniformly on <math>[0,1],</math> it is truly meaningless to condition a probability on "<math>X=3/2</math>".
 
==See also==