Regular conditional probability: Difference between revisions

Content deleted Content added
Definition: link more relevant and formal article for 'state space'
mNo edit summary
Line 3:
==Motivation==
Normally we define the '''conditional probability''' of an event ''A'' given an event ''B'' as:
:<math>\mathfrak P(A|B)=\frac{\mathfrak P(A\cap B)}{\mathfrak P(B)}.</math>
The difficulty with this arises when the event ''B'' is too small to have a non-zero probability. For example, suppose we have a [[random variable]] ''X'' with a [[uniform distribution (continuous)|uniform distribution]] on <math>[0,1],</math> and ''B'' is the event that <math>X=2/3.</math> Clearly the probability of ''B'' in this case is <math>\mathfrak P(B)=0,</math> but nonetheless we would still like to assign meaning to a conditional probability such as <math>\mathfrak P(A|X=2/3).</math> To do so rigorously requires the definition of a regular conditional probability.
 
==Definition==
Let <math>(\Omega, \mathcal F, \mathfrak P)</math> be a [[probability space]], and let <math>T:\Omega\rightarrow E</math> be a [[random variable]], defined as a [[Borel measure|Borel-]][[measurable function]] from <math>\Omega</math> to its [[Probability_space#Random_variables|state space]] <math>(E, \mathcal E).</math> Then a '''regular conditional probability''' is defined as a function <math>\nu:E \times\mathcal F \rightarrow [0,1],</math> called a "transition probability", where <math>\nu(x,A)</math> is a valid probability measure (in its second argument) on <math>\mathcal F</math> for all <math>x\in E</math> and a measurable function in ''E'' (in its first argument) for all <math>A\in\mathcal F,</math> such that for all <math>A\in\mathcal F</math> and all <math>B\in\mathcal E</math><ref>D. Leao Jr. et al. ''Regular conditional probability, disintegration of probability and Radon spaces.'' Proyecciones. Vol. 23, No. 1, pp. 15–29, May 2004, Universidad Católica del Norte, Antofagasta, Chile [http://www.scielo.cl/pdf/proy/v23n1/art02.pdf PDF]</ref>
:<math>\mathfrak P\big(A\cap T^{-1}(B)\big) = \int_B \nu(x,A) \,d\mathfrak P\big(T^{-1}(d x)\big).</math>
 
To express this in our more familiar notation:
:<math>\mathfrak P(A|T=x) = \nu(x,A),</math>
where <math>x\in\mathrm{supp}\,T,</math> i.e. the [[Support (measure theory)|topological support]] of the [[pushforward measure]] <math>T _* \mathfrak P = \mathfrak P\big(T^{-1}(\cdot)\big).</math> As can be seen from the integral above, the value of <math>\nu</math> for points ''x'' outside the support of the random variable is meaningless; its significance as a conditional probability is strictly limited to the support of ''T''.
 
The [[measurable space]] <math>(\Omega, \mathcal F)</math> is said to have the '''regular conditional probability property''' if for all [[probability measure]]s <math>\mathfrak P</math> on <math>(\Omega, \mathcal F),</math> all [[random variable]]s on <math>(\Omega, \mathcal F, \mathfrak P)</math> admit a regular conditional probability. A [[Radon space]], in particular, has this property.
 
See also [[Conditional_expectation#Definition_of_conditional_probability|conditional probability]] and [[Conditional_probability_distribution#Measure-Theoretic_Formulation|conditional probability distribution]].
Line 22:
Consider a Radon space <math> \Omega </math> (that is a probability measure defined on a Radon space endowed with the Borel sigma-algebra) and a real-valued random variable ''T''. As discussed above, in this case there exists a regular conditional probability with respect to ''T''. Moreover we can alternatively define the '''regular conditional probability''' for an event ''A'' given a particular value ''t'' of the random variable ''T'' in the following manner:
 
:<math> \mathfrak P (A|T=t) = \lim_{U\ni t} \frac {\mathfrak P(A\cap U)}{\mathfrak P(U)},</math>
 
where the [[Limit (mathematics)|limit]] is taken over the [[Net (mathematics)|net]] of [[Open set|open]] [[Neighbourhood (mathematics)|neighborhoods]] ''U'' of ''t'' as they become [[Subset|smaller with respect to set inclusion]]. This limit is defined if and only if the probability space is [[Radon space|Radon]], and only in the support of ''T'', as described in the article. This is the restriction of the transition probability to the support of ''T''. To describe this limiting process rigorously:
 
For every <math>\epsilon > 0,</math> there exists an open neighborhood ''U'' of ''t'', such that for every open ''V'' with <math>t \in V \subset U,</math>
:<math>\left|\frac {\mathfrak P(A\cap V)}{\mathfrak P(V)}-L\right| < \epsilon,</math>
where <math>L = \mathfrak P (A|T=t)</math> is the limit.
 
==Example==
To continue with our motivating example above, we consider a real-valued random variable ''X'' and write
:<math>\mathfrak P(A|X=x_0) = \nu(x_0,A) = \lim_{\epsilon\rightarrow 0+} \frac {\mathfrak P(A\cap\{x_0-\epsilon < X < x_0+\epsilon\})}{\mathfrak P(\{x_0-\epsilon < X < x_0+\epsilon\})},</math>
(where <math>x_0=2/3</math> for the example given.) This limit, if it exists, is a regular conditional probability for ''X'', restricted to <math>\mathrm{supp}\,X.</math>
 
In any case, it is easy to see that this limit fails to exist for <math>x_0</math> outside the support of ''X'': since the support of a random variable is defined as the set of all points in its state space whose every [[Neighbourhood (mathematics)|neighborhood]] has positive probability, for every point <math>x_0</math> outside the support of ''X'' (by definition) there will be an <math>\epsilon > 0</math> such that <math>\mathfrak P(\{x_0-\epsilon < X < x_0+\epsilon\})=0.</math>
 
Thus if ''X'' is distributed uniformly on <math>[0,1],</math> it is truly meaningless to condition a probability on "<math>X=3/2</math>".