Regular conditional probability: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 13:42, 12 March 2010 edit Mild Bill Hiccup (talk \| contribs) Extended confirmed users, Pending changes reviewers, Rollbackers 175,657 edits m →Motivation: spelling ← Previous edit		Latest revision as of 21:14, 3 November 2024 edit undo 84.169.251.219 (talk) →Conditional probability distribution
(47 intermediate revisions by 26 users not shown)
Line 1: In [[probability theory]], '''~~Regular~~regular conditional probability''' is a concept that ~~has~~formalizes ~~developed~~the tonotion ~~overcome~~of ~~certain~~conditioning ~~difficulties~~on inthe ~~formally~~outcome ~~defining~~of a [[~~Conditional~~random ~~probability\|conditional probabilities~~variable]]. ~~for~~The ~~[[continuous~~resulting '''conditional probability distribution~~]]s. It~~''' is ~~defined~~a asparametrized anfamily ~~alternative~~of [[probability ~~measure]]~~measures ~~conditioned on a particular value of~~called a [[~~random~~Markov ~~variable~~kernel]]. ==~~Motivation~~ Definition == ~~Normally we define the '''conditional probability''' of an event ''A'' given an event ''B'' as:~~ ~~:<math>\mathfrak P(A\|B)=\frac{\mathfrak P(A\cap B)}{\mathfrak P(B)}.</math>~~ The difficulty with this arises when the event ''B'' is too small to have a non-zero probability. For example, suppose we have a [[random variable]] ''X'' with a [[uniform distribution (continuous)\|uniform distribution]] on <math>[0,1],</math> and ''B'' is the event that <math>X=2/3.</math> Clearly the probability of ''B'' in this case is <math>\mathfrak P(B)=0,</math> but nonetheless we would still like to assign meaning to a conditional probability such as <math>\mathfrak P(A\|X=2/3).</math> To do so rigorously requires the definition of a regular conditional probability. === Conditional probability distribution === ~~==Definition==~~ Let <math>(\Omega, \mathcal F, \mathfrak P)</math> be a [[probability space]], and let <math>T:\Omega\rightarrow E</math> be a [[random variable]], defined as a [[Borel measure\|Borel-]][[measurable function]] from <math>\Omega</math> to its [[state space]] <math>(E, \mathcal E).</math> Then a '''regular conditional probability''' is defined as a function <math>\nu:E \times\mathcal F \rightarrow [0,1],</math> called a "transition probability", where <math>\nu(x,A)</math> is a valid probability measure (in its second argument) on <math>\mathcal F</math> for all <math>x\in E</math> and a measurable function in ''E'' (in its first argument) for all <math>A\in\mathcal F,</math> such that for all <math>A\in\mathcal F</math> and all <math>B\in\mathcal E</math><ref>D. Leao Jr. et al. ''Regular conditional probability, disintegration of probability and Radon spaces.'' Proyecciones. Vol. 23, No. 1, pp. 15–29, May 2004, Universidad Católica del Norte, Antofagasta, Chile [http://www.scielo.cl/pdf/proy/v23n1/art02.pdf PDF]</ref> ~~:<math>\mathfrak P\big(A\cap T^{-1}(B)\big) = \int_B \nu(x,A) \,d\mathfrak P\big(T^{-1}(x)\big).</math>~~ Consider two random variables <math>X, Y : \Omega \to \mathbb{R}</math>. The ''conditional probability distribution'' of ''Y'' given ''X'' is a two variable function <math>\kappa_{Y\mid X}: \mathbb{R} \times \mathcal{B}(\mathbb{R}) \to [0,1]</math> ~~To express this in our more familiar notation:~~ ~~:<math>\mathfrak P(A\|T=x) = \nu(x,A),</math>~~ where <math>x\in\mathrm{supp}\,T,</math> i.e. the [[Support (measure theory)\|topological support]] of the [[pushforward measure]] <math>T _* \mathfrak P = \mathfrak P\big(T^{-1}(\cdot)\big).</math> As can be seen from the integral above, the value of <math>\nu</math> for points ''x'' outside the support of the random variable is meaningless; its significance as a conditional probability is strictly limited to the support of ''T''. If the random variable ''X'' is discrete The [[measurable space]] <math>(\Omega, \mathcal F)</math> is said to have the '''regular conditional probability property''' if for all [[probability measure]]s <math>\mathfrak P</math> on <math>(\Omega, \mathcal F),</math> all [[random variable]]s on <math>(\Omega, \mathcal F, \mathfrak P)</math> admit a regular conditional probability. A [[Radon space]], in particular, has this property. :<math>\kappa_{Y\mid X}(x, A) = P(Y \in A \mid X = x) = \begin{cases} \frac{P(Y \in A, X = x)}{P(X=x)} & \text{ if } P(X = x) > 0 \\[3pt] \text{arbitrary value} & \text{ otherwise}. \end{cases}</math> If the random variables ''X'', ''Y'' are continuous with density <math>f_{X,Y}(x,y)</math>. ~~==Alternate definition==~~ :<math>\kappa_{Y\mid X}(x, A) = \begin{cases} ~~{{disputeabout\|'''this way leads to irregular conditional probability'''\|Non-regular conditional probability\|date=September 2009}}~~ \frac{\int_A f_{X,Y}(x, y) \, \mathrm{d}y}{\int_\mathbb{R} f_{X,Y}(x, y) \mathrm{d}y} & ~~We may also define a '''regular conditional probability''' for an event ''A'' given a particular value ''t'' of the random variable ''T'' in the following manner:~~ \text{ if } \int_\mathbb{R} f_{X,Y}(x, y) \, \mathrm{d}y > 0 \\[3pt] \text{arbitrary value} & \text{ otherwise}. \end{cases}</math> A more general definition can be given in terms of [[conditional expectation]]. Consider a function <math> e_{Y \in A} : \mathbb{R} \to [0,1]</math> satisfying ~~:<math> \mathfrak P (A\|T=t) = \lim_{U\ni t} \frac {\mathfrak P(A\cap U)}{\mathfrak P(U)},</math>~~ :<math>e_{Y \in A}(X(\omega)) = \operatorname E[1_{Y \in A} \mid X](\omega)</math> for almost all <math>\omega</math>. Then the conditional probability distribution is given by :<math>\kappa_{Y\mid X}(x, A) = e_{Y \in A}(x).</math> As with conditional expectation, this can be further generalized to conditioning on a sigma algebra <math>\mathcal{F}</math>. In that case the conditional distribution is a function <math>\Omega \times \mathcal{B}(\mathbb{R}) \to [0, 1]</math>: where the [[Limit (mathematics)\|limit]] is taken over the [[Net (mathematics)\|net]] of [[Open set\|open]] [[Neighbourhood (mathematics)\|neighborhoods]] ''U'' of ''t'' as they become [[Subset\|smaller with respect to set inclusion]]. This limit is defined if and only if the probability space is [[Radon space\|Radon]], and only in the support of ''T'', as described in the article. This is the restriction of the transition probability to the support of ''T''. To describe this limiting process rigorously: :<math> \kappa_{Y\mid\mathcal{F}}(\omega, A) = \operatorname E[1_{Y \in A} \mid \mathcal{F}](\omega)</math> === Regularity === ~~For every <math>\epsilon > 0,</math> there exists an open neighborhood ''U'' of ''t'', such that for every open ''V'' with <math>t \in V \subset U,</math>~~ ~~:<math>\left\|\frac {\mathfrak P(A\cap V)}{\mathfrak P(V)}-L\right\| < \epsilon,</math>~~ ~~where <math>L = \mathfrak P (A\|T=t)</math> is the limit.~~ For working with <math>\kappa_{Y\mid X}</math>, it is important that it be ''regular'', that is: ~~==Example==~~ # For almost all ''x'', <math>A \mapsto \kappa_{Y\mid X}(x, A)</math> is a probability measure ~~To continue with our motivating example above, where ''X'' is a real-valued random variable, we may write~~ # For all ''A'', <math>x \mapsto \kappa_{Y\mid X}(x, A)</math> is a measurable function ~~:<math>\mathfrak P(A\|X=x_0) = \nu(x_0,A) = \lim_{\epsilon\rightarrow 0+} \frac {\mathfrak P(A\cap\{x_0-\epsilon < X < x_0+\epsilon\})}{\mathfrak P(\{x_0-\epsilon < X < x_0+\epsilon\})},</math>~~ In other words <math>\kappa_{Y\mid X}</math> is a [[Markov kernel]]. ~~(where <math>x_0=2/3</math> for the example given.) This limit, if it exists, is a regular conditional probability for ''X'', restricted to <math>\mathrm{supp}\,X.</math>~~ The second condition holds trivially, but the proof of the first is more involved. It can be shown that if ''Y'' is a random element <math>\Omega \to S</math> in a [[Radon space]] ''S'', there exists a <math>\kappa_{Y\mid X}</math> that satisfies the first condition.<ref>{{cite book \|last1=Klenke \|first1=Achim \|title=Probability theory : a comprehensive course \|date=30 August 2013 \|___location=London \|isbn=978-1-4471-5361-0 \|edition=Second}}</ref> It is possible to construct more general spaces where a regular conditional probability distribution does not exist.<ref>Faden, A.M., 1985. The existence of regular conditional probabilities: necessary and sufficient conditions. ''The Annals of Probability'', 13(1), pp. 288–298.</ref> === Relation to conditional expectation === For discrete and continuous random variables, the [[conditional expectation]] can be expressed as :<math> \begin{aligned} \operatorname E[Y\mid X=x] &= \sum_y y \, P(Y=y\mid X=x) \\ \operatorname E[Y\mid X=x] &= \int y \, f_{Y\mid X}(x, y) \, \mathrm{d}y \end{aligned} </math> where <math>f_{Y\mid X}(x, y)</math> is the [[conditional density]] of {{mvar\|Y}} given {{mvar\|X}}. This result can be extended to measure theoretical conditional expectation using the regular conditional probability distribution: :<math>\operatorname E[Y\mid X](\omega) = \int y \, \kappa_{Y\mid\sigma(X)}(\omega, \mathrm{d}y).</math> == Formal definition== Let <math>(\Omega, \mathcal F, P)</math> be a [[probability space]], and let <math>T:\Omega\rightarrow E</math> be a [[random variable]], defined as a [[Borel measure\|Borel-]][[measurable function]] from <math>\Omega</math> to its [[Probability space#Random variables\|state space]] <math>(E, \mathcal E)</math>. One should think of <math>T</math> as a way to "disintegrate" the sample space <math>\Omega</math> into <math>\{ T^{-1}(x) \}_{x \in E}</math>. Using the [[disintegration theorem]] from the measure theory, it allows us to "disintegrate" the measure <math>P</math> into a collection of measures, one for each <math>x \in E</math>. Formally, a '''regular conditional probability''' is defined as a function <math>\nu:E \times\mathcal F \rightarrow [0,1],</math> called a "transition probability", where: * For every <math>x \in E</math>, <math>\nu(x, \cdot)</math> is a probability measure on <math>\mathcal F</math>. Thus we provide one measure for each <math>x \in E</math>. * For all <math>A\in\mathcal F</math>, <math>\nu(\cdot, A)</math> (a mapping <math>E \to [0,1]</math>) is <math>\mathcal E</math>-measurable, and * For all <math>A\in\mathcal F</math> and all <math>B\in\mathcal E</math><ref>D. Leao Jr. et al. ''Regular conditional probability, disintegration of probability and Radon spaces.'' Proyecciones. Vol. 23, No. 1, pp. 15–29, May 2004, Universidad Católica del Norte, Antofagasta, Chile [http://www.scielo.cl/pdf/proy/v23n1/art02.pdf PDF]</ref> :: <math>P\big(A\cap T^{-1}(B)\big) = \int_B \nu(x,A) \,(P\circ T^{-1})(\mathrm{d}x)</math> where <math>P\circ T^{-1}</math> is the [[pushforward measure]] <math>T_P</math> of the distribution of the random element <math>T</math>, <math>x\in\operatorname{supp}T,</math> i.e. the [[Support (measure theory)\|support]] of the <math>T_ P</math>. Specifically, if we take <math>B=E</math>, then <math>A \cap T^{-1}(E) = A</math>, and so :<math>P(A) = \int_E \nu(x,A) \, (P\circ T^{-1})(\mathrm{d}x),</math> where <math>\nu(x, A)</math> can be denoted, using more familiar terms <math>P(A\ \|\ T=x)</math>. ==Alternate definition== {{disputeabout\|'''this way leads to irregular conditional probability'''\|Non-regular conditional probability\|date=September 2009}} Consider a [[Radon space]] <math> \Omega </math> (that is a probability measure defined on a Radon space endowed with the Borel sigma-algebra) and a real-valued random variable ''T''. As discussed above, in this case there exists a regular conditional probability with respect to ''T''. Moreover, we can alternatively define the '''regular conditional probability''' for an event ''A'' given a particular value ''t'' of the random variable ''T'' in the following manner: :<math> P (A\mid T=t) = \lim_{U\supset \{T= t\}} \frac {P(A\cap U)}{P(U)},</math> In any case, it is easy to see that this limit fails to exist for <math>x_0</math> outside the support of ''X'': since the support of a random variable is defined as the set of all points in its state space whose every [[Neighbourhood (mathematics)\|neighborhood]] has positive probability, for every point <math>x_0</math> outside the support of ''X'' (by definition) there will be an <math>\epsilon > 0</math> such that <math>\mathfrak P(\{x_0-\epsilon < X < x_0+\epsilon\})=0.</math> where the [[Limit (mathematics)\|limit]] is taken over the [[Net (mathematics)\|net]] of [[Open set\|open]] [[Neighbourhood (mathematics)\|neighborhoods]] ''U'' of ''t'' as they become [[Subset\|smaller with respect to set inclusion]]. This limit is defined if and only if the probability space is [[Radon space\|Radon]], and only in the support of ''T'', as described in the article. This is the restriction of the transition probability to the support of ''T''. To describe this limiting process rigorously: ~~Thus if ''X'' is distributed uniformly on <math>[0,1],</math> it is truly meaningless to condition a probability on "<math>X=3/2</math>".~~ For every <math>\varepsilon > 0,</math> there exists an open neighborhood ''U'' of the event {''T'' = ''t''}, such that for every open ''V'' with <math>\{T=t\} \subset V \subset U,</math> ~~==Regularity versus completeness==~~ :<math>\left\|\frac {P(A\cap V)}{P(V)}-L\right\| < \varepsilon,</math> ~~{\| border="1"~~ where <math>L = P (A\mid T=t)</math> is the limit. ~~\| '''[[Standard probability space]]'''~~ ~~\| '''[[Radon space]]'''~~ \|- ~~\| [[Lebesgue measure]]~~ ~~\| [[Borel measure]]~~ \|- ~~\| [[Complete measure]]~~ ~~\| [[Regular measure]]~~ \|- ~~\| [[Conditional probability]]~~ ~~\| [[Regular conditional probability]]~~ \|- ~~\| Extremely complicated and weak.~~ ~~\| Simple and powerful.~~ \|- ~~\| [[Pathological (mathematics)\|Pathological]] cases.~~ ~~\| No pathological cases.~~ \|- ~~\| <math>\lambda(\mathbb Q \cap [0,1])=0.</math>~~ ~~\| <math>\mu(\mathbb Q \cap [0,1])</math> is undefined.~~ \|- ~~\| '''Probability is [[Sigma additivity\|<math>\sigma</math>-additive]]'''~~ ~~\| '''except for sets with [[isolated point]]s.'''~~ \|} Note: In this article we use the [[Fraktur (script)\|Fraktur]] <math>\mathfrak P</math> (whose shape is somewhat reminiscent of <math>\mathfrak B</math> for Borel) to indicate a probability based on a regular measure as opposed to one based on a complete measure. The notions of regularity and completeness are [[Mutually exclusive\|incompatible]] in a [[separable space]]. ==See also== Line 75 ⟶ 89: ==External links== * [http://planetmath.org/~~encyclopedia/ConditionalProbabilityMeasure.html~~regularconditionalprobability Regular Conditional Probability] on [http://planetmath.org/ PlanetMath] [[Category:~~Probability~~Conditional ~~theory~~probability]] [[Category:Measure theory]]