Revision as of 14:45, 13 February 2021 edit Lennart97 (talk \| contribs) Extended confirmed users, Page movers 69,838 edits m Disambiguated: support → support (mathematics) ← Previous edit		Revision as of 23:31, 28 March 2021 edit undo User357269 (talk \| contribs) 71 edits cleaned up based on feedback Next edit →
Line 1: In [[probability theory]], '''regular conditional probability''' is a concept that formalizes the notion of conditioning on the outcome of a [[random variable]]. The resulting '''conditional probability distribution''' is a parametrized family of probability measures called a [[Markov kernel]]. ==~~Motivation~~ Definition == === Conditional probability distribution === ~~Consider two random variables ''X'' and ''Y'', where the represents the roll of a die.~~ ~~The conditional probability of ''Y'' being in a Borel set <math>A \subseteq \mathbb{R}</math> is given by~~ :<math>P(Y \in A \| X = x) = \frac{P(Y \in A, X = x)}{P(X=x)}.</math>▼ ~~Conditional probability forms a two-variable function <math>\nu:\mathbb{R} \times \mathcal{F} \to \mathbb{R}</math>~~ :<math>\nu(x, A) = P(A \| X =x)</math>▼ Note that when ''x'' is not a possible outcome of ''X'', the function is undefined: the roll of a die coming up 27 is a probability zero event. The function <math>\nu</math> is defined [[almost everywhere]] in ''x''. ~~Now consider~~Consider two ~~continuous~~ random variables <math>X, Y : \Omega \to \mathbb{R}</math>. The ''Xconditional probability distribution'' ~~and~~of ''Y'', ~~with~~given ''X'' is a two variable ~~density~~function <math>f_\kappa_{X,Y\|X}: \mathbb{R} \times \mathcal{B}(~~x,y~~\mathbb{R}) \to [0,1]</math>. The conditional probability of ''Y'' being in ''A'' is given by▼ :<math>P(Y \in A \| X = x) = \frac{\int_A f_{X,Y}(x, y) \mathrm{d}y}{\int_\mathbb{R} f_{X,Y}(x, y) \mathrm{d}y}.</math>▼ ~~Conditional probability is a two variable function as before, undefined outside of the [[support (mathematics)\|support]] of the distribution of ''X''.~~ If the random variable ''X'' is discrete ~~Note that this is not the same as conditioning on the event <math>B = \{X = x\}</math>, but is rather a limit: see [[Conditional probability#Conditioning on an event of probability zero]].~~ :<math>\kappa_{Y\|X}(x, A) = P(Y \in A \| X = x) = \begin{cases} \frac{P(Y \in A, X = x)}{P(X=x)} & \text{ if } P(X = x) > 0 \\ \text{arbitrary value} & \text{ otherwise}. \end{cases}</math> If the random variables ''X'', ''Y'' are continuous with density <math>f_{X,Y}(x,y)</math>. ==Relation to conditional expectation==▼ :<math>\kappa_{Y\|X}(x, A) = \begin{cases} ▲~~:<math>P(Y \in A \| X = x) =~~ \frac{\int_A f_{X,Y}(x, y) \mathrm{d}y}{\int_\mathbb{R} f_{X,Y}(x, y) \mathrm{d}y}~~.</math>~~ & \text{ if } \int_\mathbb{R} f_{X,Y}(x, y) \mathrm{d}y > 0 \\ \text{arbitrary value} & \text{ otherwise}. \end{cases}</math> A more general definition can be given in terms of [[conditional expectation]]. Consider a function <math> e_{Y \in A} : \mathbb{R} \to [0,1]</math> satisfying In probability theory, the theory of [[conditional expectation]] is developed before that of regular conditional distributions.<ref>{{cite book \|last1=Durrett \|first1=Richard \|title=Probability : theory and examples \|date=2010 \|publisher=Cambridge University Press \|___location=Cambridge \|isbn=9780521765398 \|edition=4th}}</ref><ref>{{cite book \|last1=Klenke \|first1=Achim \|title=Probability theory : a comprehensive course \|___location=London \|isbn=978-1-4471-5361-0 \|edition=Second}}</ref>▼ ▲:<math>P(e_{Y \in A \| }(X ~~= x~~(\omega)) = \~~frac~~mathbb{E}[1_{P(Y \in A,} \| X ~~= x)}{P~~](~~X=x~~\omega)}.</math> for almost all <math>\omega</math>. ▲~~The~~Then the conditional probability ~~of ''Y'' being in ''A''~~distribution is given by ▲:<math>\nukappa_{Y\|X}(x, A) = ~~P(A~~e_{Y \|\in ~~X =~~A}(x).</math> As with conditional expectation, this can be further generalized to conditioning on a sigma algebra <math>\mathcal{F}</math>. In that case the conditional distribution is a function <math>\Omega \times \mathcal{B}(\mathbb{R}) \to [0, 1]</math>: For discrete and continuous random variables, the conditional expectation is given by▼ :<math> \nukappa_{Y\|\mathcal{F}}(\omega, A) = \mathbb{E}[1_{XY \in A} \| ~~Y](~~\~~omega).~~mathcal{F}]</math>▼ === Regularity === For working with <math>\kappa_{Y\|X}</math>, it is important that it be ''regular'', that is: * For almost all ''x'', <math>A \mapsto \kappa_{Y\|X}(x, A)</math> is a probability measure * For all ''A'', <math>x \mapsto \kappa_{Y\|X}(x, A)</math> is a measurable function The former is trivial, but the proof of the latter is more involved. It can be shown that if ''Y'' is a random element <math>\Omega \to S</math> in a [[Radon space]] ''S'', there exists a <math>\kappa_{Y\|X}</math> that satisfies the measurability condition.<ref>{{cite book \|last1=Klenke \|first1=Achim \|title=Probability theory : a comprehensive course \|___location=London \|isbn=978-1-4471-5361-0 \|edition=Second}}</ref> It is possible to construct more general spaces where a regular conditional probability distribution does not exist.<ref>Faden, A.M., 1985. The existence of regular conditional probabilities: necessary and sufficient conditions. ''The Annals of Probability'', 13(1), pp.288-298.</ref> ▲=== Relation to conditional expectation === ▲In probability theory, the theory of [[conditional expectation]] is developed before that of regular conditional distributions.<ref>{{cite book \|last1=Durrett \|first1=Richard \|title=Probability : theory and examples \|date=2010 \|publisher=Cambridge University Press \|___location=Cambridge \|isbn=9780521765398 \|edition=4th~~}}</ref><ref>{{cite book \|last1=Klenke \|first1=Achim \|title=Probability theory : a comprehensive course \|___location=London \|isbn=978-1-4471-5361-0 \|edition=Second~~}}</ref> ▲For discrete and continuous random variables, the conditional expectation iscan ~~given~~be byexpressed as :<math> \begin{aligned} \mathbb{E}[X\|Y\|X=yx] &= \~~sum_x~~sum_y xy \, P(~~X=x\|~~Y=y\|X=x)\\ \mathbb{E}[Y\|X\|Y=yx] &= \int xy \, f_{X\|Y\|X}(x, y) \mathrm{d}xy \end{aligned} </math> where <math>f_{X\|Y\|X}(x, y)</math> is the [[conditional density]] of {{mvar\|XY}} given {{mvar\|YX}}. ItThis isresult ~~natural~~can tobe ~~ask~~extended ~~whether~~to measure theoretical conditional expectation ~~can~~using ~~also~~the beregular conditional ~~expressed~~probability asdistribution: :<math>\mathbb{E}[X\|Y\|X](\omega) = \int xy \nu, \kappa_{Y\|\sigma(X)}(\omega, \mathrm{d}xy)</math> . ~~where <math>\nu : \Omega \times \mathcal{B}(\overline{\mathbb{R}}) \to [0,1]</math> is a family of measures parametrized by outcome <math>\omega</math>.~~ == Formal definition== ~~Such a [[Markov kernel]] can be defined using conditional expectation:~~ ▲:<math>\nu(\omega, A) = \mathbb{E}[1_{X \in A} \| Y](\omega).</math> It can be shown that for almost all <math>\omega</math>, this is a probability measure if <math>X : \Omega \to \mathbb{R}</math>. There are, however, counterexamples when the random variable {{mvar\|X}} takes values in a more general space {{mvar\|E}}. A space {{mvar\|E}} can be constructed where <math>\nu</math> does not form a probability measure almost everywhere. ~~==Definition==~~ Let <math>(\Omega, \mathcal F, P)</math> be a [[probability space]], and let <math>T:\Omega\rightarrow E</math> be a [[random variable]], defined as a [[Borel measure\|Borel-]][[measurable function]] from <math>\Omega</math> to its [[Probability space#Random variables\|state space]] <math>(E, \mathcal E)</math>. One should think of <math>T</math> as a way to "disintegrate" the sample space <math>\Omega</math> into <math>\{ T^{-1}(x) \}_{x \in E}</math>. Line 48 ⟶ 64: :<math>P\big(A\cap T^{-1}(B)\big) = \int_B \nu(x,A) \,P\big(T^{-1}(d x)\big).</math> where <math>P\circ T^{-1}</math> is the [[pushforward measure]] <math>T_P</math> of the distribution of the random element <math>T</math>, <math>x\in\mathrm{supp}\,T,</math> i.e. the [[Support (measure theory)\|~~topological~~ support]] of the <math>T_ P</math>. Specifically, if we take <math>B=E</math>, then <math>A \cap T^{-1}(E) = A</math>, and so :<math>P(A) = \int_E \nu(x,A) \,P\big(T^{-1}(d x)\big)</math>, where <math>\nu(x, A)</math> can be denoted, using more familiar terms <math>P(A\ \|\ T=x)</math>. ~~(this is "defined" to be conditional probability of <math>A</math> given <math>x</math>, which~~ ~~can be undefined in elementary constructions of conditional probability).~~ As can be seen from the integral above, the value of <math>\nu</math> for points ''x'' outside the support of the random variable is meaningless; its significance as a conditional probability is strictly limited to the support of ''T''. The [[measurable space]] <math>(\Omega, \mathcal F)</math> is said to have the '''regular conditional probability property''' if for all [[probability measure]]s <math>P</math> on <math>(\Omega, \mathcal F),</math> all [[random variable]]s on <math>(\Omega, \mathcal F, P)</math> admit a regular conditional probability. A [[Radon space]], in particular, has this property. See also [[Conditional expectation#Definition of conditional probability\|conditional probability]] and [[Conditional probability distribution#Measure-Theoretic Formulation\|conditional probability distribution]]. ==Alternate definition== Line 71 ⟶ 80: :<math>\left\|\frac {P(A\cap V)}{P(V)}-L\right\| < \epsilon,</math> where <math>L = P (A\|T=t)</math> is the limit. ~~==Example==~~ ~~To continue with our motivating example above, we consider a real-valued random variable ''X'' and write~~ ~~:<math>P(A\|X=x_0) = \nu(x_0,A) = \lim_{\epsilon\rightarrow 0+} \frac {P(A\cap\{x_0-\epsilon < X < x_0+\epsilon\})}{P(\{x_0-\epsilon < X < x_0+\epsilon\})},</math>~~ ~~(where <math>x_0=2/3</math> for the example given.) This limit, if it exists, is a regular conditional probability for ''X'', restricted to <math>\mathrm{supp}\,X.</math>~~ In any case, it is easy to see that this limit fails to exist for <math>x_0</math> outside the support of ''X'': since the support of a random variable is defined as the set of all points in its state space whose every [[Neighbourhood (mathematics)\|neighborhood]] has positive probability, for every point <math>x_0</math> outside the support of ''X'' (by definition) there will be an <math>\epsilon > 0</math> such that <math>P(\{x_0-\epsilon < X < x_0+\epsilon\})=0.</math> ~~Thus if ''X'' is distributed uniformly on <math>[0,1],</math> it is truly meaningless to condition a probability on "<math>X=3/2</math>".~~ ==See also==

Regular conditional probability: Difference between revisions