Conditional probability distribution: Difference between revisions

Content deleted Content added
D'ohBot (talk | contribs)
No edit summary
Line 2:
Given two jointly distributed [[random variable]]s ''X'' and ''Y'', the '''conditional probability distribution''' of ''Y'' given ''X'' (written "''Y'' | ''X''") is the [[probability distribution]] of ''Y'' when ''X'' is known to be a particular value.
 
For [[discrete random variable]]s, the [[conditional probability]] mass function can be written as ''P''(''Y'' = ''y'' | ''X'' = ''x''). From the definition of [[conditional probability]], this is defined as
 
:<math>P(Y = y \mid X = x) = \frac{P(X=x\ \cap Y=y)}{P(X=x)}= \frac{P(X = x \mid Y = y) P(Y = y)}{P(X = x)}.</math>
 
Similarly for [[continuous random variable]]s, the conditional [[probability density function]] can be written as ''pf''<sub>''Y''|''X''</sub>(''y'' | ''X=x'') and this is
 
:<math>p_{Y \mid X}f_Y(y \mid X=x) = \frac{p_f_{X, Y}(x, y)}{p_Xf_X(x)}= \frac{p_{X \mid Y}f_X(x \mid Y=y)p_Yf_Y(y)}{p_Xf_X(x)}, </math>
 
where ''pf''<sub>''X'',''Y''</sub>(x, y) gives the [[joint distribution|joint density]] of ''X'' and ''Y'', while ''pf''<sub>''X''</sub>(''x'') gives the [[marginal distributiondensity]] for ''X''.
 
The concept of the conditional distribution of a continuous random variable is not as intuitive as it might seem: [[Borel's paradox]] shows that conditional probability density functions need not be invariant under coordinate transformations.
 
If for discrete random variables ''P''(''Y'' = ''y'' | ''X'' = ''x'') = ''P''(''Y'' = ''y'') for all ''x'' and ''y'', or for continuous random variables ''pf''<sub>''Y''|''X''</sub>(''y'' | ''X=x'') = ''pf''<sub>''Y''</sub>(''y'') for all x and y, then ''Y'' is said to be [[Statistical independence|independent]] of ''X'' (and this implies that ''X'' is also independent of ''Y'').
 
Seen as a function of ''y'' for given ''x'', ''P''(''Y'' = ''y'' | ''X'' = ''x'') is a probability and so the sum over all ''y'' (or integral if it is a conditional probability density) is 1. Seen as a function of ''x'' for given ''y'', it is a [[likelihood function]], so that the sum over all ''x'' need not be 1.