Conditional probability distribution

Given two jointly distributed random variables X and Y, the conditional probability distribution of Y given X (written "Y | X") is the probability distribution of Y when X is known to be a particular value.

For discrete random variables, the conditional probability mass function can be written as P(Y = y | X = x). From the definition of conditional probability, this is

P(Y=y|X=x)={\frac {P(X=x\ \mathrm {and} \ Y=y)}{P(X=x)}}={\frac {P(X=x|Y=y)P(Y=y)}{P(X=x)}}.

Similarly for continuous random variables, the conditional probability density function can be written as p_Y|X(y | x) and this is

p_{Y|X}(y|x)={\frac {p_{X,Y}(x,y)}{p_{X}(x)}}={\frac {p_{X|Y}(x|y)p_{Y}(y)}{p_{X}(x)}}

where p_X,Y(x, y) gives the joint distribution of X and Y, while p_X(x) gives the marginal distribution for X.

The concept of the conditional distribution of a continuous random variable is not as intuitive as it might seem: Borel's paradox shows that conditional probability density functions need not be invariant under coordinate transformations.

If for discrete random variables P(Y = y | X = x) = P(Y = y) for all x and y, or for continuous random variables p_Y|X(y | x) = p_Y(y) for all x and y, then Y is said to be independent of X (and this implies that X is also independent of Y).

Seen as a function of y for given x, P(Y = y | X = x) is a probability and so the sum over all y (or integral if it is a density) is 1. Seen as a function of x for given y, it is a likelihood function, so that the sum over all x need not be 1.

Conditional probability distribution

See also