Markov kernel

In probability theory, a Markov kernel (also known as a stochastic kernel or probability kernel) is a map that plays the role, in the general theory of Markov processes, that the transition matrix does in the theory of Markov processes with a finite state space.^[1]

Formal definition

Let $(X,{\mathcal {A}}),(Y,{\mathcal {B}})$ be measurable spaces. A Markov kernel $\kappa :X\to Y$ with source $(X,{\mathcal {A}})$ and target $(Y,{\mathcal {B}})$ is a map $\kappa :{\mathcal {B}}\times X\to [0,1]$ with the following properties:

For every (fixed) $B\in {\mathcal {B}}$ , the map $x\mapsto \kappa (B,x)$ is ${\mathcal {A}}$ -measurable
For every (fixed) $x\in X$ , the map $B\mapsto \kappa (B,x)$ is a probability measure on $(Y,{\mathcal {B}})$

In other words it associates to each point $x\in X$ a probability measure $\kappa (dy|x):B\mapsto \kappa (B,x)$ on $(Y,{\mathcal {B}})$ such that, for every measurable set $B\in {\mathcal {B}}$ , the map $x\mapsto \kappa (B,x)$ is measurable with respect to the $\sigma$ -algebra ${\mathcal {A}}.$ ^[2].

Examples

Simple random walk on the integers

Take $X=Y=\mathbb {Z} ,{\mathcal {A}}={\mathcal {B}}={\mathcal {P}}(\mathbb {Z} )$ (the power set of $\mathbb {Z}$ ). Then a Markov kernel is fully determined by the probability it assigns to singleton sets for each $n\in X=\mathbb {Z}$ .

\kappa (B|n)=\sum _{m\in B}\kappa (\{m\}|n),\qquad \forall n\in \mathbb {Z} ,\,\forall B\in {\mathcal {B}}

.

Now the random walk $\kappa$ that goes to the right with probability $p$ and to the left with probability $1-p$ is defined by

\kappa (\{m\}|n)=p\delta _{m,n+1}+(1-p)\delta _{m,n-1},\quad \forall n,m\in \mathbb {Z}

where $\delta$ is the Kronecker delta. The transition probabilities $P(m|n)=\kappa ({m}|n)$ for the random walk are equivalent to the Markov kernel.

General Markov processes with countable state space

More generally take $X$ and $Y$ both countable and ${\mathcal {A}}={\mathcal {P}}(X),\ {\mathcal {B}}={\mathcal {P}}(Y)$ . Again a Markov kernel is defined by the probability it assigns to singleton sets for each $i\in X$

\kappa (B|i)=\sum _{j\in B}\kappa (\{j\}|i),\qquad \forall i\in X,\,\forall B\in {\mathcal {B}}

,

We define a Markov process by defining a transition probability $P(j|i)=K_{ji}$ where the numbers $K_{ji}$ define a (countable) stochastic matrix $(K_{ji})$ i.e.

{\begin{aligned}K_{ji}&\geq 0,\qquad &\forall (j,i)\in Y\times X,\\\sum _{j\in Y}K_{ji}&=1,\qquad &\forall i\in X.\\\end{aligned}}

We then define

\kappa (\{j\}|i)=K_{ji}=P(j|i),\qquad \forall i\in X,\quad \forall B\in {\mathcal {B}}

.

Again the transition probability, the stochastic matrix and the Markov kernel are equivalent reformulations.a

Markov kernel defined by a kernel function and a measure

If $\nu$ is a measure on $(Y,{\mathcal {B}})$ and $k:Y\times X\to [0,\infty ]$ is a measurable function with respect to the product $\sigma$ -algebra ${\mathcal {A}}\otimes {\mathcal {B}}$ such that

\int _{Y}k(y,x)\nu (\mathrm {d} y)=1,\qquad \forall x\in X

then $\kappa (dy|x)=k(y,x)\nu (dy)$ i.e. the mapping

{\begin{cases}\kappa :{\mathcal {B}}\times X\to [0,1]\\\kappa (B|x)=\int _{B}k(x,y)\nu (\mathrm {d} y)\end{cases}}

defines a Markov kernel.^[3]. This example generalises the countable Markov process example where $\nu$ was the counting measure, but other important examples are convolution kernels like the Markov kernel defined by the heat equation e.g. the Gaussian on $X=Y=\mathbb {R}$ with $\nu (dx)=dx$ standard Lebesgue measure and

k_{t}(y,x)={\frac {1}{{\sqrt {2\pi }}t}}e^{-(y-x)^{2}/(2t^{2})}

.

Measurable functions.

Take $(X,{\mathcal {A}})$ and $(B,{\mathcal {B}})$ arbitrary measurable spaces, and let $f:X\to Y$ be a measurable function. Now define $\kappa (dy|x)=\delta _{f(x)}(dy)$ i.e.

\kappa (B|x)=\mathbf {1} _{B}(f(x))=\mathbf {1} _{f^{-1}(B)}(x)

for all

B\in {\mathcal {B}}

.

Note that the indicator function $\mathbf {1} _{f^{-1}(B)}$ is ${\mathcal {A}}$ -measurable for all $B\in {\mathcal {B}}$ iff $f$ is measurable.

This example allows us to think of a Markov kernel as a generalised function with a (in general) random rather than certain value.

Galton–Watson process

As a less obvious example, take $X=\mathbb {N} ,{\mathcal {A}}={\mathcal {P}}(\mathbb {N} ),$ , and $(Y,{\mathcal {B}})$ the real numbers $\mathbb {R}$ with the standard sigma algebra of Borel sets. Then

\kappa (B|n)={\begin{cases}\mathbf {1} _{B}(0)&n=0\\\Pr(\xi _{1}+\cdots +\xi _{x}\in B)&n\neq 0\\\end{cases}}

with i.i.d. random variables $\xi _{i}$ (usually with mean 0) and where $\mathbf {1} _{B}$ is the indicator function. For the simple case of coin flips this models the different levels of a Galton board.

Composition of Markov Kernels and the Markov Categorie

Given measurable spaces $(X,{\mathcal {A}})$ , $(Y,{\mathcal {B}})$ and $(Z,{\mathcal {C}})$ , and probability kernels $\kappa :X\to Y$ and $\lambda :Y\to Z$ , we can define a composition $\lambda \circ \kappa :X\to Z$ by

(\lambda \circ \kappa )(dz|x)=\int _{Y}\lambda (dz|y)\kappa (dy|x)

The composition is associative by Tonelli's theorem and the identity function considered as Markov kernel (i.e. the delta measure $\kappa _{1}(dx'|x)=\delta _{x}(dx')$ ) is the unit for this composition.

This composition defines the structure of a category on the measurable spaces with Markov kernels as morphisms first defined by Lawvere ^[4]. The category has the empty set as initial object and the one point set $*$ as the terminal object. A probability measure on a measurable space $(X,{\mathcal {A}})$ is the same thing as a morphism $*\to X$ in this category also denoted by $P$ . By composition, a probability space $(X,{\mathcal {A}},P)$ and a probability kernel $\kappa :(X,{\mathcal {A}})\to (Y,{\mathcal {B}})$ defines a probability space $(Y,{\mathcal {B}},\kappa \circ P)$ .

Properties

Semidirect product

Let $(X,{\mathcal {A}},P)$ be a probability space and $\kappa$ a Markov kernel from $(X,{\mathcal {A}})$ to some $(Y,{\mathcal {B}})$ . Then there exists a unique measure $Q$ on $(X\times Y,{\mathcal {A}}\otimes {\mathcal {B}})$ , such that:

Q(A\times B)=\int _{A}\kappa (B|x)\,P(dx),\quad \forall A\in {\mathcal {A}},\quad \forall B\in {\mathcal {B}}.

Regular conditional distribution

Let $(S,Y)$ be a Borel space, $X$ a $(S,Y)$ -valued random variable on the measure space $(\Omega ,{\mathcal {F}},P)$ and ${\mathcal {G}}\subseteq {\mathcal {F}}$ a sub- $\sigma$ -algebra. Then there exists a Markov kernel $\kappa$ from $(\Omega ,{\mathcal {G}})$ to $(S,Y)$ , such that $\kappa (\cdot ,B)$ is a version of the conditional expectation $\mathbb {E} [\mathbf {1} _{\{X\in B\}}\mid {\mathcal {G}}]$ for every $B\in Y$ , i.e.

P(X\in B\mid {\mathcal {G}})=\mathbb {E} \left[\mathbf {1} _{\{X\in B\}}\mid {\mathcal {G}}\right]=\kappa (\omega ,B),\qquad P{\text{-a.s.}}\,\,\forall B\in {\mathcal {G}}.

It is called regular conditional distribution of $X$ given ${\mathcal {G}}$ and is not uniquely defined.

Generalizations

Transition kernels generalize Markov kernels in the sense that the map

B\mapsto \kappa (x,B)

is not necessarily a probability measure but can be any type of measure.

References

^ Reiss, R. D. (1993). "A Course on Point Processes". Springer Series in Statistics. doi:10.1007/978-1-4613-9308-5. ISBN 978-1-4613-9310-8. {{cite journal}}: Cite journal requires |journal= (help)
^ Klenke, Achim. Probability Theory: A Comprehensive Course (2 ed.). Springer. p. 180. doi:10.1007/978-1-4471-5361-0.
^ Erhan, Cinlar (2011). Probability and Stochastics. New York: Springer. pp. 37–38. ISBN 978-0-387-87858-4.
^ F. W. Lawvere (1962). "The Category of Probabilistic Mappings" (PDF).

Bauer, Heinz (1996), Probability Theory, de Gruyter, ISBN 3-11-013935-9

§36. Kernels and semigroups of kernels

[1] Reiss, R. D. (1993). "A Course on Point Processes". Springer Series in Statistics. doi:10.1007/978-1-4613-9308-5. ISBN 978-1-4613-9310-8. {{cite journal}}: Cite journal requires |journal= (help)

[2] Klenke, Achim. Probability Theory: A Comprehensive Course (2 ed.). Springer. p. 180. doi:10.1007/978-1-4471-5361-0.

[3] Erhan, Cinlar (2011). Probability and Stochastics. New York: Springer. pp. 37–38. ISBN 978-0-387-87858-4.

[4] F. W. Lawvere (1962). "The Category of Probabilistic Mappings" (PDF).

[1]

[2]

[3]

[4]