The Giry monad, like every
monad, consists of three structures: • A
functorial assignment, which in this case assigns to a measurable space X a space of probability measures PX over it; • A
natural map \delta_X:X\to PX called the
unit, which in this case assigns to each element of a space the
Dirac measure over it; • A
natural map \mathcal{E}_X:PPX\to PX called the
multiplication, which in this case assigns to each
probability measure over probability measures its
expected value.
The space of probability measures Let (X, \mathcal{F}) be a
measurable space. Denote by PX the set of
probability measures over (X, \mathcal{F}). We equip the set PX with a
sigma-algebra as follows. First of all, for every measurable set A\in \mathcal{F}, define the map \varepsilon_A:PX\to\mathbb{R} by p\longmapsto p(A). We then define the sigma algebra \mathcal{PF} on PX to be the smallest sigma-algebra which makes the maps \varepsilon_A measurable, for all A\in\mathcal{F} (where \mathbb{R} is assumed equipped with the
Borel sigma-algebra). The assignment (X,\mathcal{F})\mapsto (PX,\mathcal {PF}) is part of an
endofunctor on the
category of measurable spaces, usually denoted again by P. Its action on
morphisms, i.e. on
measurable maps, is via the
pushforward of measures. Namely, given a measurable map f:(X,\mathcal{F})\to(Y,\mathcal{G}), one assigns to f the map f_*:(PX,\mathcal {PF})\to(PY,\mathcal {PG}) defined by :f_*p\,(B)=p(f^{-1}(B)) for all p\in PX and all measurable sets B\in\mathcal{G}.
The Dirac delta map Given a measurable space (X,\mathcal{F}), the map \delta:(X,\mathcal{F})\to(PX,\mathcal{PF}) maps an element x\in X to the
Dirac measure \delta_x\in PX, defined on measurable subsets A\in\mathcal{F} by : \delta_x(A) = 1_A(x) = \begin{cases} 1 & \text{if }x\in A, \\ 0 & \text{if }x\notin A. \end{cases}
The expectation map Let \mu\in PPX, i.e. a probability measure over the probability measures over (X,\mathcal{F}). We define the probability measure \mathcal{E}\mu\in PX by : \mathcal{E}\mu(A) = \int_{PX} p(A)\,\mu(dp) for all measurable A\in\mathcal{F}. This gives a measurable,
natural map \mathcal{E}:(PPX,\mathcal{PPF})\to(PX,\mathcal{PF}).
Example: mixture distributions A
mixture distribution, or more generally a
compound distribution, can be seen as an application of the map \mathcal{E}. Let's see this for the case of a finite mixture. Let p_1,\dots,p_n be probability measures on (X,\mathcal{F}), and consider the probability measure q given by the mixture : q(A) = \sum_{i=1}^n w_i\,p_i(A) for all measurable A\in\mathcal{F}, for some weights w_i\ge 0 satisfying w_1+\dots+w_n=1. We can view the mixture q as the average q=\mathcal{E}\mu, where the measure on measures \mu\in PPX, which in this case is discrete, is given by : \mu = \sum_{i=1}^n w_i\,\delta_{p_i} . More generally, the map \mathcal{E}:PPX\to PX can be seen as the most general, non-parametric way to form arbitrary
mixture or
compound distributions. The triple (P,\delta,\mathcal{E}) is called the
Giry monad. ==Relationship with Markov kernels==