Giry monad

The Giry monad, like every monad, consists of three structures: • A functorial assignment, which in this case assigns to a measurable space X a space of probability measures PX over it; • A natural map \delta_X:X\to PX called the unit, which in this case assigns to each element of a space the Dirac measure over it; • A natural map \mathcal{E}_X:PPX\to PX called the multiplication, which in this case assigns to each probability measure over probability measures its expected value. The space of probability measures Let (X, \mathcal{F}) be a measurable space. Denote by PX the set of probability measures over (X, \mathcal{F}). We equip the set PX with a sigma-algebra as follows. First of all, for every measurable set A\in \mathcal{F}, define the map \varepsilon_A:PX\to\mathbb{R} by p\longmapsto p(A). We then define the sigma algebra \mathcal{PF} on PX to be the smallest sigma-algebra which makes the maps \varepsilon_A measurable, for all A\in\mathcal{F} (where \mathbb{R} is assumed equipped with the Borel sigma-algebra). The assignment (X,\mathcal{F})\mapsto (PX,\mathcal {PF}) is part of an endofunctor on the category of measurable spaces, usually denoted again by P. Its action on morphisms, i.e. on measurable maps, is via the pushforward of measures. Namely, given a measurable map f:(X,\mathcal{F})\to(Y,\mathcal{G}), one assigns to f the map f_*:(PX,\mathcal {PF})\to(PY,\mathcal {PG}) defined by :f_*p\,(B)=p(f^{-1}(B)) for all p\in PX and all measurable sets B\in\mathcal{G}. The Dirac delta map Given a measurable space (X,\mathcal{F}), the map \delta:(X,\mathcal{F})\to(PX,\mathcal{PF}) maps an element x\in X to the Dirac measure \delta_x\in PX, defined on measurable subsets A\in\mathcal{F} by : \delta_x(A) = 1_A(x) = \begin{cases} 1 & \text{if }x\in A, \\ 0 & \text{if }x\notin A. \end{cases} The expectation map Let \mu\in PPX, i.e. a probability measure over the probability measures over (X,\mathcal{F}). We define the probability measure \mathcal{E}\mu\in PX by : \mathcal{E}\mu(A) = \int_{PX} p(A)\,\mu(dp) for all measurable A\in\mathcal{F}. This gives a measurable, natural map \mathcal{E}:(PPX,\mathcal{PPF})\to(PX,\mathcal{PF}). Example: mixture distributions A mixture distribution, or more generally a compound distribution, can be seen as an application of the map \mathcal{E}. Let's see this for the case of a finite mixture. Let p_1,\dots,p_n be probability measures on (X,\mathcal{F}), and consider the probability measure q given by the mixture : q(A) = \sum_{i=1}^n w_i\,p_i(A) for all measurable A\in\mathcal{F}, for some weights w_i\ge 0 satisfying w_1+\dots+w_n=1. We can view the mixture q as the average q=\mathcal{E}\mu, where the measure on measures \mu\in PPX, which in this case is discrete, is given by : \mu = \sum_{i=1}^n w_i\,\delta_{p_i} . More generally, the map \mathcal{E}:PPX\to PX can be seen as the most general, non-parametric way to form arbitrary mixture or compound distributions. The triple (P,\delta,\mathcal{E}) is called the Giry monad. ==Relationship with Markov kernels==