In some contexts, the value of the loss function itself is a random quantity because it depends on the outcome of a random variable X.
Statistics Both
frequentist and
Bayesian statistical theory involve making a decision based on the
expected value of the loss function; however, this quantity is defined differently under the two paradigms.
Frequentist expected loss We first define the expected loss in the frequentist context. It is obtained by taking the expected value with respect to the
probability distribution, P_\theta, of the observed data, X. This is also referred to as the
risk function of the decision rule \delta and the parameter \theta. Here the decision rule depends on the outcome of X. The risk function is given by: : R(\theta, \delta) = \operatorname{E}_\theta L\big( \theta, \delta(X) \big) = \int_X L\big( \theta, \delta(x) \big) \, \mathrm{d} P_\theta (x) . Here, \theta is a fixed but possibly unknown state of nature, X is a vector of observations stochastically drawn from a
population, \operatorname{E}_\theta is the expectation over all population values of X, \mathrm{d}P_\theta is a
probability measure over the event space of X (parametrized by \theta) and the integral is evaluated over the entire
support of X.
Bayes Risk In a Bayesian approach, the expectation is calculated using the
prior distribution \pi^* of the parameter \theta: :\rho(\pi^*,a) = \int_\Theta \int _{\bold X} L(\theta, a(\bold x)) \, \mathrm{d} P(\bold x \vert \theta) \,\mathrm{d} \pi^* (\theta)= \int_{\bold X} \int_\Theta L(\theta,a(\bold x))\,\mathrm{d} \pi^*(\theta\vert \bold x)\,\mathrm{d}M(\bold x) where M(\mathbf{x}) is known as the
predictive likelihood wherein \theta has been "integrated out," \pi^*(\theta|\mathbf{x}) is the posterior distribution, and the order of integration has been changed. One then should choose the action a^* which minimises this expected loss, which is referred to as
Bayes Risk. In the latter equation, the integrand inside \mathrm{d}x is known as the
Posterior Risk, and minimising it with respect to decision a also minimizes the overall Bayes Risk. This optimal decision, a^* is known as the
Bayes (decision) Rule - it minimises the average loss over all possible states of nature \theta, over all possible (probability-weighted) data outcomes. One advantage of the Bayesian approach is to that one need only choose the optimal action under the actual observed data to obtain a uniformly optimal one, whereas choosing the actual frequentist optimal decision rule as a function of all possible observations, is a much more difficult problem. Of equal importance though, the Bayes Rule reflects consideration of loss outcomes under different states of nature, \theta.
Examples in statistics • For a scalar parameter \theta, a decision function whose output \hat\theta is an estimate of \theta, and a quadratic loss function (
squared error loss) L(\theta,\hat\theta)=(\theta-\hat\theta)^2, the risk function becomes the
mean squared error of the estimate, R(\theta,\hat\theta)= \operatorname{E}_\theta \left [ (\theta-\hat\theta)^2 \right ].An
estimator found by minimizing the
mean squared error estimates the
posterior distribution's mean. • In
density estimation, the unknown parameter is
probability density itself. The loss function is typically chosen to be a
norm in an appropriate
function space. For example, for
L^2 norm, L(f,\hat f) = \|f-\hat f\|_2^2\,, the risk function becomes the
mean integrated squared error R(f,\hat f)=\operatorname{E} \left ( \|f-\hat f\|^2 \right ).\,
Economic choice under uncertainty In economics, decision-making under uncertainty is often modelled using the
von Neumann–Morgenstern utility function of the uncertain variable of interest, such as end-of-period wealth. Since the value of this variable is uncertain, so is the value of the utility function; it is the expected value of utility that is maximized. ==Decision rules==