Admissibility Bayes rules having finite Bayes risk are typically
admissible. The following are some specific examples of admissibility theorems. • If a Bayes rule is unique then it is admissible. For example, as stated above, under mean squared error (MSE) the Bayes rule is unique and therefore admissible. • If θ belongs to a
discrete set, then all Bayes rules are admissible. • If θ belongs to a continuous (non-discrete) set, and if the risk function R(θ,δ) is continuous in θ for every δ, then all Bayes rules are admissible. By contrast, generalized Bayes rules often have undefined Bayes risk in the case of improper priors. These rules are often inadmissible and the verification of their admissibility can be difficult. For example, the generalized Bayes estimator of a location parameter θ based on Gaussian samples (described in the "Generalized Bayes estimator" section above) is inadmissible for p>2; this is known as
Stein's phenomenon.
Asymptotic efficiency Let θ be an unknown random variable, and suppose that x_1,x_2,\ldots are
iid samples with density f(x_i|\theta). Let \delta_n = \delta_n(x_1,\ldots,x_n) be a sequence of Bayes estimators of θ based on an increasing number of measurements. We are interested in analyzing the asymptotic performance of this sequence of estimators, i.e., the performance of \delta_n for large
n. To this end, it is customary to regard θ as a deterministic parameter whose true value is \theta_0. Under specific conditions, for large samples (large values of
n), the posterior density of θ is approximately normal. In other words, for large
n, the effect of the prior probability on the posterior is negligible. Moreover, if δ is the Bayes estimator under MSE risk, then it is
asymptotically unbiased and it
converges in distribution to the
normal distribution: : \sqrt{n}(\delta_n - \theta_0) \to N\left(0 , \frac{1}{I(\theta_0)}\right), where
I(θ0) is the
Fisher information of θ0. It follows that the Bayes estimator δ
n under MSE is
asymptotically efficient. Another estimator which is asymptotically normal and efficient is the
maximum likelihood estimator (MLE). The relations between the maximum likelihood and Bayes estimators can be shown in the following simple example.
Example: estimating p in a binomial distribution Consider the estimator of θ based on binomial sample
x~b(θ,
n) where θ denotes the probability for success. Assuming θ is distributed according to the conjugate prior, which in this case is the
Beta distribution B(
a,
b), the posterior distribution is known to be B(a+x,b+n-x). Thus, the Bayes estimator under MSE is : \delta_n(x)=E[\theta|x]=\frac{a+x}{a+b+n}. The MLE in this case is x/n and so we get, : \delta_n(x)=\frac{a+b}{a+b+n}E[\theta]+\frac{n}{a+b+n}\delta_{MLE}. The last equation implies that, for
n → ∞, the Bayes estimator (in the described problem) is close to the MLE. On the other hand, when
n is small, the prior information is still relevant to the decision problem and affects the estimate. To see the relative weight of the prior information, assume that
a=
b; in this case each measurement brings in 1 new bit of information; the formula above shows that the prior information has the same weight as
a+b bits of the new information. In applications, one often knows very little about fine details of the prior distribution; in particular, there is no reason to assume that it coincides with B(
a,
b) exactly. In such a case, one possible interpretation of this calculation is: "there is a non-pathological prior distribution with the mean value 0.5 and the standard deviation
d which gives the weight of prior information equal to 1/(4
d2)-1 bits of new information." Another example of the same phenomena is the case when the prior estimate and a measurement are normally distributed. If the prior is centered at
B with deviation Σ, and the measurement is centered at
b with deviation σ,
then the posterior is centered at \frac{\alpha}{\alpha+\beta}B+\frac{\beta}{\alpha+\beta}b, with weights in this weighted average being α=σ², β=Σ². Moreover, the squared posterior deviation is Σ²+σ². In other words, the prior is combined with the measurement in
exactly the same way as if it were an extra measurement to take into account. For example, if Σ=σ/2, then the deviation of 4 measurements combined matches the deviation of the prior (assuming that errors of measurements are independent). And the weights α,β in the formula for posterior match this: the weight of the prior is 4 times the weight of the measurement. Combining this prior with
n measurements with average
v results in the posterior centered at \frac{4}{4+n}V+\frac{n}{4+n}v; in particular, the prior plays the same role as 4 measurements made in advance. In general, the prior has the weight of (σ/Σ)² measurements. Compare to the example of binomial distribution: there the prior has the weight of (σ/Σ)²−1 measurements. One can see that the exact weight does depend on the details of the distribution, but when σ≫Σ, the difference becomes small. ==Practical example of Bayes estimators==