Logically, an estimator is minimax when it is the best in the worst case. Continuing this logic, a minimax estimator should be a
Bayes estimator with respect to a least favorable
prior distribution of \theta \,\!. To demonstrate this notion denote the average risk of the Bayes estimator \delta_\pi \,\! with respect to a prior distribution \pi \,\! as : r_\pi = \int R(\theta,\delta_\pi) \, d\pi(\theta) \,
Definition: A prior distribution \pi \,\! is called least favorable if for every other distribution \pi ' \,\! the average risk satisfies r_\pi \geq r_{\pi '} \, .
Theorem 1: If r_\pi = \sup_\theta R(\theta,\delta_\pi), \, then: • \delta_\pi\,\! is minimax. • If \delta_\pi\,\! is a unique Bayes estimator, it is also the unique minimax estimator. • \pi\,\! is least favorable.
Corollary: If a Bayes estimator has constant risk, it is minimax. This is not a necessary condition.
Example 1: Unfair coin: The example is the problem of estimating the "success" rate of a
binomial variable, x \sim B(n,\theta)\,\!. This may be viewed as estimating the rate at which an
unfair coin falls on "heads" or "tails". In this case the Bayes estimator with respect to a
Beta-distributed prior, \theta \sim \text{Beta}(\sqrt{n}/2,\sqrt{n}/2) \, is :\delta^M=\frac{x+0.5\sqrt{n}}{n+\sqrt{n}}, \, with constant Bayes risk :r=\frac{1}{4(1+\sqrt{n})^2} \, and, according to the Corollary, is minimax.
Definition: A sequence of prior distributions \pi_n\,\! is called least favorable if for any other distribution \pi '\,\!, :\lim_{n \rightarrow \infty} r_{\pi_n} \geq r_{\pi '}. \,
Theorem 2: If there are a sequence of priors \pi_n\,\! and an estimator \delta\,\! such that \sup_\theta R(\theta,\delta)=\lim_{n \rightarrow \infty} r_{\pi_n} \,\!, then: • \delta\,\! is minimax. • The sequence \pi_n\,\! is least favorable. No uniqueness is guaranteed. For example, the ML estimator from the previous example may be attained as the limit of Bayes estimators with respect to a
uniform prior, \pi_n \sim U[-n,n]\,\! with increasing support and also with respect to a zero-mean normal prior \pi_n \sim N(0,n \sigma^2) \,\! with increasing variance. Neither the resulting ML estimator is unique minimax, nor the least favorable prior is unique.
Example 2: the problem of estimating the mean of p\,\! dimensional
Gaussian random vector, x \sim N(\theta,I_p \sigma^2)\,\!. The
maximum likelihood (ML) estimator for \theta\,\! in this case is \delta_\text{ML}=x\,\!, and its risk is : R(\theta,\delta_\text{ML})=E{\|\delta_{ML}-\theta\|^2}=\sum_{i=1}^p E(x_i-\theta_i)^2=p \sigma^2. \, The risk is constant, but the ML estimator is not a Bayes estimator, and the Corollary of Theorem 1 does not apply. However, the ML estimator is the limit of the Bayes estimators with respect to the prior sequence \pi_n \sim N(0,n \sigma^2) \,\! and hence, minimax according to Theorem 2. Minimaxity does not always imply
admissibility. In this example, the ML estimator is known to be inadmissible (not admissible) whenever p >2\,\!. The
James–Stein estimator dominates the ML whenever p >2\,\!. Though both estimators have the same risk p \sigma^2\,\! when \|\theta\| \rightarrow \infty\,\!, and they are both minimax, the James–Stein estimator has smaller risk for any finite \|\theta\|\,\!. ==Examples==