The following is the simplest form of the paradox, the special case in which the number of observations is equal to the number of parameters to be estimated. Let \boldsymbol{\theta} be a vector consisting of n\geq 3 unknown parameters. To estimate these parameters, a single measurement X_i is performed for each parameter \theta_i, resulting in a vector \mathbf{X} of length n. Suppose the measurements are known to be
independent,
Gaussian random variables, with mean \boldsymbol{\theta} and variance 1, i.e., \mathbf{X}\sim \mathcal{N}(\boldsymbol{\theta},\mathbf{I}_n). Thus, each parameter is estimated using a single noisy measurement, and each measurement is equally inaccurate. Under these conditions, it is intuitive and common to use each measurement as an estimate of its corresponding parameter. This so-called "ordinary" decision rule can be written as \hat{\boldsymbol{\theta}} = \mathbf{X}, which is the
maximum likelihood estimator (MLE). The quality of such an estimator is measured by its
risk function. A commonly used risk function is the
mean squared error, defined as \mathbb{E}[\|\boldsymbol{\theta} - \hat\boldsymbol{\theta}\|^2]. Surprisingly, it turns out that the "ordinary" decision rule is suboptimal (
inadmissible) in terms of mean squared error when n\geq 3. In other words, in the setting discussed here, there exist alternative estimators which always achieve lower mean squared error, no matter what the value of \boldsymbol{\theta} is. For a given
\boldsymbol{\theta} one could obviously define a perfect "estimator" which is always just
\boldsymbol{\theta}, but this estimator would be bad for other values of
\boldsymbol{\theta}. The estimators of Stein's paradox are, for a given
\boldsymbol{\theta}, better than the "ordinary" decision rule
\mathbf{X} for some
\mathbf{X} but necessarily worse for others. It is only on average that they are better. More accurately, an estimator \hat{\boldsymbol{\theta}}_1 is said to
dominate another estimator \hat{\boldsymbol{\theta}}_2 if, for all values of \boldsymbol{\theta}, the risk of \hat{\boldsymbol{\theta}}_1 is lower than, or equal to, the risk of \hat{\boldsymbol{\theta}}_2,
and if the inequality is
strict for some \boldsymbol{\theta}. An estimator is said to be
admissible if no other estimator dominates it, otherwise it is
inadmissible. Thus, Stein's example can be simply stated as follows:
The "ordinary" decision rule of the mean of a multivariate Gaussian distribution is inadmissible under mean squared error risk. Many simple, practical estimators achieve better performance than the "ordinary" decision rule. The best-known example is the
James–Stein estimator, which shrinks
\mathbf{X} towards a particular point (such as the origin) by an amount inversely proportional to the distance of
\mathbf{X} from that point. For a sketch of the proof of this result, see
Proof of Stein's example. An alternative proof is due to
Larry Brown: he proved that the ordinary estimator for an
n-dimensional multivariate normal mean vector is admissible if and only if the
n -dimensional
Brownian motion is recurrent. Since the Brownian motion is not recurrent for n\geq 3, the MLE is not admissible for n\geq 3. == An intuitive explanation ==