Method of moments (statistics)

In statistics, the method of moments is a method of estimation of population parameters. The same principle is used to derive higher moments like skewness and kurtosis.

Method

Suppose that the parameter \theta = (\theta_1, \theta_2, \dots, \theta_k) characterizes the distribution f_W(w; \theta) of the random variable W. Suppose the first k moments of the true distribution (the "population moments") can be expressed as functions of the \thetas: \begin{align} \mu_1 & \equiv \operatorname E[W] = g_1(\theta_1, \theta_2, \ldots, \theta_k) , \\[4pt] \mu_2 & \equiv \operatorname E[W^2] = g_2(\theta_1, \theta_2, \ldots, \theta_k), \\ & \,\,\, \vdots \\ \mu_k & \equiv \operatorname E[W^k] = g_k(\theta_1, \theta_2, \ldots, \theta_k). \end{align} Suppose a sample of size n is drawn, resulting in the values w_1, \dots, w_n. For j=1,\dots,k, let \hat\mu_j = \frac{1}{n} \sum_{i=1}^n w_i^j be the j-th sample moment, an estimate of \mu_j. The method of moments estimator for \theta_1, \theta_2, \ldots, \theta_k denoted by \hat\theta_1, \hat\theta_2, \dots, \hat\theta_k is defined to be the solution (if one exists) to the equations: \begin{align} \hat \mu_1 & = g_1(\hat\theta_1, \hat\theta_2, \ldots, \hat\theta_k), \\[4pt] \hat \mu_2 & = g_2(\hat\theta_1, \hat\theta_2, \ldots, \hat\theta_k), \\ & \,\,\, \vdots \\ \hat \mu_k & = g_k(\hat\theta_1, \hat\theta_2, \ldots, \hat\theta_k). \end{align} The method described here for single random variables generalizes in an obvious manner to multiple random variables leading to multiple choices for moments to be used. Different choices generally lead to different solutions. ==Advantages and disadvantages==

Advantages and disadvantages

The method of moments is fairly simple and yields consistent estimators (under very weak assumptions), though these estimators are often biased. It is an alternative to the method of maximum likelihood. Estimates by the method of moments are not necessarily sufficient statistics, i.e., they sometimes fail to take into account all relevant information in the sample. When estimating other structural parameters (e.g., parameters of a utility function, instead of parameters of a known probability distribution), appropriate probability distributions may not be known, and moment-based estimates may be preferred to maximum likelihood estimation. == Alternative method of moments ==

Alternative method of moments

The equations to be solved in the method of moments (MoM) are in general nonlinear and there are no generally applicable guarantees that tractable solutions exist. But there is an alternative approach to using sample moments to estimate data model parameters in terms of known dependence of model moments on these parameters, and this alternative requires the solution of only linear equations or, more generally, tensor equations. This alternative is referred to as the Bayesian-Like MoM (BL-MoM), and it differs from the classical MoM in that it uses optimally weighted sample moments. Considering that the MoM is typically motivated by a lack of sufficient knowledge about the data model to determine likelihood functions and associated a posteriori probabilities of unknown or random parameters, it is odd that there exists a type of MoM that is Bayesian-Like. But the particular meaning of Bayesian-Like leads to a problem formulation in which required knowledge of a posteriori probabilities is replaced with required knowledge of only the dependence of model moments on unknown model parameters, which is exactly the knowledge required by the traditional MoM. The BL-MoM also uses knowledge of a priori probabilities of the parameters to be estimated, when available, but otherwise uses uniform priors. The BL-MoM has been reported on in only the applied statistics literature in connection with parameter estimation and hypothesis testing using observations of stochastic processes for problems in Information and Communications Theory and, in particular, communications receiver design in the absence of knowledge of likelihood functions or associated a posteriori probabilities and references therein. In addition, the restatement of this receiver design approach for stochastic process models as an alternative to the classical MoM for any type of multivariate data is available in tutorial form at the university website. The applications in and references demonstrate some important characteristics of this alternative to the classical MoM, and a detailed list of relative advantages and disadvantages is given in, but the literature is missing direct comparisons in specific applications of the classical MoM and the BL-MoM. ==Examples==

Examples

An example application of the method of moments is to estimate polynomial probability density distributions. In this case, an approximating polynomial of order N is defined on an interval [a,b]. The method of moments then yields a system of equations, whose solution involves the inversion of a Hankel matrix. Proving the central limit theorem Let X_1, X_2, \cdots be independent random variables with mean 0 and variance 1, then let S_n := \frac{1}{\sqrt n}\sum_{i=1}^n X_i. We can compute the moments of S_n as \begin{align} \operatorname{E}\left[S_n^0\right] &= 1, & \operatorname{E}\left[S_n^1\right] &= 0, \\[0.5ex] \operatorname{E}\left[S_n^2\right] &= 1, & \operatorname{E}\left[S_n^3\right] &= 0, \dots \end{align} Explicit expansion shows that \begin{align} \operatorname{E}\left[S_n^{2k+1}\right] &= 0; \\[1ex] \operatorname{E}\left[S_n^{2k}\right] &= \frac{\binom{n}{k}\frac{(2k)!}{2^k}}{n^k} \\[0.6ex] &= \frac{n(n-1)\cdots(n-k+1)}{n^k} (2k-1)!! \end{align} where the numerator is the number of ways to select k distinct pairs of balls by picking one each from 2k buckets, each containing balls numbered from 1 to n. At the n \to \infty limit, all moments converge to that of a standard normal distribution. More analysis then show that this convergence in moments imply a convergence in distribution. Essentially, this argument was published by Chebyshev in 1887. Uniform distribution Consider the uniform distribution on the interval [a,b], U(a,b). If W\sim U(a,b) then we have \begin{align} \mu_1 &= \operatorname E\left[W\right] &=& \tfrac{1}{2}(a + b) \\[1ex] \mu_2 &= \operatorname E\left[W^2\right] &=& \tfrac{1}{3} \left(a^2 + ab + b^2\right) \end{align} Solving these equations gives \begin{align} \hat{a} &= \mu_1 - \sqrt{3 \left(\mu_2 - \mu_1^2\right)} \\ \hat{b} &= \mu_1 + \sqrt{3 \left(\mu_2 - \mu_1^2\right)} \end{align} Given a set of samples \{w_i\} we can use the sample moments \hat{\mu}_1 and \hat{\mu}_2 in these formulae in order to estimate a and b. Note, however, that this method can produce inconsistent results in some cases. For example, the set of samples \{0,0,0,0,1\} results in the estimate \hat{a} = \frac{1}{5} \left(1 - 2 \sqrt{3}\right)=-0.4928, \hat{b} = \frac{1}{5} \left(1 + 2 \sqrt{3}\right)=0.8928. Since \hat{b} it is impossible for the set \{0,0,0,0,1\} to have been drawn from U(\hat{a},\hat{b}) in this case. ==See also==

Source: Wikipedia ↗

tickerdossier.com tickerdossier.substack.com