If the regression errors \varepsilon_i are independent, but have distinct variances \sigma^2_i, then \mathbf{\Sigma} = \operatorname{diag}(\sigma_1^2, \ldots, \sigma_n^2) which can be estimated with \widehat\sigma_i^2 = \widehat \varepsilon_i^2. This provides White's (1980) estimator, often referred to as
HCE (heteroskedasticity-consistent estimator): : \begin{align} \hat{\mathbb{V}}_\text{HCE} \big[ \widehat \boldsymbol{\beta}_\text{OLS} \big] &= \frac{1}{n} \bigg(\frac{1}{n} \sum_i \mathbf{x}_i \mathbf{x}_i^{\top} \bigg)^{-1} \bigg(\frac{1}{n} \sum_i \mathbf{x}_i \mathbf{x}_i^\top \widehat{\varepsilon}_i^2 \bigg) \bigg(\frac{1}{n} \sum_i \mathbf{x}_i \mathbf{x}_i^{\top} \bigg)^{-1} \\ &= ( \mathbf{X}^{\top} \mathbf{X} )^{-1} ( \mathbf{X}^{\top} \operatorname{diag}(\widehat \varepsilon_1^2, \ldots, \widehat \varepsilon_n^2) \mathbf{X} ) ( \mathbf{X}^{\top} \mathbf{X})^{-1}, \end{align} where as above \mathbf{X} denotes the matrix of stacked \mathbf{x}_i^{\top} values from the data. The estimator can be derived in terms of the
generalized method of moments (GMM). Also often discussed in the literature (including White's paper) is the covariance matrix \widehat\mathbf{\Omega}_n of the \sqrt{n}-consistent limiting distribution: : \sqrt{n}(\widehat \boldsymbol{\beta}_n - \boldsymbol{\beta}) \, \xrightarrow{d} \, \mathcal{N}(\mathbf{0}, \mathbf{\Omega}), where : \mathbf{\Omega} = \mathbb{E}[\mathbf{X} \mathbf{X}^{\top}]^{-1} \mathbb{V}[\mathbf{X} \boldsymbol{\varepsilon}]\operatorname \mathbb{E}[\mathbf{X} \mathbf{X}^{\top}]^{-1}, and : \begin{align} \widehat\mathbf{\Omega}_n &= \bigg(\frac{1}{n} \sum_i \mathbf{x}_i \mathbf{x}_i^{\top} \bigg)^{-1} \bigg(\frac{1}{n} \sum_i \mathbf{x}_i \mathbf{x}_i^{\top} \widehat \varepsilon_i^2 \bigg) \bigg(\frac{1}{n} \sum_i \mathbf{x}_i \mathbf{x}_i^{\top} \bigg)^{-1} \\ &= n ( \mathbf{X}^{\top} \mathbf{X} )^{-1} ( \mathbf{X}^{\top} \operatorname{diag}(\widehat \varepsilon_1^2, \ldots, \widehat \varepsilon_n^2) \mathbf{X} ) ( \mathbf{X}^{\top} \mathbf{X})^{-1} \end{align} Thus, : \widehat \mathbf{\Omega}_n = n \cdot \hat{\mathbb{V}}_\text{HCE}[\widehat \boldsymbol{\beta}_\text{OLS}] and : \widehat \mathbb{V}[\mathbf{X} \boldsymbol{\varepsilon}] = \frac{1}{n} \sum_i \mathbf{x}_i \mathbf{x}_i^{\top} \widehat \varepsilon_i^2 = \frac{1}{n} \mathbf{X}^{\top} \operatorname{diag}(\widehat \varepsilon_1^2, \ldots, \widehat \varepsilon_n^2) \mathbf{X}. Precisely which covariance matrix is of concern is a matter of context. Alternative estimators have been proposed in MacKinnon & White (1985) that correct for unequal variances of regression residuals due to different
leverage. Unlike the asymptotic White's estimator, their estimators are unbiased when the data are homoscedastic. Of the four widely available different options, often denoted as HC0-HC3, the HC3 specification appears to work best, with tests relying on the HC3 estimator featuring better power and closer proximity to the targeted
size, especially in small samples. The larger the sample, the smaller the difference between the different estimators. An alternative to explicitly modelling the heteroskedasticity is using a
resampling method such as the
wild bootstrap. Given that the
studentized bootstrap, which standardizes the resampled statistic by its standard error, yields an asymptotic refinement, heteroskedasticity-robust standard errors remain nevertheless useful. Instead of accounting for the heteroskedastic errors, most linear models can be transformed to feature homoskedastic error terms (unless the error term is heteroskedastic by construction, e.g. in a
linear probability model). One way to do this is using
weighted least squares, which also features improved efficiency properties. ==See also==