Description of the statistical properties of estimators from the simple linear regression estimates requires the use of a
statistical model. The following is based on assuming the validity of a model under which the estimates are optimal. It is also possible to evaluate the properties under other assumptions, such as
inhomogeneity, but this is discussed elsewhere.
Unbiasedness The estimators \widehat{\alpha} and \widehat{\beta} are
unbiased. To formalize this assertion we must define a framework in which these estimators are random variables. We consider the residuals as random variables drawn independently from some distribution with mean zero. In other words, for each value of , the corresponding value of is generated as a mean response plus an additional random variable called the
error term, equal to zero on average. Under such interpretation, the least-squares estimators \widehat\alpha and \widehat\beta will themselves be random variables whose means will equal the "true values" and . This is the definition of an unbiased estimator.
Variance of the mean response Since the data in this context is defined to be (
x,
y) pairs for every observation, the
mean response at a given value of
x, say
xd, is an estimate of the mean of the
y values in the population at the
x value of
xd, that is \hat{E}(y \mid x_d) \equiv\hat{y}_d\!. The variance of the mean response is given by: \operatorname{Var}\left(\hat{\alpha} + \hat{\beta}x_d\right) = \operatorname{Var}\left(\hat{\alpha}\right) + \left(\operatorname{Var} \hat{\beta}\right)x_d^2 + 2 x_d \operatorname{Cov} \left(\hat{\alpha}, \hat{\beta} \right) . This expression can be simplified to \operatorname{Var}\left(\hat{\alpha} + \hat{\beta}x_d\right) =\sigma^2\left(\frac{1}{m} + \frac{\left(x_d - \bar{x}\right)^2}{\sum (x_i - \bar{x})^2}\right), where
m is the number of data points. To demonstrate this simplification, one can make use of the identity \sum_i (x_i - \bar{x})^2 = \sum_i x_i^2 - \frac 1 m \left(\sum_i x_i\right)^2 .
Variance of the predicted response The
predicted response distribution is the predicted distribution of the residuals at the given point
xd. So the variance is given by \begin{align} \operatorname{Var}\left(y_d - \left[\hat{\alpha} + \hat{\beta} x_d \right] \right) &= \operatorname{Var} (y_d) + \operatorname{Var} \left(\hat{\alpha} + \hat{\beta}x_d\right) - 2\operatorname{Cov}\left(y_d,\left[\hat{\alpha} + \hat{\beta} x_d \right]\right)\\ &= \operatorname{Var} (y_d) + \operatorname{Var} \left(\hat{\alpha} + \hat{\beta}x_d\right). \end{align} The second line follows from the fact that \operatorname{Cov}\left(y_d,\left[\hat{\alpha} + \hat{\beta} x_d \right]\right) is zero because the new prediction point is independent of the data used to fit the model. Additionally, the term \operatorname{Var} \left(\hat{\alpha} + \hat{\beta}x_d\right) was calculated earlier for the mean response. Since \operatorname{Var}(y_d)=\sigma^2 (a fixed but unknown parameter that can be estimated), the variance of the predicted response is given by \begin{align} \operatorname{Var}\left(y_d - \left[\hat{\alpha} + \hat{\beta} x_d \right] \right) & = \sigma^2 + \sigma^2\left(\frac 1 m + \frac{\left(x_d - \bar{x}\right)^2}{\sum (x_i - \bar{x})^2}\right)\\[4pt] & = \sigma^2\left(1 + \frac 1 m + \frac{(x_d - \bar{x})^2}{\sum (x_i - \bar{x})^2}\right). \end{align}
Confidence intervals The formulas given in the previous section allow one to calculate the
point estimates of and — that is, the coefficients of the regression line for the given set of data. However, those formulas do not tell us how precise the estimates are, i.e., how much the estimators \widehat{\alpha} and \widehat{\beta} vary from sample to sample for the specified sample size.
Confidence intervals were devised to give a plausible set of values to the estimates one might have if one repeated the experiment a very large number of times. The standard method of constructing confidence intervals for linear regression coefficients relies on the normality assumption, which is justified if either: • the errors in the regression are
normally distributed (the so-called
classic regression assumption), or • the number of observations is sufficiently large, in which case the estimator is approximately normally distributed. The latter case is justified by the
central limit theorem.
Normality assumption Under the first assumption above, that of the normality of the error terms, the estimator of the slope coefficient will itself be normally distributed with mean and variance \sigma^2\left/\sum_i(x_i - \bar{x})^2\right., where is the variance of the error terms (see
Proofs involving ordinary least squares). At the same time the sum of squared residuals is distributed proportionally to with degrees of freedom, and independently from \widehat{\beta}. This allows us to construct a -value t = \frac{\widehat\beta - \beta}{s_{\widehat\beta}}\ \sim\ t_{n - 2}, where s_\widehat{\beta} = \sqrt{ \frac{\frac{1}{n - 2}\sum_{i=1}^n \widehat{\varepsilon}_i^{\,2}} {\sum_{i=1}^n (x_i -\bar{x})^2} } is the unbiased
standard error estimator of the estimator \widehat{\beta}. This -value has a
Student's-distribution with degrees of freedom. Using it we can construct a confidence interval for : \beta \in \left[\widehat\beta - s_{\widehat\beta} t^*_{n - 2},\ \widehat\beta + s_{\widehat\beta} t^*_{n - 2}\right], at confidence level , where t^*_{n - 2} is the \scriptstyle \left(1 \;-\; \frac{\gamma}{2}\right)\text{-th} quantile of the distribution. For example, if then the confidence level is 95%. Similarly, the confidence interval for the intercept coefficient is given by \alpha \in \left[ \widehat\alpha - s_{\widehat\alpha} t^*_{n - 2},\ \widehat\alpha + s_\widehat{\alpha} t^*_{n - 2}\right], at confidence level (1 −
γ), where s_{\widehat\alpha} = s_\widehat{\beta}\sqrt{\frac{1}{n} \sum_{i=1}^n x_i^2} = \sqrt{\frac{1}{n(n - 2)} \left(\sum_{i=1}^n \widehat{\varepsilon}_i^{\,2} \right) \frac{\sum_{i=1}^n x_i^2} {\sum_{i=1}^n (x_i - \bar{x})^2} } The confidence intervals for and give us the general idea where these regression coefficients are most likely to be. For example, in the
Okun's law regression shown here the point estimates are \widehat{\alpha} = 0.859, \qquad \widehat{\beta} = -1.817. The 95% confidence intervals for these estimates are \alpha \in \left[\,0.76, 0.96\right], \qquad \beta \in \left[-2.06, -1.58 \,\right]. In order to represent this information graphically, in the form of the confidence bands around the regression line, one has to proceed carefully and account for the joint distribution of the estimators. It can be shown that at confidence level (1 −
γ) the confidence band has hyperbolic form given by the equation (\alpha + \beta \xi) \in \left[ \,\widehat{\alpha} + \widehat{\beta} \xi \pm t^*_{n - 2} \sqrt{ \left(\frac{1}{n - 2} \sum\widehat{\varepsilon}_i^{\,2} \right) \cdot \left(\frac{1}{n} + \frac{(\xi - \bar{x})^2}{\sum(x_i - \bar{x})^2}\right)}\,\right]. When the model assumed the intercept is fixed and equal to 0 (\alpha = 0), the standard error of the slope turns into: s_\widehat{\beta} = \sqrt{ \frac{1}{n - 1} \frac{\sum_{i=1}^n \widehat{\varepsilon}_i^{\,2}} {\sum_{i=1}^n x_i^2} } With: \hat{\varepsilon}_i = y_i - \hat y_i
Asymptotic assumption The alternative second assumption states that when the number of points in the dataset is "large enough", the
law of large numbers and the
central limit theorem become applicable, and then the distribution of the estimators is approximately normal. Under this assumption all formulas derived in the previous section remain valid, with the only exception that the quantile
t*n−2 of
Student's t distribution is replaced with the quantile
q* of the
standard normal distribution. Occasionally the fraction is replaced with . When is large such a change does not alter the results appreciably. ==Numerical example==