The general regression model with observations and explanators, the first of which is a constant unit vector whose coefficient is the regression intercept, is y = X \beta + e where is an
n × 1 vector of dependent variable observations, each column of the
n ×
k matrix is a vector of observations on one of the
k explanators, \beta is a
k × 1 vector of true coefficients, and is an
n× 1 vector of the true underlying errors. The
ordinary least squares estimator for \beta is \begin{align} &X \hat \beta = y \\[1ex] \iff & X^\operatorname{T} X \hat \beta = X^\operatorname{T} y \\[1ex] \iff & \hat \beta = \left(X^\operatorname{T} X\right)^{-1}X^\operatorname{T} y. \end{align} The residual vector \hat e = y - X \hat \beta = y - X (X^\operatorname{T} X)^{-1}X^\operatorname{T} y; so the residual sum of squares is: \operatorname{RSS} = \hat e ^\operatorname{T} \hat e = \left\| \hat e \right\|^2 , (equivalent to the square of the
norm of residuals). In full: \begin{align} \operatorname{RSS} &= y^\operatorname{T} y - y^\operatorname{T} X \left(X^\operatorname{T} X\right)^{-1} X^\operatorname{T} y \\[1ex] &= y^\operatorname{T} \left[I - X \left(X^\operatorname{T} X\right)^{-1} X^\operatorname{T}\right] y \\[1ex] &= y^\operatorname{T} \left[I - H\right] y, \end{align} where is the
hat matrix, or the projection matrix in linear regression. == Relation with Pearson's product-moment correlation ==