In
regression analysis, studentized residuals are a type of standardized residual that are particularly useful for identifying
outliers and influential observations. In a typical
linear regression model, the raw residuals (the difference between the observed values and the values predicted by the model) do not all have the same
variance, even if the underlying errors have equal variance. This occurs because the variance of each residual depends on the "
leverage" of its corresponding data point—points further from the mean of the independent variables have higher leverage and smaller residual variance. To make residuals comparable and easier to interpret, statisticians use studentization to "equalize" them. This is done by dividing each raw residual by an estimate of its standard deviation. There are two main types of studentized residuals: •
Internally studentized residuals: These use a variance estimate based on the entire dataset, including the observation being tested. While useful, a major drawback is that an extreme outlier can "pull" the model toward itself, inflating the global variance estimate. This is known as "masking," where the outlier's own influence makes it appear less extreme than it actually is. •
Externally studentized residuals (also known as
deleted residuals): To overcome the masking effect, the variance for the i-th residual is estimated by fitting the model to the dataset
excluding the i-th observation. This ensures that a single anomalous data point does not contaminate its own error estimate, making this method much more sensitive for outlier detection. The use of studentized residuals is a standard part of
regression diagnostics. By plotting these residuals against predicted values, researchers can verify if the assumptions of the linear model (such as
homoscedasticity) hold true or if specific data points are distorting the results of the entire analysis. == Studentized range ==