The goal of regression analysis is to model the expected value of a dependent variable
y in terms of the value of an independent variable (or vector of independent variables)
x. In simple linear regression, the model : y = \beta_0 + \beta_1 x + \varepsilon, \, is used, where ε is an unobserved random error with mean zero conditioned on a
scalar variable
x. In this model, for each unit increase in the value of
x, the conditional expectation of
y increases by
β1 units. In many settings, such a linear relationship may not hold. For example, if we are modeling the yield of a chemical synthesis in terms of the temperature at which the synthesis takes place, we may find that the yield improves by a different amount for each unit increase in temperature. Or we may find that the yield decreases with increasing temperature (but only for a certain range of temperatures) and increases with increasing temperature in a different range of temperatures. In this case, we might propose a quadratic model of the form : y = \beta_0 + \beta_1x + \beta_2 x^2 + \varepsilon. \, In this model, when the temperature is increased from
x to
x + 1 units, the expected yield changes by \beta_1+\beta_2(2x+ 1). (This can be seen by obtaining the derivative with respect to x of the regression formula.) For
infinitesimal changes in
x, the effect on
y is given by the
total derivative with respect to
x: \beta_1+2\beta_2x. The fact that the change in yield depends on
x is what makes the relationship between
x and
y nonlinear even though the model is linear in the parameters to be estimated. In general, we can model the expected value of
y as an
nth degree polynomial, yielding the general polynomial regression model : y = \beta_0 + \beta_1 x + \beta_2 x^2 + \beta_3 x^3 + \cdots + \beta_n x^n + \varepsilon. \, Conveniently, these models are all linear from the point of view of
estimation, since the regression function is linear in terms of the unknown parameters
β0,
β1, .... Therefore, for
least squares analysis, the computational and inferential problems of polynomial regression can be completely addressed using the techniques of
multiple regression. This is done by treating
x,
x2, ... as being distinct independent variables in a multiple regression model. ==Matrix form and calculation of estimates==