Additive models are a class of
non-parametric regression models of the form: : Y_i = \alpha + \sum_{j=1}^p f_j(X_{ij}) + \epsilon_i where each X_1, X_2, \ldots, X_p is a variable in our p-dimensional predictor X, and Y is our outcome variable. \epsilon represents our inherent error, which is assumed to have mean zero. The f_j represent unspecified smooth functions of a single X_j. Given the flexibility in the f_j, we typically do not have a unique solution: \alpha is left unidentifiable as one can add any constants to any of the f_j and subtract this value from \alpha. It is common to rectify this by constraining : \sum_{i = 1}^N f_j(X_{ij}) = 0 for all j leaving : \alpha = 1/N \sum_{i = 1}^N y_i necessarily. The backfitting algorithm is then:
Initialize \hat{\alpha} = 1/N \sum_{i = 1}^N y_i, \hat{f_j} \equiv 0, \forall j
Do until \hat{f_j} converge:
For each predictor
j:
(a) \hat{f_j} \leftarrow \text{Smooth}[\lbrace y_i - \hat{\alpha} - \sum_{k \neq j} \hat{f_k}(x_{ik}) \rbrace_1^N ] (backfitting step)
(b) \hat{f_j} \leftarrow \hat{f_j} - 1/N \sum_{i=1}^N \hat{f_j}(x_{ij}) (mean centering of estimated function) where \text{Smooth} is our smoothing operator. This is typically chosen to be a
cubic spline smoother but can be any other appropriate fitting operation, such as: • local
polynomial regression •
kernel smoothing methods • more complex operators, such as surface smoothers for second and higher-order interactions In theory, step
(b) in the algorithm is not needed as the function estimates are constrained to sum to zero. However, due to numerical issues this might become a problem in practice. ==Motivation==