Local linear regression In the two previous sections we assumed that the underlying Y(X) function is locally constant, therefore we were able to use the weighted average for the estimation. The idea of local linear regression is to fit locally a straight line (or a
hyperplane for higher dimensions), and not the constant (horizontal line). After fitting the line, the estimation \hat{Y}(X_{0}) is provided by the value of this line at
X0 point. By repeating this procedure for each
X0, one can get the estimation function \hat{Y}(X). Like in previous section, the window width is constant h_\lambda (X_0)=\lambda = \text{constant}. Formally, the local linear regression is computed by solving a weighted least square problem. For one dimension (
p = 1): \begin{align} & \min_{\alpha (X_0),\beta (X_0)} \sum\limits_{i=1}^N {K_{h_{\lambda }}(X_0,X_i)\left( Y(X_i)-\alpha (X_0)-\beta (X_{0})X_i \right)^2} \\ & \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\Downarrow \\ & \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\hat{Y}(X_{0})=\alpha (X_{0})+\beta (X_{0})X_{0} \\ \end{align} The closed form solution is given by: : \hat{Y}(X_0)=\left( 1,X_0 \right)\left( B^{T}W(X_0)B \right)^{-1}B^{T}W(X_0)y where: • y=\left( Y(X_1),\dots,Y(X_N) \right)^T • W(X_0)= \operatorname{diag} \left( K_{h_{\lambda }}(X_0,X_i) \right)_{N\times N} • B^{T}=\left( \begin{matrix} 1 & 1 & \dots & 1 \\ X_{1} & X_{2} & \dots & X_{N} \\ \end{matrix} \right) The resulting function is smooth, and the problem with the biased boundary points is reduced. Local linear regression can be applied to any-dimensional space, though the question of what is a local neighborhood becomes more complicated. It is common to use k nearest training points to a test point to fit the local linear regression. This can lead to high variance of the fitted function. To bound the variance, the set of training points should contain the test point in their
convex hull (see Gupta et al. reference).
Local polynomial regression Instead of fitting locally linear functions, one can fit polynomial functions. For p=1, one should minimize: :\underset{\alpha (X_{0}),\beta _{j}(X_{0}),j=1,...,d}{\mathop{\min }}\,\sum\limits_{i=1}^{N}{K_{h_{\lambda }}(X_{0},X_{i})\left( Y(X_{i})-\alpha (X_{0})-\sum\limits_{j=1}^{d}{\beta _{j}(X_{0})X_{i}^{j}} \right)^{2}} with \hat{Y}(X_{0})=\alpha (X_{0})+\sum\limits_{j=1}^{d}{\beta _{j}(X_{0})X_{0}^{j}} In general case (p>1), one should minimize: :\begin{align} & \hat{\beta }(X_{0})=\underset{\beta (X_{0})}{\mathop{\arg \min }}\,\sum\limits_{i=1}^{N}{K_{h_{\lambda }}(X_{0},X_{i})\left( Y(X_{i})-b(X_{i})^{T}\beta (X_{0}) \right)}^{2} \\ & b(X)=\left( \begin{matrix} 1, & X_{1}, & X_{2},... & X_{1}^{2}, & X_{2}^{2},... & X_{1}X_{2}\,\,\,... \\ \end{matrix} \right) \\ & \hat{Y}(X_{0})=b(X_{0})^{T}\hat{\beta }(X_{0}) \\ \end{align} ==See also==