A generic non-linear measurement error model takes form : \begin{cases} y_t = g(x^*_t) + \varepsilon_t, \\ x_t = x^*_t + \eta_t. \end{cases} Here function
g can be either parametric or non-parametric. When function
g is parametric it will be written as
g(
x*,
β). For a general vector-valued regressor
x* the conditions for model
identifiability are not known. However, in the case of scalar
x* the model is identified unless the function
g is of the "log-exponential" form : g(x^*) = a + b \ln\big(e^{cx^*} + d\big) and the latent regressor
x* has density : f_{x^*}(x) = \begin{cases} A e^{-Be^{Cx}+CDx}(e^{Cx}+E)^{-F}, & \text{if}\ d>0 \\ A e^{-Bx^2 + Cx} & \text{if}\ d=0 \end{cases} where constants
A,
B,
C,
D,
E,
F may depend on
a,
b,
c,
d. Despite this optimistic result, as of now no methods exist for estimating non-linear errors-in-variables models without any extraneous information. However, there are several techniques which make use of some additional data: either the instrumental variables, or repeated observations.
Instrumental variables methods {{unordered list '''Newey's simulated moments method'
for parametric models – requires that there is an additional set of observed predictor variables
zt'', such that the true regressor can be expressed as : x^*_t = \pi_0'z_t + \sigma_0 \zeta_t, where
π0 and
σ0 are (unknown) constant matrices, and
ζt ⊥
zt. The coefficient
π0 can be estimated using standard
least squares regression of
x on
z. The distribution of
ζt is unknown; however, we can model it as belonging to a flexible parametric family – the
Edgeworth series: : f_\zeta(v;\,\gamma) = \phi(v)\,\textstyle\sum_{j=1}^J \!\gamma_j v^j where
ϕ is the
standard normal distribution. Simulated moments can be computed using the
importance sampling algorithm: first we generate several random variables {
vts ~
ϕ,
s = 1,…,
S,
t = 1,…,
T} from the standard normal distribution, then we compute the moments at
t-th observation as : m_t(\theta) = A(z_t) \frac{1}{S}\sum_{s=1}^S H(x_t,y_t,z_t,v_{ts};\theta) \sum_{j=1}^J\!\gamma_j v_{ts}^j, where
θ = (
β,
σ,
γ),
A is just some function of the instrumental variables
z, and
H is a two-component vector of moments : \begin{align} & H_1(x_t,y_t,z_t,v_{ts};\theta) = y_t - g(\hat\pi'z_t + \sigma v_{ts}, \beta), \\ & H_2(x_t,y_t,z_t,v_{ts};\theta) = z_t y_t - (\hat\pi'z_t + \sigma v_{ts}) g(\hat\pi'z_t + \sigma v_{ts}, \beta) \end{align} With moment functions
mt one can apply standard
GMM technique to estimate the unknown parameter
θ. }}
Repeated observations In this approach two (or maybe more) repeated observations of the regressor
x* are available. Both observations contain their own measurement errors; however, those errors are required to be independent: : \begin{cases} x_{1t} = x^*_t + \eta_{1t}, \\ x_{2t} = x^*_t + \eta_{2t}, \end{cases} where
x* ⊥
η1 ⊥
η2. Variables
η1,
η2 need not be identically distributed (although if they are efficiency of the estimator can be slightly improved). With only these two observations it is possible to consistently estimate the density function of
x* using Kotlarski's
deconvolution technique. {{unordered list : \operatorname{E}[\,y_t|x_t\,] = \int g(x^*_t,\beta) f_{x^*|x}(x^*_t|x_t)dx^*_t , where it would be possible to compute the integral if we knew the conditional density function
ƒx*x. If this function could be known or estimated, then the problem turns into standard non-linear regression, which can be estimated for example using the
NLLS method. Assuming for simplicity that
η1,
η2 are identically distributed, this conditional density can be computed as : \hat f_{x^*|x}(x^*|x) = \frac{\hat f_{x^*}(x^*)}{\hat f_{x}(x)} \prod_{j=1}^k \hat f_{\eta_{j}}\big( x_{j} - x^*_{j} \big), where with slight abuse of notation
xj denotes the
j-th component of a vector. All densities in this formula can be estimated using inversion of the empirical
characteristic functions. In particular, : \begin{align} & \hat \varphi_{\eta_j}(v) = \frac{\hat\varphi_{x_j}(v,0)}{\hat\varphi_{x^*_j}(v)}, \quad \text{where } \hat\varphi_{x_j}(v_1,v_2) = \frac{1}{T}\sum_{t=1}^T e^{iv_1x_{1tj}+iv_2x_{2tj}}, \\ \hat\varphi_{x^*_j}(v) = \exp \int_0^v \frac{\partial\hat\varphi_{x_j}(0,v_2)/\partial v_1}{\hat\varphi_{x_j}(0,v_2)}dv_2, \\ & \hat \varphi_x(u) = \frac{1}{2T}\sum_{t=1}^T \Big( e^{iu'x_{1t}} + e^{iu'x_{2t}} \Big), \quad \hat \varphi_{x^*}(u) = \frac{\hat\varphi_x(u)}{\prod_{j=1}^k \hat\varphi_{\eta_j}(u_j)}. \end{align} To invert these characteristic function one has to apply the inverse Fourier transform, with a trimming parameter
C needed to ensure the numerical stability. For example: : \hat f_x(x) = \frac{1}{(2\pi)^k} \int_{-C}^{C}\cdots\int_{-C}^C e^{-iu'x} \hat\varphi_x(u) du. : \begin{cases} y_t = \textstyle \sum_{j=1}^k \beta_j g_j(x^*_t) + \sum_{j=1}^\ell \beta_{k+j}w_{jt} + \varepsilon_t, \\ x_{1t} = x^*_t + \eta_{1t}, \\ x_{2t} = x^*_t + \eta_{2t}, \end{cases} where
wt represents variables measured without errors. The regressor
x* here is scalar (the method can be extended to the case of vector
x* as well). If not for the measurement errors, this would have been a standard
linear model with the estimator : \hat{\beta} = \big(\hat{\operatorname{E}}[\,\xi_t\xi_t'\,]\big)^{-1} \hat{\operatorname{E}}[\,\xi_t y_t\,], where : \xi_t'= (g_1(x^*_t), \cdots ,g_k(x^*_t), w_{1,t}, \cdots , w_{l,t}). It turns out that all the expected values in this formula are estimable using the same deconvolution trick. In particular, for a generic observable
wt (which could be 1,
w1
t, …,
wℓ t, or
yt) and some function
h (which could represent any
gj or
gigj) we have : \operatorname{E}[\,w_th(x^*_t)\,] = \frac{1}{2\pi} \int_{-\infty}^\infty \varphi_h(-u)\psi_w(u)du, where
φh is the
Fourier transform of
h(
x*), but using the same convention as for the
characteristic functions, : \varphi_h(u)=\int e^{iux}h(x)dx, and : \psi_w(u) = \operatorname{E}[\,w_te^{iux^*}\,] = \frac{\operatorname{E}[w_te^{iux_{1t}}]}{\operatorname{E}[e^{iux_{1t}}]} \exp \int_0^u i\frac{\operatorname{E}[x_{2t}e^{ivx_{1t}}]}{\operatorname{E}[e^{ivx_{1t}}]}dv The resulting estimator \scriptstyle\hat\beta is consistent and asymptotically normal. : \hat{g}(x) = \frac{\hat{\operatorname{E}}[\,y_tK_h(x^*_t - x)\,]}{\hat{\operatorname{E}}[\,K_h(x^*_t - x)\,]}, for a suitable choice of the
kernel K and the bandwidth
h. Both expectations here can be estimated using the same technique as in the previous method. }} == References ==