MarketEndogeneity (econometrics)
Company Profile

Endogeneity (econometrics)

In econometrics, endogeneity broadly refers to situations in which an explanatory variable is correlated with the error term.

Definition of exogeneity
In a stochastic model, the notion of the usual exogeneity, sequential exogeneity, strong/strict exogeneity can be defined. Exogeneity is articulated in such a way that a variable or variables is exogenous for parameter \alpha. Even if a variable is exogenous for parameter \alpha, it might be endogenous for parameter \beta. When the explanatory variables are not stochastic, then they are strong exogenous for all the parameters. If the independent variable is correlated with the error term in a regression model then the estimate of the regression coefficient in an ordinary least squares (OLS) regression is biased; however if the correlation is not contemporaneous, then the coefficient estimate may still be consistent. There are many methods of correcting the bias, including instrumental variable regression and Heckman selection correction. == Static models ==
Static models
The following are some common sources of endogeneity. Omitted variable In this case, the endogeneity comes from an uncontrolled confounding variable, a variable that is correlated with both the independent variable in the model and with the error term. (Equivalently, the omitted variable affects the independent variable and separately affects the dependent variable.) Assume that the "true" model to be estimated is : y_i = \alpha + \beta x_i + \gamma z_i + u_i but z_i is omitted from the regression model (perhaps because there is no way to measure it directly). Then the model that is actually estimated is : y_i = \alpha + \beta x_i + \varepsilon_i where \varepsilon_i=\gamma z_i + u_i (thus, the z_i term has been absorbed into the error term). If the correlation of x and z is not 0 and z separately affects y (meaning \gamma \neq 0), then x is correlated with the error term \varepsilon. Here, x is not exogenous for \alpha and \beta, since, given x, the distribution of y depends not only on \alpha and \beta, but also on z and \gamma. Measurement error Suppose that a perfect measure of an independent variable is impossible. That is, instead of observing x^{*}_{i}, what is actually observed is x_i=x^{*}_{i}+ \nu_i where \nu_i is the measurement error or "noise". In this case, a model given by : y_i = \alpha+\beta x^{*}_i + \varepsilon_i can be written in terms of observables and error terms as : \begin{align} y_i & = \alpha+\beta(x_i-\nu_i) + \varepsilon_i \\[3pt] y_i & = \alpha+\beta x_i +(\varepsilon_i - \beta\nu_i) \\[3pt] y_i & = \alpha+\beta x_i +u_i \quad (\text{where } u_i=\varepsilon_i - \beta\nu_i) \end{align} Since both x_i and u_i depend on \nu_i, they are correlated, so the OLS estimation of \beta will be biased downward. Measurement error in the dependent variable, y_i, does not cause endogeneity, though it does increase the variance of the error term. Simultaneity Suppose that two variables are codetermined, with each affecting the other according to the following "structural" equations: :y_i = \beta_1 x_i + \gamma_1 z_i + u_i :z_i = \beta_2 x_i + \gamma_2 y_i + v_i Estimating either equation by itself results in endogeneity. In the case of the first structural equation, E(z_i u_i) \neq 0. Solving for z_i while assuming that 1-\gamma_1 \gamma_2 \neq 0 results in :z_i = \frac{\beta_2 + \gamma_2 \beta_1}{1-\gamma_1 \gamma_2}x_i+\frac{1}{1-\gamma_1 \gamma_2}v_i+\frac{\gamma_2}{1-\gamma_1 \gamma_2}u_i. Assuming that x_i and v_i are uncorrelated with u_i, :\operatorname E(z_i u_i) = \frac{\gamma_2}{1-\gamma_1 \gamma_2}\operatorname E(u_i u_i) \neq 0. Therefore, attempts at estimating either structural equation will be hampered by endogeneity. == Dynamic models ==
Dynamic models
The endogeneity problem is particularly relevant in the context of time series analysis of causal processes. It is common for some factors within a causal system to be dependent for their value in period t on the values of other factors in the causal system in period t − 1. Suppose that the level of pest infestation is independent of all other factors within a given period, but is influenced by the level of rainfall and fertilizer in the preceding period. In this instance it would be correct to say that infestation is exogenous within the period, but endogenous over time. Strict exogeneity Let the model be y = f(xz) + u. If the variable x is sequential exogenous for parameter \alpha, and y does not cause x in the Granger sense, then the variable x is strongly/strictly exogenous for the parameter \alpha. Weak exogeneity Weak exogeneity is an identifying assumption which requires that the structural error term has a zero conditional expectation given the present and past values of the regressors. It is used to determine whether statistical inference about parameters of interest can be validly drawn from a conditional probability model alone, without needing to analyze the marginal distribution of the explanatory variables. While strict exogeneity is often implausible in macroeconomic and financial data due to feedback effects, weak exogeneity is the standard identifying assumption employed in these fields. The concept was formalized by Jean-François Richard (1980) and further analyzed by Robert F. Engle, David F. Hendry, and Richard (1983) in Econometrica. The variable z_t is weakly exogenous for a set of specific parameters of interest, denoted as \psi = (\phi_1, \phi_2), if the marginal density of z_t contains no useful information for estimating \psi, that is • the parameters of interest \psi must depend only on the parameters of the conditional model (\phi_1) and not on the parameters of the marginal model (\phi_2) • the parameters \phi_1 and \phi_2 must be variation-free. This means that the permissible range of values for \phi_1 does not depend on the values taken by \phi_2 In a linear regression framework defined by: z_t = x_t^\top \beta + \epsilon_t where z_t is the outcome variable, x_t are the regressors (potentially containing past values of z_t), and \epsilon_t is the structural error term, this implies that the errors are orthogonal to current and past regressors. This can be expressed by the following moment condition: :\mathbb{E}[\epsilon_t \mid x_t, x_{t-1}, \dots] = 0 This condition allows for the errors to be correlated with future realizations of the regressors, accommodating feedback mechanisms where an outcome variable in one period influences regressor values in future periods. This is in contrast to strict exogeneity, a more restrictive assumption which requires that the error term has a zero conditional expectation conditional on the complete set of regressors, including past, present, and future values, that is \mathbb{E}[\epsilon_t \mid x_0, x_1, ..., x_T] = 0 where T is the size of the sample. Equivalently, weak exogeneity requires regressors and lagged response variables to be predetermined—that is, determined prior to the current period. A common example of a weakly exogenous variable is consumption in models with credit constraints and rational expectations. Here, consumption is predetermined but not strictly exogenous. An unpredictable negative income shock will be uncorrelated with past (and potentially current) consumption, but will surely be correlated with future consumption—the individual will be forced to adjust their future consumption to accommodate their poorer state, inducing correlation. If the shock affects current consumption, predeterminedness (defined now as lags only) provides potential instruments—lagged values of the variable. The presence of predetermined variables is a motivating factor in the Arellano–Bond estimator. == See also ==
tickerdossier.comtickerdossier.substack.com