The DID method can be implemented according to the table below, where the lower right cell is the DID estimator. Running a
regression analysis gives the same result. Consider the OLS model : y ~=~ \beta_0 + \beta_1 T + \beta_2 S + \beta_3 (T \cdot S) + \varepsilon where T is a dummy variable for the period, equal to 1 when t=2, and S is a dummy variable for group membership, equal to 1 when s=2. The composite variable (T \cdot S) is a dummy variable indicating when S=T=1. Although it is not shown rigorously here, this is a proper parametrization of the model
formal definition, furthermore, it turns out that the group and period averages in that section relate to the model parameter estimates as follows : \begin{align} \hat{\beta}_0 & = \widehat{E}(y \mid T=0,~ S=0) \\[8pt] \hat{\beta}_1 & = \widehat{E}(y \mid T=1,~ S=0) - \widehat{E}(y \mid T=0,~ S=0) \\[8pt] \hat{\beta}_2 & = \widehat{E}(y \mid T=0,~ S=1) - \widehat{E}(y \mid T=0,~ S=0) \\[8pt] \hat{\beta}_3 & = \big[\widehat{E}(y \mid T=1,~ S=1) - \widehat{E}(y \mid T=0,~ S=1)\big] \\ & \qquad {} - \big[\widehat{E}(y \mid T=1,~ S=0) - \widehat{E}(y \mid T=0,~ S=0)\big], \end{align} where \widehat{E}(\dots \mid \dots ) stands for conditional averages computed on the sample, for example, T=1 is the indicator for the after period, S=0 is an indicator for the control group. Note that \hat{\beta}_1 is an estimate of the counterfactual rather than the impact of the control group. The control group is often used as a proxy for the
counterfactual (see,
Synthetic control method for a deeper understanding of this point). Thereby, \hat{\beta}_1 can be interpreted as the impact of both the control group and the intervention's (treatment's) counterfactual. Similarly, \hat{\beta}_2, due to the parallel trend assumption, is also the same differential between the treatment and control group in T=1 . The above descriptions should not be construed to imply the (average) effect of only the control group, for \hat{\beta}_1, or only the difference of the treatment and control groups in the pre-period, for \hat{\beta}_2. As in
Card and
Krueger, below, a first (time) difference of the outcome variable (\Delta Y_i = Y_{i,1} - Y_{i,0}) eliminates the need for time-trend (i.e., \hat{\beta}_1) to form an unbiased estimate of \hat{\beta}_3, implying that \hat{\beta}_1 is not actually conditional on the treatment or control group. Consistently, a difference among the treatment and control groups would eliminate the need for treatment differentials (i.e., \hat{\beta}_2) to form an unbiased estimate of \hat{\beta}_3. This nuance is important to understand when the user believes (weak) violations of parallel pre-trend exist or in the case of violations of the appropriate counterfactual approximation assumptions given the existence of non-common shocks or
confounding events. To see the relation between this notation and the previous section, consider as above only one observation per time period for each group, then : \begin{align} \widehat{E}(y \mid T=1,~ S=0) & = \widehat{E}(y \mid \text{ after period, control}) \\ [3pt] \\ & = \frac{ \widehat{E}(y \ I(\text{ after period, control}) )}{ \widehat{P}(\text{ after period, control})} \\ [3pt] \\ & = \frac{ \sum_{i=1}^n y_{i,\text{after}} I(i \text{ in control}) } { n_{\text{control}} } = \overline{y}_{\text{control, after}} \\ [3pt] \\ & = \overline{y}_{\text{12}} \end{align} and so on for other values of T and S, which is equivalent to : \hat{\beta}_3 ~=~ (y_{11} - y_{21}) - (y_{12} - y_{22}). But this is the expression for the treatment effect that was given in the
formal definition and in the above table. Variants of difference-in-difference frameworks include ones for staggered implementation of treatment as well as an estimator introduced for multiple time periods and other variations by Brantly Callaway and
Pedro H.C. Sant'Anna. ==Example==