Difference in Differences II

ECON526

Paul Schrimpf

University of British Columbia

Difference in Differences

\[ \def\Er{{\mathrm{E}}} \def\En{{\mathbb{E}_n}} \def\cov{{\mathrm{Cov}}} \def\var{{\mathrm{Var}}} \def\R{{\mathbb{R}}} \newcommand\norm[1]{\left\lVert#1\right\rVert} \def\rank{{\mathrm{rank}}} \newcommand{\inpr}{ \overset{p^*_{\scriptscriptstyle n}}{\longrightarrow}} \def\inprob{{\,{\buildrel p \over \rightarrow}\,}} \def\indist{\,{\buildrel d \over \rightarrow}\,} \DeclareMathOperator*{\plim}{plim} \]

Setup

  • Two Many periods, binary treatment in second some periods
  • Potential outcomes \(\{y_{it}(0),y_{it}(1)\}_{t=1}^T\) for \(i=1,...,N\)
  • Treatment \(D_{it} \in \{0,1\}\),
    • \(D_{i0} = 0\) \(\forall i\)
    • \(D_{i1} = 1\) for some, \(0\) for others
  • Observe \(y_{it} = y_{it}(0)(1-D_{it}) + D_{it} y_{it}(1)\)

Identification

  • Same logic as before, \[ \begin{align*} ATT_{t,t-s} & = \Er[y_{it}(1) - \color{red}{y_{it}(0)} | D_{it} = 1, D_{it-s}=0] \\ & = \Er[y_{it}(1) - y_{it-s}(0) | D_{it} = 1, D_{it-s}=0] - \\ & \;\; - \Er[\color{red}{y_{it}(0)} - y_{t-s}(0) | D_{it}=1, D_{it-s}=0] \end{align*} \]

    • assume \(\Er[\color{red}{y_{it}(0)} - y_{it-s}(0) | D_{it}=1, D_{it-s}=0] = \Er[y_{it}(0) - y_{it-s}(0) | D_{it}=0, D_{it-s}=0]\)

\[ \begin{align*} ATT_{t,t-s}& = \Er[y_{it} - y_{it-s} | D_{it}=1, D_{it-s}=0] - \Er[y_{it} - y_{it-s} | D_{it}=0, D_{it-s}=0] \end{align*} \] - Similarly, can identify various other interpretable average treatment effects conditional on being treated at some times and not others

Estimation

  • Plugin

  • Fixed effects? \[ y_{it} = \beta D_{it} + \alpha_i + \delta_t + \epsilon_{it} \] When will \(\hat{\beta}^{FE}\) consistently estimate some interpretable conditional average of treatment effects?

Fixed Effects

  • As with matching, \[ \begin{align*} \hat{\beta} = & \sum_{i=1,t=1}^{n,T} y_{it} \overbrace{\frac{\tilde{D}_{it}}{ \sum_{i,t} \tilde{D}_{it}^2 }}^{\hat{\omega}_{it}} = \sum_{i=1,t=1}^{n,T} y_{it}(0) \hat{\omega}_{it} + \sum_{i=1,t=1}^{n,T} D_{it} (y_{it}(1) - y_{it}(0)) \hat{\omega}_{it} \end{align*} \] where \[ \begin{align*} \tilde{D}_{it} & = D_{it} - \frac{1}{n} \sum_{j=1}^n (D_{jt} - \frac{1}{T} \sum_{s=1}^T D_{js}) - \frac{1}{T} \sum_{s=1}^T D_{is} \\ & = D_{it} - \frac{1}{n} \sum_{j=1}^n D_{jt} - \frac{1}{T} \sum_{s=1}^T D_{is} + \frac{1}{nT} \sum_{j,s} D_{js} \end{align*} \]
imports
import pandas as pd
import numpy as np
from matplotlib import style
from matplotlib import pyplot as plt
style.use("fivethirtyeight")

Weights

def assigntreat(n, T, portiontreated):
    treated = np.zeros((n, T), dtype=bool)
    for t in range(1, T):
        treated[:, t] = treated[:, t - 1]
        if portiontreated[t] > 0:
            treated[:, t] = np.logical_or(treated[:, t-1], np.random.rand(n) < portiontreated[t])
    return treated

def weights(D):
= D - np.mean(D, axis=0) - np.mean(D, axis=1)[:, np.newaxis] + np.mean(D)
    ω =/ np.sum(D̈**2)
    return ω

n = 100
T = 9
pt = np.zeros(T)
pt[T//2 + 1] = 0.5
D = assigntreat(n, T,pt)
y = np.random.randn(n, T)
weighted_sum = np.sum(y * weights(D))
print(weighted_sum)
0.08147854863434313

Using a package

# check that it matches fixed effect estimate from a package
import pyfixest as pf

df = pd.DataFrame({
    'id': np.repeat(np.arange(1, n + 1), T),
    't': np.tile(np.arange(1, T + 1), n),
    'y': y.flatten(),
    'D': D.flatten()
})
result=pf.feols('y ~ D | id + t', df, vcov={"CRV1": "id"})
result.summary()
###

Estimation:  OLS
Dep. var.: y, Fixed effects: id+t
Inference:  CRV1
Observations:  900

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5% |   97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| D             |      0.081 |        0.133 |     0.610 |      0.543 | -0.183 |   0.346 |
---
RMSE: 0.953 R2: 0.109 R2 Within: 0.0 

Weights with Single Treatment Time

Code
def plotD(D,ax):
    n, T = D.shape
    ax.set(xlabel='time',ylabel='portiontreated')
    ax.plot(range(1,T+1),D.mean(axis=0))
    ax

def plotweights(D, ax):
    n, T = D.shape
    ω = weights(D)
    groups = np.unique(D, axis=0)
    ax.set(xlabel='time', ylabel='weight')

    for g in groups:
        i = np.where(np.all(D == g, axis=1))[0][0]
        wt = ω[i, :]
        ax.plot(range(1, T+1), wt, marker='o', label=f'Treated {np.sum(g)} times')

    ax.legend()
    ax

def plotwd(D):
    fig, ax = plt.subplots(2,1)
    ax[0]=plotD(D,ax[0])
    ax[1]=plotweights(D,ax[1])
    plt.show()

plotwd(D)

Weights with Early and Late Treated

Code
pt = np.zeros(T)
pt[1] = 0.3
pt[T-2] = 0.6
D = assigntreat(n,T,pt)
plotwd(D)

Sign Reversal

Code
dvals = np.unique(D,axis=0)
dvals.sort()
ATT = np.ones(T)
ATT[0] = 0.0
ATT[T-2:T] = 6.0
np.random.seed(6798)

def simulate(n,T,pt,ATT,sigma=1.0):
    D = assigntreat(n,T,pt)
    y = np.random.randn(n,T)*sigma + ATT[np.cumsum(D, axis=1)]
    df = pd.DataFrame({
        'id': np.repeat(np.arange(1, n + 1), T),
        't': np.tile(np.arange(1, T + 1), n),
        'y': y.flatten(),
        'D': D.flatten()
    })
    return(df)

df = simulate(n,T,pt,ATT)
result = pf.feols('y ~ D | id + t',  df, vcov={"CRV1": "id"})
result.summary()
###

Estimation:  OLS
Dep. var.: y, Fixed effects: id+t
Inference:  CRV1
Observations:  900

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5% |   97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| D             |     -0.526 |        0.249 |    -2.107 |      0.038 | -1.020 |  -0.031 |
---
RMSE: 1.287 R2: 0.51 R2 Within: 0.009 

When to worry

  • If multiple treatment times and treatment heterogeneity
  • Even if weights do not have wrong sign, the fixed effects estimate is hard to interpret
  • Same logic applies more generally – not just to time
    • E.g. if have group effects, some treated units in multiple groups, and \(E[y(1) - y(0) | group]\) varies

What to Do?

  • Follow identification \[ \begin{align*} ATT_{t,t-s}& = \Er[y_{it} - y_{it-s} | D_{it}=1, D_{it-s}=0] - \Er[y_{it} - y_{it-s} | D_{it}=0, D_{it-s}=0] \end{align*} \] and estimate \[ \begin{align*} \widehat{ATT}_{t,t-s} = & \frac{\sum_i y_{it} D_{it}(1-D_{it-s})}{\sum_i D_{it}(1-D_{it-s})} \\ & - \frac{\sum_i y_{it} (1-D_{it})(1-D_{it-s})}{\sum_i (1-D_{it})(1-D_{it-s})} \end{align*} \] and perhaps some average, e.g. (there are other reasonable weighted averages) \[ \sum_{t=1}^T \frac{\sum_i D_{it}}{\sum_{i,s} D_{i,s}} \frac{1}{t-1} \sum_{s=1}^{t-1} \widehat{ATT}_{t,t-s} \]
    • Code? Inference? Optimal? (could create it, but there’s an easier way)

What to Do?

  • Option 1: Use an appropriate package
    • differences
    • pyfixest (Gardner’s 2-stage estimator or Dube et al (2023) local projections)
    • doubleml (see chapter 16 of Chernozhukov et al. (2024))
    • see https://asjadnaqvi.github.io/DiD/ for more options (but none are python)

What to Do?

  • Option 2: estimate a correctly specified fixed effects regression (my preferred approach)

  • Problem is possible correlation of \((y_{it}(1) - y_{it}(0))D_{it}\) with \(\tilde{D}_{it}\)

  • \(\tilde{D}_{it}\) is function of \(t\) and \((D_{i1}, ..., D_{iT})\)

  • Estimating separate coefficient for each combination of \(t\) and \((D_{i1}, ..., D_{iT})\) will eliminate correlation / flexibly model treatment effect heterogeneity

What to Do?

  • Cohorts = unique sequences of \((D_{i1}, ..., D_{iT})\)
    • In current simulated example, three cohorts
      1. \((0, 0, 0, 0, 0, 0, 0, 0, 0)\)
      2. \((0, 0, 0, 0, 0, 0, 0, 1, 1)\)
      3. \((0, 1, 1, 1, 1, 1, 1, 1, 1)\)

Regression with Cohort-time Interactions

  • Estimate: \[ y_{it} = \sum_{c=1}^C D_{it} 1\{C_i=c\} \beta_{ct} + \alpha_i + \delta_t + \epsilon_{it} \]

  • \(\hat{\beta}_{ct}\) consistently estimates \(\Er[y_{it}(1) - y_{it}(0) | C_{i}=c, D_{it}=1]\) assuming parallel trends holds for all periods \[ \Er[y_{it}(0) - y_{it-s}(0) | C_i=c] = \Er[y_{it}(0) - y_{it-s}(0) | C_i=c'] \] for all \(t, s, c, c'\)

Regression with Cohort Interactions

def definecohort(df):
    # convert dummies into categorical
    n = len(df.id.unique())
    T = len(df.t.unique())
    df = df.set_index(['id','t'])
    dmat=np.array(df.sort_index().D)
    dmat=np.array(df.D).reshape(n,T)
    cohort=dmat.dot(1 << np.arange(dmat.shape[-1] - 1, -1, -1))
    cdf = pd.DataFrame({"id":np.array(df.index.levels[0]), "cohort":pd.Categorical(cohort)})
    cdf =cdf.set_index('id')
    df = df.reset_index().set_index('id')
    df=pd.merge(df, cdf, left_index=True, right_index=True)
    df=df.reset_index()
    return(df)

dfc = definecohort(df)

def defineinteractions(df):
    df['dct'] = 'untreated'
    df['dct'] = df.apply(lambda x: f"t{x['t']},c{x['cohort']}" if x['D'] else f"untreated", axis=1)
    return(df)

dfc = defineinteractions(dfc)

modc = pf.feols("y ~  C(dct, Treatment('untreated')) | id + t", dfc, vcov={"CRV1": "id"})
pf.etable([modc], type="md")

Regression with Cohort Interactions

index                                           est1
-----------------------------------------  ---------
depvar                                             y
----------------------------------------------------
C(dct, Treatment('untreated'))[T.t2,c255]     0.524
                                             (0.277)
C(dct, Treatment('untreated'))[T.t3,c255]    0.761*
                                             (0.320)
C(dct, Treatment('untreated'))[T.t4,c255]   0.820**
                                             (0.304)
C(dct, Treatment('untreated'))[T.t5,c255]   0.880**
                                             (0.329)
C(dct, Treatment('untreated'))[T.t6,c255]     0.458
                                             (0.285)
C(dct, Treatment('untreated'))[T.t7,c255]    0.624*
                                             (0.307)
C(dct, Treatment('untreated'))[T.t8,c255]  5.242***
                                             (0.408)
C(dct, Treatment('untreated'))[T.t8,c3]    1.121***
                                             (0.311)
C(dct, Treatment('untreated'))[T.t9,c255]  5.693***
                                             (0.354)
C(dct, Treatment('untreated'))[T.t9,c3]    1.007***
                                             (0.286)
----------------------------------------------------
t                                                  x
id                                                 x
----------------------------------------------------
Observations                                     900
S.E. type                                     by: id
R2                                             0.724
----------------------------------------------------

Regression with Cohort Interactions

import re
def plotcohortatt(modc):
    coef = modc.coef()
    ci = modc.confint()
    tcregex = re.compile(r".+t(\d+),c(\d+)]")
    catt = pd.DataFrame(index=coef.index, columns=['t','c','att','yerr'])
    for i in range(len(coef)):
        m = tcregex.match(coef.index[i])
        t,c = m.groups()
        t=int(t)
        c=int(c)
        catt.loc[coef.index[i]] = [t,c,coef.iloc[i],np.abs(ci.iloc[i][0]-coef.iloc[i])]

    catt.sort_values(['c','t'],inplace=True)
    fig, ax = plt.subplots()
    ax.set(xlabel='time', ylabel='ATT | cohort')
    for g in catt.groupby('c') :
        c = g[0]
        g = g[1]
        ax.errorbar(g['t'], g['att'], yerr=g['yerr'], fmt='o', label=f'cohort {c}')

    ax.legend()
    return(fig)

fig=plotcohortatt(modc)

Regression with Cohort Interactions

Regression with Cohort-Time Interactions

  • If just want to assume parallel trends for treated and never treated, i.e. \[ \Er[y_{it}(0) - y_{it-s}(0) | C_i=c] = \Er[y_{it}(0) - y_{it-s}(0) | C_i=c'] \] when \(c\) treated at \(t\), untreated at \(t-s\) and \(c'\) never treated

  • Estimate \[ y_{it} = \sum_{c=1}^C 1\{C_i=c\} \delta_{c,t} + \alpha_i + \epsilon_{it} \]

Regression with Cohort-Time Interactions

\[ y_{it} = \sum_{c=1}^C 1\{C_i=c\} \delta_{c,t} + \alpha_i + \epsilon_{it} \]

  • \(\hat{\delta}_{c,t} + \frac{\sum \alpha_i 1\{C_i=c\}}{\sum 1\{C_i = c\}}\) consistently estimates \(\Er[y_{it} | C_{i} = c]\)
  • \(\hat{\delta}_{c,t} -\hat{\delta}_{c,t-s}\) consistently estimates \(\Er[y_{it} - y_{i,t-s}| C_{i} = c]\)
  • If \(c\) treated at \(t\), not at \(t-s\), and \(c'\) not treated at either and assume parallel trends, \[ \hat{\delta}_{c,t} - \hat{\delta}_{c,t-s} - (\hat{\delta}_{c',t} -\hat{\delta}_{c',t-s}) \inprob \Er[y_{it}(1)-y{it}(0)| C_i =c] \]

Regression with Cohort-Time Interactions

modct = pf.feols("y ~ C(cohort)*C(t) | id + t",dfc, vcov={"CRV1":"id"})
pf.etable(modct, type='md')
index                            est1
--------------------------  ---------
depvar                              y
-------------------------------------
C(cohort)[T.3]:C(t)[T.2]       0.288
                              (0.355)
C(cohort)[T.255]:C(t)[T.2]    0.696*
                              (0.338)
C(cohort)[T.3]:C(t)[T.3]       0.248
                              (0.404)
C(cohort)[T.255]:C(t)[T.3]    0.909*
                              (0.404)
C(cohort)[T.3]:C(t)[T.4]       0.386
                              (0.343)
C(cohort)[T.255]:C(t)[T.4]   1.051**
                              (0.350)
C(cohort)[T.3]:C(t)[T.5]      -0.119
                              (0.375)
C(cohort)[T.255]:C(t)[T.5]    0.809*
                              (0.405)
C(cohort)[T.3]:C(t)[T.6]       0.480
                              (0.291)
C(cohort)[T.255]:C(t)[T.6]    0.745*
                              (0.338)
C(cohort)[T.3]:C(t)[T.7]       0.309
                              (0.360)
C(cohort)[T.255]:C(t)[T.7]    0.808*
                              (0.382)
C(cohort)[T.3]:C(t)[T.8]     1.349**
                              (0.398)
C(cohort)[T.255]:C(t)[T.8]  5.378***
                              (0.438)
C(cohort)[T.3]:C(t)[T.9]     1.235**
                              (0.383)
C(cohort)[T.255]:C(t)[T.9]  5.829***
                              (0.383)
-------------------------------------
t                                   x
id                                  x
-------------------------------------
Observations                      900
S.E. type                      by: id
R2                              0.725
-------------------------------------

Covariates

Doubly Robust Difference in Differences

  • Linear covariates could lead to same problem as with matching

  • Doubly robust estimator Sant’Anna and Zhao (2020)

    • doubleml package implements it
    • see example notebook from chapter 16 of Chernozhukov et al. (2024)

Sources and Further Reading

  • Facure (2022, chap. 1)
  • Huntington-Klein (2021, chap. 16)
  • Book: C. de Chaisemartin and D’Haultfœuille (2023)
  • Recent reviews: Roth et al. (2023), Clément de Chaisemartin and D’Haultfœuille (2022), Arkhangelsky and Imbens (2023)
  • Early work pointing to problems with fixed effects:
    • Laporte and Windmeijer (2005), Wooldridge (2005)
  • Explosion of papers written just before 2020, published just after:
    • Borusyak and Jaravel (2018)
    • Clément de Chaisemartin and D’Haultfœuille (2020)
    • Callaway and Sant’Anna (2021)
    • Goodman-Bacon (2021)
    • Sun and Abraham (2021)

References

Arkhangelsky, Dmitry, and Guido Imbens. 2023. “Causal Models for Longitudinal and Panel Data: A Survey.”
Borusyak, Kirill, and Xavier Jaravel. 2018. “Revisiting Event Study Designs.” https://scholar.harvard.edu/files/borusyak/files/borusyak_jaravel_event_studies.pdf.
Callaway, Brantly, and Pedro H. C. Sant’Anna. 2021. “Difference-in-Differences with Multiple Time Periods.” Journal of Econometrics 225 (2): 200–230. https://doi.org/https://doi.org/10.1016/j.jeconom.2020.12.001.
Chaisemartin, C de, and X D’Haultfœuille. 2023. Credible Answers to Hard Questions: Differences-in-Differences for Natural Experiments. https://dx.doi.org/10.2139/ssrn.4487202.
Chaisemartin, Clément de, and Xavier D’Haultfœuille. 2020. “Two-Way Fixed Effects Estimators with Heterogeneous Treatment Effects.” American Economic Review 110 (9): 2964–96. https://doi.org/10.1257/aer.20181169.
———. 2022. Two-way fixed effects and differences-in-differences with heterogeneous treatment effects: a survey.” The Econometrics Journal 26 (3): C1–30. https://doi.org/10.1093/ectj/utac017.
Chernozhukov, V., C. Hansen, N. Kallus, M. Spindler, and V. Syrgkanis. 2024. Applied Causal Inference Powered by ML and AI. https://causalml-book.org/.
Facure, Matheus. 2022. Causal Inference for the Brave and True. https://matheusfacure.github.io/python-causality-handbook/landing-page.html.
Goodman-Bacon, Andrew. 2021. “Difference-in-Differences with Variation in Treatment Timing.” Journal of Econometrics 225 (2): 254–77. https://doi.org/https://doi.org/10.1016/j.jeconom.2021.03.014.
Huntington-Klein, Nick. 2021. The Effect: An Introduction to Research Design and Causality. CRC Press. https://theeffectbook.net/.
Laporte, Audrey, and Frank Windmeijer. 2005. “Estimation of Panel Data Models with Binary Indicators When Treatment Effects Are Not Constant over Time.” Economics Letters 88 (3): 389–96. https://doi.org/https://doi.org/10.1016/j.econlet.2005.04.002.
Rambachan, Ashesh, and Jonathan Roth. 2023. A More Credible Approach to Parallel Trends.” The Review of Economic Studies 90 (5): 2555–91. https://doi.org/10.1093/restud/rdad018.
Roth, Jonathan. 2022. “Pretest with Caution: Event-Study Estimates After Testing for Parallel Trends.” American Economic Review: Insights 4 (3): 305–22. https://doi.org/10.1257/aeri.20210236.
Roth, Jonathan, Pedro H. C. Sant’Anna, Alyssa Bilinski, and John Poe. 2023. “What’s Trending in Difference-in-Differences? A Synthesis of the Recent Econometrics Literature.” Journal of Econometrics 235 (2): 2218–44. https://doi.org/https://doi.org/10.1016/j.jeconom.2023.03.008.
Sant’Anna, Pedro H. C., and Jun Zhao. 2020. “Doubly Robust Difference-in-Differences Estimators.” Journal of Econometrics 219 (1): 101–22. https://doi.org/https://doi.org/10.1016/j.jeconom.2020.06.003.
Sun, Liyang, and Sarah Abraham. 2021. “Estimating Dynamic Treatment Effects in Event Studies with Heterogeneous Treatment Effects.” Journal of Econometrics 225 (2): 175–99. https://doi.org/https://doi.org/10.1016/j.jeconom.2020.09.006.
Wooldridge, Jeffrey M. 2005. Fixed-Effects and Related Estimators for Correlated Random-Coefficient and Treatment-Effect Panel Data Models.” The Review of Economics and Statistics 87 (2): 385–90. https://doi.org/10.1162/0034653053970320.