Difference in Diffferences

Paul Schrimpf

2024-11-06

Difference in Differences

\[ \def\Er{{\mathrm{E}}} \def\En{{\mathbb{E}_n}} \def\cov{{\mathrm{Cov}}} \def\var{{\mathrm{Var}}} \def\R{{\mathbb{R}}} \newcommand\norm[1]{\left\lVert#1\right\rVert} \def\rank{{\mathrm{rank}}} \newcommand{\inpr}{ \overset{p^*_{\scriptscriptstyle n}}{\longrightarrow}} \def\inprob{{\,{\buildrel p \over \rightarrow}\,}} \def\indist{\,{\buildrel d \over \rightarrow}\,} \DeclareMathOperator*{\plim}{plim} \]

Setup

  • Two periods, binary treatment in second period
  • Potential outcomes \(\{y_{it}(0),y_{it}(1)\}_{t=0}^1\) for \(i=1,...,N\)
  • Treatment \(D_{it} \in \{0,1\}\),
    • \(D_{i0} = 0\) \(\forall i\)
    • \(D_{i1} = 1\) for some, \(0\) for others
  • Observe \(y_{it} = y_{it}(0)(1-D_{it}) + D_{it} y_{it}(1)\)

Identification

  • Average treatment effect on the treated: \[ \begin{align*} ATT & = \Er[y_{i1}(1) - \color{red}{y_{i1}(0)} | D_{i1} = 1] \\ & = \Er[y_{i1}(1) - y_{i0}(0) | D_{i1} = 1] - \Er[\color{red}{y_{i1}(0)} - y_{i0}(0) | D_{i1}=1] \\ & \text{ assume } \Er[\color{red}{y_{i1}(0)} - y_{i0}(0) | D_{i1}=1] = \Er[y_{i1}(0) - y_{i0}(0) | D_{i1}=0] \\ & = \Er[y_{i1}(1) - y_{i0}(0) | D_{i1} = 1] - \Er[y_{i1}(0) - y_{i0}(0) | D_{i1}=0] \\ & = \Er[y_{i1} - y_{i0} | D_{i1}=1, D_{i0}=0] - \Er[y_{i1} - y_{i0} | D_{i1}=0, D_{i0}=0] \end{align*} \]

Important Assumptions

  • No anticipation: \(D_{i1}=1\) does not affect \(y_{i0}\)
    • built into the potential outcomes notation we used, relaxing would be allowing potential outcomes given sequence of \(D\)\(y_{it}(D_{i0},D_{i1})\)
  • Parallel trends: \(\Er[\color{red}{y_{i1}(0)} - y_{i0}(0) |D_{i1}=1,D_{i0}=0] = \Er[y_{i1}(0) - y_{i0}(0) | D_{i1}=0], D_{i0}=0]\)
    • not invariant to tranformations of \(y\)

Estimation

  • Plugin: \[ \widehat{ATT} = \frac{ \sum_{i=1}^n (y_{i1} - y_{i0})D_{i1}(1-D_{i0})}{\sum_{i=1}^n D_{i1}(1-D_{i0})} - \frac{ \sum_{i=1}^n (y_{i1} - y_{i0})(1-D_{i1})(1-D_{i0})}{\sum_{i=1}^n (1-D_{i1})(1-D_{i0})} \]

  • Regression: \[ y_{it} = \delta_t + \alpha 1\{D_{i1}=1\} + \beta D_{it} + \epsilon_{it} \] then \(\hat{\beta} = \widehat{ATT}\)

  • Fixed effects: \[ y_{it} = \delta_t + \alpha_i + \beta D_{it} + u_{it} \] then \(\hat{\beta} = \widehat{ATT}\)

Multiple Periods

Identification

  • Same logic as before, \[ \begin{align*} ATT_{t,t-s} & = \Er[y_{it}(1) - \color{red}{y_{it}(0)} | D_{it} = 1, D_{it-s}=0] \\ & = \Er[y_{it}(1) - y_{it-s}(0) | D_{it} = 1, D_{it-s}=0] - \\ & \;\; - \Er[\color{red}{y_{it}(0)} - y_{t-s}(0) | D_{it}=1, D_{it-s}=0] \end{align*} \]

    • assume \(\Er[\color{red}{y_{it}(0)} - y_{it-s}(0) | D_{it}=1, D_{it-s}=0] = \Er[y_{it}(0) - y_{it-s}(0) | D_{it}=0, D_{it-s}=0]\)

\[ \begin{align*} ATT_{t,t-s}& = \Er[y_{it} - y_{it-s} | D_{it}=1, D_{it-s}=0] - \Er[y_{it} - y_{it-s} | D_{it}=0, D_{it-s}=0] \end{align*} \] - Similarly, can identify various other interpretable average treatment effects conditional on being treated at some times and not others

Estimation

  • Plugin

  • Fixed effects? \[ y_{it} = \beta D_{it} + \alpha_i + \delta_t + \epsilon_{it} \] When will \(\hat{\beta}^{FE}\) consistently estimate some interpretable conditional average of treatment effects?

Fixed Effects

  • As on problem set 6, \[ \begin{align*} \hat{\beta} = & \sum_{i=1,t=1}^{n,T} y_{it} \overbrace{\frac{\tilde{D}_{it}}{ \sum_{i,t} \tilde{D}_{it}^2 }}^{\hat{\omega}_{it}(D_it)} \\ = & \sum_{i=1,t=1}^{n,T} y_{it}(0) \hat{\omega}_{it}(D_it) + \sum_{i=1,t=1}^{n,T} D_{it} (y_{it}(1) - y_{it}(0)) \hat{\omega}_{it}(D_it) \end{align*} \] where \[ \begin{align*} \tilde{D}_{it} & = D_{it} - \frac{1}{n} \sum_{j=1}^n (D_{jt} - \frac{1}{T} \sum_{s=1}^T D_{js}) - \frac{1}{T} \sum_{s=1}^T D_{is} \\ & = D_{it} - \frac{1}{n} \sum_{j=1}^n D_{jt} - \frac{1}{T} \sum_{s=1}^T D_{is} + \frac{1}{nT} \sum_{j,s} D_{js} \end{align*} \]

Simulation

  • \(T\) periods
  • Once \(i\) treated, remains treated

Weights

Code
using Statistics
function assigntreat(n,T;portiontreated=vcat(zeros(T ÷ 2), 0.5, zeros(T - (T ÷ 2) - 1)))
  treated = falses(n,T)
  for t=2:T
    treated[:,t] = treated[:,t-1]
    if (portiontreated[t]>0)
      treated[:,t] = (treated[:,t] .|| rand(n) .< portiontreated[t])
    end
  end
  return(treated)
end

function weights(D)
= D .- mean(D,dims=1) .- mean(D,dims=2) .+ mean(D)
  ω =./sum(D̃.^2)
end

n = 1000
T = 9
D = assigntreat(n,T)
y = randn(n,T)
sum(y.*weights(D))
0.007077771753723077
Code
n,T = size(D)
using DataFrames, FixedEffectModels, RegressionTables
df = DataFrame(id = vec((1:n)*ones(Int,T)'), t = vec(ones(Int,n)*(1:T)'), y = vec(y), D=vec(D))
m=reg(df, @formula(y ~ D + fe(t) + fe(id)))
regtable(m, render=AsciiTable())

--------------------------
                      y   
--------------------------
D                    0.007
                   (0.042)
--------------------------
t Fixed Effects        Yes
id Fixed Effects       Yes
--------------------------
N                    9,000
R2                   0.112
Within-R2            0.000
--------------------------

Weights with Single Treatment Time

Code
using PlotlyLight

function plotp(D; width=900, height=300)
  n,T=size(D)
  plt=Plot()
  plt.layout=Config(xaxis=Config(title="time",tickvals=1:T),
                    yaxis=Config(title="Portion Treated"),
                    autosize=false,
                    width=width,
                    height=height)
  plt(x=1:T,y=vec(mean(D,dims=1)))
  plt()
end

pfig = plotp(D)
pfig

Weights with Single Treatment Time

Code
function plotweights(D; width=900, height=300)
  n,T = size(D)
  ω = weights(D)
  groups = unique(eachrow(D))
  plt = Plot()
  plt.layout=Config(xaxis=Config(title="time",tickvals=1:T),
                    yaxis=Config(title="weight"),
                    autosize=false,
                    width=width,
                    height=height)

  for g in groups
    i = findfirst(d == g for d in eachrow(D))
    wt = ω[i,:]
    plt(x=1:T,y=wt,name="Treated $(sum(g)) times", mode="markers",type="scatter")
    end
  fig=plt()
  return(fig)
end
fig = plotweights(D)
fig

Weights with Uniform Treatment Time

Code
D = assigntreat(n,T,portiontreated=vcat(0,fill(0.5/(T-1),T-1)))
pfig = plotp(D)
pfig

Weights with Uniform Treatment Time

Code
fig = plotweights(D)
fig

Weights with Early and Late Treated

Code
pt = zeros(T)
pt[2] = 1/3
pt[end-1]=1/3
D = assigntreat(n,T,portiontreated=pt)
pfig = plotp(D)
pfig

Weights with Early and Late Treatde

Code
fig = plotweights(D)
fig

Sign Reversal with Fixed Effects

  • True Treatment Effects
Code
pt = zeros(T)
pt[2] = 1/3
pt[end-1]=1/3

function simulate(n,T,portiontreated, ATT, σ=0.01)
  D = assigntreat(n,T,portiontreated=portiontreated)
  y = randn(n,T)*σ
  for i in axes(y)[1]
    timetreated=cumsum(D[i,:])
    y[i,:] .+= (tt>0 ? ATT[tt] : 0.0 for tt in timetreated)
  end
  DataFrame(id = vec((1:n)*ones(Int,T)'), t = vec(ones(Int,n)*(1:T)'), y = vec(y), D=vec(D))
end

ATT =  vcat(ones(T-3),10*ones(3))
df = simulate(n,T, pt,ATT)

function plotGAT(ATT,D; width=900, height=300)
  n,T = size(D)
  groups = unique(eachrow(D))
  plt = Plot()
  plt.layout=Config(xaxis=Config(title="time",tickvals=1:T),
                    yaxis=Config(title="ATT"),
                    autosize=false,
                    width=width,
                    height=height)

  for g in groups
    t = findfirst(g)
    if (isnothing(t))
      t=T+1
    end
    ate = vcat(zeros(t-1), ATT[1:(T-t+1)])
    plt(x=1:T,y=ate,name="Treated $(sum(g)) times", mode="markers",type="scatter")
    end
  fig=plt()
  return(fig)
end

plotGAT(ATT,D)

Sign Reversal with Fixed Effects

  • Fixed Effects Estimate
m=reg(df, @formula(y ~ D + fe(t) + fe(id)))
regtable(m, render=AsciiTable())

----------------------------
                       y    
----------------------------
D                  -0.461***
                     (0.087)
----------------------------
t Fixed Effects          Yes
id Fixed Effects         Yes
----------------------------
N                      9,000
R2                     0.543
Within-R2              0.004
----------------------------

When to worry

  • If multiple treatment times and treatment heterogeneity
  • Even if weights do not have wrong sign, the fixed effects estimate is hard to interpret
  • Same logic applies more generally – not just to time
    • E.g. if have group effects, some treated units in multiple groups, and \(E[y(1) - y(0) | group]\) varies

What to Do?

Plug-in Estimator

  • Follow identification \[ \begin{align*} ATT_{t,t-s}& = \Er[y_{it} - y_{it-s} | D_{it}=1, D_{it-s}=0] - \Er[y_{it} - y_{it-s} | D_{it}=0, D_{it-s}=0] \end{align*} \] and estimate \[ \begin{align*} \widehat{ATT}_{t,t-s} = & \frac{\sum_i y_{it} D_{it}(1-D_{it-s})}{\sum_i D_{it}(1-D_{it-s})} \\ & - \frac{\sum_i y_{it} (1-D_{it})(1-D_{it-s})}{\sum_i (1-D_{it})(1-D_{it-s})} \end{align*} \] and perhaps some average, e.g. (there are other reasonable weighted averages) \[ \sum_{t=1}^T \frac{\sum_i D_{it}}{\sum_{i,s} D_{i,s}} \frac{1}{t-1} \sum_{s=1}^{t-1} \widehat{ATT}_{t,t-s} \]
    • Inference? Optimal?

What to Do?

  • Problem is possible correlation of \((y_{it}(1) - y_{it}(0))D_{it}\) with \(\tilde{D}_{it}\)
    • \(\tilde{D}_{it}\) is function of \(t\) and \((D_{i1}, ..., D_{iT})\)
    • Estimating separate coefficient for each combination of \(t\) and \((D_{i1}, ..., D_{iT})\) will eliminate correlation / flexibly model treatment effect heterogeneity

Cohorts

  • Cohorts = unique sequences of \((D_{i1}, ..., D_{iT})\)
    • In last simulated example, three cohorts
      1. \((0, 0, 0, 0, 0, 0, 0, 0, 0)\)
      2. \((0, 0, 0, 0, 0, 0, 0, 1, 1)\)
      3. \((0, 1, 1, 1, 1, 1, 1, 1, 1)\)
Code
using CategoricalArrays

function createCohorts(df)
  n = length(unique(df.id))
  T = length(unique(df.t))
  sorted = sort(df, [:id, :t])
  D = reshape(sorted.D, T,n)'
  groups = sort(unique(eachrow(D)))
  cohorts = [findfirst(d == g for g in groups) for d in eachrow(D)]
  df=leftjoin(sorted, DataFrame(cohort=categorical(cohorts), id=unique(sorted.id)), on=:id)
  df.DCt .= "untreated"
  for r in 1:T
    for c in unique(df.cohort)
      dct = (df.t .== r) .& (df.cohort .== c) .& df.D
      if (any(dct))
        df.DCt[dct] .= "c$(c),t$(r)"
        df[!,"Dc$(c)t$(r)"] .= false
        df[!,"Dc$(c)t$(r)"][dct] .= true
      end
    end
  end
  df.ct = categorical(df.t)
  df
end

dfc = createCohorts(df);

Regression with Cohort Interactions

Code
m=reg(dfc, @formula(y ~ DCt + fe(id) + fe(t)))
regtable(m, render=AsciiTable())

----------------------------
                       y    
----------------------------
DCt: c2,t9             0.002
                     (0.001)
DCt: c3,t2            -0.000
                     (0.001)
DCt: c3,t3             0.001
                     (0.001)
DCt: c3,t4            -0.000
                     (0.001)
DCt: c3,t5            -0.000
                     (0.001)
DCt: c3,t6            -0.001
                     (0.001)
DCt: c3,t7            -0.000
                     (0.001)
DCt: c3,t8          9.000***
                     (0.001)
DCt: c3,t9          9.001***
                     (0.001)
DCt: untreated     -0.999***
                     (0.001)
----------------------------
id Fixed Effects         Yes
t Fixed Effects          Yes
----------------------------
N                      9,000
R2                     1.000
Within-R2              1.000
----------------------------

Regression with Cohort Interactions

Code
m=reg(dfc, @formula(y ~ -1 + cohort*ct + fe(id)), save=:fe)
regtable(m, render=AsciiTable())
[ Info: RHS-variable cohort: 2 is collinear with the fixed effects.
[ Info: RHS-variable cohort: 3 is collinear with the fixed effects.

-----------------------------
                        y    
-----------------------------
cohort: 2               0.000
                        (NaN)
cohort: 3               0.000
                        (NaN)
ct: 2                  0.002*
                      (0.001)
ct: 3                  -0.000
                      (0.001)
ct: 4                   0.000
                      (0.001)
ct: 5                   0.001
                      (0.001)
ct: 6                   0.001
                      (0.001)
ct: 7                   0.000
                      (0.001)
ct: 8                   0.001
                      (0.001)
ct: 9                  -0.000
                      (0.001)
cohort: 2 & ct: 2      -0.001
                      (0.001)
cohort: 3 & ct: 2    0.999***
                      (0.001)
cohort: 2 & ct: 3       0.001
                      (0.001)
cohort: 3 & ct: 3    1.000***
                      (0.001)
cohort: 2 & ct: 4       0.001
                      (0.001)
cohort: 3 & ct: 4    0.999***
                      (0.001)
cohort: 2 & ct: 5      -0.000
                      (0.001)
cohort: 3 & ct: 5    0.999***
                      (0.001)
cohort: 2 & ct: 6       0.000
                      (0.001)
cohort: 3 & ct: 6    0.998***
                      (0.001)
cohort: 2 & ct: 7      -0.000
                      (0.001)
cohort: 3 & ct: 7    0.999***
                      (0.001)
cohort: 2 & ct: 8    0.999***
                      (0.001)
cohort: 3 & ct: 8    9.999***
                      (0.001)
cohort: 2 & ct: 9    1.001***
                      (0.001)
cohort: 3 & ct: 9   10.000***
                      (0.001)
-----------------------------
id Fixed Effects          Yes
-----------------------------
N                       9,000
R2                      1.000
Within-R2               1.000
-----------------------------

What to Do?

  • Understand existing methods: read reviews Clément de Chaisemartin and D’Haultfœuille (2022), Roth et al. (2023), C. de Chaisemartin and D’Haultfœuille (2023)
  • Use an appropriate package:

Reading

  • Book: C. de Chaisemartin and D’Haultfœuille (2023)
  • Recent reviews: Roth et al. (2023), Clément de Chaisemartin and D’Haultfœuille (2022), Arkhangelsky and Imbens (2023)
  • Early work pointing to problems with fixed effects:
    • Laporte and Windmeijer (2005), Wooldridge (2005)
  • Explosion of papers written just before 2020, published just after:
    • Borusyak and Jaravel (2018)
    • Clément de Chaisemartin and D’Haultfœuille (2020)
    • Callaway and Sant’Anna (2021)
    • Goodman-Bacon (2021)
    • Sun and Abraham (2021)

References

Arkhangelsky, Dmitry, and Guido Imbens. 2023. “Causal Models for Longitudinal and Panel Data: A Survey.”
Borusyak, Kirill, and Xavier Jaravel. 2018. “Revisiting Event Study Designs.” https://scholar.harvard.edu/files/borusyak/files/borusyak_jaravel_event_studies.pdf.
Callaway, Brantly, and Pedro H. C. Sant’Anna. 2021. “Difference-in-Differences with Multiple Time Periods.” Journal of Econometrics 225 (2): 200–230. https://doi.org/https://doi.org/10.1016/j.jeconom.2020.12.001.
Chaisemartin, C de, and X D’Haultfœuille. 2023. Credible Answers to Hard Questions: Differences-in-Differences for Natural Experiments. https://dx.doi.org/10.2139/ssrn.4487202.
Chaisemartin, Clément de, and Xavier D’Haultfœuille. 2020. “Two-Way Fixed Effects Estimators with Heterogeneous Treatment Effects.” American Economic Review 110 (9): 2964–96. https://doi.org/10.1257/aer.20181169.
———. 2022. Two-way fixed effects and differences-in-differences with heterogeneous treatment effects: a survey.” The Econometrics Journal 26 (3): C1–30. https://doi.org/10.1093/ectj/utac017.
Goodman-Bacon, Andrew. 2021. “Difference-in-Differences with Variation in Treatment Timing.” Journal of Econometrics 225 (2): 254–77. https://doi.org/https://doi.org/10.1016/j.jeconom.2021.03.014.
Laporte, Audrey, and Frank Windmeijer. 2005. “Estimation of Panel Data Models with Binary Indicators When Treatment Effects Are Not Constant over Time.” Economics Letters 88 (3): 389–96. https://doi.org/https://doi.org/10.1016/j.econlet.2005.04.002.
Rambachan, Ashesh, and Jonathan Roth. 2023. A More Credible Approach to Parallel Trends.” The Review of Economic Studies 90 (5): 2555–91. https://doi.org/10.1093/restud/rdad018.
Roth, Jonathan. 2022. “Pretest with Caution: Event-Study Estimates After Testing for Parallel Trends.” American Economic Review: Insights 4 (3): 305–22. https://doi.org/10.1257/aeri.20210236.
Roth, Jonathan, Pedro H. C. Sant’Anna, Alyssa Bilinski, and John Poe. 2023. “What’s Trending in Difference-in-Differences? A Synthesis of the Recent Econometrics Literature.” Journal of Econometrics 235 (2): 2218–44. https://doi.org/https://doi.org/10.1016/j.jeconom.2023.03.008.
Sun, Liyang, and Sarah Abraham. 2021. “Estimating Dynamic Treatment Effects in Event Studies with Heterogeneous Treatment Effects.” Journal of Econometrics 225 (2): 175–99. https://doi.org/https://doi.org/10.1016/j.jeconom.2020.09.006.
Wooldridge, Jeffrey M. 2005. Fixed-Effects and Related Estimators for Correlated Random-Coefficient and Treatment-Effect Panel Data Models.” The Review of Economics and Statistics 87 (2): 385–90. https://doi.org/10.1162/0034653053970320.