Difference in Diffferences

Paul Schrimpf


Difference in Differences

\[ \def\Er{{\mathrm{E}}} \def\En{{\mathbb{E}_n}} \def\cov{{\mathrm{Cov}}} \def\var{{\mathrm{Var}}} \def\R{{\mathbb{R}}} \newcommand\norm[1]{\left\lVert#1\right\rVert} \def\rank{{\mathrm{rank}}} \newcommand{\inpr}{ \overset{p^*_{\scriptscriptstyle n}}{\longrightarrow}} \def\inprob{{\,{\buildrel p \over \rightarrow}\,}} \def\indist{\,{\buildrel d \over \rightarrow}\,} \DeclareMathOperator*{\plim}{plim} \]


  • Two periods, binary treatment in second period
  • Potential outcomes \(\{y_{it}(0),y_{it}(1)\}_{t=0}^1\) for \(i=1,...,N\)
  • Treatment \(D_{it} \in \{0,1\}\),
    • \(D_{i0} = 0\) \(\forall i\)
    • \(D_{i1} = 1\) for some, \(0\) for others
  • Observe \(y_{it} = y_{it}(0)(1-D_{it}) + D_{it} y_{it}(1)\)


  • Average treatment effect on the treated: \[ \begin{align*} ATT & = \Er[y_{i1}(1) - \color{red}{y_{i1}(0)} | D_{i1} = 1] \\ & = \Er[y_{i1}(1) - y_{i0}(0) | D_{i1} = 1] - \Er[\color{red}{y_{i1}(0)} - y_{i0}(0) | D_{i1}=1] \\ & \text{ assume } \Er[\color{red}{y_{i1}(0)} - y_{i0}(0) | D_{i1}=1] = \Er[y_{i1}(0) - y_{i0}(0) | D_{i1}=0] \\ & = \Er[y_{i1}(1) - y_{i0}(0) | D_{i1} = 1] - \Er[y_{i1}(0) - y_{i0}(0) | D_{i1}=0] \\ & = \Er[y_{i1} - y_{i0} | D_{i1}=1, D_{i0}=0] - \Er[y_{i1} - y_{i0} | D_{i1}=0, D_{i0}=0] \end{align*} \]

Important Assumptions

  • No anticipation: \(D_{i1}=1\) does not affect \(y_{i0}\)
    • built into the potential outcomes notation we used, relaxing would be allowing potential outcomes given sequence of \(D\)\(y_{it}(D_{i0},D_{i1})\)
  • Parallel trends: \(\Er[\color{red}{y_{i1}(0)} - y_{i0}(0) |D_{i1}=1,D_{i0}=0] = \Er[y_{i1}(0) - y_{i0}(0) | D_{i1}=0], D_{i0}=0]\)
    • not invariant to tranformations of \(y\)


  • Plugin: \[ \widehat{ATT} = \frac{ \sum_{i=1}^n (y_{i1} - y_{i0})D_{i1}(1-D_{i0})}{\sum_{i=1}^n D_{i1}(1-D_{i0})} - \frac{ \sum_{i=1}^n (y_{i1} - y_{i0})(1-D_{i1})(1-D_{i0})}{\sum_{i=1}^n (1-D_{i1})(1-D_{i0})} \]

  • Regression: \[ y_{it} = \delta_t + \alpha 1\{D_{i1}=1\} + \beta D_{it} + \epsilon_{it} \] then \(\hat{\beta} = \widehat{ATT}\)

  • Fixed effects: \[ y_{it} = \delta_t + \alpha_i + \beta D_{it} + u_{it} \] then \(\hat{\beta} = \widehat{ATT}\)

Multiple Periods


  • Same logic as before, \[ \begin{align*} ATT_{t,t-s} & = \Er[y_{it}(1) - \color{red}{y_{it}(0)} | D_{it} = 1, D_{it-s}=0] \\ & = \Er[y_{it}(1) - y_{it-s}(0) | D_{it} = 1, D_{it-s}=0] - \\ & \;\; - \Er[\color{red}{y_{it}(0)} - y_{t-s}(0) | D_{it}=1, D_{it-s}=0] \end{align*} \]

    • assume \(\Er[\color{red}{y_{it}(0)} - y_{it-s}(0) | D_{it}=1, D_{it-s}=0] = \Er[y_{it}(0) - y_{it-s}(0) | D_{it}=0, D_{it-s}=0]\)

\[ \begin{align*} ATT_{t,t-s}& = \Er[y_{it} - y_{it-s} | D_{it}=1, D_{it-s}=0] - \Er[y_{it} - y_{it-s} | D_{it}=0, D_{it-s}=0] \end{align*} \] - Similarly, can identify various other interpretable average treatment effects conditional on being treated at some times and not others


  • Plugin

  • Fixed effects? \[ y_{it} = \beta D_{it} + \alpha_i + \delta_t + \epsilon_{it} \] When will \(\hat{\beta}^{FE}\) consistently estimate some interpretable conditional average of treatment effects?

Fixed Effects

  • As on problem set 6, \[ \begin{align*} \hat{\beta} = & \sum_{i=1,t=1}^{n,T} y_{it} \overbrace{\frac{\tilde{D}_{it}}{ \sum_{i,t} \tilde{D}_{it}^2 }}^{\hat{\omega}_{it}(D_it)} \\ = & \sum_{i=1,t=1}^{n,T} y_{it}(0) \hat{\omega}_{it}(D_it) + \sum_{i=1,t=1}^{n,T} D_{it} (y_{it}(1) - y_{it}(0)) \hat{\omega}_{it}(D_it) \end{align*} \] where \[ \begin{align*} \tilde{D}_{it} & = D_{it} - \frac{1}{n} \sum_{j=1}^n (D_{jt} - \frac{1}{T} \sum_{s=1}^T D_{js}) - \frac{1}{T} \sum_{s=1}^T D_{is} \\ & = D_{it} - \frac{1}{n} \sum_{j=1}^n D_{jt} - \frac{1}{T} \sum_{s=1}^T D_{is} + \frac{1}{nT} \sum_{j,s} D_{js} \end{align*} \]


using Statistics
function assigntreat(n,T;portiontreated=vcat(zeros(T ÷ 2), 0.5, zeros(T - (T ÷ 2) - 1)))
  treated = falses(n,T)
  for t=2:T
    treated[:,t] = treated[:,t-1]
    if (portiontreated[t]>0)
      treated[:,t] = (treated[:,t] .|| rand(n) .< portiontreated[t])

function weights(D)
= D .- mean(D,dims=1) .- mean(D,dims=2) .+ mean(D)
  ω =./sum(D̃.^2)

n = 100
T = 9
D = assigntreat(n,T)
y = randn(n,T)
n,T = size(D)
using DataFrames, FixedEffectModels
df = DataFrame(id = vec((1:n)*ones(Int,T)'), t = vec(ones(Int,n)*(1:T)'), y = vec(y), D=vec(D))
reg(df, @formula(y ~ D + fe(t) + fe(id)))
Number of obs:              900   Converged:                 true
dof (model):                  1   dof (residuals):            791
R²:                       0.120   R² adjusted:             -0.002
F-statistic:           0.626217   P-value:                  0.429
R² within:                0.001   Iterations:                   2
   Estimate  Std. Error    t-stat  Pr(>|t|)  Lower 95%  Upper 95%
D  0.102789    0.129892  0.791339    0.4290  -0.152185   0.357763

Weights with Single Treatment Time

using PlotlyLight, Cobweb
function plotp(D)
                    yaxis=Config(title="Portion Treated"))

function plotweights(D)
  n,T = size(D)
  ω = weights(D)
  groups = unique(eachrow(D))
  plt = Plot()
  for g in groups
    i = findfirst(d == g for d in eachrow(D))
    wt = ω[i,:]
    plt(x=1:T,y=wt,name="Treated $(sum(g)) times", mode="markers",type="scatter")
pfig = plotp(D)
Cobweb.save(Page(pfig), "p-samet.html")
fig = plotweights(D)
Cobweb.save(Page(fig), "w-samet.html")
Weights with Uniform Treatment Time

D = assigntreat(n,T,portiontreated=vcat(0,fill(0.5/(T-1),T-1)))
pfig = plotp(D)
Cobweb.save(Page(pfig), "p-unit.html")
fig = plotweights(D)
Cobweb.save(Page(fig), "w-unit.html")
Weights with Early and Late Treated

pt = zeros(T)
pt[2] = 1/3
D = assigntreat(n,T,portiontreated=pt)
pfig = plotp(D)
Cobweb.save(Page(pfig), "p-el.html")
fig = plotweights(D)
Cobweb.save(Page(fig), "w-el.html")
Sign Reversal with Fixed Effects

pt = zeros(T)
pt[2] = 1/3

function simulate(n,T,portiontreated, ATT, σ=0.01)
  D = assigntreat(n,T,portiontreated=portiontreated)
  y = randn(n,T)*σ
  for i in axes(y)[1]
    y[i,:] .+= (tt>0 ? ATT[tt] : 0.0 for tt in timetreated)
  DataFrame(id = vec((1:n)*ones(Int,T)'), t = vec(ones(Int,n)*(1:T)'), y = vec(y), D=vec(D))

ATT =  vcat(ones(T-3),10*ones(3))
df = simulate(n,T, pt,ATT)
reg(df, @formula(y ~ D + fe(t) + fe(id)))
Number of obs:               900  Converged:                  true
dof (model):                   1  dof (residuals):             791
R²:                        0.547  R² adjusted:               0.485
F-statistic:             3.27459  P-value:                   0.071
R² within:                 0.004  Iterations:                    2
    Estimate  Std. Error    t-stat  Pr(>|t|)  Lower 95%  Upper 95%
D  -0.500218    0.276427  -1.80958    0.0707   -1.04284  0.0423998

Sign Reversal with Fixed Effects

D = reshape(df.D, n,T)
y = reshape(df.y, n,T)
function plotATT(D, y)
  n,T = size(D)
  groups = unique(eachrow(D))
  ATT = zeros(0)
  tt = zeros(0)
  ts = zeros(0)
  for t  2:T
    for s  1:t-1
      treatts = D[:,t] .& .!D[:,s]
      controlts = .!D[:,t] .& .!D[:,s]
      if any(treatts) && any(controlts)
        push!(ATT, mean(y[treatts,t] - y[treatts,s]) - mean(y[controlts,t] - y[controlts,s]))
  plt = Plot()
  plt.layout = Config(xaxis=hiddenaxis,yaxis=hiddenaxis)
  fig=plt(x=tt,y=ts,z=ATT,name="ATTₜ,ₜ₋ₛ", mode="markers",type="scatter3d")
Cobweb.save(Page(fig), "ATT.html")
When to worry

  • If multiple treatment times and treatment heterogeneity
  • Even if weights do not have wrong sign, the fixed effects estimate is hard to interpret
  • Same logic applies more generally – not just to time
    • E.g. if have group effects, some treated units in multiple groups, and \(E[y(1) - y(0) | group]\) varies

What to Do?

Plug-in Estimator

  • Follow identification \[ \begin{align*} ATT_{t,t-s}& = \Er[y_{it} - y_{it-s} | D_{it}=1, D_{it-s}=0] - \Er[y_{it} - y_{it-s} | D_{it}=0, D_{it-s}=0] \end{align*} \] and estimate \[ \begin{align*} \widehat{ATT}_{t,t-s} = & \frac{\sum_i y_{it} D_{it}(1-D_{it-s})}{\sum_i D_{it}(1-D_{it-s})} \\ & - \frac{\sum_i y_{it} (1-D_{it})(1-D_{it-s})}{\sum_i (1-D_{it})(1-D_{it-s})} \end{align*} \] and perhaps some average, e.g. (there are other reasonable weighted averages) \[ \sum_{t=1}^T \frac{\sum_i D_{it}}{\sum_{i,s} D_{i,s}} \frac{1}{t-1} \sum_{s=1}^{t-1} \widehat{ATT}_{t,t-s} \]
    • Inference? Optimal? Code?

What to Do?

  • Problem is possible correlation of \((y_{it}(1) - y_{it}(0))D_{it}\) with \(\tilde{D}_{it}\)
    • \(\tilde{D}_{it}\) is function of \(t\) and \((D_{i1}, ..., D_{iT})\)
    • Estimating separate coefficient for each combination of \(t\) and \((D_{i1}, ..., D_{iT})\) will eliminate correlation / flexibly model treatment effect heterogeneity


  • Cohorts = unique sequences of \((D_{i1}, ..., D_{iT})\)
    • In last simulated example, three cohorts
      1. \((0, 0, 0, 0, 0, 0, 0, 0, 0)\)
      2. \((0, 0, 0, 0, 0, 0, 0, 1, 1)\)
      3. \((0, 1, 1, 1, 1, 1, 1, 1, 1)\)

Regression with Cohort Interactions

using CategoricalArrays

function createCohorts(df)
  n = length(unique(df.id))
  T = length(unique(df.t))
  sorted = sort(df, [:id, :t])
  D = reshape(sorted.D, T,n)'
  groups = sort(unique(eachrow(D)))
  cohorts = [findfirst(d == g for g in groups) for d in eachrow(D)]
  df=leftjoin(sorted, DataFrame(cohort=categorical(cohorts), id=unique(sorted.id)), on=:id)
  df.DCt .= "untreated"
  for r in 1:T
    for c in unique(df.cohort)
      dct = (df.t .== r) .& (df.cohort .== c) .& df.D
      if (any(dct))
        df.DCt[dct] .= "c$(c),t$(r)"
        df[!,"Dc$(c)t$(r)"] .= false
        df[!,"Dc$(c)t$(r)"][dct] .= true
  df.ct = categorical(df.t)

dfc = createCohorts(df)

reg(dfc, @formula(y ~ DCt + fe(id) + fe(t)))
Number of obs:                            900  Converged:                               true
dof (model):                               10  dof (residuals):                          781
R²:                                     1.000  R² adjusted:                            1.000
F-statistic:                        2.66348e6  P-value:                                0.000
R² within:                              1.000  Iterations:                                 2
                    Estimate  Std. Error         t-stat  Pr(>|t|)     Lower 95%    Upper 95%
DCt: c2,t9       0.0065784    0.00362649     1.81398       0.0701  -0.000540428   0.0136972
DCt: c3,t2       0.00284666   0.00408376     0.697067      0.4860  -0.00516979    0.0108631
DCt: c3,t3       0.000240736  0.00408376     0.0589495     0.9530  -0.00777572    0.00825719
DCt: c3,t4       0.00386895   0.00408376     0.947397      0.3437  -0.0041475     0.0118854
DCt: c3,t5       3.83001e-5   0.00408376     0.00937862    0.9925  -0.00797815    0.00805475
DCt: c3,t6       0.00399468   0.00408376     0.978186      0.3283  -0.00402177    0.0120111
DCt: c3,t7       0.00344132   0.00408376     0.842682      0.3997  -0.00457514    0.0114578
DCt: c3,t8       9.00292      0.00350106  2571.48          <1e-99   8.99604       9.00979
DCt: c3,t9       9.00422      0.00411927  2185.88          <1e-99   8.99613       9.01231
DCt: untreated  -0.994795     0.00274137  -362.882         <1e-99  -1.00018      -0.989414

Regression with Cohort Interactions

m=reg(dfc, @formula(y ~ -1 + cohort*ct + fe(id)), save=:fe)
Number of obs:                              900  Converged:                                 true
dof (model):                                 24  dof (residuals):                            777
R²:                                       1.000  R² adjusted:                              1.000
F-statistic:                          1.78724e6  P-value:                                  0.000
R² within:                                1.000  Iterations:                                   1
                       Estimate    Std. Error       t-stat  Pr(>|t|)     Lower 95%     Upper 95%
cohort: 2           0.0          NaN            NaN           NaN     NaN           NaN
cohort: 3           0.0          NaN            NaN           NaN     NaN           NaN
ct: 2              -0.00163546     0.00217367    -0.752399    0.4520   -0.00590241    0.00263149
ct: 3              -0.00118574     0.00217367    -0.545504    0.5856   -0.0054527     0.00308121
ct: 4              -0.00251173     0.00217367    -1.15553     0.2482   -0.00677869    0.00175522
ct: 5              -0.000430413    0.00217367    -0.198013    0.8431   -0.00469737    0.00383654
ct: 6               0.000297268    0.00217367     0.136759    0.8913   -0.00396968    0.00456422
ct: 7               0.00122247     0.00217367     0.562402    0.5740   -0.00304448    0.00548943
ct: 8              -0.000447986    0.00217367    -0.206097    0.8368   -0.00471494    0.00381897
ct: 9              -0.000322571    0.00217367    -0.148399    0.8821   -0.00458952    0.00394438
cohort: 2 & ct: 2   0.00135633     0.00363182     0.373457    0.7089   -0.00577301    0.00848567
cohort: 3 & ct: 2   0.998127       0.0032987    302.582       <1e-99    0.991652      1.0046
cohort: 2 & ct: 3   0.00514932     0.00363182     1.41784     0.1566   -0.00198002    0.0122787
cohort: 3 & ct: 3   0.99688        0.0032987    302.204       <1e-99    0.990405      1.00336
cohort: 2 & ct: 4   0.00244789     0.00363182     0.674013    0.5005   -0.00468145    0.00957724
cohort: 3 & ct: 4   0.999541       0.0032987    303.011       <1e-99    0.993065      1.00602
cohort: 2 & ct: 5   0.00260405     0.00363182     0.717009    0.4736   -0.0045253     0.00973339
cohort: 3 & ct: 5   0.995766       0.0032987    301.866       <1e-99    0.989291      1.00224
cohort: 2 & ct: 6   0.00069941     0.00363182     0.192578    0.8473   -0.00642993    0.00782875
cohort: 3 & ct: 6   0.99904        0.0032987    302.859       <1e-99    0.992565      1.00552
cohort: 2 & ct: 7   0.00081113     0.00363182     0.22334     0.8233   -0.00631821    0.00794047
cohort: 3 & ct: 7   0.998527       0.0032987    302.703       <1e-99    0.992051      1.005
cohort: 2 & ct: 8   0.996662       0.00363182   274.425       <1e-99    0.989532      1.00379
cohort: 3 & ct: 8   9.99838        0.0032987   3031.01        <1e-99    9.99191      10.0049
cohort: 2 & ct: 9   1.00324        0.00363182   276.236       <1e-99    0.996111      1.01037
cohort: 3 & ct: 9   9.99968        0.0032987   3031.4         <1e-99    9.99321      10.0062

What to Do?

  • Understand existing methods: read reviews Chaisemartin and D’Haultfœuille (2022), Roth et al. (2023)
  • Use an appropriate package: partial list on https://asjadnaqvi.github.io/DiD/


  • Recent reviews: Roth et al. (2023), Chaisemartin and D’Haultfœuille (2022), Arkhangelsky and Imbens (2023)
  • Early work pointing to problems with fixed effects:
    • Laporte and Windmeijer (2005), Wooldridge (2005)
  • Explosion of papers written just before 2020, published just after:
    • Borusyak and Jaravel (2018)
    • Chaisemartin and D’Haultfœuille (2020)
    • Callaway and Sant’Anna (2021)
    • Goodman-Bacon (2021)
    • Sun and Abraham (2021)


