Difference in Diffferences

Paul Schrimpf

2025-11-17

Difference in Differences

\[ \def\Er{{\mathrm{E}}} \def\En{{\mathbb{E}_n}} \def\cov{{\mathrm{Cov}}} \def\var{{\mathrm{Var}}} \def\R{{\mathbb{R}}} \newcommand\norm[1]{\left\lVert#1\right\rVert} \def\rank{{\mathrm{rank}}} \newcommand{\inpr}{ \overset{p^*_{\scriptscriptstyle n}}{\longrightarrow}} \def\inprob{{\,{\buildrel p \over \rightarrow}\,}} \def\indist{\,{\buildrel d \over \rightarrow}\,} \DeclareMathOperator*{\plim}{plim} \]

Setup

Two periods, binary treatment in second period
Potential outcomes \(\{y_{it}(0),y_{it}(1)\}_{t=0}^1\) for \(i=1,...,N\)
Treatment \(D_{it} \in \{0,1\}\),
- \(D_{i0} = 0\) \(\forall i\)
- \(D_{i1} = 1\) for some, \(0\) for others
Observe \(y_{it} = y_{it}(0)(1-D_{it}) + D_{it} y_{it}(1)\)

Identification

Average treatment effect on the treated: \[ \begin{align*} ATT & = \Er[y_{i1}(1) - \color{red}{y_{i1}(0)} | D_{i1} = 1] \\ & = \Er[y_{i1}(1) - y_{i0}(0) | D_{i1} = 1] - \Er[\color{red}{y_{i1}(0)} - y_{i0}(0) | D_{i1}=1] \\ & \text{ assume } \Er[\color{red}{y_{i1}(0)} - y_{i0}(0) | D_{i1}=1] = \Er[y_{i1}(0) - y_{i0}(0) | D_{i1}=0] \\ & = \Er[y_{i1}(1) - y_{i0}(0) | D_{i1} = 1] - \Er[y_{i1}(0) - y_{i0}(0) | D_{i1}=0] \\ & = \Er[y_{i1} - y_{i0} | D_{i1}=1, D_{i0}=0] - \Er[y_{i1} - y_{i0} | D_{i1}=0, D_{i0}=0] \end{align*} \]

Important Assumptions

No anticipation: \(D_{i1}=1\) does not affect \(y_{i0}\)
- built into the potential outcomes notation we used
- relaxing would be allowing potential outcomes to depend on sequence of \(D\) – \(y_{it}(D_{i0},D_{i1})\) (would require different estimator and assumptions)
Parallel trends: \(\Er[\color{red}{y_{i1}(0)} - y_{i0}(0) |D_{i1}=1,D_{i0}=0] = \Er[y_{i1}(0) - y_{i0}(0) | D_{i1}=0], D_{i0}=0]\)
- not invariant to tranformations of \(y\)

Parallel Trends in Levels

Code

using DataFrames, Statistics, Measures
import Plots

# Parameters
n_periods = 10
group_names = ["Control", "Treatment"]
initials = [10.0, 5.0]
trend = 1

data = DataFrame(group=String[], time=Int[], y=Float64[])
for (g, initial) in enumerate(initials)
    y = initial
    for t in 1:n_periods
        push!(data, (group_names[g], t, y))
        y = initial + trend * t
    end
end

data.log_y = log.(data.y)

# Compute growth rates (difference in logs)
data.growth_rate = vcat(missing, diff(data.log_y))
sameid = vcat(true, data.group[1:(end-1)] .== data.group[2:end])
data.growth_rate[.!sameid] .= missing

# Plot growth rates
function plot_trends(data)
    plt1 = Plots.plot(title="Levels", xlabel="Time", ylabel="y");
    for g in group_names
        idx = data.group .== g
        Plots.plot!(plt1, data.time[idx], data.y[idx], label=g, marker=:circle);
    end

    plt2 = Plots.plot(title="Logs", xlabel="Time", ylabel="log(y)");
    for g in group_names
        idx = data.group .== g
        Plots.plot!(plt2, data.time[idx], data.log_y[idx], label=g, marker=:circle);
    end

    plt3 = Plots.plot(title="Growth Rates", xlabel="Time", ylabel="Δlog(y)");
    for g in group_names
        idx = (data.group .== g) .& (.!ismissing.(data.growth_rate))
        Plots.plot!(plt3, data.time[idx], data.growth_rate[idx], label=g, marker=:circle);
        if -(-)(extrema(skipmissing(data.growth_rate))...) < 1e-4
            Plots.ylims!(plt3, mean(skipmissing(data.growth_rate)) .+ (-0.01, 0.01))
        end
    end

    Plots.plot(plt1, plt2, plt3, layout=(1,3), size=(900,300), margin=5mm)
end
plot_trends(data)

Parallel Trends in Logs

Code

using DataFrames
import Plots

# Parameters
n_periods = 10
group_names = ["Control", "Treatment"]
initials = [5.0, 6.0]
trend = -0.1

data = DataFrame(group=String[], time=Int[], log_y=Float64[])
for (g, initial) in enumerate(initials)
    log_y = initial
    for t in 1:n_periods
        push!(data, (group_names[g], t, log_y))
        log_y = initial + t*trend
    end
end

data.y = exp.(data.log_y)

# Compute growth rates (difference in logs)
data.growth_rate = vcat(missing, diff(data.log_y))
sameid = vcat(true, data.group[1:(end-1)] .== data.group[2:end])
data.growth_rate[.!sameid] .= missing

plot_trends(data)

Parallel Trends in Growth Rates

Code

using DataFrames
import Plots

# Parameters
n_periods = 10
group_names = ["Control", "Treatment"]
initials = [20.0, 19.0]
growth = [1.10, 1.07]
growth_trend = -0.004

data = DataFrame(group=String[], time=Int[], y=Float64[])
for (g, initial) in enumerate(initials)
    y = initial
    for t in 0:n_periods-1
        push!(data, (group_names[g], t, y))
        y *= growth[g] + growth_trend * t
    end
end

data.log_y = log.(data.y)

# Compute growth rates (difference in logs)
data.growth_rate = vcat(missing, diff(data.log_y))
sameid = vcat(true, data.group[1:(end-1)] .== data.group[2:end])
data.growth_rate[.!sameid] .= missing

plot_trends(data)

Estimation

Plugin: \[ \widehat{ATT} = \frac{ \sum_{i=1}^n (y_{i1} - y_{i0})D_{i1}(1-D_{i0})}{\sum_{i=1}^n D_{i1}(1-D_{i0})} - \frac{ \sum_{i=1}^n (y_{i1} - y_{i0})(1-D_{i1})(1-D_{i0})}{\sum_{i=1}^n (1-D_{i1})(1-D_{i0})} \]
Regression: \[ y_{it} = \delta_t + \alpha 1\{D_{i1}=1\} + \beta D_{it} + \epsilon_{it} \] then \(\hat{\beta} = \widehat{ATT}\)
Fixed effects: \[ y_{it} = \delta_t + \alpha_i + \beta D_{it} + u_{it} \] then \(\hat{\beta} = \widehat{ATT}\)

Multiple Periods

Identification

Same logic as before, \[ \begin{align*} ATT_{t,t-s} & = \Er[y_{it}(1) - \color{red}{y_{it}(0)} | D_{it} = 1, D_{it-s}=0] \\ & = \Er[y_{it}(1) - y_{it-s}(0) | D_{it} = 1, D_{it-s}=0] - \\ & \;\; - \Er[\color{red}{y_{it}(0)} - y_{t-s}(0) | D_{it}=1, D_{it-s}=0] \end{align*} \]
- assume \(\Er[\color{red}{y_{it}(0)} - y_{it-s}(0) | D_{it}=1, D_{it-s}=0] = \Er[y_{it}(0) - y_{it-s}(0) | D_{it}=0, D_{it-s}=0]\)

\[ \begin{align*} ATT_{t,t-s}& = \Er[y_{it} - y_{it-s} | D_{it}=1, D_{it-s}=0] - \Er[y_{it} - y_{it-s} | D_{it}=0, D_{it-s}=0] \end{align*} \] - Similarly, can identify various other interpretable average treatment effects conditional on being treated at some times and not others

Estimation

Plugin
Fixed effects? \[ y_{it} = \beta D_{it} + \alpha_i + \delta_t + \epsilon_{it} \] When will \(\hat{\beta}^{FE}\) consistently estimate some interpretable conditional average of treatment effects?

Fixed Effects

As on problem set 6, \[ \begin{align*} \hat{\beta} = & \sum_{i=1,t=1}^{n,T} y_{it} \overbrace{\frac{\tilde{D}_{it}}{ \sum_{i,t} \tilde{D}_{it}^2 }}^{\hat{\omega}_{it}(D_it)} \\ = & \sum_{i=1,t=1}^{n,T} y_{it}(0) \hat{\omega}_{it}(D_it) + \sum_{i=1,t=1}^{n,T} D_{it} (y_{it}(1) - y_{it}(0)) \hat{\omega}_{it}(D_it) \end{align*} \] where \[ \begin{align*} \tilde{D}_{it} & = D_{it} - \frac{1}{n} \sum_{j=1}^n (D_{jt} - \frac{1}{T} \sum_{s=1}^T D_{js}) - \frac{1}{T} \sum_{s=1}^T D_{is} \\ & = D_{it} - \frac{1}{n} \sum_{j=1}^n D_{jt} - \frac{1}{T} \sum_{s=1}^T D_{is} + \frac{1}{nT} \sum_{j,s} D_{js} \end{align*} \]

Simulation

\(T\) periods
Once \(i\) treated, remains treated

Weights

Code

using Statistics
function assigntreat(n,T;portiontreated=vcat(zeros(T ÷ 2), 0.5, zeros(T - (T ÷ 2) - 1)))
  treated = falses(n,T)
  for t=2:T
    treated[:,t] = treated[:,t-1]
    if (portiontreated[t]>0)
      treated[:,t] = (treated[:,t] .|| rand(n) .< portiontreated[t])
    end
  end
  return(treated)
end

function weights(D)
  D̃ = D .- mean(D,dims=1) .- mean(D,dims=2) .+ mean(D)
  ω = D̃./sum(D̃.^2)
end

n = 500
T = 9
D = assigntreat(n,T)
y = randn(n,T)
sum(y.*weights(D))

-0.079969454754528

Code

n,T = size(D)
using DataFrames, FixedEffectModels, RegressionTables
df = DataFrame(id = vec((1:n)*ones(Int,T)'), t = vec(ones(Int,n)*(1:T)'), y = vec(y), D=vec(D))
m=reg(df, @formula(y ~ D + fe(t) + fe(id)))
regtable(m, render=AsciiTable())


--------------------------
                      y   
--------------------------
D                   -0.080
                   (0.060)
--------------------------
t Fixed Effects        Yes
id Fixed Effects       Yes
--------------------------
N                    4,500
R2                   0.106
Within-R2            0.000
--------------------------

Portion Treated with Single Treatment Time

Code

using PlotlyLight

function plotp(D; width=900, height=300)
  n,T=size(D)
  plt=Plot()
  plt.layout=Config(xaxis=Config(title=Config(text="time"),tickvals=1:T),
                    yaxis=Config(title=Config(text="Portion Treated")),
                    autosize=false,
                    width=width,
                    height=height)
  plt(x=1:T,y=vec(mean(D,dims=1)))
  plt()
end

pfig = plotp(D)
pfig

Weights with Single Treatment Time

Code

import StatsPlots
function plotweights(D; width=900, height=300)
    n,T = size(D)
    ω = weights(D)
    groups = unique(eachrow(D))
    plt = Plot()
    plt.layout=Config(xaxis=Config(title=Config(text="time"),tickvals=1:T),
                      yaxis=Config(title=Config(text="weight")),
                      autosize=false,
                      width=width,
                      height=height)

    for g in groups
        i = findfirst(d == g for d in eachrow(D))
        wt = ω[i,:]
        plt(x=1:T,y=wt,name="Treated $(sum(g)) times", mode="markers",type="scatter")
    end
    fig=plt()

    evertreated = any(D, dims=2)*ones(Int,1,size(D,2)).==1
    aftertreatment = ones(Int,size(D,1))*any(D,dims=1).==1
    g=[(et ? "Treated," : "Control,") * (at ? "After" : "Before") for (et,at) in zip(evertreated, aftertreatment)]
    histo = StatsPlots.groupedhist(vec(ω),group=vec(g),
                                   title="Histogram of Weights",
                                   xlabel="Weight", ylabel="Frequency",
                                   bins=min(length(unique(ω)),20),
                                   size=(900,300), margin=5mm)
    Plots.vline!(histo, [0.0], line=(:black, :dash),label=:none, width=4)

    return(fig, histo)
end
fig,histo = plotweights(D)
fig

Weight Distribution with Single Treatment Time

Code

histo

Portion Treated with Uniform Treatment Time

Code

D = assigntreat(n,T,portiontreated=vcat(0,fill(0.5/(T-1),T-1)))
pfig = plotp(D)
pfig

Weights with Uniform Treatment Time

Code

fig,histo = plotweights(D)
fig

Distribution of Weights with Uniform Treatment Time

Code

histo

Portion Treated with Early and Late Treated

Code

pt = zeros(T)
pt[2] = 1/3
pt[end-1]=1/3
D = assigntreat(n,T,portiontreated=pt)
pfig = plotp(D)
pfig

Weights with Early and Late Treated

Code

fig,histo = plotweights(D)
fig

Distribution of Weights with Early and Late Treated

Code

histo

Sign Reversal with Fixed Effects

True Treatment Effects

Code

pt = zeros(T)
pt[2] = 1/3
pt[end-1]=1/3

function simulate(n,T,portiontreated, ATT, σ=0.25)
  D = assigntreat(n,T,portiontreated=portiontreated)
  y = randn(n,T)*σ
  for i in axes(y)[1]
    timetreated=cumsum(D[i,:])
    y[i,:] .+= (tt>0 ? ATT[tt] : 0.0 for tt in timetreated)
  end
  DataFrame(id = vec((1:n)*ones(Int,T)'), t = vec(ones(Int,n)*(1:T)'), y = vec(y), D=vec(D))
end

ATT =  vcat(ones(T-3),10*ones(3))
df = simulate(n,T, pt,ATT,1.0)

function plotGAT(ATT,D; width=900, height=300)
  n,T = size(D)
  groups = unique(eachrow(D))
  plt = Plot()
  plt.layout=Config(xaxis=Config(title="time",tickvals=1:T),
                    yaxis=Config(title="ATT"),
                    autosize=false,
                    width=width,
                    height=height)

  for g in groups
    t = findfirst(g)
    if (isnothing(t))
      t=T+1
    end
    ate = vcat(zeros(t-1), ATT[1:(T-t+1)])
    plt(x=1:T,y=ate,name="Treated $(sum(g)) times", mode="markers",type="scatter")
    end
  fig=plt()
  return(fig)
end

plotGAT(ATT,D)

Sign Reversal with Fixed Effects

Fixed Effects Estimate

m=reg(df, @formula(y ~ D + fe(t) + fe(id)))
regtable(m, render=AsciiTable())


----------------------------
                       y    
----------------------------
D                  -0.534***
                     (0.142)
----------------------------
t Fixed Effects          Yes
id Fixed Effects         Yes
----------------------------
N                      4,500
R2                     0.488
Within-R2              0.004
----------------------------

When to worry

If multiple treatment times and treatment heterogeneity
Even if weights do not have wrong sign, the fixed effects estimate is hard to interpret
Same logic applies more generally – not just to time
- E.g. if have group effects, some treated units in multiple groups, and \(E[y(1) - y(0) | group]\) varies

What to Do?

Plug-in Estimator

Follow identification \[ \begin{align*} ATT_{t,t-s}& = \Er[y_{it} - y_{it-s} | D_{it}=1, D_{it-s}=0] - \Er[y_{it} - y_{it-s} | D_{it}=0, D_{it-s}=0] \end{align*} \] and estimate \[ \begin{align*} \widehat{ATT}_{t,t-s} = & \frac{\sum_i y_{it} D_{it}(1-D_{it-s})}{\sum_i D_{it}(1-D_{it-s})} \\ & - \frac{\sum_i y_{it} (1-D_{it})(1-D_{it-s})}{\sum_i (1-D_{it})(1-D_{it-s})} \end{align*} \] and perhaps some average, e.g. (there are other reasonable weighted averages) \[ \sum_{t=1}^T \frac{\sum_i D_{it}}{\sum_{i,s} D_{i,s}} \frac{1}{t-1} \sum_{s=1}^{t-1} \widehat{ATT}_{t,t-s} \]
- Inference? Optimal?

What to Do? Flexibly Model Conditional Expectation

Problem is possible correlation of \((y_{it}(1) - y_{it}(0))D_{it}\) with \(\tilde{D}_{it}\)
- \(\tilde{D}_{it}\) is function of \(t\) and \((D_{i1}, ..., D_{iT})\)
- Estimating separate coefficient for each combination of \(t\) and \((D_{i1}, ..., D_{iT})\) will eliminate correlation / flexibly model treatment effect heterogeneity

Cohorts

Cohorts = unique sequences of \((D_{i1}, ..., D_{iT})\)
- In last simulated example, three cohorts
  1. \((0, 0, 0, 0, 0, 0, 0, 0, 0)\)
  2. \((0, 0, 0, 0, 0, 0, 0, 1, 1)\)
  3. \((0, 1, 1, 1, 1, 1, 1, 1, 1)\)
Flexible conditional expectation \[ \Er[y_{it}|t, D_{i1}=d_1, ..., D_{iT}=d_T] = \sum{c, s} \beta_{c,s} \mathbbm{1}\{cohort(i)=c,s=t\} \]

Code

using CategoricalArrays

function createCohorts(df)
  n = length(unique(df.id))
  T = length(unique(df.t))
  sorted = sort(df, [:id, :t])
  D = reshape(sorted.D, T,n)'
  groups = sort(unique(eachrow(D)))
  cohorts = [findfirst(d == g for g in groups) for d in eachrow(D)]
  df=leftjoin(sorted, DataFrame(cohort=categorical(cohorts), id=unique(sorted.id)), on=:id)
  df.DCt .= "untreated"
  for r in 1:T
    for c in unique(df.cohort)
      dct = (df.t .== r) .& (df.cohort .== c) .& df.D
      if (any(dct))
        df.DCt[dct] .= "c$(c),t$(r)"
        df[!,"Dc$(c)t$(r)"] .= false
        df[!,"Dc$(c)t$(r)"][dct] .= true
      end
    end
  end
  df.ct = categorical(df.t)
  df
end

dfc = createCohorts(df);

Regression with Cohort Interactions

Code

m=reg(dfc, @formula(y ~ -1 + cohort&ct), Vcov.cluster(:id))
regtable(m, render=AsciiTable())


-----------------------------
                        y    
-----------------------------
cohort: 1 & ct: 1      -0.080
                      (0.073)
cohort: 2 & ct: 1      -0.048
                      (0.102)
cohort: 3 & ct: 1       0.045
                      (0.077)
cohort: 1 & ct: 2      -0.054
                      (0.063)
cohort: 2 & ct: 2      -0.086
                      (0.092)
cohort: 3 & ct: 2    0.980***
                      (0.080)
cohort: 1 & ct: 3       0.041
                      (0.075)
cohort: 2 & ct: 3     -0.224*
                      (0.090)
cohort: 3 & ct: 3    0.922***
                      (0.074)
cohort: 1 & ct: 4       0.047
                      (0.069)
cohort: 2 & ct: 4      -0.058
                      (0.093)
cohort: 3 & ct: 4    0.949***
                      (0.074)
cohort: 1 & ct: 5       0.026
                      (0.068)
cohort: 2 & ct: 5       0.102
                      (0.109)
cohort: 3 & ct: 5    0.995***
                      (0.085)
cohort: 1 & ct: 6       0.085
                      (0.067)
cohort: 2 & ct: 6       0.107
                      (0.091)
cohort: 3 & ct: 6    1.066***
                      (0.083)
cohort: 1 & ct: 7       0.016
                      (0.068)
cohort: 2 & ct: 7       0.002
                      (0.094)
cohort: 3 & ct: 7    0.957***
                      (0.074)
cohort: 1 & ct: 8      -0.042
                      (0.066)
cohort: 2 & ct: 8    0.990***
                      (0.096)
cohort: 3 & ct: 8   10.107***
                      (0.077)
cohort: 1 & ct: 9       0.032
                      (0.067)
cohort: 2 & ct: 9    0.898***
                      (0.088)
cohort: 3 & ct: 9    9.914***
                      (0.084)
-----------------------------
N                       4,500
R2                      0.882
-----------------------------

Regression with Cohort Interactions: \(\hat{\Er}[y|\mathrm{cohort}, t]\)

Code

using Plots, LinearAlgebra
function plotEy(m)
    rms = match.(r"cohort: (\d+) & ct: (\d+)", coefnames(m))
    ct = [parse.(Int, r.captures) for r in rms]
    cohort = [c[1] for c in ct]
    time = [c[2] for c in ct]
    ey = coef(m)
    se = sqrt.(diag(vcov(m)))
    fig=Plots.plot(time, ey, ribbon=1.96*se, group=cohort, xlabel="time", ylabel="E[y|cohort,t]", legend=:topleft)
end
plotEy(m)

Differences in which Difference?

Time:
- For cohort 3, only one untreated period, so only possible difference across time is \(t\) versus \(1\)
- For cohort 2, many differences across time possible because many untreated periods
- Typical to report differences between \(t\) and last period before treatment in “event study” figure
Groups:
- For ATT for cohort 3 at time 2, should cohort 1 or cohort 2 or both be used a controls?

Event Study

Code

function ploteventstudy(m,dfc)
    rms = match.(r"cohort: (\d+) & ct: (\d+)", coefnames(m))
    ct = [parse.(Int, r.captures) for r in rms]
    cohort = [c[1] for c in ct]
    time = [c[2] for c in ct]
    ey = coef(m)
    ttdf = combine(groupby(dfc,:cohort),[:t,:D] => ((t,d)->(any(d) ? minimum(t[d]) : missing)) => :treattime)
    ttdict = Dict(ttdf.cohort .=> ttdf.treattime)
    treattime= [ttdict[c] for c in cohort]
    rtt = time .- treattime
    controlgroup = cohort[ismissing.(treattime)][1]
    ATT = zeros(length(ey))
    se = zeros(length(ey))
    for (i,(c,t)) in enumerate(zip(cohort, time))
        if c == controlgroup
            continue
        end
        baset = ttdict[c] - 1
        DiD = 1*((cohort .== c) .& (time .== t)) - 1*((cohort .== controlgroup) .& (time .== t)) -
            1*((cohort .== c) .& (time .== baset)) + 1*((cohort .== controlgroup) .& (time .== baset))
        ATT[i] = DiD'*ey
        se[i] = sqrt(DiD'*vcov(m)*DiD)
    end
    rtt = rtt[cohort.!=controlgroup]
    se = se[cohort.!=controlgroup]
    ATT = ATT[cohort.!=controlgroup]
    fig=Plots.plot(rtt, ATT, ribbon=1.96*se, xlabel="time relative to treatment", ylabel="E[y_t-y_{-1}|cohort]-E[y_t-y_{-1}|nevertreated] ", group=cohort[cohort.!=controlgroup], legend=true)
end
ploteventstudy(m,dfc)

Regression with Cohort Interactions

Code

m=reg(dfc, @formula(y ~ DCt + fe(id) + fe(t)), Vcov.cluster(:id), contrasts=Dict(:DCt=>DummyCoding(base="untreated")))
regtable(m, render=AsciiTable())


----------------------------
                       y    
----------------------------
DCt: c2,t8          1.073***
                     (0.121)
DCt: c2,t9          0.906***
                     (0.115)
DCt: c3,t2          0.931***
                     (0.131)
DCt: c3,t3          0.860***
                     (0.136)
DCt: c3,t4          0.826***
                     (0.133)
DCt: c3,t5          0.828***
                     (0.137)
DCt: c3,t6          0.860***
                     (0.140)
DCt: c3,t7          0.832***
                     (0.139)
DCt: c3,t8         10.050***
                     (0.141)
DCt: c3,t9          9.783***
                     (0.141)
----------------------------
id Fixed Effects         Yes
t Fixed Effects          Yes
----------------------------
N                      4,500
R2                     0.881
Within-R2              0.770
----------------------------

What to Do?

Understand existing methods: read reviews Clément de Chaisemartin and D’Haultfœuille (2022), Roth et al. (2023), C. de Chaisemartin and D’Haultfœuille (2023), Wing et al. (2024)
Use an appropriate package:
- partial list on https://asjadnaqvi.github.io/DiD/

Pre-trends

Parallel trends assumption

\[ \Er[\color{red}{y_{it}(0)} - y_{it-s}(0) | D_{it}=1, D_{it-s}=0] = \Er[y_{it}(0) - y_{it-s}(0) | D_{it}=0, D_{it-s}=0] \]

More plausible if there are parallel pre-trends

\[ \begin{align*} & \Er[y_{it-r}(0) - y_{it-s}(0) | D_{it}=1, D_{it-r}=0, D_{it-s}=0] = \\ & = \Er[y_{it-r}(0) - y_{it-s}(0) | D_{it}=0, D_{it-r}=0, D_{it-s}=0] \end{align*} \]

Always at least plot pre-trends

Testing for Pre-trends

Is it a good idea to test

\[ \begin{align*} H_0 : & \Er[y_{it-r} - y_{it-s} | D_{it}=1, D_{it-r}=0, D_{it-s}=0] = \\ & = \Er[y_{it-r} - y_{it-s} | D_{it}=0, D_{it-r}=0, D_{it-s}=0]? \end{align*} \] - Even if not testing formally, we do it informally by plotting

Testing for Pre-trends

Assume: \((y_{i1}, ..., y_{iT}, D_{i1},..., D_{iT})\) i.i.d. across \(i\) with finite second moments
Let \[ \tau^{1t}_{r,s} = \Er[y_{ir}|D_{it}=1, D_{ir}=0, D_{is}=0] \] \[ \tau^{0t}_{r,s} = \Er[y_{ir}|D_{it}=0, D_{ir}=0, D_{is}=0] \]
Plugin estimators \[ \hat{\tau}^{1t}_{r,s} = \frac{\sum_i y_{ir} D_{it}(1-D_{ir})(1-D_{is})} {\sum_i D_{it}(1-D_{ir})(1-D_{is})} \] \[ \hat{\tau}^{0t}_{r,s} = \frac{\sum_i y_{ir} (1-D_{it})(1-D_{ir})(1-D_{is})} {\sum_i (1-D_{it})(1-D_{ir})(1-D_{is})} \]

Testing for Pre-trends

Note: \[ \begin{align*} \hat{\tau}^{1t}_{r,s} - \tau^{1t}_{r,s} = & \frac{\frac{1}{n}\sum_i y_{ir} D_{it}(1-D_{ir})(1-D_{is}) - \Er[y_{ir} D_{it}(1-D_{ir})(1-D_{is})]}{\Er[D_{it}(1-D_{ir})(1-D_{is})]} + \\ & + \frac{\Er[y_{it} D_{it}(1-D_{ir})(1-D_{is})]}{\Er[D_{it}(1-D_{ir})(1-D_{is})]^2} \left(\frac{1}{n} \sum_i D_{it}(1-D_{ir})(1-D_{is}) - \Er[D_{it}(1-D_{ir})(1-D_{is})] \right) \end{align*} \] and similar expression for \(\hat{\tau}^{0t}_{r,s}\)
Use CLT to get joint asymptotic distribution of pre-trends and \(ATT_{t,t-s}\)

Testing for Pre-trends

E.g. \(T=3\), \(D_{i3}=1\) for some \(i\), other \(D_{it}=0\) \[ \begin{align*} \sqrt{n} \left[ \begin{pmatrix} \hat{\tau}^{13}_{2,1} - \hat{\tau}^{03}_{2,1} \\ \underbrace{\hat{\tau}^{13}_{3,2} - \hat{\tau}^{03}_{3,2}}_{\widehat{ATT}_{3,2}} \end{pmatrix} - \begin{pmatrix} \tau^{13}_{2,1} - \tau^{03}_{2,1} \\ ATT_{3,2} \end{pmatrix} \right] \indist N(0, \Sigma) \end{align*} \]
Distribution of \(\widehat{ATT}_{3,2}\) conditional on fail to reject \(\tau^{13}_{2,1} - \tau^{03}_{2,1} = 0\) is a truncated normal
Roth (2022) : test can have low power, and in plausible violations, \(\widehat{ATT}_{3,2}\) conditional on failing to reject is biased

Bounds from Pre-trends

Let \(\Delta\) be violation of parallel trends \[ \Delta = \Er[\color{red}{y_{it}(0)} - y_{it-1}(0) | D_{it}=1, D_{it-1}=0] - \Er[y_{it}(0) - y_{it-1}(0) | D_{it}=0, D_{it-1}=0] \]
Assume \(\Delta\) is bounded by deviation from parallel of pre-trends \[ |\Delta| \leq M \max_{r} \left\vert \tau^{1t}_{t-r,t-r-1} - \tau^{0t}_{t-r,t-r-1} \right\vert \] for some chosen \(M\)
See Rambachan and Roth (2023)

Reading

Book: C. de Chaisemartin and D’Haultfœuille (2023)
Recent reviews: Roth et al. (2023), Clément de Chaisemartin and D’Haultfœuille (2022), Arkhangelsky and Imbens (2023), Wing et al. (2024)
Early work pointing to problems with fixed effects:
- Laporte and Windmeijer (2005), Wooldridge (2005)
Explosion of papers written just before 2020, published just after:
- Borusyak and Jaravel (2018)
- Clément de Chaisemartin and D’Haultfœuille (2020)
- Callaway and Sant’Anna (2021)
- Goodman-Bacon (2021)
- Sun and Abraham (2021)

References

Arkhangelsky, Dmitry, and Guido Imbens. 2023. “Causal Models for Longitudinal and Panel Data: A Survey.”

Borusyak, Kirill, and Xavier Jaravel. 2018. “Revisiting Event Study Designs.” https://scholar.harvard.edu/files/borusyak/files/borusyak_jaravel_event_studies.pdf.

Callaway, Brantly, and Pedro H. C. Sant’Anna. 2021. “Difference-in-Differences with Multiple Time Periods.” Journal of Econometrics 225 (2): 200–230. https://doi.org/https://doi.org/10.1016/j.jeconom.2020.12.001.

Chaisemartin, C de, and X D’Haultfœuille. 2023. Credible Answers to Hard Questions: Differences-in-Differences for Natural Experiments. https://dx.doi.org/10.2139/ssrn.4487202.

Chaisemartin, Clément de, and Xavier D’Haultfœuille. 2020. “Two-Way Fixed Effects Estimators with Heterogeneous Treatment Effects.” American Economic Review 110 (9): 2964–96. https://doi.org/10.1257/aer.20181169.

———. 2022. “Two-way fixed effects and differences-in-differences with heterogeneous treatment effects: a survey.” The Econometrics Journal 26 (3): C1–30. https://doi.org/10.1093/ectj/utac017.

Goodman-Bacon, Andrew. 2021. “Difference-in-Differences with Variation in Treatment Timing.” Journal of Econometrics 225 (2): 254–77. https://doi.org/https://doi.org/10.1016/j.jeconom.2021.03.014.

Laporte, Audrey, and Frank Windmeijer. 2005. “Estimation of Panel Data Models with Binary Indicators When Treatment Effects Are Not Constant over Time.” Economics Letters 88 (3): 389–96. https://doi.org/https://doi.org/10.1016/j.econlet.2005.04.002.

Rambachan, Ashesh, and Jonathan Roth. 2023. “A More Credible Approach to Parallel Trends.” The Review of Economic Studies 90 (5): 2555–91. https://doi.org/10.1093/restud/rdad018.

Roth, Jonathan. 2022. “Pretest with Caution: Event-Study Estimates After Testing for Parallel Trends.” American Economic Review: Insights 4 (3): 305–22. https://doi.org/10.1257/aeri.20210236.

Roth, Jonathan, Pedro H. C. Sant’Anna, Alyssa Bilinski, and John Poe. 2023. “What’s Trending in Difference-in-Differences? A Synthesis of the Recent Econometrics Literature.” Journal of Econometrics 235 (2): 2218–44. https://doi.org/https://doi.org/10.1016/j.jeconom.2023.03.008.

Sun, Liyang, and Sarah Abraham. 2021. “Estimating Dynamic Treatment Effects in Event Studies with Heterogeneous Treatment Effects.” Journal of Econometrics 225 (2): 175–99. https://doi.org/https://doi.org/10.1016/j.jeconom.2020.09.006.

Wing, Coady, Madeline Yozwiak, Alex Hollingsworth, Seth Freedman, and Kosali Simon. 2024. “Designing Difference-in-Difference Studies with Staggered Treatment Adoption: Key Concepts and Practical Guidelines.” Journal Article. Annual Review of Public Health 45 (Volume 45, 2024): 485–505. https://doi.org/https://doi.org/10.1146/annurev-publhealth-061022-050825.

Wooldridge, Jeffrey M. 2005. “Fixed-Effects and Related Estimators for Correlated Random-Coefficient and Treatment-Effect Panel Data Models.” The Review of Economics and Statistics 87 (2): 385–90. https://doi.org/10.1162/0034653053970320.