Instrumental Variables Estimation

Paul Schrimpf

2025-11-24

Reading

  • Required: Song (2021) chapter 12

\[ \def\Er{{\mathrm{E}}} \def\En{{\mathbb{E}_n}} \def\cov{{\mathrm{Cov}}} \def\var{{\mathrm{Var}}} \def\R{{\mathbb{R}}} \def\indep{{\perp\!\!\!\perp}} \newcommand\norm[1]{\left\lVert#1\right\rVert} \def\rank{{\mathrm{rank}}} \newcommand{\inpr}{ \overset{p^*_{\scriptscriptstyle n}}{\longrightarrow}} \def\inprob{{\,{\buildrel p \over \rightarrow}\,}} \def\indist{\,{\buildrel d \over \rightarrow}\,} \DeclareMathOperator*{\plim}{plim} \]

Instrumental Variables

Model

\[ Y_i = \underbrace{X_i}_{\in \R^k}' \beta_0 + u_i \]

  • \(\Er[u_i] = 0\), but \(\Er[X_i u_i] \neq 0\)

  • Instrument \(Z_i \in \R^d\) s.t.

    1. Relevant \(rank(\Er[Z_i X_i']) = k\)

    2. Exogenous \(\Er[Z_i u_i] = 0\)

Identification

  • Exogeneity implies \[ \Er[Z_i Y_i] = \Er[Z_i X_i']\beta_0 \]
  • If \(d=k\) (exactly identified), then relevance implies \(\Er[Z_i X_i']\) invertible, so \[ \beta_0 = \Er[Z_i X_i']^{-1} \Er[Z_i Y_i] \]
  • For \(d>k\), relevance implies \(\Er[Z_iX_i']'\Er[Z_iX_i']\) invertible, so \[ \beta_0 = (\Er[Z_i X_i]' \Er[Z_i X_i'])^{-1} \Er[Z_i X_i']' \Er[Z_i Y_i] \]

Estimation

Method of Moments Estimation

  • We assume \(\Er[Z_i u_i] = 0\), so \[ \Er[Z_i(Y_i - X_i'\beta_0)] = 0 \]
  • Estimate by replacing \(\Er\) with \(\frac{1}{n}\sum_{i=1}^n\)

Method of Moments Estimation

  • \(d\) equations, \(k \geq d\) unknowns, so find \[ \frac{1}{n} \sum_{i=1}^n Z_i(Y_i - X_i'\hat{\beta}^{IV}) \approx 0 \] by solving \[ \begin{align*} \hat{\beta}^{IV} & = \mathrm{arg}\min_\beta \norm{ \frac{1}{n} \sum_{i=1}^n Z_i(Y_i - X_i'\beta) }_{W}^2 \\ & = \mathrm{arg}\min_\beta \left( \frac{1}{n} \sum_{i=1}^n Z_i(Y_i - X_i'\beta) \right)' W \left( \frac{1}{n} \sum_{i=1}^n Z_i(Y_i - X_i'\beta)\right) \end{align*} \]

Method of Moments Estimation

\[ \hat{\beta}^{IV} = \mathrm{arg}\min_\beta \left( \frac{1}{n} \sum_{i=1}^n Z_i(Y_i - X_i'\beta\right)' W \left( \frac{1}{n} \sum_{i=1}^n Z_i(Y_i - X_i'\beta\right) \]

  • \(\hat{\beta}^{IV}_W = (X'Z W Z'W)^{-1}(X'Z W Z'y)\)

Asymptotic Properties

Consistency

\[ \begin{align*} \hat{\beta}^{IV}_W - \beta_0 = & (X'Z W Z'W)^{-1}(X'Z W Z'u) \\ = & \left[ \left(\frac{1}{n}\sum_{i=1}^n X_i Z_i'\right) W \left(\frac{1}{n}\sum_{i=1}^n Z_i X_i'\right) \right]^{-1} \left(\frac{1}{n}\sum_{i=1}^n X_i Z_i'\right) W \left(\frac{1}{n}\sum_{i=1}^n Z_i u_i\right) \end{align*} \]

  • Consistent if LLN applies to \(\frac{1}{n}\sum_{i=1}^n Z_i X_i'\) and \(\frac{1}{n}\sum_{i=1}^n Z_i u_i\)
    • E.g. if i.i.d. with \(\Er[\norm{X_i}^4]\) and \(\Er[\norm{Z_i}^4]\) finite and \(\Er[u_i^2|Z_i=z] = \sigma^2\) 1

Asymptotic Normality

\[ \begin{align*} \hat{\beta}^{IV}_W - \beta_0 = & (X'Z W Z'W)^{-1}(X'Z W Z'u) \\ = & \left[ \left(\frac{1}{n}\sum_{i=1}^n X_i Z_i'\right) W \left(\frac{1}{n}\sum_{i=1}^n Z_i X_i'\right) \right]^{-1} \left(\frac{1}{n}\sum_{i=1}^n X_i Z_i'\right) W \left(\frac{1}{n}\sum_{i=1}^n Z_i u_i\right) \end{align*} \]

  • \(\sqrt{n}(\hat{\beta}^{IV} - \beta_0) \indist N(0, V)\) if LLN applies to \(\frac{1}{n}\sum_{i=1}^n Z_i X_i'\) and CLT to \(\frac{1}{\sqrt{n}}\sum_{i=1}^n Z_i u_i\)
    • E.g. if i.i.d. with \(\Er[\norm{X_i}^4]\) and \(\Er[\norm{Z_i}^4]\) finite and \(\Er[u_i^2|Z_i=z] = \sigma^2\)
    • then \(\frac{1}{\sqrt{n}} \sum Z_i u_i \indist N(0, \sigma^2 \Er[Z_iZ_i'])\)
    • \(V = \sigma^2 (\Er[Z_iX_i']' W \Er[Z_iX_i'])^{-1} (\Er[Z_iX_i']' W \Er[Z_i Z_i'] W \Er[Z_i X_i']) (\Er[Z_iX_i']' W \Er[Z_iX_i'])^{-1}\)

Optimal \(W\)

Theorem 2.1

\(W^* = \Er[Z_iZ_i']^{-1}\) minimizes the asymptotic variance of \(\hat{\beta}^{IV}_W\)

  • Estimate \(\hat{W}^* = \left(\frac{1}{n} Z'Z\right)^{-1}\) \[ \hat{\beta}^{IV} = (X'Z (Z'Z)^{-1} Z' X)^{-1} (X'Z(Z'Z)^{-1}Z'y) \]

Two Stage Least Squares

\[ \begin{align*} \hat{\beta}^{IV} & = (X'Z (Z'Z)^{-1} Z' X)^{-1} (X'Z(Z'Z)^{-1}Z'y) \\ & = (X'P_Z X)^{-1} (X' P_Z y) \\ & = ((P_Z X)'(P_Z X))^{-1} ((P_Z X)'y) \end{align*} \]

  1. Regress \(X\) on \(Z\), let \(\hat{X} = P_Z X\)
  2. Regress \(y\) on \(\hat{X}\)

Testing Overidentifying Restrictions

  • \(H_0: \Er[Z_i(Y_i - X_i'\beta_0)] = 0\)
  • \(k=d\), have \(\En[Z_i(Y_i - X_i'\hat{\beta}^{IV})] = 0\) exactly, and \(H_0\) is untestable
  • \(k>d\), can test
  • Test statistic \[ J = n \left(\frac{1}{n} Z'(y-X\hat{\beta}^{IV}) \right)' \hat{C} \left(\frac{1}{n} Z'(y-X\hat{\beta}^{IV}) \right) \]

Testing Overidentifying Restrictions

Theorem 2.3

Let \(\hat{C} = \left(\frac{1}{n} \sum_{i=1}^n Z_iZ_i' \hat{u}_i^2\right)^{-1}\). Assume:

  1. \(\Er[ \norm{X_i}^4] + \Er[\norm{Z_i}^4] < \infty\)

  2. \(\Er[u|Z] = \sigma^2\)

  3. \(\Er[Z_i Z_i']\) is positive definite

Then, \[ J \indist \chi^2_{d-k} \]

Over-identifying Test

  • Only has power when instruments have different covariances with \(u\)
Code
using Distributions, LinearAlgebra
import PlotlyLight
function sim(n; d=3, EZu = zeros(d), Exu = 0.5, beta = 1, gamma = ones(d))
  zu = randn(n,d)
  Z = randn(n,d) + mapslices(x->x.*EZu, zu, dims=2)
  xu = randn(n)
  X = Z*gamma + xu*Exu
  u = vec(sum(zu,dims=2) + xu + randn(n))
  y = X*beta + u
  return(y,X,Z)
end

biv(y,X,Z) = (X'*Z*inv(Z'*Z)*Z'*X) \ (X'*Z*inv(Z'*Z)*Z'*y)

function J(y,X,Z)
  n = length(y)
  bhat = biv(y,X,Z)
  uhat = y - X*bhat
  C = inv(1/n*sum(z*z'*u^2 for (z,u) in zip(eachrow(Z),uhat)))
  Zu = Z'*uhat/n
  J = n*Zu'*C*Zu
end

S = 1_000
n = 100
j0s = [J(sim(n)...) for _ in 1:S]
j1s = [J(sim(n,EZu=[0.,0., 3.])...) for _ in 1:S]
j2s = [J(sim(n,EZu=[1.,1., 1.])...) for _ in 1:S]

plt = PlotlyLight.Plot()
plt(x=j0s, type="histogram", name="E[Zu] = 0")
plt(x=j1s, type="histogram", name="E[Zu] = [0,0,3]")
fig=plt(x=j2s, type="histogram", name="E[Zu] = [1,1,1]")

fig

Weak Instruments

Simulated Distribution of \(\hat{\beta}^{IV}\)

  • First stage \(X = Z\gamma + e\), simulation with \(\Er[Z_i Z_i] = I\) and \(e \sim N(0,0.25)\), so first stage \(t \approx \sqrt{n}\gamma/0.5\)

  • Distribution of \(\hat{\beta}^{IV}\) with \(\gamma = 1\), \(\gamma=0.2\), and \(\gamma=0.1\)

Simulated Distribution of \(\hat{\beta}^{IV}\)

Code
function tiv(y,X,Z; b0 = ones(size(X,2)))
  b = biv(y,X,Z)
  u = y - X*b
  V = var(u)*inv(X'*Z*inv(Z'*Z)*Z'*X)
  (b - b0)./sqrt.(diag(V))
end
n = 100
S = 10_000
plt = PlotlyLight.Plot()
for g in [1, 0.2, 0.1]
  b = [tiv(sim(n,d=1,EZu=0,gamma=g)...)[1] for _ in 1:S]
  # crop outliers so figure looks okay
  b .= max.(b, -4)
  b .= min.(b, 4)
  plt(x=b, type="histogram",name="γ=$g")
end
fig=plt(x=randn(S), type="histogram", name="Normal")

fig

Weak Instruments

  • Lessons from simulation:
    • When \(\Er[Z_i X_i']\) is small, usual asymptotic distribution is a poor approximation for the finite sample distribution of \(\hat{\beta}^{IV}\)
    • The approximation can be poor even when \(H_0: \gamma = 0\) in \(X = Z\gamma + e\) would be rejected
  • Can we find a better approximation to the finite sample distribution when \(\Er[Z_i X_i']\) is small?

Irrelevant Instrument Asymptotics

  • Suppose \(\Er[Z_i X_i'] = 0\)
  • CLT \[ \frac{1}{\sqrt{n}} \begin{pmatrix} vec(Z'X) \\ Z'u \end{pmatrix} \indist \begin{pmatrix} \zeta_1 \\ \zeta_2 \end{pmatrix} \sim N(0, \Sigma) \]
  • Then \[ \begin{align*} \hat{\beta}^{IV} - \beta_0 = & \left((Z'X)'(Z'Z)^{-1}(Z'X)\right)^{-1} (Z'X)'(Z'Z)^{-1}(Z'u) \\ \indist & \left(H' \Er[Z_i Z_i]^{-1} H\right)^{-1} \left(H \Er[Z_i Z_i']^{-1} \zeta_2\right) \end{align*} \] where \(vec(H) = \zeta_1\)

Weak Instrument Asymptotics

  • Let \(\Er[Z_i X_i'] = \frac{1}{\sqrt{n}} \Gamma\)
  • Then \(\frac{1}{\sqrt{n}} Z' X = \Gamma + H\)
  • and \[ \hat{\beta}^{IV} - \beta_0 \indist \left((\Gamma + H)' \Er[Z_i Z_i]^{-1} (\Gamma + H)\right)^{-1} \left((\Gamma + H) \Er[Z_i Z_i']^{-1} \zeta_2\right) \]
  • \(\Gamma\) cannot be estimated, but we can try to develop estimators and inference methods for \(\beta\) that work for any \(\Gamma\)

Testing for Relevance

  • Model , assume \(\Er[W_i u_i] = 0\) and \(\Er[Z_i u_i] = 0\) \[ Y_i = X_i'\beta + W_i'\beta_W + u_i \]
  • First stage \[ X_i = Z_i' \pi_z + W_i' \pi_W + \nu_i \]
  • Can test \(H_0 : \pi_z = 0\) vs \(H_1 : \pi_z \neq 0\) using F-test
    • With one instrument, \(F = t^2\)
    • Rejecting \(H_0\) at usual significance level is not enough for \(\hat{\beta}^{IV}\) to be well aproximated by its asymptotic normal distribution

Testing for Relevance

Code
function tiv(y,X,Z; b0 = ones(size(X,2)))
    b = biv(y,X,Z)
    u = y - X*b
    V = var(u)*inv(X'*Z*inv(Z'*Z)*Z'*X)
    π = X \ Z
    e = X - Z*π
    se = inv(Z'*Z)*sum(ei^2*z*z' for (ei,z) in zip(e,eachrow(Z)))*inv(Z'*Z)
    t1 = π[1,1]/sqrt(se[1,1])
    return((b - b0)./sqrt.(diag(V)), t1)
end
n = 100
S = 10_000
plt = PlotlyLight.Plot()
alpha = 0.001
for g in [1, 0.2, 0.1]
    b_t1 = [tiv(sim(n,d=1,EZu=0,gamma=g)...) for _ in 1:S]
    # crop outliers so figure looks okay
    b = [bt[1][1] for bt in b_t1 if abs(bt[2][1])>quantile(Normal(),1-alpha/2)]
    b .= max.(b, -4)
    b .= min.(b, 4)
    println("γ=$g: retained $(length(b)) / $S simulations")
    plt(x=b, type="histogram",name="γ=$g")
end
fig=plt(x=randn(S), type="histogram", name="Normal")

fig
γ=1.0: retained 10000 / 10000 simulations
γ=0.2: retained 9980 / 10000 simulations
γ=0.1: retained 8518 / 10000 simulations

Testing for Relevance

  • Stock and Yogo (2002) (table from Stock, Wright, and Yogo (2002)): first stage F > threshold \(\approx 10\) implies \(Bias(\hat{\beta}^{IV}) < 10\% Bias(\hat{\beta}^{OLS})\) and size of 5% test < 15%

swy-tab1.png

Testing for Relevance

  • Lee et al. (2022) : F\(>>10\) is needed in practice1

Identification Robust Inference

  • Opinion: always do this, testing for relevance not needed

  • Test \(H_0: \beta = \beta_0\) vs \(\beta \neq \beta_0\) with Anderson-Rubin test \[ AR(\beta) = n\left(\frac{1}{n} Z'(y-X\beta) \right)' \Sigma(\beta)^{-1} \left(\frac{1}{n} Z'(y - X\beta)\right) \] where \(\Sigma(\beta) = \frac{1}{n} \sum_{i=1}^n Z_iZ_i' (y_i - X_i'\beta)^2\)

  • \(AR(\beta) \indist \chi^2_d\) (under either weak instrument or usual asymptotics)

  • See my other notes for simulations and references

Identification Robust Inference

  • Two downsides of AR test:
    1. AR statistic is similar to over-identifying test (\(AR(\hat{\beta}^{IV}) = J\))
    • Small (even empty) confidence region if model is misspecified
    1. Only gives confidence region for all of \(\beta\), not confidence intervals for single co-ordinates
  • Kleibergen’s LM and Moreira CLR tests address 1, see my other notes for simulations and references

Identification Robust Inference

  • Various approaches to 2 see Andrews, Stock, and Sun (2019) for a review
    • Londschien and Bühlmann (2024) seems like a promising approach, implemented in ivmodels python package (assumes homoscedasticity)
    • Guggenberger, Kleibergen, and Mavroeidis (2024) and Tuvaandorj (2024) allow heteroscedasticity
  • If you want something close to the usual t-test and have 1 endogenous regression and 1 instrument, the tF test from Lee et al. (2022), or better yet, recently improved VtF test in Lee et al. (2023)

Further Reading

  • Recent reviews:
    • Andrews, Stock, and Sun (2019)
    • Keane and Neal (2023)

IV with Treatment Effect Heterogeneity

Model

  • \(Z_i \in \{0,1\}\)
  • \(D_i \in \{0,1\}\)
  • Potential treatments \(D_i(z)\)
  • Potential outcomes \(Y_i(d)\)
  • Exogenous instruments \(Y_i(0),Y_i(1), D_i(0), D_i(1) \indep Z_i\)

LATE

  • Wald estimator \[ \frac{\Er[Y_i | Z_i=1] - \Er[Y_i|Z_i=0]}{\Er[D_i|Z_i=1] - \Er[D_i|Z_i=0]} = \frac{\Er[Y_i(D_i(1))] - \Er[Y_i(D_i(0))]}{\Er[D_i(1)] - \Er[D_i(0)]} \]

\[ = \frac{\Er[Y_i(D_i(1)) - Y_i(D_i(0)) | D_i(1) \neq D_i(0)] P(D_i(1) \neq D_i(0))} { P(D_i(1) > D_i(0)) - P(D_i(1) < D_i(0))} \]

  • Assume monotonicity \(P(D_i(1)<D_i(0)) = 0\), then \[ \frac{\Er[Y_i | Z_i=1] - \Er[Y_i|Z_i=0]}{\Er[D_i|Z_i=1] - \Er[D_i|Z_i=0]} = \Er[Y_i(1) - Y_i(0) | D_i(1)=1, D_i(0) = 0 ] \]
  • local average treatment effect

IV=LATE

  • With single binary \(Z\) and \(D\), \[ \begin{align*} \hat{\beta}^{IV} & = \frac{\sum Y_i(Z_i - \bar{Z})} {\sum D_i(Z_i - \bar{Z})} \\ & \inprob \Er[Y_i(1) - Y_i(0) | D_i(1)=1, D_i(0) = 0 ] \end{align*} \]
  • How general is this interpretation?
    • Multi-valued \(D\)?
    • Multi-values or multiple \(Z\)?
    • Exogenous controls \(X\)?
  • Can salvage some LATE like intrepretation with multiple treatments or instruments, but monotonocity assumption needs to be stronger
    • Mogstad and Torgovitsky (2024) for comprehensive review

Controls

  • Conditional exogeneity: \(Y_i(0),Y_i(1), D_i(0), D_i(1) \indep Z_i | X_i\)

  • Estimate \[ y_i = D_i \beta + X_i'\gamma + \epsilon_i \] by 2SLS

  • Partial out \(X\) to show \[ \hat{\beta}^{IV} = \frac{\sum y_i \tilde{Z}_i}{\sum D_i \tilde{Z}_i} \] where \(\tilde{Z}_i = Z_i - X_i' (X'X)^{-1} X'Z\)

2SLS with Controls

\[ \begin{align*} \hat{\beta}^{IV} = & \frac{\sum y_i \tilde{Z}_i}{\sum D_i \tilde{Z}_i} \\ \inprob & \frac{\Er[Y_i \tilde{Z}_i]}{\Er[D_i \tilde{Z}_i]} \\ = & \frac{\Er[\cov(Y_i \tilde{Z}_i|X_i)] + \Er[\Er[Y_i|X_i]\Er[\tilde{Z}_i|X_i]]}{\Er[D_i \tilde{Z}_i]} \end{align*} \] - If \(\Er[\Er[Y_i|X_i]\Er[\tilde{Z}_i|X_i]] = 0\), we get average of \(X\) specific LATEs - But unless \(\Er[Z_i|X_i]\) is linear, \(\Er[\Er[Y_i|X_i]\Er[\tilde{Z}_i|X_i]] \neq 0\)

2SLS with Controls is not LATE

  • Blandhol et al. (2025) show \[ \begin{align*} \beta^{IV} \inprob & \Er\left[\omega(cp,X)\Er[Y(1) - Y(0) |D(1)>D(0), X] \right] + \\ & + \Er\left[\omega(at,X)\Er[Y(1) - Y(0) |D(1)=D(0)=1, X] \right] \end{align*} \] with \[ \begin{align*} \omega(cp,X) = & \Er[Z|X](1 - L[Z|X])P(D(1)>D(0)|X)\Er[\tilde{Z}D]^{-1} \\ \omega(at,X) = & \Er[\tilde{Z}|X] P(D(1)=D(0)=1|X)\Er[\tilde{Z}D]^{-1} \end{align*} \]
    • \(\Er[\tilde{Z}] = 0\), so unless \(\Er[\tilde{Z}|X] = 0\), it will sometimes be negative

Simulation: Low Bias

using Plots, Statistics, Distributions, Printf

function sim(n; ezx = x->cdf(Normal(),x), Δ = x->x^2, covde=1, vare=2)
    xd = randn(n)
    x = randn(n) + xd
    de = randn(n)
    z = rand(n) .< ezx.(x)
    derr = randn(n)
    d = (xd + derr + z + de .> 0)
    d1 = (xd + derr .+ 1 + de .> 0)
    d0 = (xd + derr .+ 0 + de .> 0)
    ϵ = de*covde + randn(n)*sqrt(vare-covde^2)
    y = (Δ.(x) + de).*d + ϵ

    return(y=y,x=x,z=z,d=d,Δ=(Δ.(x) + de), d0=d0, d1=d1)
end

function bols(y,d,x)
    n = length(y)
    X = hcat(ones(n), d, x)
    return((X'*X) \ X'*y)
end

function b2sls(y,d,x,z)
    n = length(y)
    Z = hcat(ones(n), z, x)
    X = hcat(ones(n), d, x)
    iZZ = inv(Z'*Z)
    XZ = X'*Z
    return((XZ*iZZ*XZ') \ (XZ*iZZ*(Z'*y)))
end


function plotTE(y,d,x,z,Δ,d0,d1; ezx=x->cdf(Normal(),x))
    te=scatter(x,Δ, group=[(t0,t1) for (t0,t1) in zip(d0,d1)], alpha=0.2, markersize=1,markerstrokewidth=0)
    xlabel!("x")
    ylabel!("Treatment Effect")
    title!("Treatment Effects")
    xy=scatter(x,y,group=d,markersize=1,markerstrokewidth=0, alpha=0.2)
    xlabel!("x")
    ylabel!("y")
    title!("Observed Data")
    xs = sort(x)
    pz=plot(xs,ezx.(xs), xlabel="x",ylabel="P(Z=1|X)",title="P(Z|X)",legend=:none)
    xlims!(pz, quantile(x,[0.01,0.99])...)

    n = length(z)
    X = hcat(ones(n),x)
    lzx = X*inv(X'*X)*X'*z
    scatter!(x,lzx,label="L[Z|X]",markersize=1,markerstrokewidth=0,alpha=0.5)

    bo = bols(y,d,x)[2]
    bi = b2sls(y,d,x,z)[2]
    LATE = mean(Δ[d1.>d0])
    numbers=plot(xlims=(0,1),ylims=(0,1), axis=([], false))
    annotate!([(0,0.8,(@sprintf("E[y1-y0|d1>d0] = %.2f",LATE),:left)),
               (0,0.6,(@sprintf("βols = %.2f",bo),:left)),
               (0,0.4,(@sprintf("βiv = %.2f",bi),:left))])


    plot(xy,te,pz,numbers)
end

Simulation: Low Bias

plotTE (generic function with 1 method)

Simulation: Low Bias (Linear \(E[Z|X]\))

ezx = x->cdf(Normal(),x/10)
y,x,z,d,Δ,d0,d1 = sim(25_000, Δ=x->(1+x^3/10), ezx = ezx)
plotTE(y,d,x,z,Δ,d0,d1,  ezx = ezx)

Simulation: Low Bias (Constant Treatment Effect)

ezx = x->cdf(Normal(),x)
y,x,z,d,Δ,d0,d1 = sim(25_000, Δ=x->1, ezx=ezx)
plotTE(y,d,x,z,Δ,d0,d1, ezx=ezx)

Simulation: High Bias

ezx = x->cdf(Normal(),x)
y,x,z,d,Δ,d0,d1 = sim(25_000, Δ=x->(1+x^3/10), ezx=ezx)
plotTE(y,d,x,z,Δ,d0,d1, ezx=ezx)

Observations

  • Nonlinearity in \(\Er[Z|X]\) and \(\Er[Y|X]\) can lead to substantial bias in 2SLS

What to do?

  • Flexibly control for \(X\)
  • If discrete, saturated regression
  • Otherwise, doubly robust estimator for average conditional LATE
    • Chernozhukov et al. (2024) chapter 13, doubleml python & R package

Further Reading

  • Mogstad and Torgovitsky (2024)

References

Andrews, Isaiah, James H. Stock, and Liyang Sun. 2019. “Weak Instruments in Instrumental Variables Regression: Theory and Practice.” Annual Review of Economics 11 (1): 727–53. https://doi.org/10.1146/annurev-economics-080218-025643.
Blandhol, Christine, John Bonney, Magne Mogstad, and Alexander Torgovitsky. 2025. “When Is TSLS Actually Late?” https://a-torgovitsky.github.io/tslslate.pdf.
Chernozhukov, V., C. Hansen, N. Kallus, M. Spindler, and V. Syrgkanis. 2024. Applied Causal Inference Powered by ML and AI. https://causalml-book.org/.
Guggenberger, Patrik, Frank Kleibergen, and Sophocles Mavroeidis. 2024. “A POWERFUL SUBVECTOR ANDERSON–RUBIN TEST IN LINEAR INSTRUMENTAL VARIABLES REGRESSION WITH CONDITIONAL HETEROSKEDASTICITY.” Econometric Theory 40 (5): 957–1002. https://doi.org/10.1017/S0266466622000627.
Keane, Michael, and Timothy Neal. 2023. “Instrument Strength in IV Estimation and Inference: A Guide to Theory and Practice.” Journal of Econometrics 235 (2): 1625–53. https://doi.org/https://doi.org/10.1016/j.jeconom.2022.12.009.
Lee, David S, Justin McCrary, Marcelo J. Moreira, and Jack Porter. 2022. “Valid t-Ratio Inference for IV.” American Economic Review 112 (10): 3260–90. https://doi.org/10.1257/aer.20211063.
Lee, David S, Justin McCrary, Marcelo J Moreira, Jack R Porter, and Luther Yap. 2023. “What to Do When You Can’t Use ’1.96’ Confidence Intervals for IV.” Working Paper 31893. Working Paper Series. National Bureau of Economic Research. https://doi.org/10.3386/w31893.
Londschien, Malte, and Peter Bühlmann. 2024. “Weak-Instrument-Robust Subvector Inference in Instrumental Variables Regression: A Subvector Lagrange Multiplier Test and Properties of Subvector Anderson-Rubin Confidence Sets.” https://arxiv.org/abs/2407.15256.
Mogstad, Magne, and Alexander Torgovitsky. 2024. “Instrumental Variables with Unobserved Heterogeneity in Treatment Effects.” Working Paper 32927. Working Paper Series. National Bureau of Economic Research. https://doi.org/10.3386/w32927.
Song, Kyunchul. 2021. “Introduction to Econometrics.”
Stock, James H, Jonathan H Wright, and Motohiro Yogo. 2002. “A Survey of Weak Instruments and Weak Identification in Generalized Method of Moments.” Journal of Business & Economic Statistics 20 (4): 518–29. https://doi.org/10.1198/073500102288618658.
Stock, James H, and Motohiro Yogo. 2002. “Testing for Weak Instruments in Linear IV Regression.” Working Paper 284. Technical Working Paper Series. National Bureau of Economic Research. https://doi.org/10.3386/t0284.
Tuvaandorj, Purevdorj. 2024. “Robust Permutation Tests in Linear Instrumental Variables Regression.” Journal of the American Statistical Association 0 (ja): 1–24. https://doi.org/10.1080/01621459.2024.2412363.