Instrumental Variables Estimation

Paul Schrimpf

2024-12-14

Reading

  • Required: Song (2021) chapter 12

\[ \def\Er{{\mathrm{E}}} \def\En{{\mathbb{E}_n}} \def\cov{{\mathrm{Cov}}} \def\var{{\mathrm{Var}}} \def\R{{\mathbb{R}}} \def\indep{{\perp\!\!\!\perp}} \newcommand\norm[1]{\left\lVert#1\right\rVert} \def\rank{{\mathrm{rank}}} \newcommand{\inpr}{ \overset{p^*_{\scriptscriptstyle n}}{\longrightarrow}} \def\inprob{{\,{\buildrel p \over \rightarrow}\,}} \def\indist{\,{\buildrel d \over \rightarrow}\,} \DeclareMathOperator*{\plim}{plim} \]

Instrumental Variables

Model

\[ Y_i = \underbrace{X_i}_{\in \R^k}' \beta_0 + u_i \]

  • \(\Er[u_i] = 0\), but \(\Er[X_i u_i] \neq 0\)

  • Instrument \(Z_i \in \R^d\) s.t.

    1. Relevant \(rank(\Er[Z_i X_i']) = k\)

    2. Exogenous \(\Er[Z_i u_i] = 0\)

Identification

  • Exogeneity implies \[ \Er[Z_i Y_i] = \Er[Z_i X_i']\beta_0 \]
  • If \(d=k\) (exactly identified), then relevance implies \(\Er[Z_i X_i']\) invertible, so \[ \beta_0 = \Er[Z_i X_i']^{-1} \Er[Z_i Y_i] \]
  • For \(d>k\), relevance implies \(\Er[Z_iX_i']'\Er[Z_iX_i']\) invertible, so \[ \beta_0 = (\Er[Z_i X_i]' \Er[Z_i X_i'])^{-1} \Er[Z_i X_i']' \Er[Z_i Y_i] \]

Estimation

Method of Moments Estimation

  • We assume \(\Er[Z_i u_i] = 0\), so \[ \Er[Z_i(Y_i - X_i'\beta_0)] = 0 \]
  • Estimate by replacing \(\Er\) with \(\frac{1}{n}\sum_{i=1}^n\)

Method of Moments Estimation

  • \(d\) equations, \(k \geq d\) unknowns, so find \[ \frac{1}{n} \sum_{i=1}^n Z_i(Y_i - X_i'\hat{\beta}^{IV}) \approx 0 \] by solving \[ \begin{align*} \hat{\beta}^{IV} & = \mathrm{arg}\min_\beta \norm{ \frac{1}{n} \sum_{i=1}^n Z_i(Y_i - X_i'\beta) }_{W}^2 \\ & = \mathrm{arg}\min_\beta \left( \frac{1}{n} \sum_{i=1}^n Z_i(Y_i - X_i'\beta\right)' W \left( \frac{1}{n} \sum_{i=1}^n Z_i(Y_i - X_i'\beta\right) \end{align*} \]

Method of Moments Estimation

\[ \hat{\beta}^{IV} = \mathrm{arg}\min_\beta \left( \frac{1}{n} \sum_{i=1}^n Z_i(Y_i - X_i'\beta\right)' W \left( \frac{1}{n} \sum_{i=1}^n Z_i(Y_i - X_i'\beta\right) \]

  • \(\hat{\beta}^{IV}_W = (X'Z W Z'W)^{-1}(X'Z W Z'y)\)

Asymptotic Properties

Consistency

\[ \begin{align*} \hat{\beta}^{IV}_W - \beta_0 = & (X'Z W Z'W)^{-1}(X'Z W Z'u) \\ = & \left[ \left(\frac{1}{n}\sum_{i=1}^n X_i Z_i'\right) W \left(\frac{1}{n}\sum_{i=1}^n Z_i X_i'\right) \right]^{-1} \left(\frac{1}{n}\sum_{i=1}^n X_i Z_i'\right) W \left(\frac{1}{n}\sum_{i=1}^n Z_i u_i\right) \end{align*} \]

  • Consistent if LLN applies to \(\frac{1}{n}\sum_{i=1}^n Z_i X_i'\) and \(\frac{1}{n}\sum_{i=1}^n Z_i u_i\)
    • E.g. if i.i.d. with \(\Er[\norm{X_i}^4]\) and \(\Er[\norm{Z_i}^4]\) finite and \(\Er[u_i^2|Z_i=z] = \sigma^2\) 1

Asymptotic Normality

\[ \begin{align*} \hat{\beta}^{IV}_W - \beta_0 = & (X'Z W Z'W)^{-1}(X'Z W Z'u) \\ = & \left[ \left(\frac{1}{n}\sum_{i=1}^n X_i Z_i'\right) W \left(\frac{1}{n}\sum_{i=1}^n Z_i X_i'\right) \right]^{-1} \left(\frac{1}{n}\sum_{i=1}^n X_i Z_i'\right) W \left(\frac{1}{n}\sum_{i=1}^n Z_i u_i\right) \end{align*} \]

  • \(\sqrt{n}(\hat{\beta}^{IV} - \beta_0) \indist N(0, V)\) if LLN applies to \(\frac{1}{n}\sum_{i=1}^n Z_i X_i'\) and CLT to \(\frac{1}{\sqrt{n}}\sum_{i=1}^n Z_i u_i\)
    • E.g. if i.i.d. with \(\Er[\norm{X_i}^4]\) and \(\Er[\norm{Z_i}^4]\) finite and \(\Er[u_i^2|Z_i=z] = \sigma^2\)
    • then \(\frac{1}{\sqrt{n}} \sum Z_i u_i \indist N(0, \sigma^2 \Er[Z_iZ_i'])\)
    • \(V = \sigma^2 (\Er[Z_iX_i']' W \Er[Z_iX_i'])^{-1} (\Er[Z_iX_i']' W \Er[Z_i Z_i'] W \Er[Z_i X_i']) (\Er[Z_iX_i']' W \Er[Z_iX_i'])^{-1}\)

Optimal \(W\)

Theorem 2.1

\(W^* = \Er[Z_iZ_i']^{-1}\) minimizes the asymptotic variance of \(\hat{\beta}^{IV}_W\)

  • Estimate \(\hat{W}^* = \left(\frac{1}{n} Z'Z\right)^{-1}\) \[ \hat{\beta}^{IV} = (X'Z (Z'Z)^{-1} Z' X)^{-1} (X'Z(Z'Z)^{-1}Z'y) \]

Two Stage Least Squares

\[ \begin{align*} \hat{\beta}^{IV} & = (X'Z (Z'Z)^{-1} Z' X)^{-1} (X'Z(Z'Z)^{-1}Z'y) \\ & = (X'P_Z X)^{-1} (X' P_Z y) \\ & = ((P_Z X)'(P_Z X))^{-1} ((P_Z X)'y) \end{align*} \]

  1. Regress \(X\) on \(Z\), let \(\hat{X} = P_Z X\)
  2. Regress \(y\) on \(\hat{X}\)

Testing Overidentifying Restrictions

  • \(H_0: \Er[Z_i(Y_i - X_i'\beta_0)] = 0\)
  • \(k=d\), have \(\En[Z_i(Y_i - X_i'\hat{\beta}^{IV})] = 0\) exactly, and \(H_0\) is untestable
  • \(k>d\), can test
  • Test statistic \[ J = n \left(\frac{1}{n} Z'(y-X\hat{\beta}^{IV}) \right)' \hat{C} \left(\frac{1}{n} Z'(y-X\hat{\beta}^{IV}) \right) \]

Testing Overidentifying Restrictions

Theorem 2.3

Let \(\hat{C} = \left(\frac{1}{n} \sum_{i=1}^n Z_iZ_i' \hat{u}_i^2\right)^{-1}\). Assume:

  1. \(\Er[ \norm{X_i}^4] + \Er[\norm{Z_i}^4] < \infty\)

  2. \(\Er[u|Z] = \sigma^2\)

  3. \(\Er[Z_i Z_i']\) is positive definite

Then, \[ J \indist \chi^2_{d-k} \]

Over-identifying Test

  • Only has power when instruments have different covariances with \(u\)
Code
using Distributions, LinearAlgebra
import PlotlyLight
function sim(n; d=3, EZu = zeros(d), Exu = 0.5, beta = 1, gamma = ones(d))
  zu = randn(n,d)
  Z = randn(n,d) + mapslices(x->x.*EZu, zu, dims=2)
  xu = randn(n)
  X = Z*gamma + xu*Exu
  u = vec(sum(zu,dims=2) + xu + randn(n))
  y = X*beta + u
  return(y,X,Z)
end

biv(y,X,Z) = (X'*Z*inv(Z'*Z)*Z'*X) \ (X'*Z*inv(Z'*Z)*Z'*y)

function J(y,X,Z)
  n = length(y)
  bhat = biv(y,X,Z)
  uhat = y - X*bhat
  C = inv(1/n*sum(z*z'*u^2 for (z,u) in zip(eachrow(Z),uhat)))
  Zu = Z'*uhat/n
  J = n*Zu'*C*Zu
end

S = 1_000
n = 100
j0s = [J(sim(n)...) for _ in 1:S]
j1s = [J(sim(n,EZu=[0.,0., 3.])...) for _ in 1:S]
j2s = [J(sim(n,EZu=[1.,1., 1.])...) for _ in 1:S]

plt = PlotlyLight.Plot()
plt(x=j0s, type="histogram", name="E[Zu] = 0")
plt(x=j1s, type="histogram", name="E[Zu] = [0,0,3]")
fig=plt(x=j2s, type="histogram", name="E[Zu] = [1,1,1]")

fig
┌ Warning: `Plot(; kw...)` is deprecated. Use `plot(; kw...)` instead.
│   caller = top-level scope at iv.qmd:246
└ @ Core ~/626/site/iv/iv.qmd:246

Weak Instruments

Simulated Distribution of \(\hat{\beta}^{IV}\)

  • First stage \(X = Z\gamma + e\), simulation with \(\Er[Z_i Z_i] = I\) and \(e \sim N(0,0.25)\), so first stage \(t \approx \sqrt{n}\gamma/0.5\)

  • Distribution of \(\hat{\beta}^{IV}\) with \(\gamma = 1\), \(\gamma=0.2\), and \(\gamma=0.1\)

Simulated Distribution of \(\hat{\beta}^{IV}\)

Code
function tiv(y,X,Z; b0 = ones(size(X,2)))
  b = biv(y,X,Z)
  u = y - X*b
  V = var(u)*inv(X'*Z*inv(Z'*Z)*Z'*X)
  (b - b0)./sqrt.(diag(V))
end
n = 100
S = 10_000
plt = PlotlyLight.Plot()
for g in [1, 0.2, 0.1]
  b = [tiv(sim(n,d=1,EZu=0,gamma=g)...)[1] for _ in 1:S]
  # crop outliers so figure looks okay
  b .= max.(b, -4)
  b .= min.(b, 4)
  plt(x=b, type="histogram",name="γ=$g")
end
fig=plt(x=randn(S), type="histogram", name="Normal")

fig
┌ Warning: `Plot(; kw...)` is deprecated. Use `plot(; kw...)` instead.
│   caller = top-level scope at iv.qmd:280
└ @ Core ~/626/site/iv/iv.qmd:280

Weak Instruments

  • Lessons from simulation:
    • When \(\Er[Z_i X_i']\) is small, usual asymptotic distribution is a poor approximation for the finite sample distribution of \(\hat{\beta}^{IV}\)
    • The approximation can be poor even when \(H_0: \gamma = 0\) in \(X = Z\gamma + e\) would be rejected
  • Can we find a better approximation to the finite sample distribution when \(\Er[Z_i X_i']\) is small?

Irrelevant Instrument Asymptotics

  • Suppose \(\Er[Z_i X_i'] = 0\)
  • CLT \[ \frac{1}{\sqrt{n}} \begin{pmatrix} vec(Z'X) \\ Z'u \end{pmatrix} \indist \begin{pmatrix} \zeta_1 \\ \zeta_2 \end{pmatrix} \sim N(0, \Sigma) \]
  • Then \[ \begin{align*} \hat{\beta}^{IV} - \beta_0 = & \left((Z'X)'(Z'Z)^{-1}(Z'X)\right)^{-1} (Z'X)'(Z'Z)^{-1}(Z'u) \\ \indist & \left(H' \Er[Z_i Z_i]^{-1} H\right)^{-1} \left(H \Er[Z_i Z_i']^{-1} \zeta_2\right) \end{align*} \] where \(vec(H) = \zeta_1\)

Weak Instrument Asymptotics

  • Let \(\Er[Z_i X_i'] = \frac{1}{\sqrt{n}} \Gamma\)
  • Then \(\frac{1}{\sqrt{n}} Z' X = \Gamma + H\)
  • and \[ \hat{\beta}^{IV} - \beta_0 \indist \left((\Gamma + H)' \Er[Z_i Z_i]^{-1} (\Gamma + H)\right)^{-1} \left((\Gamma + H) \Er[Z_i Z_i']^{-1} \zeta_2\right) \]
  • \(\Gamma\) cannot be estimated, but we can try to develop estimators and inference methods for \(\beta\) that work for any \(\Gamma\)

Testing for Relevance

  • Model , assume \(\Er[W_i u_i] = 0\) and \(\Er[Z_i u_i] = 0\) \[ Y_i = X_i'\beta + W_i'\beta_W + u_i \]
  • First stage \[ X_i = Z_i' \pi_z + W_i' \pi_W + \nu_i \]
  • Can test \(H_0 : \pi_z = 0\) vs \(H_1 : \pi_z \neq 0\) using F-test
    • With one instrument, \(F = t^2\)
    • Rejecting \(H_0\) at usual significance level is not enough for \(\hat{\beta}^{IV}\) to be well aproximated by its asymptotic normal distribution

Testing for Relevance

  • Stock and Yogo (2002) (table from Stock, Wright, and Yogo (2002)): first stage F > threshold \(\approx 10\) implies \(Bias(\hat{\beta}^{IV}) < 10\% Bias(\hat{\beta}^{OLS})\) and size of 5% test < 15%

swy-tab1.png

Testing for Relevance

  • Lee et al. (2022) : F\(>>10\) is needed in practice1

Identification Robust Inference

  • Opinion: always do this, testing for relevance not needed

  • Test \(H_0: \beta = \beta_0\) vs \(\beta \neq \beta_0\) with Anderson-Rubin test \[ AR(\beta) = n\left(\frac{1}{n} Z'(y-X\beta) \right)' \Sigma(\beta)^{-1} \left(\frac{1}{n} Z'(y - X\beta)\right) \] where \(\Sigma(\beta) = \frac{1}{n} \sum_{i=1}^n Z_iZ_i' (y_i - X_i'\beta)^2\)

  • \(AR(\beta) \indist \chi^2_d\) (under either weak instrument or usual asymptotics)

  • See my other notes for simulations and references

Identification Robust Inference

  • Two downsides of AR test:
    1. AR statistic is similar to over-identifying test (\(AR(\hat{\beta}^{IV}) = J\))
    • Small (even empty) confidence region if model is misspecified
    1. Only gives confidence region for all of \(\beta\), not confidence intervals for single co-ordinates
  • Kleibergen’s LM and Moreira CLR tests address 1, see my other notes for simulations and references

Identification Robust Inference

  • Various approaches to 2 see Andrews, Stock, and Sun (2019) for a review
    • Londschien and Bühlmann (2024) seems like a promising approach, implemented in ivmodels python package (assumes homoscedasticity)
    • Guggenberger, Kleibergen, and Mavroeidis (2024) and Tuvaandorj (2024) allow heteroscedasticity
  • If you want something close to the usual t-test and have 1 endogenous regression and 1 instrument, the tF test from Lee et al. (2022), or better yet, recently improved VtF test in Lee et al. (2023)

Further Reading

  • Recent reviews:
    • Andrews, Stock, and Sun (2019)
    • Keane and Neal (2023)

IV with Treatment Effect Heterogeneity

Model

  • \(Z_i \in \{0,1\}\)
  • \(D_i \in \{0,1\}\)
  • Potential treatments \(D_i(z)\)
  • Potential outcomes \(Y_i(d)\)
  • Exogenous instruments \(Y_i(0),Y_i(1), D_i(0), D_i(1) \indep Z_i\)

LATE

  • Wald estimator \[ \frac{\Er[Y_i | Z_i=1] - \Er[Y_i|Z_i=0]}{\Er[D_i|Z_i=1] - \Er[D_i|Z_i=0]} = \frac{\Er[Y_i(D_i(1))] - \Er[Y_i(D_i(0))]}{\Er[D_i(1)] - \Er[D_i(0)]} \text{ (exogeneity) \]

\[ = \frac{\Er[Y_i(D_i(1)) - Y_i(D_i(0)) | D_i(1) \neq D_i(0)] P(D_i(1) \neq D_i(0))} { P(D_i(1) > D_i(0)) - P(D_i(1) < D_i(0))} \]

  • Assume monotonicity \(P(D_i(1)<D_i(0)) = 0\), then \[ \frac{\Er[Y_i | Z_i=1] - \Er[Y_i|Z_i=0]}{\Er[D_i|Z_i=1] - \Er[D_i|Z_i=0]} = \Er[Y_i(1) - Y_i(0) | D_i(1)=1, D_i(0) = 0 ] \]
  • local average treatment effect

IV=LATE

  • With single binary \(Z\) and \(D\), \[ \begin{align*} \hat{\beta}^{IV} & = \frac{\sum Y_i(Z_i - \bar{Z})} {\sum D_i(Z_i - \bar{Z})} \\ & \inprob \Er[Y_i(1) - Y_i(0) | D_i(1)=1, D_i(0) = 0 ] \end{align*} \]
  • How general is this interpretation?
    • Multi-valued \(D\)?
    • Multi-values or multiple \(Z\)?
    • Exogenous controls \(X\)?
  • Can salvage some LATE like intrepretation with multiple treatments or instruments, but monotonocity assumption needs to be stronger
    • Mogstad and Torgovitsky (2024) for comprehensive review

Controls

  • Conditional exogeneity: \(Y_i(0),Y_i(1), D_i(0), D_i(1) \indep Z_i | X_i\)

  • Estimate \[ y_i = D_i \beta + X_i'\gamma + \epsilon_i \] by 2SLS

  • Partial out \(X\) to show \[ \hat{\beta}^{IV} = \frac{\sum y_i \tilde{Z}_i}{\sum D_i \tilde{Z}_i} \] where \(\tilde{Z}_i = Z_i - X_i' (X'X)^{-1} X'Z\)

2SLS with Controls

\[ \begin{align*} \hat{\beta}^{IV} = & \frac{\sum y_i \tilde{Z}_i}{\sum D_i \tilde{Z}_i} \\ \inprob & \frac{\Er[Y_i \tilde{Z}_i]}{\Er[D_i \tilde{Z}_i]} \\ = & \frac{\Er[\cov(Y_i \tilde{Z}_i|X_i)] + \Er[\Er[Y_i|X_i]\Er[\tilde{Z}_i|X_i]]}{\Er[D_i \tilde{Z}_i]} \end{align*} \] - If \(\Er[\Er[Y_i|X_i]\Er[\tilde{Z}_i|X_i]] = 0\), we get average of \(X\) specific LATEs - But unless \(\Er[Z_i|X_i]\) is linear, \(\Er[\Er[Y_i|X_i]\Er[\tilde{Z}_i|X_i]] \neq 0\)

2SLS with Controls is not LATE

  • Blandhol et al. (2022) show \[ \begin{align*} \beta^{IV} \inprob & \Er\left[\omega(cp,X)\Er[Y(1) - Y(0) |D(1)>D(0), X] \right] + \\ & + \Er\left[\omega(at,X)\Er[Y(1) - Y(0) |D(1)=D(0)=1, X] \right] \end{align*} \] with \[ \begin{align*} \omega(cp,X) = & \Er[Z|X](1 - L[Z|X])P(D(1)>D(0)|X)\Er[\tilde{Z}D]^{-1} \\ \omega(at,X) = & \Er[\tilde{Z}|X] P(D(1)=D(0)=1|X)\Er[\tilde{Z}D]^{-1} \end{align*} \]
    • \(\Er[\tilde{Z}] = 0\), so unless \(\Er[\tilde{Z}|X] = 0\), it will sometimes be negative

Simulation: Low Bias

using Plots, Statistics, Distributions, Printf

function sim(n; ezx = x->cdf(Normal(),x), Δ = x->x^2, covde=1, vare=2)
  xd = randn(n)
  x = randn(n) + xd
  de = randn(n)
  z = rand(n) .< ezx.(x)
  derr = randn(n)
  d = (xd + derr + z + de .> 0)
  d1 = (xd + derr .+ 1 + de .> 0)
  d0 = (xd + derr .+ 0 + de .> 0)
  ϵ = de*covde + randn(n)*sqrt(vare-covde^2)
  y = (Δ.(x) + de).*d + ϵ

  return(y=y,x=x,z=z,d=d,Δ=(Δ.(x) + de), d0=d0, d1=d1)
end

function bols(y,d,x)
  n = length(y)
  X = hcat(ones(n), d, x)
  return((X'*X) \ X'*y)
end

function b2sls(y,d,x,z)
  n = length(y)
  Z = hcat(ones(n), z, x)
  X = hcat(ones(n), d, x)
  iZZ = inv(Z'*Z)
  XZ = X'*Z
  return((XZ*iZZ*XZ') \ (XZ*iZZ*(Z'*y)))
end


function plotTE(y,d,x,z,Δ,d0,d1; ezx=x->cdf(Normal(),x))
  te=scatter(x,Δ, group=[(t0,t1) for (t0,t1) in zip(d0,d1)], alpha=1.0, markersize=1,markerstrokewidth=0)
  xlabel!("x")
  ylabel!("Treatment Effect")
  title!("Treatment Effects")
  xy=scatter(x,y,group=d,markersize=1,markerstrokewidth=0)
  xlabel!("x")
  ylabel!("y")
  title!("Observed Data")
  xs = sort(x)
  pz=plot(xs,ezx.(xs), xlabel="x",ylabel="P(Z=1|X)",title="P(Z|X)",legend=:none)
  n = length(z)
  X = hcat(ones(n),x)
  lzx = X*inv(X'*X)*X'*z
  scatter!(x,lzx,label="L[Z|X]",markersize=1,markerstrokewidth=0,alpha=0.5)

  bo = bols(y,d,x)[2]
  bi = b2sls(y,d,x,z)[2]
  LATE = mean(Δ[d1.>d0])
  numbers=plot(xlims=(0,1),ylims=(0,1), axis=([], false))
  annotate!([(0,0.8,(@sprintf("E[y1-y0|d1>d0] = %.2f",LATE),:left)),
             (0,0.6,(@sprintf("βols = %.2f",bo),:left)),
             (0,0.4,(@sprintf("βiv = %.2f",bi),:left))])


  plot(xy,te,pz,numbers)
end

y,x,z,d,Δ,d0,d1 = sim(5_000, Δ=x->1)
plotTE(y,d,x,z,Δ,d0,d1)

Simulation: Low Bias

Simulation: Low Bias

ezx = x->cdf(Normal(),x/10)
y,x,z,d,Δ,d0,d1 = sim(5_000, Δ=x->1+x^3/10, ezx = ezx)
plotTE(y,d,x,z,Δ,d0,d1,  ezx = ezx)

Simulation: High Bias

y,x,z,d,Δ,d0,d1 = sim(5_000, Δ=x->1+x^3/10)
plotTE(y,d,x,z,Δ,d0,d1)

Observations

  • Nonlinearity in \(\Er[Z|X]\) and \(\Er[Y|X]\) can lead to substantial bias in 2SLS

What to do?

  • Flexibly control for \(X\)
  • If discrete, saturated regression
  • Otherwise, double robust estimator for average conditional LATE
    • Chernozhukov et al. (2024) chapter 13, doubleml python & R package

Further Reading

  • Mogstad and Torgovitsky (2024)

References

Andrews, Isaiah, James H. Stock, and Liyang Sun. 2019. “Weak Instruments in Instrumental Variables Regression: Theory and Practice.” Annual Review of Economics 11 (1): 727–53. https://doi.org/10.1146/annurev-economics-080218-025643.
Blandhol, Christine, John Bonney, Magne Mogstad, and Alexander Torgovitsky. 2022. “When Is TSLS Actually Late?”
Chernozhukov, V., C. Hansen, N. Kallus, M. Spindler, and V. Syrgkanis. 2024. Applied Causal Inference Powered by ML and AI. https://causalml-book.org/.
Guggenberger, Patrik, Frank Kleibergen, and Sophocles Mavroeidis. 2024. “A POWERFUL SUBVECTOR ANDERSON–RUBIN TEST IN LINEAR INSTRUMENTAL VARIABLES REGRESSION WITH CONDITIONAL HETEROSKEDASTICITY.” Econometric Theory 40 (5): 957–1002. https://doi.org/10.1017/S0266466622000627.
Keane, Michael, and Timothy Neal. 2023. “Instrument Strength in IV Estimation and Inference: A Guide to Theory and Practice.” Journal of Econometrics 235 (2): 1625–53. https://doi.org/https://doi.org/10.1016/j.jeconom.2022.12.009.
Lee, David S, Justin McCrary, Marcelo J. Moreira, and Jack Porter. 2022. “Valid t-Ratio Inference for IV.” American Economic Review 112 (10): 3260–90. https://doi.org/10.1257/aer.20211063.
Lee, David S, Justin McCrary, Marcelo J Moreira, Jack R Porter, and Luther Yap. 2023. “What to Do When You Can’t Use ’1.96’ Confidence Intervals for IV.” Working Paper 31893. Working Paper Series. National Bureau of Economic Research. https://doi.org/10.3386/w31893.
Londschien, Malte, and Peter Bühlmann. 2024. “Weak-Instrument-Robust Subvector Inference in Instrumental Variables Regression: A Subvector Lagrange Multiplier Test and Properties of Subvector Anderson-Rubin Confidence Sets.” https://arxiv.org/abs/2407.15256.
Mogstad, Magne, and Alexander Torgovitsky. 2024. “Instrumental Variables with Unobserved Heterogeneity in Treatment Effects.” Working Paper 32927. Working Paper Series. National Bureau of Economic Research. https://doi.org/10.3386/w32927.
Song, Kyunchul. 2021. “Introduction to Econometrics.”
Stock, James H, Jonathan H Wright, and Motohiro Yogo. 2002. “A Survey of Weak Instruments and Weak Identification in Generalized Method of Moments.” Journal of Business & Economic Statistics 20 (4): 518–29. https://doi.org/10.1198/073500102288618658.
Stock, James H, and Motohiro Yogo. 2002. “Testing for Weak Instruments in Linear IV Regression.” Working Paper 284. Technical Working Paper Series. National Bureau of Economic Research. https://doi.org/10.3386/t0284.
Tuvaandorj, Purevdorj. 2024. “Robust Permutation Tests in Linear Instrumental Variables Regression.” Journal of the American Statistical Association 0 (ja): 1–24. https://doi.org/10.1080/01621459.2024.2412363.