Instrumental Variables Estimation

Paul Schrimpf

2024-12-14

Reading

Required: Song (2021) chapter 12

\[ \def\Er{{\mathrm{E}}} \def\En{{\mathbb{E}_n}} \def\cov{{\mathrm{Cov}}} \def\var{{\mathrm{Var}}} \def\R{{\mathbb{R}}} \def\indep{{\perp\!\!\!\perp}} \newcommand\norm[1]{\left\lVert#1\right\rVert} \def\rank{{\mathrm{rank}}} \newcommand{\inpr}{ \overset{p^*_{\scriptscriptstyle n}}{\longrightarrow}} \def\inprob{{\,{\buildrel p \over \rightarrow}\,}} \def\indist{\,{\buildrel d \over \rightarrow}\,} \DeclareMathOperator*{\plim}{plim} \]

Instrumental Variables

Model

\[ Y_i = \underbrace{X_i}_{\in \R^k}' \beta_0 + u_i \]

\(\Er[u_i] = 0\), but \(\Er[X_i u_i] \neq 0\)
Instrument \(Z_i \in \R^d\) s.t.
1. Relevant \(rank(\Er[Z_i X_i']) = k\)
2. Exogenous \(\Er[Z_i u_i] = 0\)

Identification

Exogeneity implies \[ \Er[Z_i Y_i] = \Er[Z_i X_i']\beta_0 \]

If \(d=k\) (exactly identified), then relevance implies \(\Er[Z_i X_i']\) invertible, so \[ \beta_0 = \Er[Z_i X_i']^{-1} \Er[Z_i Y_i] \]

For \(d>k\), relevance implies \(\Er[Z_iX_i']'\Er[Z_iX_i']\) invertible, so \[ \beta_0 = (\Er[Z_i X_i]' \Er[Z_i X_i'])^{-1} \Er[Z_i X_i']' \Er[Z_i Y_i] \]

Estimation

Method of Moments Estimation

We assume \(\Er[Z_i u_i] = 0\), so \[ \Er[Z_i(Y_i - X_i'\beta_0)] = 0 \]
Estimate by replacing \(\Er\) with \(\frac{1}{n}\sum_{i=1}^n\)

Method of Moments Estimation

\(d\) equations, \(k \geq d\) unknowns, so find \[ \frac{1}{n} \sum_{i=1}^n Z_i(Y_i - X_i'\hat{\beta}^{IV}) \approx 0 \] by solving \[ \begin{align*} \hat{\beta}^{IV} & = \mathrm{arg}\min_\beta \norm{ \frac{1}{n} \sum_{i=1}^n Z_i(Y_i - X_i'\beta) }_{W}^2 \\ & = \mathrm{arg}\min_\beta \left( \frac{1}{n} \sum_{i=1}^n Z_i(Y_i - X_i'\beta\right)' W \left( \frac{1}{n} \sum_{i=1}^n Z_i(Y_i - X_i'\beta\right) \end{align*} \]

Method of Moments Estimation

\[ \hat{\beta}^{IV} = \mathrm{arg}\min_\beta \left( \frac{1}{n} \sum_{i=1}^n Z_i(Y_i - X_i'\beta\right)' W \left( \frac{1}{n} \sum_{i=1}^n Z_i(Y_i - X_i'\beta\right) \]

\(\hat{\beta}^{IV}_W = (X'Z W Z'W)^{-1}(X'Z W Z'y)\)

Asymptotic Properties

Consistency

\[ \begin{align*} \hat{\beta}^{IV}_W - \beta_0 = & (X'Z W Z'W)^{-1}(X'Z W Z'u) \\ = & \left[ \left(\frac{1}{n}\sum_{i=1}^n X_i Z_i'\right) W \left(\frac{1}{n}\sum_{i=1}^n Z_i X_i'\right) \right]^{-1} \left(\frac{1}{n}\sum_{i=1}^n X_i Z_i'\right) W \left(\frac{1}{n}\sum_{i=1}^n Z_i u_i\right) \end{align*} \]

Consistent if LLN applies to \(\frac{1}{n}\sum_{i=1}^n Z_i X_i'\) and \(\frac{1}{n}\sum_{i=1}^n Z_i u_i\)
- E.g. if i.i.d. with \(\Er[\norm{X_i}^4]\) and \(\Er[\norm{Z_i}^4]\) finite and \(\Er[u_i^2|Z_i=z] = \sigma^2\) ¹

Asymptotic Normality

\(\sqrt{n}(\hat{\beta}^{IV} - \beta_0) \indist N(0, V)\) if LLN applies to \(\frac{1}{n}\sum_{i=1}^n Z_i X_i'\) and CLT to \(\frac{1}{\sqrt{n}}\sum_{i=1}^n Z_i u_i\)
- E.g. if i.i.d. with \(\Er[\norm{X_i}^4]\) and \(\Er[\norm{Z_i}^4]\) finite and \(\Er[u_i^2|Z_i=z] = \sigma^2\)
- then \(\frac{1}{\sqrt{n}} \sum Z_i u_i \indist N(0, \sigma^2 \Er[Z_iZ_i'])\)
- \(V = \sigma^2 (\Er[Z_iX_i']' W \Er[Z_iX_i'])^{-1} (\Er[Z_iX_i']' W \Er[Z_i Z_i'] W \Er[Z_i X_i']) (\Er[Z_iX_i']' W \Er[Z_iX_i'])^{-1}\)

Optimal \(W\)

Theorem 2.1

\(W^* = \Er[Z_iZ_i']^{-1}\) minimizes the asymptotic variance of \(\hat{\beta}^{IV}_W\)

Estimate \(\hat{W}^* = \left(\frac{1}{n} Z'Z\right)^{-1}\) \[ \hat{\beta}^{IV} = (X'Z (Z'Z)^{-1} Z' X)^{-1} (X'Z(Z'Z)^{-1}Z'y) \]

Two Stage Least Squares

\[ \begin{align*} \hat{\beta}^{IV} & = (X'Z (Z'Z)^{-1} Z' X)^{-1} (X'Z(Z'Z)^{-1}Z'y) \\ & = (X'P_Z X)^{-1} (X' P_Z y) \\ & = ((P_Z X)'(P_Z X))^{-1} ((P_Z X)'y) \end{align*} \]

Regress \(X\) on \(Z\), let \(\hat{X} = P_Z X\)
Regress \(y\) on \(\hat{X}\)

Testing Overidentifying Restrictions

\(H_0: \Er[Z_i(Y_i - X_i'\beta_0)] = 0\)
\(k=d\), have \(\En[Z_i(Y_i - X_i'\hat{\beta}^{IV})] = 0\) exactly, and \(H_0\) is untestable
\(k>d\), can test
Test statistic \[ J = n \left(\frac{1}{n} Z'(y-X\hat{\beta}^{IV}) \right)' \hat{C} \left(\frac{1}{n} Z'(y-X\hat{\beta}^{IV}) \right) \]

Testing Overidentifying Restrictions

Theorem 2.3

Let \(\hat{C} = \left(\frac{1}{n} \sum_{i=1}^n Z_iZ_i' \hat{u}_i^2\right)^{-1}\). Assume:

\(\Er[ \norm{X_i}^4] + \Er[\norm{Z_i}^4] < \infty\)
\(\Er[u|Z] = \sigma^2\)
\(\Er[Z_i Z_i']\) is positive definite

Then, \[ J \indist \chi^2_{d-k} \]

Over-identifying Test

Only has power when instruments have different covariances with \(u\)

Code

using Distributions, LinearAlgebra
import PlotlyLight
function sim(n; d=3, EZu = zeros(d), Exu = 0.5, beta = 1, gamma = ones(d))
  zu = randn(n,d)
  Z = randn(n,d) + mapslices(x->x.*EZu, zu, dims=2)
  xu = randn(n)
  X = Z*gamma + xu*Exu
  u = vec(sum(zu,dims=2) + xu + randn(n))
  y = X*beta + u
  return(y,X,Z)
end

biv(y,X,Z) = (X'*Z*inv(Z'*Z)*Z'*X) \ (X'*Z*inv(Z'*Z)*Z'*y)

function J(y,X,Z)
  n = length(y)
  bhat = biv(y,X,Z)
  uhat = y - X*bhat
  C = inv(1/n*sum(z*z'*u^2 for (z,u) in zip(eachrow(Z),uhat)))
  Zu = Z'*uhat/n
  J = n*Zu'*C*Zu
end

S = 1_000
n = 100
j0s = [J(sim(n)...) for _ in 1:S]
j1s = [J(sim(n,EZu=[0.,0., 3.])...) for _ in 1:S]
j2s = [J(sim(n,EZu=[1.,1., 1.])...) for _ in 1:S]

plt = PlotlyLight.Plot()
plt(x=j0s, type="histogram", name="E[Zu] = 0")
plt(x=j1s, type="histogram", name="E[Zu] = [0,0,3]")
fig=plt(x=j2s, type="histogram", name="E[Zu] = [1,1,1]")

fig

┌ Warning: `Plot(; kw...)` is deprecated. Use `plot(; kw...)` instead.
│   caller = top-level scope at iv.qmd:246
└ @ Core ~/626/site/iv/iv.qmd:246

Weak Instruments

Simulated Distribution of \(\hat{\beta}^{IV}\)

First stage \(X = Z\gamma + e\), simulation with \(\Er[Z_i Z_i] = I\) and \(e \sim N(0,0.25)\), so first stage \(t \approx \sqrt{n}\gamma/0.5\)
Distribution of \(\hat{\beta}^{IV}\) with \(\gamma = 1\), \(\gamma=0.2\), and \(\gamma=0.1\)

Simulated Distribution of \(\hat{\beta}^{IV}\)

Code

function tiv(y,X,Z; b0 = ones(size(X,2)))
  b = biv(y,X,Z)
  u = y - X*b
  V = var(u)*inv(X'*Z*inv(Z'*Z)*Z'*X)
  (b - b0)./sqrt.(diag(V))
end
n = 100
S = 10_000
plt = PlotlyLight.Plot()
for g in [1, 0.2, 0.1]
  b = [tiv(sim(n,d=1,EZu=0,gamma=g)...)[1] for _ in 1:S]
  # crop outliers so figure looks okay
  b .= max.(b, -4)
  b .= min.(b, 4)
  plt(x=b, type="histogram",name="γ=$g")
end
fig=plt(x=randn(S), type="histogram", name="Normal")

fig

┌ Warning: `Plot(; kw...)` is deprecated. Use `plot(; kw...)` instead.
│   caller = top-level scope at iv.qmd:280
└ @ Core ~/626/site/iv/iv.qmd:280

Weak Instruments

Lessons from simulation:
- When \(\Er[Z_i X_i']\) is small, usual asymptotic distribution is a poor approximation for the finite sample distribution of \(\hat{\beta}^{IV}\)
- The approximation can be poor even when \(H_0: \gamma = 0\) in \(X = Z\gamma + e\) would be rejected
Can we find a better approximation to the finite sample distribution when \(\Er[Z_i X_i']\) is small?

Irrelevant Instrument Asymptotics

Suppose \(\Er[Z_i X_i'] = 0\)
CLT \[ \frac{1}{\sqrt{n}} \begin{pmatrix} vec(Z'X) \\ Z'u \end{pmatrix} \indist \begin{pmatrix} \zeta_1 \\ \zeta_2 \end{pmatrix} \sim N(0, \Sigma) \]
Then \[ \begin{align*} \hat{\beta}^{IV} - \beta_0 = & \left((Z'X)'(Z'Z)^{-1}(Z'X)\right)^{-1} (Z'X)'(Z'Z)^{-1}(Z'u) \\ \indist & \left(H' \Er[Z_i Z_i]^{-1} H\right)^{-1} \left(H \Er[Z_i Z_i']^{-1} \zeta_2\right) \end{align*} \] where \(vec(H) = \zeta_1\)

Weak Instrument Asymptotics

Let \(\Er[Z_i X_i'] = \frac{1}{\sqrt{n}} \Gamma\)
Then \(\frac{1}{\sqrt{n}} Z' X = \Gamma + H\)
and \[ \hat{\beta}^{IV} - \beta_0 \indist \left((\Gamma + H)' \Er[Z_i Z_i]^{-1} (\Gamma + H)\right)^{-1} \left((\Gamma + H) \Er[Z_i Z_i']^{-1} \zeta_2\right) \]
\(\Gamma\) cannot be estimated, but we can try to develop estimators and inference methods for \(\beta\) that work for any \(\Gamma\)

Testing for Relevance

Model , assume \(\Er[W_i u_i] = 0\) and \(\Er[Z_i u_i] = 0\) \[ Y_i = X_i'\beta + W_i'\beta_W + u_i \]
First stage \[ X_i = Z_i' \pi_z + W_i' \pi_W + \nu_i \]
Can test \(H_0 : \pi_z = 0\) vs \(H_1 : \pi_z \neq 0\) using F-test
- With one instrument, \(F = t^2\)
- Rejecting \(H_0\) at usual significance level is not enough for \(\hat{\beta}^{IV}\) to be well aproximated by its asymptotic normal distribution

Testing for Relevance

Stock and Yogo (2002) (table from Stock, Wright, and Yogo (2002)): first stage F > threshold \(\approx 10\) implies \(Bias(\hat{\beta}^{IV}) < 10\% Bias(\hat{\beta}^{OLS})\) and size of 5% test < 15%

swy-tab1.png

Testing for Relevance

Lee et al. (2022) : F\(>>10\) is needed in practice¹

Identification Robust Inference

Opinion: always do this, testing for relevance not needed
Test \(H_0: \beta = \beta_0\) vs \(\beta \neq \beta_0\) with Anderson-Rubin test \[ AR(\beta) = n\left(\frac{1}{n} Z'(y-X\beta) \right)' \Sigma(\beta)^{-1} \left(\frac{1}{n} Z'(y - X\beta)\right) \] where \(\Sigma(\beta) = \frac{1}{n} \sum_{i=1}^n Z_iZ_i' (y_i - X_i'\beta)^2\)
\(AR(\beta) \indist \chi^2_d\) (under either weak instrument or usual asymptotics)
See my other notes for simulations and references

Identification Robust Inference

Two downsides of AR test:
1. AR statistic is similar to over-identifying test (\(AR(\hat{\beta}^{IV}) = J\))
- Small (even empty) confidence region if model is misspecified
1. Only gives confidence region for all of \(\beta\), not confidence intervals for single co-ordinates
Kleibergen’s LM and Moreira CLR tests address 1, see my other notes for simulations and references

Identification Robust Inference

Various approaches to 2 see Andrews, Stock, and Sun (2019) for a review
- Londschien and Bühlmann (2024) seems like a promising approach, implemented in ivmodels python package (assumes homoscedasticity)
- Guggenberger, Kleibergen, and Mavroeidis (2024) and Tuvaandorj (2024) allow heteroscedasticity
If you want something close to the usual t-test and have 1 endogenous regression and 1 instrument, the tF test from Lee et al. (2022), or better yet, recently improved VtF test in Lee et al. (2023)

IV with Treatment Effect Heterogeneity

Model

\(Z_i \in \{0,1\}\)
\(D_i \in \{0,1\}\)
Potential treatments \(D_i(z)\)
Potential outcomes \(Y_i(d)\)
Exogenous instruments \(Y_i(0),Y_i(1), D_i(0), D_i(1) \indep Z_i\)

LATE

Wald estimator \[ \frac{\Er[Y_i | Z_i=1] - \Er[Y_i|Z_i=0]}{\Er[D_i|Z_i=1] - \Er[D_i|Z_i=0]} = \frac{\Er[Y_i(D_i(1))] - \Er[Y_i(D_i(0))]}{\Er[D_i(1)] - \Er[D_i(0)]} \text{ (exogeneity) \]

\[ = \frac{\Er[Y_i(D_i(1)) - Y_i(D_i(0)) | D_i(1) \neq D_i(0)] P(D_i(1) \neq D_i(0))} { P(D_i(1) > D_i(0)) - P(D_i(1) < D_i(0))} \]

Assume monotonicity \(P(D_i(1)<D_i(0)) = 0\), then \[ \frac{\Er[Y_i | Z_i=1] - \Er[Y_i|Z_i=0]}{\Er[D_i|Z_i=1] - \Er[D_i|Z_i=0]} = \Er[Y_i(1) - Y_i(0) | D_i(1)=1, D_i(0) = 0 ] \]
local average treatment effect

IV=LATE

With single binary \(Z\) and \(D\), \[ \begin{align*} \hat{\beta}^{IV} & = \frac{\sum Y_i(Z_i - \bar{Z})} {\sum D_i(Z_i - \bar{Z})} \\ & \inprob \Er[Y_i(1) - Y_i(0) | D_i(1)=1, D_i(0) = 0 ] \end{align*} \]

How general is this interpretation?
- Multi-valued \(D\)?
- Multi-values or multiple \(Z\)?
- Exogenous controls \(X\)?
Can salvage some LATE like intrepretation with multiple treatments or instruments, but monotonocity assumption needs to be stronger
- Mogstad and Torgovitsky (2024) for comprehensive review

Controls

Conditional exogeneity: \(Y_i(0),Y_i(1), D_i(0), D_i(1) \indep Z_i | X_i\)
Estimate \[ y_i = D_i \beta + X_i'\gamma + \epsilon_i \] by 2SLS
Partial out \(X\) to show \[ \hat{\beta}^{IV} = \frac{\sum y_i \tilde{Z}_i}{\sum D_i \tilde{Z}_i} \] where \(\tilde{Z}_i = Z_i - X_i' (X'X)^{-1} X'Z\)

2SLS with Controls

\[ \begin{align*} \hat{\beta}^{IV} = & \frac{\sum y_i \tilde{Z}_i}{\sum D_i \tilde{Z}_i} \\ \inprob & \frac{\Er[Y_i \tilde{Z}_i]}{\Er[D_i \tilde{Z}_i]} \\ = & \frac{\Er[\cov(Y_i \tilde{Z}_i|X_i)] + \Er[\Er[Y_i|X_i]\Er[\tilde{Z}_i|X_i]]}{\Er[D_i \tilde{Z}_i]} \end{align*} \] - If \(\Er[\Er[Y_i|X_i]\Er[\tilde{Z}_i|X_i]] = 0\), we get average of \(X\) specific LATEs - But unless \(\Er[Z_i|X_i]\) is linear, \(\Er[\Er[Y_i|X_i]\Er[\tilde{Z}_i|X_i]] \neq 0\)

2SLS with Controls is not LATE

Blandhol et al. (2022) show \[ \begin{align*} \beta^{IV} \inprob & \Er\left[\omega(cp,X)\Er[Y(1) - Y(0) |D(1)>D(0), X] \right] + \\ & + \Er\left[\omega(at,X)\Er[Y(1) - Y(0) |D(1)=D(0)=1, X] \right] \end{align*} \] with \[ \begin{align*} \omega(cp,X) = & \Er[Z|X](1 - L[Z|X])P(D(1)>D(0)|X)\Er[\tilde{Z}D]^{-1} \\ \omega(at,X) = & \Er[\tilde{Z}|X] P(D(1)=D(0)=1|X)\Er[\tilde{Z}D]^{-1} \end{align*} \]
- \(\Er[\tilde{Z}] = 0\), so unless \(\Er[\tilde{Z}|X] = 0\), it will sometimes be negative

Simulation: Low Bias

using Plots, Statistics, Distributions, Printf

function sim(n; ezx = x->cdf(Normal(),x), Δ = x->x^2, covde=1, vare=2)
  xd = randn(n)
  x = randn(n) + xd
  de = randn(n)
  z = rand(n) .< ezx.(x)
  derr = randn(n)
  d = (xd + derr + z + de .> 0)
  d1 = (xd + derr .+ 1 + de .> 0)
  d0 = (xd + derr .+ 0 + de .> 0)
  ϵ = de*covde + randn(n)*sqrt(vare-covde^2)
  y = (Δ.(x) + de).*d + ϵ

  return(y=y,x=x,z=z,d=d,Δ=(Δ.(x) + de), d0=d0, d1=d1)
end

function bols(y,d,x)
  n = length(y)
  X = hcat(ones(n), d, x)
  return((X'*X) \ X'*y)
end

function b2sls(y,d,x,z)
  n = length(y)
  Z = hcat(ones(n), z, x)
  X = hcat(ones(n), d, x)
  iZZ = inv(Z'*Z)
  XZ = X'*Z
  return((XZ*iZZ*XZ') \ (XZ*iZZ*(Z'*y)))
end


function plotTE(y,d,x,z,Δ,d0,d1; ezx=x->cdf(Normal(),x))
  te=scatter(x,Δ, group=[(t0,t1) for (t0,t1) in zip(d0,d1)], alpha=1.0, markersize=1,markerstrokewidth=0)
  xlabel!("x")
  ylabel!("Treatment Effect")
  title!("Treatment Effects")
  xy=scatter(x,y,group=d,markersize=1,markerstrokewidth=0)
  xlabel!("x")
  ylabel!("y")
  title!("Observed Data")
  xs = sort(x)
  pz=plot(xs,ezx.(xs), xlabel="x",ylabel="P(Z=1|X)",title="P(Z|X)",legend=:none)
  n = length(z)
  X = hcat(ones(n),x)
  lzx = X*inv(X'*X)*X'*z
  scatter!(x,lzx,label="L[Z|X]",markersize=1,markerstrokewidth=0,alpha=0.5)

  bo = bols(y,d,x)[2]
  bi = b2sls(y,d,x,z)[2]
  LATE = mean(Δ[d1.>d0])
  numbers=plot(xlims=(0,1),ylims=(0,1), axis=([], false))
  annotate!([(0,0.8,(@sprintf("E[y1-y0|d1>d0] = %.2f",LATE),:left)),
             (0,0.6,(@sprintf("βols = %.2f",bo),:left)),
             (0,0.4,(@sprintf("βiv = %.2f",bi),:left))])


  plot(xy,te,pz,numbers)
end

y,x,z,d,Δ,d0,d1 = sim(5_000, Δ=x->1)
plotTE(y,d,x,z,Δ,d0,d1)

Simulation: Low Bias

ezx = x->cdf(Normal(),x/10)
y,x,z,d,Δ,d0,d1 = sim(5_000, Δ=x->1+x^3/10, ezx = ezx)
plotTE(y,d,x,z,Δ,d0,d1,  ezx = ezx)

Simulation: High Bias

y,x,z,d,Δ,d0,d1 = sim(5_000, Δ=x->1+x^3/10)
plotTE(y,d,x,z,Δ,d0,d1)

Observations

Nonlinearity in \(\Er[Z|X]\) and \(\Er[Y|X]\) can lead to substantial bias in 2SLS

What to do?

Flexibly control for \(X\)
If discrete, saturated regression
Otherwise, double robust estimator for average conditional LATE
- Chernozhukov et al. (2024) chapter 13, doubleml python & R package

References

Andrews, Isaiah, James H. Stock, and Liyang Sun. 2019. “Weak Instruments in Instrumental Variables Regression: Theory and Practice.” Annual Review of Economics 11 (1): 727–53. https://doi.org/10.1146/annurev-economics-080218-025643.

Blandhol, Christine, John Bonney, Magne Mogstad, and Alexander Torgovitsky. 2022. “When Is TSLS Actually Late?”

Chernozhukov, V., C. Hansen, N. Kallus, M. Spindler, and V. Syrgkanis. 2024. Applied Causal Inference Powered by ML and AI. https://causalml-book.org/.

Guggenberger, Patrik, Frank Kleibergen, and Sophocles Mavroeidis. 2024. “A POWERFUL SUBVECTOR ANDERSON–RUBIN TEST IN LINEAR INSTRUMENTAL VARIABLES REGRESSION WITH CONDITIONAL HETEROSKEDASTICITY.” Econometric Theory 40 (5): 957–1002. https://doi.org/10.1017/S0266466622000627.

Keane, Michael, and Timothy Neal. 2023. “Instrument Strength in IV Estimation and Inference: A Guide to Theory and Practice.” Journal of Econometrics 235 (2): 1625–53. https://doi.org/https://doi.org/10.1016/j.jeconom.2022.12.009.

Lee, David S, Justin McCrary, Marcelo J. Moreira, and Jack Porter. 2022. “Valid t-Ratio Inference for IV.” American Economic Review 112 (10): 3260–90. https://doi.org/10.1257/aer.20211063.

Lee, David S, Justin McCrary, Marcelo J Moreira, Jack R Porter, and Luther Yap. 2023. “What to Do When You Can’t Use ’1.96’ Confidence Intervals for IV.” Working Paper 31893. Working Paper Series. National Bureau of Economic Research. https://doi.org/10.3386/w31893.

Londschien, Malte, and Peter Bühlmann. 2024. “Weak-Instrument-Robust Subvector Inference in Instrumental Variables Regression: A Subvector Lagrange Multiplier Test and Properties of Subvector Anderson-Rubin Confidence Sets.” https://arxiv.org/abs/2407.15256.

Mogstad, Magne, and Alexander Torgovitsky. 2024. “Instrumental Variables with Unobserved Heterogeneity in Treatment Effects.” Working Paper 32927. Working Paper Series. National Bureau of Economic Research. https://doi.org/10.3386/w32927.

Song, Kyunchul. 2021. “Introduction to Econometrics.”

Stock, James H, Jonathan H Wright, and Motohiro Yogo. 2002. “A Survey of Weak Instruments and Weak Identification in Generalized Method of Moments.” Journal of Business & Economic Statistics 20 (4): 518–29. https://doi.org/10.1198/073500102288618658.

Stock, James H, and Motohiro Yogo. 2002. “Testing for Weak Instruments in Linear IV Regression.” Working Paper 284. Technical Working Paper Series. National Bureau of Economic Research. https://doi.org/10.3386/t0284.

Tuvaandorj, Purevdorj. 2024. “Robust Permutation Tests in Linear Instrumental Variables Regression.” Journal of the American Statistical Association 0 (ja): 1–24. https://doi.org/10.1080/01621459.2024.2412363.

Instrumental Variables Estimation

Reading

Instrumental Variables

Model

Identification

Estimation

Method of Moments Estimation

Method of Moments Estimation

Method of Moments Estimation

Asymptotic Properties

Consistency

Asymptotic Normality

Optimal \(W\)

Two Stage Least Squares

Testing Overidentifying Restrictions

Testing Overidentifying Restrictions

Over-identifying Test

Weak Instruments

Simulated Distribution of \(\hat{\beta}^{IV}\)

Simulated Distribution of \(\hat{\beta}^{IV}\)

Weak Instruments

Irrelevant Instrument Asymptotics

Weak Instrument Asymptotics

Testing for Relevance

Testing for Relevance

Testing for Relevance

Identification Robust Inference

Identification Robust Inference

Identification Robust Inference

Further Reading

IV with Treatment Effect Heterogeneity

Model

LATE

IV=LATE

Controls

2SLS with Controls

2SLS with Controls is not LATE

Simulation: Low Bias

Simulation: Low Bias

Simulation: Low Bias

Simulation: High Bias

Observations

What to do?

Further Reading

References