ECON 626: Problem Set 5
\[ \def\indep{{\perp\!\!\!\perp}} \def\R{{\mathbb{R}}} \def\Er{{\mathrm{E}}} \newcommand\norm[1]{\left\lVert#1\right\rVert} \]
Problem 1
Suppose \(X_n = O_p(a_n)\) and \(Y_n - c = O_p(a_n)\) for \(c \neq 0\) and \(a_n \to 0\). Show that \(\frac{X_n}{Y_n} = O_p(a_n)\).
Hint: \(\frac{X_n}{Y_n} = \frac{X_n}{c} + \frac{X_n}{Y_n} (c - Y_n) \frac{1}{c}\), use Lemma 3.3 and Exericse 3.1 from Song (2021).
Problem 2
Suppose that each individual person \(i\) has potential outcomes \(Y_i(1)\) and \(Y_i(0)\) depending on the treated state or the untreated state. The average treatment effect is defined to be \[ \tau = \Er[Y_i(1) - Y_i(0)]. \] We assume that the econometrician observes \((Y_i,D_i)\), where \(Y_i = D_i Y_i(1) + (1 - D_i)Y_i(0)\), \(D_i\) is a binary variable representing the treatment status, and that \((Y_(1),Y_i(0))\) is independent of \(D_i\). Finally, assume that \(0 < \Er[D_i] <1\), \(Y_i \in [0,1]\) for all \(i =1,...,n\), and that \((Y_i(1),Y_i(0),D_i)\) are i.i.d. across \(i\)’s.
Show that \[ \hat{\tau} = \frac{\sum_{i=1}^n Y_i D_i}{\sum_{i=1}^n D_i} - \frac{\sum_{i=1}^n Y_i (1-D_i)}{\sum_{i=1}^n (1-D_i)} \] is a consistent estimator for \(\tau\).
Show that \(\hat{\tau} - \tau = O_p(n^{-1/2})\)
Show that \(\sqrt{n}(\hat{\tau}-\tau) \to^d N(0,\sigma_\tau^2)\) and calculate \(\sigma_\tau^2\).
Problem 3
Consider the simple regression model \[ y_i = \beta_0 + \beta_1 x^*_i + \epsilon_i \] Assume that \(\var(x^*)>0\) and \(\Er[x^* \epsilon] = 0\). However, \(x^*\) is measured with error. Instead of observing \(x^*\), you observe \(x_i = x^*_i + u_i\). Assume that \(\Er[u] = 0\), \(\Er[u x^*] =0\) and \(\Er[\epsilon u] = 0\).
1
Find \(\plim \hat{\beta}_1\), where \(\hat{\beta}_1\) is the OLS etimator.
2
In ``Do Low Levels of Blood Lead Reduce Children’s Future Test Scores?’’ Aizer et al. (2018) examine the relationship between blood lead levels and children’s test scores. Table 4 shows estimates of \(\beta_1\) from regressions of
\[ test_i = \beta_0 + \beta_1 lead_i + \text{ other controls} + \epsilon_i \tag{1}\]
where \(test_i\) is a 3rd grade reading or math test and \(lead_i\) is a blood lead level measurement taken before the age of six. Some children had multiple measurements of their blood lead levels taken. Each blood lead level measurement has some error. In comparing columns (1) and (2), note that venous tests are known to have less measurement error than capillary tests, and in comparing columns (3) and (4) the average of all blood lead levels has less measurement error than a single one. Are the changes in the estimates across columns what you would expect from part part 1? Why or why not?

3
Suppose \(z_i\) is a second measurement of \(x^*\), [ z_i = x^*_i + e_i ] with \(\Er[e] = 0\), \(\Er[x^* e] = 0\), \(\Er[\epsilon e] = 0\) and \(\Er[e u] = 0\). Show that \[ \hat{\beta}_1^{IV} = \frac{\sum_{i=1}^n (z_i - \bar{z}) y_i} {\sum_{i=1}^n (z_i - \bar{z}) x_i} \] is a consistent estimate of \(\beta_1\).
4
Table 5 from Aizer et al. (2018), shows additional estimates of the model Equation 1. Column (1) shows standard multiple regression estimates. Columns (2) and (3) show estimates using the estimator \(\hat{\beta}_1^{IV}\) from part 3. Is the change in the estimates between columns (1) and (2) what you would expect based on parts 1 and 3? Why or why not?
