ECON 626: Midterm

Published

October 16, 2024

Problem 1

\[ \def\R{{\mathbb{R}}} \def\Er{{\mathrm{E}}} \def\var{{\mathrm{Var}}} \newcommand\norm[1]{\left\lVert#1\right\rVert} \def\cov{{\mathrm{Cov}}} \def\En{{\mathbb{En}}} \def\rank{{\mathrm{rank}}} \newcommand{\inpr}{ \overset{p^*_{\scriptscriptstyle n}}{\longrightarrow}} \def\inprob{{\,{\buildrel p \over \rightarrow}\,}} \def\indist{\,{\buildrel d \over \rightarrow}\,} \DeclareMathOperator*{\plim}{plim} \]

Suppose $y_i^* = x_i' \beta + \epsilon_i$ for $i=1, ..., n$ with $x_i \in \R^k$. Also suppose $y_i^*$ is not always observed. Instead, you only observe $y_i = y_i^*$ if $o_i = 1$. The observed data is $\left\lbrace\left(x_i, o_i, y_i = \begin{cases} y_i^* & \text{ if } o_i = 1 \\ \text{missing} & \text{ otherwise} \end{cases}\right)\right\rbrace_{i=1}^n$. Throughout, assume that observations for different $i$ are independent and identically distributed, $\Er[\epsilon_i | x_i] = 0$, and $\Er[x_ix_i' o_i]$ is nonsingular.

Identification

Show that if $\epsilon_i$ is independent of $o_i$ conditional on $x_i$, then $\beta$ is identified.
No longer assuming $\epsilon_i$ is independent of $o_i$, show that $\beta$ is not identified by finding an observationally equivalent $\tilde{\beta}$. [Hint: suppose $\Er[o | x] < 1$ and consider $\tilde{\epsilon} = \begin{cases} x_i' (\beta - \tilde{\beta}) + \epsilon_i & \text{ if } o_i = 1 \\ x_i'(\tilde{\beta} - \beta) \frac{\Er[o|x_i]}{\Er[1-o|x_i]} + \epsilon_i & \text{ if } o_i = 0 \end{cases}$. You should verify that this $\tilde{\epsilon}$ still meets the assumption that $\Er[\tilde{\epsilon} | x_i] = 0$.]
Now suppose you observe $z_i \in \R^k$ such that $\Er[o_i z_i x_i']$ is nonsingular and $\Er[\epsilon z_i | o_i = 1] = 0$. Show that $\beta$ is identified.

Solution.

Since $\epsilon$ is independent of $o_i$ given $x_i$, we have \[ \Er[x_i o_i \epsilon ] = \Er[x_i o_i \Er[\epsilon_i | x_i, o_i]] = \Er[x_i o_i \Er[\epsilon_i | x_i]] = 0. \] We can use this for identification. \[ \begin{align*} \Er[x_i o_i (y_i - x_i'\beta)] & = 0 \\ \beta = & \Er[x_ix_i'o_i]^{-1} \Er[x_i y_i o_i] \end{align*} \]
As suggested by the hint, given any observed $\{y_i, x_i, o_i\}$ created by $\beta$ and $\epsilon_i$, note that \[ \begin{align*} y_i & = x_i'\beta + \epsilon_i \text{ if } o_i = 1 \\ & = x_i'\tilde{\beta} + \underbrace{x_i'(\beta - \tilde{\beta}) + \epsilon_i}_{\tilde{\epsilon}_i} \text{ if } o_i = 1 \end{align*} \] so changing $\beta$ to $\tilde{\beta}$ and $\epsilon$ to $\tilde{\epsilon}$ leaves the observed data unchanged. Also, \[ \begin{align*} \Er[\tilde{\epsilon} | x] & = \Er[o(x'(\beta-\tilde{\beta}) + (1-o)x'(\tilde{\beta} - \beta) \frac{\Er[o|x]}{\Er[1-o|x]} + \epsilon | x] \\ = & \Er[o|x](x'(\beta-\tilde{\beta}) + \Er[(1-o)|x]x'(\tilde{\beta} - \beta) \frac{\Er[o|x]}{\Er[1-o|x]} \\ = & 0 \end{align*} \] so $\tilde{\epsilon}$ still meets the restriction that $\Er[\tilde{\epsilon}|x]=0$.
Similar to part 1, \[ \Er[o_i z_i \epsilon ] = \Er[o_i \Er[z_i \epsilon | o_i]] = 0 \] so \[ \begin{align*} \Er[z_i o_i (y_i - x_i'\beta)] & = 0 \\ \beta = & \Er[z_ix_i'o_i]^{-1} \Er[z_i y_i o_i] \end{align*} \] identifies $\beta$.

Estimation

Construct a sample estimator for $\beta$ based on your answer to 1.a.1 and show that it is unbiased.
Suppose that $\epsilon_i \sim N(0, 1)$, independent of $o_i$ and $x_i$, and that $P(o_i=1|x_i) = g(x_i'\alpha)$ for some known function $g$ and unknown parameter $\alpha$. Write the loglikelihood for $(\alpha,\beta)$, and show that $\hat{\beta}^{MLE}$ does not depend on $\alpha$. Remember that the normal pdf is $\phi(z) = \frac{1}{\sqrt{2\pi}} e^{-x^2/2}$

Solution.

We have \[ \begin{align*} \Er[\hat{\beta}] & = \Er\left[ (\sum o_i x_i x_i')^{-1} (\sum o_i x_i y_i) \right] \\ & = \Er\left[ (\sum o_i x_i x_i')^{-1} (\sum o_i x_i (x_i'\beta + \epsilon_i)) \right] \\ & = \beta + \Er\left[ (\sum o_i x_i x_i')^{-1} (\sum o_i x_i \epsilon_i) \right] \\ & = \beta + \Er\left[ (\sum o_i x_i x_i')^{-1} (\sum o_i x_i \Er[\epsilon_i|x_1,...,x_n, o_1,...,o_n]) \right] \\ & = \beta \end{align*} \]
The log likelihood is \[ \log \ell(\alpha,\beta) = \sum_{i=1}^n \left( o_i\left(\frac{-\log(2\pi)}{2} - \frac{(y_i - x_i'\beta)^2}{2} + \log(g(x_i'\alpha))\right) + (1-o_i)\log(1-g(x_i'\alpha)) \right) \] The first order condition for $\hat{\beta}^{MLE}$ is \[ 0 = \sum o_i(y_i -x_i'\hat{\beta}^{MLE}) x_i \] which does not depend on $\alpha$.

Efficiency

As in the previous part, suppose that $\epsilon_i \sim N(0, 1)$, independent of $o_i$ and $x_i$, and that $P(o_i=1|x_i) = g(x_i'\alpha)$ for some known function $g$ and unknown parameter $\alpha$.

Derive the Cramér-Rao lower bound for $\theta = (\alpha,\beta)$
Now, treat $\alpha$ as known and derive the Cramér-Rao lower bound for just $\beta$. Does knowing $\alpha$ help with estimating $\beta$?

Solution.

The score for $\alpha$ is $\frac{\partial}{\partial \alpha}\ell(\alpha,\beta) = \sum_{i=1}^n \frac{o_i}{g(x_i'\alpha)} g'(x_i'\alpha) x_i - \frac{1-o_i}{1-g(x_i'\alpha)} g'(x_i'\alpha) x_i$

The Hessian is \[ H = \begin{pmatrix} -\sum_{i=1}^n x_i x_i' & 0 \\ 0 & A_n(\alpha) \end{pmatrix} \] where $A_n(\alpha)$ is the derivative of the score for $\alpha$. We can see from the score for $\alpha$ that it does not depend on $\beta$. The Cramer Rao lower bound is $^{-1} = \[\begin{pmatrix} \Er[n x_i x_i']^{-1} & 0 \\ 0 & A_n(\alpha)^{-1} \end{pmatrix}\]

Now the Cramer Rao lower bound is just $\Er[n x_i x_i']^{-1}$, which is the same as if $\alpha$ were unknown. Knowing $\alpha$ does not help estimate $\beta$ in this situation.

Problem 2

Consider the probability space with sample space $\Omega = \{a, b, c, d\}$, sigma field $\mathscr{F} = 2^\Omega$, the power set of $\Omega$, and probability measure $P$.

Generated $\sigma$-fields

Let $X(\omega) = \begin{cases} 1 & \text{ if } \omega = a \text{ or } b \\ 0 & \text{ otherwise} \end{cases}$. List the elements of $\sigma(X)$.

Solution. $\sigma(X) = \{ \{a,b\}, \{c,d\}, \Omega, \emptyset \}$.

Independence

Let $Y(\omega) = \begin{cases} 1 & \text{ if } \omega = b \text{ or } c \\ 0 & \text{ otherwise} \end{cases}$. Can $X$ and $Y$ be independent?

Solution. Yes, for example if $P(a) = P(b) = P(c) = P(d) = 1/4$, then $P(X=x,Y=y) = 1/4 = P(X=x)P(Y=y)$ for $x, y \in \{0,1\}^2$.

Testing

(For this and subsequent parts, ignore any restrictions on $\theta_x$ and $\theta_y$ implied by the sample space and form of $X$ and $Y$ in the first two parts).

Suppose you observe an independent and identically distributed sample of $\{(x_i, y_i)\}_{i=1}^n$. For each $i$, $x_i = 1$ with probability $\theta_x$ and 0 otherwise, and $y_i = 1$ with probability $\theta_y$ and 0 otherwise. Assume $x_i$ is independent of $y_i$.

Find the most powerful test for testing $H_0: (\theta_x, \theta_y) = (\theta_x^0, \theta_y^0)$ against $H_1: (\theta_x, \theta_y) = (\theta_x^1, \theta_y^1)$.
Show that there is a most powerful test for testing $H_0: \theta_x = \theta_x^0$ against $H_1: \theta_x = \theta_x^1$, where under the null and alternative, $\theta_y$ is unrestricted.

Solution.

By the Neyman-Pearson lemma, the likelihood ratio test is most powerful. The log likelihood ratio is \[ \begin{align*} \tau = \sum_{i=1}^n & x_i(\log(\theta_x^1) - \log(\theta_x^0)) + (1-x_i)(\log(1-\theta_x^1)-\log(1-\theta_x^0)) + \\ & + y_i(\log(\theta_y^1) - \log(\theta_y^0)) + (1-y_i)(\log(1-\theta_y^1)-\log(1-\theta_y^0)) \\ = & n_x(\log(\theta_x^1) - \log(\theta_x^0)) + (n-n_x)(\log(1-\theta_x^1)-\log(1-\theta_x^0)) + \\ & + n_y(\log(\theta_y^1) - \log(\theta_y^0)) + (n-n_y)(\log(1-\theta_y^1)-\log(1-\theta_y^0)) \\ \end{align*} \]

where $n_x = \sum_{i=1}^n x_i$ and $n_y = \sum_{i=1}^n y_i$. For a test of size $\alpha$, we would find $c$ such that $P(\tau>c|H_0) = \alpha$ and reject if $\tau > c$.

We can interpret this as testing the point null and alternative $H_0: \theta_x = \theta_x^0, \theta_y = \theta_y^0$, against $H_1: \theta_x = \theta_x^1, \theta_y = \theta_y^1$ with $\theta_y^1 = \theta_y^0$. In that case, the test statistic becomes \[ \tau = n_x(\log(\theta_x^1) - \log(\theta_x^0)) + (n-n_x)(\log(1-\theta_x^1)-\log(1-\theta_x^0)) \] and importantly does not depend on $\theta_y$ nor $n_y$. Thus, the same likelihood ratio test will be most powerful for any $\theta_y$.

Behavior of Averages

Note that $\Er[x_i] = \theta_x$ and $\Er[(x_i - \theta_x)^2] = \theta_x(1-\theta_x)$. Use Markov’s inequality to show that \[ P\left(\left\vert \frac{1}{n} \sum_{i=1}^n x_i - \theta_x \right\vert > \epsilon \right) \leq \frac{\theta_x(1-\theta_x)}{n \epsilon^2}. \]
Show that \[ \lim_{n \to \infty} P\left(\left\vert \frac{1}{n} \sum_{i=1}^n x_i - \theta_x \right\vert > \epsilon \right) = 0. \]

Solution.

This is Markov’s inequality with $k=2$, because $\Er[\left(\frac{1}{n} \sum_{i=1}^n x_i - \theta_x \right)^2] = \frac{\theta_x (1-\theta_x)}{n}$.
Taking limits of the previous part directly gives this conclusion.

Definitions and Results

Measure and Probability:
- A collection of subsets, $\mathscr{F}$, of $\Omega$ is a $\sigma$-field , if
  1. $\Omega \in \mathscr{F}$
  2. If $A \in \mathscr{F}$, then $A^c \in \mathscr{F}$
  3. If $A_1, A_2, ... \in \mathscr{F}$, then $\cup_{j=1}^\infty A_j \in \mathscr{F}$
- A measure, $\mu: \mathcal{F} \to [0, \infty]$ s.t.
  1. $\mu(\emptyset) = 0$
  2. If $A_1, A_2, ... \in \mathscr{F}$ are pairwise disjoint, then $\mu\left(\cup_{j=1}^\infty A_j \right) = \sum_{j=1}^\infty \mu(A_j)$
- Lesbegue integral is
  1. Positive: if $f \geq 0$ a.e., then $\int f d\mu \geq 0$
  2. Linear: $\int (af + bg) d\mu = a\int f d\mu + b \int g d \mu$
- Radon-Nikodym derivative: if $\nu \ll \mu$, then $\exists$ nonnegative measureable function, $\frac{d\nu}{d\mu}$, s.t. \[ \nu(A) = \int_A \frac{d\nu}{d\mu} d\mu \]
- Monotone convergence: If $f_n:\Omega \to \mathbf{R}$ are measurable, $f_{n}\geq 0$, and for each $\omega \in \Omega$, $f_{n}(\omega )\uparrow f(\omega )$, then $\int f_{n}d\mu \uparrow \int fd\mu$ as $n\rightarrow \infty$
- Dominated converegence: If $f_n:\Omega \to \mathbf{R}$ are measurable, and for each $\omega \in \Omega$, $f_{n}(\omega )\rightarrow f(\omega ).$ Furthermore, for some $g\geq 0$ such that $\int gd\mu <\infty$, $|f_{n}|\leq g$ for each $n\geq 1$. Then, $\int f_{n}d\mu \rightarrow \int fd\mu$
- Markov’s inequality: $P(|X|>\epsilon) \leq \frac{\Er[|X|^k]}{\epsilon^k}$ $\forall \epsilon > 0, k > 0$
- Jensen’s inequality: if $g$ is convex, then $g(\Er[X]) \leq \Er[g(X)]$
- Cauchy-Schwarz inequality: $\left(\Er[XY]\right)^2 \leq \Er[X^2] \Er[Y^2]$
- $\sigma(X)$ is $\sigma$-field generated by $X$, it is
  - smallest $\sigma$-field w.r.t. which $X$ is measurable
  - $\sigma(X) = \{X^{-1}(B): B \in \mathscr{B}(\R)\}$
- $\sigma(W) \subset \sigma(X)$ iff $\exists$ $g$ s.t. $W = g(X)$
- Events $A_1, ..., A_m$ are independent if for any sub-collection $A_{i_1}, ..., A_{i_s}$ \[ P\left(\cap_{j=1}^s A_{i_j}\right) = \prod_{j=1}^s P(A_{i_j}) \] $\sigma$-fields are independent if this is true for any events from them. Random variables are independent if their $\sigma$-fields are.
- Conditional expection of $Y$ given $\sigma$-field $\mathscr{G}$ satisfies $\int_A \Er[Y|\mathscr{G}] dP = \int_A Y dP$ $\forall A \in \mathscr{G}$
Identification $X$ observed, distribution $P_X$, probability model $\mathcal{P}$
- $\theta_0 \in \R^k$ is identified in $\mathcal{P}$ if there exists a known $\psi: \mathcal{P} \to \R^k$ s.t. $\theta_0 = \psi(P_X)$
- $\mathcal{P} = \{ P(\cdot; s) : s \in S \}$, two structures $s$ and $\tilde{s}$ in $S$ are observationally equivalent if they imply the same distribution for the observed data, i.e. \[ P(B;s) = P(B; \tilde{s}) \] for all $B \in \sigma(X)$.
- Let $\lambda: S \to \R^k$, $\theta$ is observationally equivalent to $\tilde{\theta}$ if $\exists s, \tilde{s} \in S$ that are observationally equivalent and $\theta = \lambda(s)$ and $\tilde{\theta} = \lambda(\tilde{s})$
- $s_0 \in S$ is identified if there is no $s \neq s_0$ that is observationally equivalent to $s_0$
- $\theta_0$ is identified (in $S$) if there is no observationally equivalent $\theta \neq \theta_0$

Cramér-Rao Bound: in the parametric model $P_X \in \{P_\theta: \theta \in \R^d\}$ with likelihood $\ell(\theta;x)$, if appropriate derivatives and integrals can be interchanged, then for any unbiased estimator $\tau(X)$, \[ \var_\theta(\tau(X)) \geq I(\theta)^{-1} \] where $I(\theta) = \int s(x,\theta) s(x,\theta)' dP_\theta(x) = \Er[H(x,\theta)]$ and $s(x,\theta) = \frac{\partial \log \ell(\theta;x)}{\partial \theta}$
Hypothesis testing:
- $P(\text{reject } H_0 | P_x \in \mathcal{P}_0)$=Type I error rate $=P_x(C)$
- $P(\text{fail to reject } H_0 | P_x \in \mathcal{P}_1)$=Type II error rate
- $P(\text{reject } H_0 | P_x \in \mathcal{P}_1)$ = power
- $\sup_{P_x \in \mathcal{P}_0} P_x(C)$ = size of test
- Neyman-Pearson Lemma: Let $\Theta = \{0, 1\}$, $f_0$ and $f_1$ be densities of $P_0$ and $P_1$, $\tau(x) =f_1(x)/f_0(x)$ and $C^* =\{x \in X: \tau(x) > c\}$. Then among all tests $C$ s.t. $P_0(C) = P_0(C^*)$, $C^*$ is most powerful.
Projection: $P_L y \in L$ is the projection of $y$ on $L$ if \[ \norm{y - P_L y } = \inf_{w \in L} \norm{y - w} \]
1. $P_L y$ exists, is unique, and is a linear function of $y$
2. For any $y_1^* \in L$, $y_1^* = P_L y$ iff $y- y_1^* \perp L$
3. $G = P_L$ iff $Gy = y \forall y \in L$ and $Gy = 0 \forall y \in L^\perp$
4. Linear $G: V \to V$ is a projection map onto its range, $\mathcal{R}(G)$, iff $G$ is idempotent and symmetric.
Gauss-Markov: $Y = \theta + u$ with $\theta \in L \subset \R^n$, a known subspace. If $\Er[u] = 0$ and $\Er[uu'] = \sigma^2 I_n$, then the best linear unbiased estimator (BLUE) of $a'\theta = a'\hat{\theta}$ where $\hat{\theta} = P_L y$

Problem 1

Identification

Estimation

Efficiency

Problem 2

Generated \(\sigma\)-fields

Independence

Testing

Behavior of Averages

Definitions and Results