ECON 626: Midterm Review

Published

October 24, 2022

\[ \def\R{{\mathbb{R}}} \def\Er{{\mathrm{E}}} \def\var{{\mathrm{Var}}} \newcommand\norm[1]{\left\lVert#1\right\rVert} \def\cov{{\mathrm{Cov}}} \]

List of Definitions and Results

Measure Theory

Measure Space

  1. A set \(\Omega\)
  2. A collection of subsets, \(\mathscr{F}\), of \(\Omega\) that is a \(\sigma\)-field (aka \(\sigma\)-algebra) , that is
    1. \(\Omega \in \mathscr{F}\)
    2. If \(A \in \mathscr{F}\), then \(A^c \in \mathscr{F}\)
    3. If \(A_1, A_2, ... \in \mathscr{F}\), then \(\cup_{j=1}^\infty A_j \in \mathscr{F}\)
  3. A measure, \(\mu: \mathcal{F} \to [0, \infty]\) s.t.
    1. \(\mu(\emptyset) = 0\)
    2. If \(A_1, A_2, ... \in \mathscr{F}\) are pairwise disjoint, then \(\mu\left(\cup_{j=1}^\infty A_j \right) = \sum_{j=1}^\infty \mu(A_j)\)

Given a topology on \(\Omega\), the Borel \(\sigma\)-field, \(\mathscr{B}(\Omega)\), is the smallest \(\sigma\)-field containing all open subsets of \(\Omega\).

  • \(f: \Omega \to \mathbf{R}\) is (\(\mathscr{F}\)-)measurable if \(\forall\) \(B \in \mathscr{B}(\mathbf{R})\), \(f^{-1}(B) \in \mathscr{F}\)
Tip

Chapter 1, exercises 1.1, 1.2,

Lesbegue Integral

The Lesbegue integral satisfies:

  1. If \(f \geq 0\) a.e., then \(\int f d\mu \geq 0\)

  2. Linearity: \(\int (af + bg) d\mu = a\int f d\mu + b \int g d \mu\)

  • Measure \(\nu\) is absolutely continuous with respect to \(\mu\) if for \(A \in \mathscr{F}\), \(\mu(A) = 0\) implies \(\nu(A) = 0\)
    • denotate as \(\nu \ll \mu\)
    • \(\mu\) is called a dominating measure

Radon-Nikodym Derivative

Let \((\Omega,\mathscr{F},\mu)\) be a measure space, and let \(\nu\) and \(\mu\) be \(\sigma\)-finite measures defined on \(\mathscr{F}\) and \(\nu \ll \mu\). Then there is a nonnegative measurable function \(f\) such that for each set \(A\in \mathscr{F}\), \[ \nu (A)=\int_{A}fd\mu \] For any such \(f\) and \(g\), \(\mu (\{\omega \in \Omega:f(\omega )\neq g(\omega )\})=0\)

Tip

Chapter 1, exercises 3.1, 3.2

Convergence Theorems

Continuity of Measure

Suppose that \(\{E_{n}\}\) is a monotone sequence of events. Then \[ \mu \left( \lim_{n\rightarrow \infty}E_{n}\right) =\lim_{n\rightarrow \infty }\mu (E_{n}). \]

Monotone Convergence Theorem

If \(f_n:\Omega \to \mathbf{R}\) are measurable, \(f_{n}\geq 0\), and for each \(\omega \in \Omega\), \(f_{n}(\omega )\uparrow f(\omega )\), then \(\int f_{n}d\mu \uparrow \int fd\mu\) as \(n\rightarrow \infty\)

Fatou’s Lemma

If \(f_n:\Omega \to \mathbf{R}\) are measurable, \(f_{n}\geq 0\), then \[ \int \left( \text{liminf}_{n\rightarrow \infty }f_{n}d\mu \right) \leq \text{liminf}_{n\rightarrow \infty }\int f_{n}d\mu \]

Dominated Convergence Theorem

If \(f_n:\Omega \to \mathbf{R}\) are measurable, and for each \(\omega \in \Omega\), \(f_{n}(\omega )\rightarrow f(\omega ).\) Furthermore, for some \(g\geq 0\) such that \(\int gd\mu <\infty\), \(|f_{n}|\leq g\) for each \(n\geq 1\). Then, \(\int f_{n}d\mu \rightarrow \int fd\mu\)

Probability

  • Given a measure space \((\Omega ,\mathscr{F})\), a probability (or probability measure)\(\ P\) is a measure s.t. \(P(\Omega )=1\)
Tip

Chapter 2, exercise 1.1

Random Variables

A random variable \(X\) is a measurable function from \(\Omega\) to \(\mathbf{R}\)

Distribution

Let \((\Omega ,\mathscr{F},P)\) be a probability space, \(X\) a random variable on \((\Omega ,\mathscr{F})\). A distribution \(P_{X}\) induced by \(X\) is a probability measure on \((\mathbf{R},\mathscr{B}(\mathbf{R}))\) such that : \(\forall B\in \mathscr{B}(\mathbf{R})\), \[ P_{X}(B)\equiv P\left\{ \omega \in \Omega :X(\omega )\in B\right\} \]

CDF

The CDF of a random variable \(X\) with distribution \(P_{X}\) is defined to be a function \(F:\mathbf{R}\rightarrow [0,1]\) such that \[ F(t)=P_{X}\left( (-\infty ,t]\right) . \]

PDF

Let \(X\) be a random variable with distribution \(P_{X}\). When \(P_{X}\ll \lambda\), we call \(X\) a continuous random variable, and call the Radon-Nikodym derivative \(f\equiv dP_{X}/d\lambda\) the (probability) density function of \(P_{X}\).

Tip

Chapter 2, exercise 2.1, 2.2, 2.3

Inequalities

Markov’s Inequality

\(P(|X|>\epsilon) \leq \frac{\Er[|X|^k]}{\epsilon^k}\) \(\forall \epsilon > 0, k > 0\)

Jensen’s Inequality

Suppose that \(g\) is convex and \(X\) and \(g(X)\) are integrable, then \(g(\Er X) \leq \Er[g(X)]\)

Cauchy-Schwarz Inequality

\(\left(\Er[XY]\right)^2 \leq \Er[X^2] \Er[Y^2]\)

Tip

Chapter 2, exercise 3.1

Dependence and Information

Generated \(\sigma\)-field

  • \(\sigma(X)\) is \(\sigma\)-field generated by \(X\)
    • smallest \(\sigma\)-field w.r.t. which \(X\) is measurable
    • \(\sigma(X) = \{X^{-1}(B): B \in \mathscr{B}(\R)\}\)

Information

  • \(\forall E \in \sigma(X)\), observing value \(x\) of \(X\), tells us whether \(E\) occurred
  • if \(\sigma(X_1) \subset \sigma(X_2)\), then \(\sigma(X_2)\) has more information than \(\sigma(X_1)\)

Dependence

Suppose \(g:\R \to \R\) is Borel measurable, then \(\sigma(g(X)) \subset \sigma(X)\)

Suppose \(\sigma(W) \subset \sigma(X)\), then \(\exists\) Borel measurable \(g\) s.t. \(W=g(X)\)

Independence

  • Events \(A_1, ..., A_m\) are independent if for any sub-collection \(A_{i_1}, ..., A_{i_s}\) \[ P\left(\cap_{j=1}^s A_{i_j}\right) = \prod_{j=1}^s P(A_{i_j}) \]
  • \(\sigma\)-fields, \(\mathscr{F}_1, .., \mathscr{F}_m \subset \mathscr{F}\) are independent if for any \(\mathscr{F}_{i_1}, .., \mathscr{F}_{i_s}\) and \(E_j \in \mathscr{F}_j\), \[ P\left(\cap_{j=1}^s E_{i_j}\right) = \prod_{j=1}^s P(E_{i_j}) \]
  • Random variables \(X_1, ..., X_m\) are independent if \(\sigma(X_1), ..., \sigma(X_m)\) are independent

Suppose that \(X=(X_1, X_2)\) and \(Y=(Y_1, Y_2)\) are independent, then \(f(X)\) and \(g(Y)\) are independent

Tip

Chapter 2, exercises 4.1-4.8

Conditional Expectation

Let \(\mathscr{G} \subset \mathscr{F}\) be \(\sigma\)-fields, \(Y\) a random variable with \(\Er |Y| < \infty\), then the conditional expectation of \(Y\) given \(\mathscr{G}\) is \(\Er[Y|\mathscr{G}](\cdot): \Omega \to \R\) s.t.

  1. \(\Er[Y|\mathscr{G}](\cdot)\) is \(\mathscr{G}\) measurable

  2. \(\int_A \Er[Y|\mathscr{G}] dP = \int_A Y dP\) \(\forall A \in \mathscr{G}\)

Properties

  • If \(X\) is \(\mathscr{G}\) measurable, then \(\Er[XY| \mathscr{G}] = X \Er[Y|\mathscr{G}]\) a.e.
  • If \(\sigma(X) \subset \sigma(Z)\), then \(\Er[XY|Z] = X \Er[Y|Z]\)
  • If \(\sigma(X) \subset \sigma(Z)\), then \(\Er[\Er[Y|Z]|X] = \Er[Y|X]\)
  • If \(Y\) and \(X\) are independent, then \(\Er[Y | X ] = \Er[Y]\)
Tip

Chapter 2, exercise 5.1, 5.2, 5.3

Identification

Let \(X\) be an observed random vector with distribution \(P_X\). Let \(\mathcal{P}\) be aprobability model — a collection of probabilities such that \(P_X \in \mathcal{P}\). Then \(\theta_0 \in \R^k\) is identified in \(\mathcal{P}\) if there exists a known \(\psi: \mathcal{P} \to \R^k\) s.t.

\[ \theta_0 = \psi(P_X) \]

Observationally Equivalent

  • Let \(\mathcal{P} = \{ P(\cdot; s) : s \in S \}\), two structures \(s\) and \(\tilde{s}\) in \(S\) are observationally equivalent if they imply the same distribution for the observed data, i.e. \[ P(B;s) = P(B; \tilde{s}) \] for all \(B \in \sigma(X)\).

  • Let \(\lambda: S \to \R^k\), \(\theta\) is observationally equivalent to \(\tilde{\theta}\) if \(\exists s, \tilde{s} \in S\) that are observationally equivalent and \(\theta = \lambda(s)\) and \(\tilde{\theta} = \lambda(\tilde{s})\)

    • Let \(\Gamma(\theta, S) = \{P(\dot; s) | s \in S, \theta = \lambda(s) \}\), then \(\theta\) and \(\tilde{\theta}\) are observationally equivalent iff \(\Gamma(\theta,S) \cap \Gamma(\tilde{\theta}, S) \neq \emptyset\)

(Non-Constructive) Identification

  • \(s_0 \in S\) is identified if there is no \(s\) that is observationally equivalent to \(s_0\)

  • \(\theta_0\) is identified (in \(S\)) if there is no observationally equivalent \(\theta \neq \theta_0\)

    • i.e. \(\Gamma(\theta_0, S) \cap \Gamma(\theta, S) = \emptyset\) \(\forall \theta \neq \theta_0\)
Tip

Chapter 3, exercise 1.1, 1.2

Estimation

Sample Analogue Estimation

Tip

Chapter 3, exercise 1.3, 1.4

Maximum Likelihood Estimation

Cramer-Rao Lower Bound

  • \(X \in \R^n\) distribution \(P_X \in \mathcal{P} = \{P_\theta: \theta \in \Theta \subset \R^d \}\), likelihood \(\ell(\theta;x) = f_X(x;\theta)\)

Score Equality

  • If \(\frac{\partial}{\partial \theta} \int f_X(x;\theta) d\mu(x) = \int \frac{\partial}{\partial \theta} f_X(x;\theta) d\mu(x)\), then \[ \int \underbrace{\frac{\partial \log \ell(\theta;x)}{\partial \theta}}_{\text{"score"}=s(x,\theta)} dP_\theta(x) = 0 \]

Information Equality

  • Fischer Information \(I(\theta) = \int s(x,\theta) s(x,\theta)' dP_\theta(x)\)
  • If \(\frac{\partial^2}{\partial \theta\partial \theta'} \int f_X(x;\theta) d\mu(x) = \int \frac{\partial^2}{\partial \theta\partial \theta'} f_X(x;\theta) d\mu(x)\), then \[ I(\theta) = -\int \underbrace{\frac{\partial^2 \ell(\theta;x)}{\partial \theta \partial \theta'}}_{\text{"Hessian"}=h(x,\theta)} dP_\theta(x) \]

2

  • If \(T = \tau(X)\) is an unbiased estimator for \(\theta\) and \[ \frac{\partial}{\partial \theta} \int \tau(x) f_X(x;\theta) d\mu(x) = \int \tau(x) \frac{\partial f_X(x,\theta)}{\partial \theta\partial \theta'} d\mu(x) \] then \[ \int \tau(x) s(x,\theta)'dP_\theta(x) = I \]

Cramér-Rao Bound

Let \(T = \tau(X)\) be an unbiased estimator, and suppose the condition of the previous slide and of the score equality hold. Then, \[ \var_\theta(\tau(X)) \equiv \int \left(\tau(x) - \int \tau(x) dP_\theta\right)\left(\tau(x) - \int \tau(x) dP_\theta\right)' dP\theta \geq I(\theta)^{-1} \]

Tip

Chapter 3, exercise 1.5

Hypothesis Testing

  • \(P(\text{reject } H_0 | P_x \in \mathcal{P}_0)\)=Type I error \(=P_x(C)\)
  • \(P(\text{fail to reject } H_0 | P_x \in \mathcal{P}_1)\)=Type II error
  • \(P(\text{reject } H_0 | P_x \in \mathcal{P}_1)\) = power
  • \(\sup_{P_x \in \mathcal{P}_0} P_x(C)\) = size of test

Neyman-Pearson Lemma

Let \(\Theta = \{0, 1\}\), \(f_0\) and \(f_1\) be densities of \(P_0\) and \(P_1\), $(x) =f_1(x)/f_0(x) $ and \(C^* =\{x \in X: \tau(x) > c\}\). Then among all tests \(C\) s.t. \(P_0(C) = P_0(C^*)\), \(C^*\) is most powerful.

Projection

\(P_L y \in L\) is the projection of \(y\) on \(L\) if \[ \norm{y - P_L y } = \inf_{w \in L} \norm{y - w} \]

  1. \(P_L y\) exists, is unique, and is a linear function of \(y\)

  2. For any \(y_1^* \in L\), \(y_1^* = P_L y\) iff \(y- y_1^* \perp L\)

  3. \(G = P_L\) iff \(Gy = y \forall y \in L\) and \(Gy = 0 \forall y \in L^\perp\)

  4. Linear \(G: V \to V\) is a projection map onto its range, \(\mathcal{R}(G)\), iff \(G\) is idempotent and symmetric.

Let \(L \subset V\) and \(L_0 \subset L\) be subspaces. Then \(P_L - P_{L_0} = P_{L \cap L_0^\perp}\)

Let \(X: \R^k \to \R^n\) be linear. The projection onto \(\mathcal{R}(X)\) is \(P_X = X(X'X)^- X'\) where \((X'X)^{-}\) is any g-inverse of \(X'X\)

Gauss-Markov Theorem

\[ Y = \theta + u \] with \(\theta \in L \subset \R^n\), \(L\) a known subspace. If \(\Er[u] = 0\) and \(\Er[uu'] = \sigma^2 I_n\), then the best linear unbiased estimator (BLUE) of \(a'\theta = a'\hat{\theta}\) where \(\hat{\theta} = P_L y\)

Tip

Chapter 4, exercise 2.1