ECON 626: Midterm Review

Published

October 24, 2022

\[ \def\R{{\mathbb{R}}} \def\Er{{\mathrm{E}}} \def\var{{\mathrm{Var}}} \newcommand\norm[1]{\left\lVert#1\right\rVert} \def\cov{{\mathrm{Cov}}} \]

List of Definitions and Results

Measure Theory

Measure Space

A set $\Omega$
A collection of subsets, $\mathscr{F}$, of $\Omega$ that is a $\sigma$-field (aka $\sigma$-algebra) , that is
1. $\Omega \in \mathscr{F}$
2. If $A \in \mathscr{F}$, then $A^c \in \mathscr{F}$
3. If $A_1, A_2, ... \in \mathscr{F}$, then $\cup_{j=1}^\infty A_j \in \mathscr{F}$
A measure, $\mu: \mathcal{F} \to [0, \infty]$ s.t.
1. $\mu(\emptyset) = 0$
2. If $A_1, A_2, ... \in \mathscr{F}$ are pairwise disjoint, then $\mu\left(\cup_{j=1}^\infty A_j \right) = \sum_{j=1}^\infty \mu(A_j)$

Given a topology on $\Omega$, the Borel $\sigma$-field, $\mathscr{B}(\Omega)$, is the smallest $\sigma$-field containing all open subsets of $\Omega$.

$f: \Omega \to \mathbf{R}$ is ($\mathscr{F}$-)measurable if $\forall$ $B \in \mathscr{B}(\mathbf{R})$, $f^{-1}(B) \in \mathscr{F}$

Tip

Chapter 1, exercises 1.1, 1.2,

Lesbegue Integral

The Lesbegue integral satisfies:

If $f \geq 0$ a.e., then $\int f d\mu \geq 0$
Linearity: $\int (af + bg) d\mu = a\int f d\mu + b \int g d \mu$

Measure $\nu$ is absolutely continuous with respect to $\mu$ if for $A \in \mathscr{F}$, $\mu(A) = 0$ implies $\nu(A) = 0$
- denotate as $\nu \ll \mu$
- $\mu$ is called a dominating measure

Radon-Nikodym Derivative

Let $(\Omega,\mathscr{F},\mu)$ be a measure space, and let $\nu$ and $\mu$ be $\sigma$-finite measures defined on $\mathscr{F}$ and $\nu \ll \mu$. Then there is a nonnegative measurable function $f$ such that for each set $A\in \mathscr{F}$, \[ \nu (A)=\int_{A}fd\mu \] For any such $f$ and $g$, $\mu (\{\omega \in \Omega:f(\omega )\neq g(\omega )\})=0$

Tip

Chapter 1, exercises 3.1, 3.2

Convergence Theorems

Continuity of Measure

Suppose that $\{E_{n}\}$ is a monotone sequence of events. Then \[ \mu \left( \lim_{n\rightarrow \infty}E_{n}\right) =\lim_{n\rightarrow \infty }\mu (E_{n}). \]

Monotone Convergence Theorem

If $f_n:\Omega \to \mathbf{R}$ are measurable, $f_{n}\geq 0$, and for each $\omega \in \Omega$, $f_{n}(\omega )\uparrow f(\omega )$, then $\int f_{n}d\mu \uparrow \int fd\mu$ as $n\rightarrow \infty$

Fatou’s Lemma

If $f_n:\Omega \to \mathbf{R}$ are measurable, $f_{n}\geq 0$, then \[ \int \left( \text{liminf}_{n\rightarrow \infty }f_{n}d\mu \right) \leq \text{liminf}_{n\rightarrow \infty }\int f_{n}d\mu \]

Dominated Convergence Theorem

If $f_n:\Omega \to \mathbf{R}$ are measurable, and for each $\omega \in \Omega$, $f_{n}(\omega )\rightarrow f(\omega ).$ Furthermore, for some $g\geq 0$ such that $\int gd\mu <\infty$, $|f_{n}|\leq g$ for each $n\geq 1$. Then, $\int f_{n}d\mu \rightarrow \int fd\mu$

Probability

Given a measure space $(\Omega ,\mathscr{F})$, a probability (or probability measure)$\ P$ is a measure s.t. $P(\Omega )=1$

Tip

Chapter 2, exercise 1.1

Random Variables

A random variable $X$ is a measurable function from $\Omega$ to $\mathbf{R}$

Distribution

Let $(\Omega ,\mathscr{F},P)$ be a probability space, $X$ a random variable on $(\Omega ,\mathscr{F})$. A distribution $P_{X}$ induced by $X$ is a probability measure on $(\mathbf{R},\mathscr{B}(\mathbf{R}))$ such that : $\forall B\in \mathscr{B}(\mathbf{R})$, \[ P_{X}(B)\equiv P\left\{ \omega \in \Omega :X(\omega )\in B\right\} \]

CDF

The CDF of a random variable $X$ with distribution $P_{X}$ is defined to be a function $F:\mathbf{R}\rightarrow [0,1]$ such that \[ F(t)=P_{X}\left( (-\infty ,t]\right) . \]

PDF

Let $X$ be a random variable with distribution $P_{X}$. When $P_{X}\ll \lambda$, we call $X$ a continuous random variable, and call the Radon-Nikodym derivative $f\equiv dP_{X}/d\lambda$ the (probability) density function of $P_{X}$.

Tip

Chapter 2, exercise 2.1, 2.2, 2.3

Inequalities

Markov’s Inequality

$P(|X|>\epsilon) \leq \frac{\Er[|X|^k]}{\epsilon^k}$ $\forall \epsilon > 0, k > 0$

Jensen’s Inequality

Suppose that $g$ is convex and $X$ and $g(X)$ are integrable, then $g(\Er X) \leq \Er[g(X)]$

Cauchy-Schwarz Inequality

$\left(\Er[XY]\right)^2 \leq \Er[X^2] \Er[Y^2]$

Tip

Chapter 2, exercise 3.1

Dependence and Information

Generated $\sigma$-field

$\sigma(X)$ is $\sigma$-field generated by $X$
- smallest $\sigma$-field w.r.t. which $X$ is measurable
- $\sigma(X) = \{X^{-1}(B): B \in \mathscr{B}(\R)\}$

Information

$\forall E \in \sigma(X)$, observing value $x$ of $X$, tells us whether $E$ occurred
if $\sigma(X_1) \subset \sigma(X_2)$, then $\sigma(X_2)$ has more information than $\sigma(X_1)$

Dependence

Suppose $g:\R \to \R$ is Borel measurable, then $\sigma(g(X)) \subset \sigma(X)$

Suppose $\sigma(W) \subset \sigma(X)$, then $\exists$ Borel measurable $g$ s.t. $W=g(X)$

Independence

Events $A_1, ..., A_m$ are independent if for any sub-collection $A_{i_1}, ..., A_{i_s}$ \[ P\left(\cap_{j=1}^s A_{i_j}\right) = \prod_{j=1}^s P(A_{i_j}) \]
$\sigma$-fields, $\mathscr{F}_1, .., \mathscr{F}_m \subset \mathscr{F}$ are independent if for any $\mathscr{F}_{i_1}, .., \mathscr{F}_{i_s}$ and $E_j \in \mathscr{F}_j$, \[ P\left(\cap_{j=1}^s E_{i_j}\right) = \prod_{j=1}^s P(E_{i_j}) \]
Random variables $X_1, ..., X_m$ are independent if $\sigma(X_1), ..., \sigma(X_m)$ are independent

Suppose that $X=(X_1, X_2)$ and $Y=(Y_1, Y_2)$ are independent, then $f(X)$ and $g(Y)$ are independent

Tip

Chapter 2, exercises 4.1-4.8

Conditional Expectation

Let $\mathscr{G} \subset \mathscr{F}$ be $\sigma$-fields, $Y$ a random variable with $\Er |Y| < \infty$, then the conditional expectation of $Y$ given $\mathscr{G}$ is $\Er[Y|\mathscr{G}](\cdot): \Omega \to \R$ s.t.

$\Er[Y|\mathscr{G}](\cdot)$ is $\mathscr{G}$ measurable
$\int_A \Er[Y|\mathscr{G}] dP = \int_A Y dP$ $\forall A \in \mathscr{G}$

Properties

If $X$ is $\mathscr{G}$ measurable, then $\Er[XY| \mathscr{G}] = X \Er[Y|\mathscr{G}]$ a.e.
If $\sigma(X) \subset \sigma(Z)$, then $\Er[XY|Z] = X \Er[Y|Z]$
If $\sigma(X) \subset \sigma(Z)$, then $\Er[\Er[Y|Z]|X] = \Er[Y|X]$
If $Y$ and $X$ are independent, then $\Er[Y | X ] = \Er[Y]$

Tip

Chapter 2, exercise 5.1, 5.2, 5.3

Identification

Let $X$ be an observed random vector with distribution $P_X$. Let $\mathcal{P}$ be aprobability model — a collection of probabilities such that $P_X \in \mathcal{P}$. Then $\theta_0 \in \R^k$ is identified in $\mathcal{P}$ if there exists a known $\psi: \mathcal{P} \to \R^k$ s.t.

\[ \theta_0 = \psi(P_X) \]

Observationally Equivalent

Let $\mathcal{P} = \{ P(\cdot; s) : s \in S \}$, two structures $s$ and $\tilde{s}$ in $S$ are observationally equivalent if they imply the same distribution for the observed data, i.e. \[ P(B;s) = P(B; \tilde{s}) \] for all $B \in \sigma(X)$.
Let $\lambda: S \to \R^k$, $\theta$ is observationally equivalent to $\tilde{\theta}$ if $\exists s, \tilde{s} \in S$ that are observationally equivalent and $\theta = \lambda(s)$ and $\tilde{\theta} = \lambda(\tilde{s})$
- Let $\Gamma(\theta, S) = \{P(\dot; s) | s \in S, \theta = \lambda(s) \}$, then $\theta$ and $\tilde{\theta}$ are observationally equivalent iff $\Gamma(\theta,S) \cap \Gamma(\tilde{\theta}, S) \neq \emptyset$

(Non-Constructive) Identification

$s_0 \in S$ is identified if there is no $s$ that is observationally equivalent to $s_0$
$\theta_0$ is identified (in $S$) if there is no observationally equivalent $\theta \neq \theta_0$
- i.e. $\Gamma(\theta_0, S) \cap \Gamma(\theta, S) = \emptyset$ $\forall \theta \neq \theta_0$

Tip

Chapter 3, exercise 1.1, 1.2

Estimation

Sample Analogue Estimation

Tip

Chapter 3, exercise 1.3, 1.4

Maximum Likelihood Estimation

Cramer-Rao Lower Bound

$X \in \R^n$ distribution $P_X \in \mathcal{P} = \{P_\theta: \theta \in \Theta \subset \R^d \}$, likelihood $\ell(\theta;x) = f_X(x;\theta)$

Score Equality

If $\frac{\partial}{\partial \theta} \int f_X(x;\theta) d\mu(x) = \int \frac{\partial}{\partial \theta} f_X(x;\theta) d\mu(x)$, then \[ \int \underbrace{\frac{\partial \log \ell(\theta;x)}{\partial \theta}}_{\text{"score"}=s(x,\theta)} dP_\theta(x) = 0 \]

Information Equality

Fischer Information $I(\theta) = \int s(x,\theta) s(x,\theta)' dP_\theta(x)$
If $\frac{\partial^2}{\partial \theta\partial \theta'} \int f_X(x;\theta) d\mu(x) = \int \frac{\partial^2}{\partial \theta\partial \theta'} f_X(x;\theta) d\mu(x)$, then \[ I(\theta) = -\int \underbrace{\frac{\partial^2 \ell(\theta;x)}{\partial \theta \partial \theta'}}_{\text{"Hessian"}=h(x,\theta)} dP_\theta(x) \]

2

If $T = \tau(X)$ is an unbiased estimator for $\theta$ and \[ \frac{\partial}{\partial \theta} \int \tau(x) f_X(x;\theta) d\mu(x) = \int \tau(x) \frac{\partial f_X(x,\theta)}{\partial \theta\partial \theta'} d\mu(x) \] then \[ \int \tau(x) s(x,\theta)'dP_\theta(x) = I \]

Cramér-Rao Bound

Let $T = \tau(X)$ be an unbiased estimator, and suppose the condition of the previous slide and of the score equality hold. Then, \[ \var_\theta(\tau(X)) \equiv \int \left(\tau(x) - \int \tau(x) dP_\theta\right)\left(\tau(x) - \int \tau(x) dP_\theta\right)' dP\theta \geq I(\theta)^{-1} \]

Tip

Chapter 3, exercise 1.5

Hypothesis Testing

$P(\text{reject } H_0 | P_x \in \mathcal{P}_0)$=Type I error $=P_x(C)$
$P(\text{fail to reject } H_0 | P_x \in \mathcal{P}_1)$=Type II error
$P(\text{reject } H_0 | P_x \in \mathcal{P}_1)$ = power
$\sup_{P_x \in \mathcal{P}_0} P_x(C)$ = size of test

Neyman-Pearson Lemma

Let $\Theta = \{0, 1\}$, $f_0$ and $f_1$ be densities of $P_0$ and $P_1$, $(x) =f_1(x)/f_0(x) $ and $C^* =\{x \in X: \tau(x) > c\}$. Then among all tests $C$ s.t. $P_0(C) = P_0(C^*)$, $C^*$ is most powerful.

Projection

$P_L y \in L$ is the projection of $y$ on $L$ if \[ \norm{y - P_L y } = \inf_{w \in L} \norm{y - w} \]

$P_L y$ exists, is unique, and is a linear function of $y$
For any $y_1^* \in L$, $y_1^* = P_L y$ iff $y- y_1^* \perp L$
$G = P_L$ iff $Gy = y \forall y \in L$ and $Gy = 0 \forall y \in L^\perp$
Linear $G: V \to V$ is a projection map onto its range, $\mathcal{R}(G)$, iff $G$ is idempotent and symmetric.

Let $L \subset V$ and $L_0 \subset L$ be subspaces. Then $P_L - P_{L_0} = P_{L \cap L_0^\perp}$

Let $X: \R^k \to \R^n$ be linear. The projection onto $\mathcal{R}(X)$ is $P_X = X(X'X)^- X'$ where $(X'X)^{-}$ is any g-inverse of $X'X$

Gauss-Markov Theorem

\[ Y = \theta + u \] with $\theta \in L \subset \R^n$, $L$ a known subspace. If $\Er[u] = 0$ and $\Er[uu'] = \sigma^2 I_n$, then the best linear unbiased estimator (BLUE) of $a'\theta = a'\hat{\theta}$ where $\hat{\theta} = P_L y$

Tip

Chapter 4, exercise 2.1

List of Definitions and Results

Measure Theory

Measure Space

Lesbegue Integral

Radon-Nikodym Derivative

Convergence Theorems

Continuity of Measure

Monotone Convergence Theorem

Fatou’s Lemma

Dominated Convergence Theorem

Probability

Random Variables

Distribution

CDF

PDF

Inequalities

Markov’s Inequality

Jensen’s Inequality

Cauchy-Schwarz Inequality

Dependence and Information

Generated \(\sigma\)-field

Information

Dependence

Independence

Conditional Expectation

Properties

Identification

Observationally Equivalent

(Non-Constructive) Identification

Estimation

Sample Analogue Estimation

Maximum Likelihood Estimation

Cramer-Rao Lower Bound

Score Equality

Information Equality

2

Cramér-Rao Bound

Hypothesis Testing

Neyman-Pearson Lemma

Projection

Gauss-Markov Theorem