ECON 626: Midterm Review
\[ \def\R{{\mathbb{R}}} \def\Er{{\mathrm{E}}} \def\var{{\mathrm{Var}}} \newcommand\norm[1]{\left\lVert#1\right\rVert} \def\cov{{\mathrm{Cov}}} \]
List of Definitions and Results
Measure Theory
Measure Space
- A set \(\Omega\)
- A collection of subsets, \(\mathscr{F}\), of \(\Omega\) that is a \(\sigma\)-field (aka \(\sigma\)-algebra) , that is
- \(\Omega \in \mathscr{F}\)
- If \(A \in \mathscr{F}\), then \(A^c \in \mathscr{F}\)
- If \(A_1, A_2, ... \in \mathscr{F}\), then \(\cup_{j=1}^\infty A_j \in \mathscr{F}\)
- A measure, \(\mu: \mathcal{F} \to [0, \infty]\) s.t.
- \(\mu(\emptyset) = 0\)
- If \(A_1, A_2, ... \in \mathscr{F}\) are pairwise disjoint, then \(\mu\left(\cup_{j=1}^\infty A_j \right) = \sum_{j=1}^\infty \mu(A_j)\)
Given a topology on \(\Omega\), the Borel \(\sigma\)-field, \(\mathscr{B}(\Omega)\), is the smallest \(\sigma\)-field containing all open subsets of \(\Omega\).
- \(f: \Omega \to \mathbf{R}\) is (\(\mathscr{F}\)-)measurable if \(\forall\) \(B \in \mathscr{B}(\mathbf{R})\), \(f^{-1}(B) \in \mathscr{F}\)
Lesbegue Integral
The Lesbegue integral satisfies:
If \(f \geq 0\) a.e., then \(\int f d\mu \geq 0\)
Linearity: \(\int (af + bg) d\mu = a\int f d\mu + b \int g d \mu\)
- Measure \(\nu\) is absolutely continuous with respect to \(\mu\) if for \(A \in \mathscr{F}\), \(\mu(A) = 0\) implies \(\nu(A) = 0\)
- denotate as \(\nu \ll \mu\)
- \(\mu\) is called a dominating measure
Radon-Nikodym Derivative
Let \((\Omega,\mathscr{F},\mu)\) be a measure space, and let \(\nu\) and \(\mu\) be \(\sigma\)-finite measures defined on \(\mathscr{F}\) and \(\nu \ll \mu\). Then there is a nonnegative measurable function \(f\) such that for each set \(A\in \mathscr{F}\), \[ \nu (A)=\int_{A}fd\mu \] For any such \(f\) and \(g\), \(\mu (\{\omega \in \Omega:f(\omega )\neq g(\omega )\})=0\)
Convergence Theorems
Continuity of Measure
Suppose that \(\{E_{n}\}\) is a monotone sequence of events. Then \[ \mu \left( \lim_{n\rightarrow \infty}E_{n}\right) =\lim_{n\rightarrow \infty }\mu (E_{n}). \]
Monotone Convergence Theorem
If \(f_n:\Omega \to \mathbf{R}\) are measurable, \(f_{n}\geq 0\), and for each \(\omega \in \Omega\), \(f_{n}(\omega )\uparrow f(\omega )\), then \(\int f_{n}d\mu \uparrow \int fd\mu\) as \(n\rightarrow \infty\)
Fatou’s Lemma
If \(f_n:\Omega \to \mathbf{R}\) are measurable, \(f_{n}\geq 0\), then \[ \int \left( \text{liminf}_{n\rightarrow \infty }f_{n}d\mu \right) \leq \text{liminf}_{n\rightarrow \infty }\int f_{n}d\mu \]
Dominated Convergence Theorem
If \(f_n:\Omega \to \mathbf{R}\) are measurable, and for each \(\omega \in \Omega\), \(f_{n}(\omega )\rightarrow f(\omega ).\) Furthermore, for some \(g\geq 0\) such that \(\int gd\mu <\infty\), \(|f_{n}|\leq g\) for each \(n\geq 1\). Then, \(\int f_{n}d\mu \rightarrow \int fd\mu\)
Probability
- Given a measure space \((\Omega ,\mathscr{F})\), a probability (or probability measure)\(\ P\) is a measure s.t. \(P(\Omega )=1\)
Random Variables
A random variable \(X\) is a measurable function from \(\Omega\) to \(\mathbf{R}\)
Distribution
Let \((\Omega ,\mathscr{F},P)\) be a probability space, \(X\) a random variable on \((\Omega ,\mathscr{F})\). A distribution \(P_{X}\) induced by \(X\) is a probability measure on \((\mathbf{R},\mathscr{B}(\mathbf{R}))\) such that : \(\forall B\in \mathscr{B}(\mathbf{R})\), \[ P_{X}(B)\equiv P\left\{ \omega \in \Omega :X(\omega )\in B\right\} \]
CDF
The CDF of a random variable \(X\) with distribution \(P_{X}\) is defined to be a function \(F:\mathbf{R}\rightarrow [0,1]\) such that \[ F(t)=P_{X}\left( (-\infty ,t]\right) . \]
Let \(X\) be a random variable with distribution \(P_{X}\). When \(P_{X}\ll \lambda\), we call \(X\) a continuous random variable, and call the Radon-Nikodym derivative \(f\equiv dP_{X}/d\lambda\) the (probability) density function of \(P_{X}\).
Inequalities
Markov’s Inequality
\(P(|X|>\epsilon) \leq \frac{\Er[|X|^k]}{\epsilon^k}\) \(\forall \epsilon > 0, k > 0\)
Jensen’s Inequality
Suppose that \(g\) is convex and \(X\) and \(g(X)\) are integrable, then \(g(\Er X) \leq \Er[g(X)]\)
Cauchy-Schwarz Inequality
\(\left(\Er[XY]\right)^2 \leq \Er[X^2] \Er[Y^2]\)
Dependence and Information
Generated \(\sigma\)-field
- \(\sigma(X)\) is \(\sigma\)-field generated by \(X\)
- smallest \(\sigma\)-field w.r.t. which \(X\) is measurable
- \(\sigma(X) = \{X^{-1}(B): B \in \mathscr{B}(\R)\}\)
Information
- \(\forall E \in \sigma(X)\), observing value \(x\) of \(X\), tells us whether \(E\) occurred
- if \(\sigma(X_1) \subset \sigma(X_2)\), then \(\sigma(X_2)\) has more information than \(\sigma(X_1)\)
Dependence
Suppose \(g:\R \to \R\) is Borel measurable, then \(\sigma(g(X)) \subset \sigma(X)\)
Suppose \(\sigma(W) \subset \sigma(X)\), then \(\exists\) Borel measurable \(g\) s.t. \(W=g(X)\)
Independence
- Events \(A_1, ..., A_m\) are independent if for any sub-collection \(A_{i_1}, ..., A_{i_s}\) \[ P\left(\cap_{j=1}^s A_{i_j}\right) = \prod_{j=1}^s P(A_{i_j}) \]
- \(\sigma\)-fields, \(\mathscr{F}_1, .., \mathscr{F}_m \subset \mathscr{F}\) are independent if for any \(\mathscr{F}_{i_1}, .., \mathscr{F}_{i_s}\) and \(E_j \in \mathscr{F}_j\), \[ P\left(\cap_{j=1}^s E_{i_j}\right) = \prod_{j=1}^s P(E_{i_j}) \]
- Random variables \(X_1, ..., X_m\) are independent if \(\sigma(X_1), ..., \sigma(X_m)\) are independent
Suppose that \(X=(X_1, X_2)\) and \(Y=(Y_1, Y_2)\) are independent, then \(f(X)\) and \(g(Y)\) are independent
Conditional Expectation
Let \(\mathscr{G} \subset \mathscr{F}\) be \(\sigma\)-fields, \(Y\) a random variable with \(\Er |Y| < \infty\), then the conditional expectation of \(Y\) given \(\mathscr{G}\) is \(\Er[Y|\mathscr{G}](\cdot): \Omega \to \R\) s.t.
\(\Er[Y|\mathscr{G}](\cdot)\) is \(\mathscr{G}\) measurable
\(\int_A \Er[Y|\mathscr{G}] dP = \int_A Y dP\) \(\forall A \in \mathscr{G}\)
Properties
- If \(X\) is \(\mathscr{G}\) measurable, then \(\Er[XY| \mathscr{G}] = X \Er[Y|\mathscr{G}]\) a.e.
- If \(\sigma(X) \subset \sigma(Z)\), then \(\Er[XY|Z] = X \Er[Y|Z]\)
- If \(\sigma(X) \subset \sigma(Z)\), then \(\Er[\Er[Y|Z]|X] = \Er[Y|X]\)
- If \(Y\) and \(X\) are independent, then \(\Er[Y | X ] = \Er[Y]\)
Identification
Let \(X\) be an observed random vector with distribution \(P_X\). Let \(\mathcal{P}\) be aprobability model — a collection of probabilities such that \(P_X \in \mathcal{P}\). Then \(\theta_0 \in \R^k\) is identified in \(\mathcal{P}\) if there exists a known \(\psi: \mathcal{P} \to \R^k\) s.t.
\[ \theta_0 = \psi(P_X) \]
Observationally Equivalent
Let \(\mathcal{P} = \{ P(\cdot; s) : s \in S \}\), two structures \(s\) and \(\tilde{s}\) in \(S\) are observationally equivalent if they imply the same distribution for the observed data, i.e. \[ P(B;s) = P(B; \tilde{s}) \] for all \(B \in \sigma(X)\).
Let \(\lambda: S \to \R^k\), \(\theta\) is observationally equivalent to \(\tilde{\theta}\) if \(\exists s, \tilde{s} \in S\) that are observationally equivalent and \(\theta = \lambda(s)\) and \(\tilde{\theta} = \lambda(\tilde{s})\)
- Let \(\Gamma(\theta, S) = \{P(\dot; s) | s \in S, \theta = \lambda(s) \}\), then \(\theta\) and \(\tilde{\theta}\) are observationally equivalent iff \(\Gamma(\theta,S) \cap \Gamma(\tilde{\theta}, S) \neq \emptyset\)
(Non-Constructive) Identification
\(s_0 \in S\) is identified if there is no \(s\) that is observationally equivalent to \(s_0\)
\(\theta_0\) is identified (in \(S\)) if there is no observationally equivalent \(\theta \neq \theta_0\)
- i.e. \(\Gamma(\theta_0, S) \cap \Gamma(\theta, S) = \emptyset\) \(\forall \theta \neq \theta_0\)
Estimation
Sample Analogue Estimation
Maximum Likelihood Estimation
Cramer-Rao Lower Bound
- \(X \in \R^n\) distribution \(P_X \in \mathcal{P} = \{P_\theta: \theta \in \Theta \subset \R^d \}\), likelihood \(\ell(\theta;x) = f_X(x;\theta)\)
Score Equality
- If \(\frac{\partial}{\partial \theta} \int f_X(x;\theta) d\mu(x) = \int \frac{\partial}{\partial \theta} f_X(x;\theta) d\mu(x)\), then \[ \int \underbrace{\frac{\partial \log \ell(\theta;x)}{\partial \theta}}_{\text{"score"}=s(x,\theta)} dP_\theta(x) = 0 \]
Information Equality
- Fischer Information \(I(\theta) = \int s(x,\theta) s(x,\theta)' dP_\theta(x)\)
- If \(\frac{\partial^2}{\partial \theta\partial \theta'} \int f_X(x;\theta) d\mu(x) = \int \frac{\partial^2}{\partial \theta\partial \theta'} f_X(x;\theta) d\mu(x)\), then \[ I(\theta) = -\int \underbrace{\frac{\partial^2 \ell(\theta;x)}{\partial \theta \partial \theta'}}_{\text{"Hessian"}=h(x,\theta)} dP_\theta(x) \]
2
- If \(T = \tau(X)\) is an unbiased estimator for \(\theta\) and \[ \frac{\partial}{\partial \theta} \int \tau(x) f_X(x;\theta) d\mu(x) = \int \tau(x) \frac{\partial f_X(x,\theta)}{\partial \theta\partial \theta'} d\mu(x) \] then \[ \int \tau(x) s(x,\theta)'dP_\theta(x) = I \]
Cramér-Rao Bound
Let \(T = \tau(X)\) be an unbiased estimator, and suppose the condition of the previous slide and of the score equality hold. Then, \[ \var_\theta(\tau(X)) \equiv \int \left(\tau(x) - \int \tau(x) dP_\theta\right)\left(\tau(x) - \int \tau(x) dP_\theta\right)' dP\theta \geq I(\theta)^{-1} \]
Hypothesis Testing
- \(P(\text{reject } H_0 | P_x \in \mathcal{P}_0)\)=Type I error \(=P_x(C)\)
- \(P(\text{fail to reject } H_0 | P_x \in \mathcal{P}_1)\)=Type II error
- \(P(\text{reject } H_0 | P_x \in \mathcal{P}_1)\) = power
- \(\sup_{P_x \in \mathcal{P}_0} P_x(C)\) = size of test
Neyman-Pearson Lemma
Let \(\Theta = \{0, 1\}\), \(f_0\) and \(f_1\) be densities of \(P_0\) and \(P_1\), $(x) =f_1(x)/f_0(x) $ and \(C^* =\{x \in X: \tau(x) > c\}\). Then among all tests \(C\) s.t. \(P_0(C) = P_0(C^*)\), \(C^*\) is most powerful.
Projection
\(P_L y \in L\) is the projection of \(y\) on \(L\) if \[ \norm{y - P_L y } = \inf_{w \in L} \norm{y - w} \]
\(P_L y\) exists, is unique, and is a linear function of \(y\)
For any \(y_1^* \in L\), \(y_1^* = P_L y\) iff \(y- y_1^* \perp L\)
\(G = P_L\) iff \(Gy = y \forall y \in L\) and \(Gy = 0 \forall y \in L^\perp\)
Linear \(G: V \to V\) is a projection map onto its range, \(\mathcal{R}(G)\), iff \(G\) is idempotent and symmetric.
Let \(L \subset V\) and \(L_0 \subset L\) be subspaces. Then \(P_L - P_{L_0} = P_{L \cap L_0^\perp}\)
Let \(X: \R^k \to \R^n\) be linear. The projection onto \(\mathcal{R}(X)\) is \(P_X = X(X'X)^- X'\) where \((X'X)^{-}\) is any g-inverse of \(X'X\)
Gauss-Markov Theorem
\[ Y = \theta + u \] with \(\theta \in L \subset \R^n\), \(L\) a known subspace. If \(\Er[u] = 0\) and \(\Er[uu'] = \sigma^2 I_n\), then the best linear unbiased estimator (BLUE) of \(a'\theta = a'\hat{\theta}\) where \(\hat{\theta} = P_L y\)