Estimation

Paul Schrimpf

2024-09-03

Reading

Required: Song (2021) chapter 4, sections 1.2 and 2 (which is the basis for these slides)
Supplemental: Erich Leo Lehmann and Romano (n.d.) , Erich L. Lehmann and Casella (2006)

\[ \def\Er{{\mathrm{E}}} \def\cov{{\mathrm{Cov}}} \def\var{{\mathrm{Var}}} \def\R{{\mathbb{R}}} \]

Estimation

Estimator

Given a parameter of interest \(\theta_0\), an estimator is a measurable function of an observed random vector X, i.e. \(\hat{\theta} = \tau(X)\) for some known map \(\tau\)
An estimate given \(X=x\) is \(\tau(x)\)

Sample Analogue Estimation

i.i.d. observations from \(P\), \(X = (X_1, ...., X_n)\)
constructively identified parameter \(\theta_0 = \psi(P)\)
empirical measure: \[ \hat{P}(B) = \frac{1}{n} \sum_{i=1}^n 1\{X_i \in B \}. \]
Sample analogue estimator \[ \hat{\theta} = \psi(\hat{P}) \]

Sample Analogue Estimation - Examples

Mean
OLS

Maximum Likelihood Estimation

\(X \in \R^n\) distribution \(P_X \in \mathcal{P} = \{P_\theta: \theta \in \Theta \subset \R^d \}\)
\(P_\theta\) dominated by \(\sigma\)-finite \(\mu\) with density \(f_X(\cdot;\theta)\)
Likelihood \(\ell(\cdot, X): \Theta \to [0,\infty)\) \[ \ell(\theta; X)= f(X; \theta) \]
Maximum likelihood estimator \[ \hat{\theta}_{MLE} = \textrm{arg}\max_{\theta \in \Theta} \ell(\theta;X) \]

MLE: Examples

\(X_i \sim N(\mu, 1)\)
\(Y_i = \alpha_0 + \beta_0 X_i + \epsilon_i\), \(\epsilon_i \sim N(0, \sigma_0^2)\)

MLE: Equivariance

Theorem 1.1

If \(\hat{\theta}\) is the MLE of \(\theta\), then for any function \(g:\Theta \to G\), the MLE of \(g(\theta)\) is \(g(\hat{\theta})\).

Quality of Estimators

Mean Squared Error

Loss function \(L: \R^d \times \Theta \to [0,\infty)\) with \(L(\theta,\theta)=0\)
Risk of at \(\theta_0\) \(\Er[L(\hat{\theta}, \theta_0)]\)
Squared error loss \(L_2(\theta, \theta_0) = (\theta-\theta_0)'(\theta-\theta_0)\)
Mean squared error \[ MSE(\hat{\theta}) = \Er[ (\theta-\theta_0)'(\theta-\theta_0) ] \]
Bias-variance decomposition \[ MSE(\hat{\theta}) = \textrm{Bias}(\hat{\theta})'\textrm{Bias}(\hat{\theta}) + \textrm{tr}(\var(\hat{\theta})) \]

Optimal Estimation in Parametric Models

Setup

\(X \in \R^n\) distribution \(P_X \in \mathcal{P} = \{P_\theta: \theta \in \Theta \subset \R^d \}\), likelihood \(\ell(\theta;x) = f_X(x;\theta)\)
Question: if an estimator is unbiased, what is the smallest possible variance?

Score Equality

If \(\frac{\partial}{\partial \theta} \int f_X(x;\theta) d\mu(x) = \int \frac{\partial}{\partial \theta} f_X(x;\theta) d\mu(x)\), then \[ \int \underbrace{\frac{\partial \log \ell(\theta;x)}{\partial \theta}}_{\text{"score"}=s(x,\theta)} dP_\theta(x) = 0 \]

Information Equality

Fischer Information \(I(\theta) = \int s(x,\theta) s(x,\theta)' dP_\theta(x)\)
If \(\frac{\partial^2}{\partial \theta\partial \theta'} \int f_X(x;\theta) d\mu(x) = \int \frac{\partial^2}{\partial \theta\partial \theta'} f_X(x;\theta) d\mu(x)\), then \[ I(\theta) = -\int \underbrace{\frac{\partial^2 \ell(\theta;x)}{\partial \theta \partial \theta'}}_{\text{"Hessian"}=h(x,\theta)} dP_\theta(x) \]

If \(T = \tau(X)\) is an unbiased estimator for \(\theta\) and \[ \frac{\partial}{\partial \theta} \int \tau(x) f_X(x;\theta) d\mu(x) = \int \tau(x) \frac{\partial f_X(x,\theta)}{\partial \theta\partial \theta'} d\mu(x) \] then \[ \int \tau(x) s(x,\theta)'dP_\theta(x) = I \]

Cramér-Rao Bound

Cramér-Rao Bound

Let \(T = \tau(X)\) be an unbiased estimator, and suppose the condition of the previous slide and of the score equality hold. Then, \[ \var_\theta(\tau(X)) \equiv \int \left(\tau(x) - \int \tau(x) dP_\theta\right)\left(\tau(x) - \int \tau(x) dP_\theta\right)' dP\theta \geq I(\theta)^{-1} \]

Hypothesis Testing

\(X \in \mathcal{X} \subset \R^n\), distribution \(P_x \in \mathcal{P}\)
Partition \(\mathcal{P} = \mathcal{P}_0 \cup \mathcal{P}_1\)
Null and alternative hypotheses:
- \(H_0: \; P_x \in \mathcal{P}_0\)
- \(H_1: \; P_x \in \mathcal{P}_1\)

Hypothesis Testing

Test partitions \(\mathcal{X} = \underbrace{C}_{\text{critical region}} \cup A\)
- Reject null if \(X \in C\)
- Often \(C = \{x \in \mathcal{X}: \underbrace{\tau(x)}_{\text{test statistic}} > \underbrace{c}_{\text{critical value}} \}\)

Hypothesis Testing

\(P(\text{reject } H_0 | P_x \in \mathcal{P}_0)\)=Type I error \(=P_x(C)\)
\(P(\text{fail to reject } H_0 | P_x \in \mathcal{P}_1)\)=Type II error
\(P(\text{reject } H_0 | P_x \in \mathcal{P}_1)\) = power
\(\sup_{P_x \in \mathcal{P}_0} P_x(C)\) = size of test

p-value

test statistic \(\tau(X)\) , define \[ G_P(t) = P(\tau(X) > t) \]
p-value is \[ p= \sup_{P \in \mathcal{P}_0} G_P(\tau(X)) \]

if \(\mathcal{P}_0 = \{P_0\}\), critical value \(c\), let \(\alpha = G_{P_0}(c)\), then \(\tau(X) > c\) iff \(p < \alpha\)

Testing in Parametric Family

Parametric family \(\mathcal{P} = \{P_\theta: \theta \in \Theta\}\)
- \(\Theta_0 = \{\theta \in \Theta: P_\theta \in \mathcal{P}_0\}\)
- \(\Theta_1 = \{\theta \in \Theta: P_\theta \in \mathcal{P}_1\}\)
Hypotheses
- \(H_0 : \theta \in \Theta_0\)
- \(H_1: \theta \in \Theta_1\)
Power function of test \(C\) \[\pi:\Theta \to [0,1] , \;\; \pi(\theta) = P_\theta(C)\]
Size \(= \sup_{\theta \in \Theta_0} \pi(\theta)\)

More Powerful

Definition

For test \(C_1\) and \(C_2\) with same size, \(C_1\) is more powerful at \(\theta \in \Theta_1\) than \(C_2\) if \(P_\theta(C_1) \geq P_\theta(C_2)\)
\(C\) is most powerful at \(\theta \in \Theta_1\) if is more powerfull than any test of the same size
\(C\) is uniformly most powerful if it is most powerful at any \(\theta \in \Theta_1\)

Neyman-Pearson

Lemma (Neyman-Pearson)

Let \(\Theta = \{0, 1\}\), \(f_0\) and \(f_1\) be densities of \(P_0\) and \(P_1\), \(\tau(x) =f_1(x)/f_0(x)\) and \(C^* =\{x \in X: \tau(x) > c\}\). Then among all tests \(C\) s.t. \(P_0(C) = P_0(C^*)\), \(C^*\) is most powerful.

Example

\(X_i \sim N(\mu, 1)\)
\(H_0: \mu = 0\) against \(H_1: \mu = 1\)
Find a most powerful test

What is the most powerful test if \(H_1: \mu = a\) for \(a>0\) instead?

What is the uniformly most powerful test if \(H_1: \mu > 0\) ?

What is the uniformly most powerful test if \(H_1: \mu \neq 0\) ?

References

Lehmann, Erich L, and George Casella. 2006. Theory of Point Estimation. Springer Science & Business Media. https://link.springer.com/book/10.1007/b98854.

Lehmann, Erich Leo, and Joseph P Romano. n.d. Testing Statistical Hypotheses. Vol. 3. Springer. https://link.springer.com/book/10.1007/0-387-27605-X.

Song, Kyunchul. 2021. “Introduction to Econometrics.”