Estimation

Paul Schrimpf

2024-09-03

Reading

  • Required: Song (2021) chapter 4, sections 1.2 and 2 (which is the basis for these slides)
  • Supplemental: Erich Leo Lehmann and Romano (n.d.) , Erich L. Lehmann and Casella (2006)

\[ \def\Er{{\mathrm{E}}} \def\cov{{\mathrm{Cov}}} \def\var{{\mathrm{Var}}} \def\R{{\mathbb{R}}} \]

Estimation

Estimator

  • Given a parameter of interest \(\theta_0\), an estimator is a measurable function of an observed random vector X, i.e. \(\hat{\theta} = \tau(X)\) for some known map \(\tau\)

  • An estimate given \(X=x\) is \(\tau(x)\)

Sample Analogue Estimation

  • i.i.d. observations from \(P\), \(X = (X_1, ...., X_n)\)
  • constructively identified parameter \(\theta_0 = \psi(P)\)
  • empirical measure: \[ \hat{P}(B) = \frac{1}{n} \sum_{i=1}^n 1\{X_i \in B \}. \]
  • Sample analogue estimator \[ \hat{\theta} = \psi(\hat{P}) \]

Sample Analogue Estimation - Examples

  • Mean
  • OLS

Maximum Likelihood Estimation

  • \(X \in \R^n\) distribution \(P_X \in \mathcal{P} = \{P_\theta: \theta \in \Theta \subset \R^d \}\)

  • \(P_\theta\) dominated by \(\sigma\)-finite \(\mu\) with density \(f_X(\cdot;\theta)\)

  • Likelihood \(\ell(\cdot, X): \Theta \to [0,\infty)\) \[ \ell(\theta; X)= f(X; \theta) \]

  • Maximum likelihood estimator \[ \hat{\theta}_{MLE} = \textrm{arg}\max_{\theta \in \Theta} \ell(\theta;X) \]

MLE: Examples

  • \(X_i \sim N(\mu, 1)\)
  • \(Y_i = \alpha_0 + \beta_0 X_i + \epsilon_i\), \(\epsilon_i \sim N(0, \sigma_0^2)\)

MLE: Equivariance

Theorem 1.1

If \(\hat{\theta}\) is the MLE of \(\theta\), then for any function \(g:\Theta \to G\), the MLE of \(g(\theta)\) is \(g(\hat{\theta})\).

Quality of Estimators

Mean Squared Error

  • Loss function \(L: \R^d \times \Theta \to [0,\infty)\) with \(L(\theta,\theta)=0\)
  • Risk of at \(\theta_0\) \(\Er[L(\hat{\theta}, \theta_0)]\)
  • Squared error loss \(L_2(\theta, \theta_0) = (\theta-\theta_0)'(\theta-\theta_0)\)
  • Mean squared error \[ MSE(\hat{\theta}) = \Er[ (\theta-\theta_0)'(\theta-\theta_0) ] \]
  • Bias-variance decomposition \[ MSE(\hat{\theta}) = \textrm{Bias}(\hat{\theta})'\textrm{Bias}(\hat{\theta}) + \textrm{tr}(\var(\hat{\theta})) \]

Optimal Estimation in Parametric Models

Setup

  • \(X \in \R^n\) distribution \(P_X \in \mathcal{P} = \{P_\theta: \theta \in \Theta \subset \R^d \}\), likelihood \(\ell(\theta;x) = f_X(x;\theta)\)

  • Question: if an estimator is unbiased, what is the smallest possible variance?

Score Equality

  • If \(\frac{\partial}{\partial \theta} \int f_X(x;\theta) d\mu(x) = \int \frac{\partial}{\partial \theta} f_X(x;\theta) d\mu(x)\), then \[ \int \underbrace{\frac{\partial \log \ell(\theta;x)}{\partial \theta}}_{\text{"score"}=s(x,\theta)} dP_\theta(x) = 0 \]

Information Equality

  • Fischer Information \(I(\theta) = \int s(x,\theta) s(x,\theta)' dP_\theta(x)\)
  • If \(\frac{\partial^2}{\partial \theta\partial \theta'} \int f_X(x;\theta) d\mu(x) = \int \frac{\partial^2}{\partial \theta\partial \theta'} f_X(x;\theta) d\mu(x)\), then \[ I(\theta) = -\int \underbrace{\frac{\partial^2 \ell(\theta;x)}{\partial \theta \partial \theta'}}_{\text{"Hessian"}=h(x,\theta)} dP_\theta(x) \]

  • If \(T = \tau(X)\) is an unbiased estimator for \(\theta\) and \[ \frac{\partial}{\partial \theta} \int \tau(x) f_X(x;\theta) d\mu(x) = \int \tau(x) \frac{\partial f_X(x,\theta)}{\partial \theta\partial \theta'} d\mu(x) \] then \[ \int \tau(x) s(x,\theta)'dP_\theta(x) = I \]

Cramér-Rao Bound

Cramér-Rao Bound

Let \(T = \tau(X)\) be an unbiased estimator, and suppose the condition of the previous slide and of the score equality hold. Then, \[ \var_\theta(\tau(X)) \equiv \int \left(\tau(x) - \int \tau(x) dP_\theta\right)\left(\tau(x) - \int \tau(x) dP_\theta\right)' dP\theta \geq I(\theta)^{-1} \]

Hypothesis Testing

Hypothesis Testing

  • \(X \in \mathcal{X} \subset \R^n\), distribution \(P_x \in \mathcal{P}\)
  • Partition \(\mathcal{P} = \mathcal{P}_0 \cup \mathcal{P}_1\)
  • Null and alternative hypotheses:
    • \(H_0: \; P_x \in \mathcal{P}_0\)
    • \(H_1: \; P_x \in \mathcal{P}_1\)

Hypothesis Testing

  • Test partitions \(\mathcal{X} = \underbrace{C}_{\text{critical region}} \cup A\)
    • Reject null if \(X \in C\)
    • Often \(C = \{x \in \mathcal{X}: \underbrace{\tau(x)}_{\text{test statistic}} > \underbrace{c}_{\text{critical value}} \}\)

Hypothesis Testing

  • \(P(\text{reject } H_0 | P_x \in \mathcal{P}_0)\)=Type I error \(=P_x(C)\)
  • \(P(\text{fail to reject } H_0 | P_x \in \mathcal{P}_1)\)=Type II error
  • \(P(\text{reject } H_0 | P_x \in \mathcal{P}_1)\) = power
  • \(\sup_{P_x \in \mathcal{P}_0} P_x(C)\) = size of test

p-value

  • test statistic \(\tau(X)\) , define \[ G_P(t) = P(\tau(X) > t) \]
  • p-value is \[ p= \sup_{P \in \mathcal{P}_0} G_P(\tau(X)) \]
  • if \(\mathcal{P}_0 = \{P_0\}\), critical value \(c\), let \(\alpha = G_{P_0}(c)\), then \(\tau(X) > c\) iff \(p < \alpha\)

Testing in Parametric Family

  • Parametric family \(\mathcal{P} = \{P_\theta: \theta \in \Theta\}\)
    • \(\Theta_0 = \{\theta \in \Theta: P_\theta \in \mathcal{P}_0\}\)
    • \(\Theta_1 = \{\theta \in \Theta: P_\theta \in \mathcal{P}_1\}\)
  • Hypotheses
    • \(H_0 : \theta \in \Theta_0\)
    • \(H_1: \theta \in \Theta_1\)
  • Power function of test \(C\) \[\pi:\Theta \to [0,1] , \;\; \pi(\theta) = P_\theta(C)\]
  • Size \(= \sup_{\theta \in \Theta_0} \pi(\theta)\)

More Powerful

Definition

  • For test \(C_1\) and \(C_2\) with same size, \(C_1\) is more powerful at \(\theta \in \Theta_1\) than \(C_2\) if \(P_\theta(C_1) \geq P_\theta(C_2)\)
  • \(C\) is most powerful at \(\theta \in \Theta_1\) if is more powerfull than any test of the same size
  • \(C\) is uniformly most powerful if it is most powerful at any \(\theta \in \Theta_1\)

Neyman-Pearson

Lemma (Neyman-Pearson)

Let \(\Theta = \{0, 1\}\), \(f_0\) and \(f_1\) be densities of \(P_0\) and \(P_1\), \(\tau(x) =f_1(x)/f_0(x)\) and \(C^* =\{x \in X: \tau(x) > c\}\). Then among all tests \(C\) s.t. \(P_0(C) = P_0(C^*)\), \(C^*\) is most powerful.

Example

  • \(X_i \sim N(\mu, 1)\)
  • \(H_0: \mu = 0\) against \(H_1: \mu = 1\)
  • Find a most powerful test
  • What is the most powerful test if \(H_1: \mu = a\) for \(a>0\) instead?
  • What is the uniformly most powerful test if \(H_1: \mu > 0\) ?
  • What is the uniformly most powerful test if \(H_1: \mu \neq 0\) ?

References

Lehmann, Erich L, and George Casella. 2006. Theory of Point Estimation. Springer Science & Business Media. https://link.springer.com/book/10.1007/b98854.
Lehmann, Erich Leo, and Joseph P Romano. n.d. Testing Statistical Hypotheses. Vol. 3. Springer. https://link.springer.com/book/10.1007/0-387-27605-X.
Song, Kyunchul. 2021. “Introduction to Econometrics.”