Convergence in Distribution

Paul Schrimpf

2023-10-31

Convergence in Distribution

Reading

  • Required: Song (2021) chapter 10

\[ \def\Er{{\mathrm{E}}} \def\En{{\mathbb{En}}} \def\cov{{\mathrm{Cov}}} \def\var{{\mathrm{Var}}} \def\R{{\mathbb{R}}} \newcommand\norm[1]{\left\lVert#1\right\rVert} \def\rank{{\mathrm{rank}}} \newcommand{\inpr}{ \overset{p^*_{\scriptscriptstyle n}}{\longrightarrow}} \def\inprob{{\,{\buildrel p \over \rightarrow}\,}} \def\indist{\,{\buildrel d \over \rightarrow}\,} \DeclareMathOperator*{\plim}{plim} \]

Convergence in Distribution

Definition

Random vectors \(X_1, X_2, ...\) converge in distribution to the random vector \(X\) if for all \(f \in \underbrace{\mathcal{C}_b}\) (continuous and bounded) \[ \Er[ f(X_n) ] \to \Er[f(X)] \] denoted by \(X_n \indist X\)

Relation to Convergence in Probability

Theorem 1.4

  1. If \(X_n \indist X\), then \(X_n = O_p(1)\)

  2. If \(c\) is a constant, then \(X_n \inprob c\) iff \(X_n \indist c\)

  3. If \(Y_n \inprob c\) and \(X_n \indist X\), then \((Y_n, X_n) \indist (c, X)\)

  4. If \(X_n \inprob X\), then \(X_n \indist X\)

Slutsky’s Lemma

Theorem 1.5 (Generalized Slutsky’s Lemma)

If \(Y_n \inprob c\), \(X_n \indist X\), and \(g\) is continuous, then \[ g(Y_n, X_n) \indist g(c,X) \]

  • Implies:
    • \(Y_n + X_n \indist c + X\)
    • \(Y_n X_n \indist c X\)
    • \(X_n/Y_n \indist X/c\)

Central Limit Theorem

Levy’s Continuity Theorem

Lemma 2.1 (Levy’s Continuity Theorem)

\(X_n \indist X\) iff \(\Er[e^{i t'X_n} ] \to \Er[e^{i t' X} ]\) for all \(t \in \R^d\)

  • see Döbler (2022) for a short proof
  • \(\Er[e^{i t' X}] \equiv \varphi(t)\) is the characteristic function of \(X\)

Law of Large Numbers Revisited

Lemma 2.2 (Weak Law of Large Numbers)

If \(X_1, ..., X_n\) are i.i.d. with \(\Er[|X_1|] < \infty\), then \(\frac{1}{n} \sum_{i=1}^n X_i \inprob \Er[X_1]\)

Theorem 2.2 (non-iid WLLN)

If \(\Er[X_i]=0\), \(\Er[X_i X_j] = 0\) for all \(i \neq j\) and \(\frac{1}{n} \max_{1 \leq j \leq n} \Er[X_j^2] \to 0\), then \(\frac{1}{n} \sum_{i=1}^n X_i \inprob 0\)

Central Limit Theorem

Theorem 2.3

Suppose \(X_1, ..., X_n \in \R\) are i.i.d. with \(\Er[X_1] = \mu\) and \(\var(X_1) = \sigma^2\), then \[ \frac{1}{\sqrt{n}} \sum_{i=1}^n \frac{X_i - \mu}{\sigma} \indist N(0,1) \]

using PlotlyLight, Distributions, Statistics
function simulateCLT(n, s, dist=Uniform())
  μ = mean(dist)
  σ = sqrt(var(dist))
  x = rand(dist, n, s)
  z = 1/sqrt(n)*sum( (x .- μ)./σ, dims=1)
end
dist = Uniform()
x = range(-2.5, 2.5, length=200)
N = [1, 2, 4, 16, 256]
S = 10_000
Fn = [let z=simulateCLT(n,S, dist);
    x->mean(z .<= x)
      end for n in N];

CDF

plotting code
using Cobweb
plt = Plot()
plt.layout = Config()
for i in axes(N)[1]
  plt(x=x, y=Fn[i].(x), name="N=$(N[i])")
end
plt(x=x, y=cdf.(Normal(),x), name="Normal CDF")
fig = plt()
Cobweb.save(Page(fig), "cdf.html")
HTML("<iframe src=\"cdf.html\" width=\"1200\"  height=\"700\"></iframe>")

Size Distortion

plotting code
p = range(0,1,length=200)
plt = Plot()
plt.layout = Config()
plt.layout.yaxis.title.text="p - Fn(Φ^{-1}(p))"
plt.layout.xaxis.title.text="p"
for i in axes(N)[1]
  plt(x=p,y=p - Fn[i].(quantile.(Normal(),p)), name="N=$(N[i])")
end
fig = plt()
Cobweb.save(Page(fig), "size.html")
HTML("<iframe src=\"size.html\" width=\"1200\"  height=\"700\"></iframe>")

Histogram

plotting code
bins = range(-2.5,2.5,length=21)
x = range(-2.5,2.5, length=1000)
plt = Plot();
plt.layout=Config()
function Hn(x, F)
  if (x <= bins[1] || x>bins[end])
    return 0.0
  end
  j = findfirst( x .<= bins )
  return (F(bins[j]) - F(bins[j-1]))
end
plt(x=x,y=Hn.(x,x->cdf(Normal(),x)), name="Normal")
for i in axes(N)[1]
  plt(x=x,y=Hn.(x, Fn[i]), name="N=$(N[i])")
end
fig = plt()
Cobweb.save(Page(fig), "hist.html")
HTML("<iframe src=\"hist.html\" width=\"1200\"  height=\"700\"></iframe>")

Cramér-Wold Device

Lemma 2.2

For \(X_n, X \in \R^d\), \(X_n \indist X\) iff \(t' X_n \indist t' X\) for all \(t \in \R^d\)

Multivariate Central Limit Theorem

Theorem 2.4

Suppose \(X_1, ..., X_n\) are i.i.d. with \(\Er[X_1] = \mu \in \R^d\), \(\Er[\norm{X_i}^2] < \infty\) and \(\var(X_1) = \Sigma > 0\), then \[ \frac{1}{\sqrt{n}} \sum_{i=1}^n (X_i - \mu) \indist N(0,\Sigma) \]

Delta Method

Theorem 3.1 (Delta Method)

Suppose that \(\hat{\theta}\) is a sequence of estimators of \(\theta_0 \in \R^d\), and \[ \sqrt{n}(\hat{\theta} - \theta_0) \indist S \] Also, assume that \(h: \R^d \to \R^k\) is differentiable at \(\theta_0\), then \[ \sqrt{n} \left( h(\hat{\theta}) - h(\theta_0) \right) \indist Dh(\theta_0) S \]

Delta Method: Example

  • What is the asymptotic distribution of \[ \hat{\sigma} = \sqrt{\frac{1}{n} \sum_{i=1}^n \left(x_i - \frac{1}{n} \sum_j=1^n x_j \right)^2}? \]

Continuous Mapping Theorem

Continuous Mapping Theorem

Let \(X_n \indist X\) and \(g\) be continuous on a set \(C\) with \(P(X \in C) = 1\), then \[ g(X_n) \indist g(X) \]

Continuous Mapping Theorem: Example

  • In linear regression, \[ y_i = x_i'\beta_0 + \epsilon_i \]
  • What is the asymptotic distribution of \[ M(\beta) = \left\Vert \frac{1}{n} \sum_{i=1} x_i (y_i - x_i'\beta) \right\Vert^2 \] when \(\beta=\beta_0\)?

i. non i.d. Central Limit Theorem

  • Triangular array \[ \begin{array}{ccc} X_{1,1}, & ..., & X_{1,k(1)} \\ X_{2,1}, & ..., & X_{2,k(2)} \\ \vdots & & \vdots \\ X_{n,1}, & ..., & X_{n,k(n)} \end{array} \] with \(k(n) \to \infty\) as \(n \to \infty\)

i. non i.d. Central Limit Theorem

Theorem 2.5 (Lindeberg’s Theorem)

Assume that for each \(n\), \(X_{n,1}, ..., X_{n,k(n)}\) are independent with \(\Er[X_{nj}] = 0\), and \(\frac{1}{k(n)} \sum_{j=1}^{k(n)} \Er[X_{nj}^2] = 1\) and for any \(\epsilon>0\), \[ \lim_{n \to \infty} \frac{1}{k(n)} \sum_{j=1}^{k(n)} \Er\left[ X_{nj}^2 1\{|X_{nj}|>\epsilon \sqrt{k(n)} \right] = 0 \] Then, \[ \frac{1}{\sqrt{k(n)}} \sum_{j=1}^{k(n)} X_{n,j} \indist N(0,1) \]

Characterizing Convergence in Distribution

Characterizing Convergence in Distribution

Lemma 1.2

\(X_n \indist X\) iff for any open \(G \subset \R^d\), \[ \liminf P(X_n \in G) \geq P(X \in G) \]

  • This and additional characterizations of convergence in distribution are called the Portmanteau Theorem

Characterizing Convergence in Distribution

Theorem 1.1

If \(X_n \indist X\) if and only if \(P(X_n \leq t) \to P(X \leq t)\) for all \(t\) where \(P(X \leq t)\) is continuous

Theorem 1.2

If \(X_n \indist X\) and \(X\) is continuous, then \[ \sup_{t \in \R^d} | P(X_n \leq t) - P(X \leq t) | \to 0 \]

References

Döbler, Christian. 2022. “A Short Proof of Lévy’s Continuity Theorem Without Using Tightness.” Statistics & Probability Letters 185: 109438. https://doi.org/https://doi.org/10.1016/j.spl.2022.109438.
Song, Kyunchul. 2021. “Introduction to Econometrics.”