Convergence in Distribution

Paul Schrimpf

2024-12-14

Convergence in Distribution

Reading

  • Required: Song (2021) chapter 10

\[ \def\Er{{\mathrm{E}}} \def\En{{\mathbb{En}}} \def\cov{{\mathrm{Cov}}} \def\var{{\mathrm{Var}}} \def\R{{\mathbb{R}}} \newcommand\norm[1]{\left\lVert#1\right\rVert} \def\rank{{\mathrm{rank}}} \newcommand{\inpr}{ \overset{p^*_{\scriptscriptstyle n}}{\longrightarrow}} \def\inprob{{\,{\buildrel p \over \rightarrow}\,}} \def\indist{\,{\buildrel d \over \rightarrow}\,} \DeclareMathOperator*{\plim}{plim} \]

Convergence in Distribution

Definition

Random vectors \(X_1, X_2, ...\) converge in distribution to the random vector \(X\) if for all \(f \in \underbrace{\mathcal{C}_b}\) (continuous and bounded) \[ \Er[ f(X_n) ] \to \Er[f(X)] \] denoted by \(X_n \indist X\)

Relation to Convergence in Probability

Theorem 1.4

  1. If \(X_n \indist X\), then \(X_n = O_p(1)\)

  2. If \(c\) is a constant, then \(X_n \inprob c\) iff \(X_n \indist c\)

  3. If \(Y_n \inprob c\) and \(X_n \indist X\), then \((Y_n, X_n) \indist (c, X)\)

  4. If \(X_n \inprob X\), then \(X_n \indist X\)

Slutsky’s Lemma

Theorem 1.5 (Generalized Slutsky’s Lemma)

If \(Y_n \inprob c\), \(X_n \indist X\), and \(g\) is continuous, then \[ g(Y_n, X_n) \indist g(c,X) \]

  • Implies:
    • \(Y_n + X_n \indist c + X\)
    • \(Y_n X_n \indist c X\)
    • \(X_n/Y_n \indist X/c\)

Central Limit Theorem

Levy’s Continuity Theorem

Lemma 2.1 (Levy’s Continuity Theorem)

\(X_n \indist X\) iff \(\Er[e^{i t'X_n} ] \to \Er[e^{i t' X} ]\) for all \(t \in \R^d\)

  • see Döbler (2022) for a short proof
  • \(\Er[e^{i t' X}] \equiv \varphi(t)\) is the characteristic function of \(X\)

Law of Large Numbers Revisited

Lemma 2.2 (Weak Law of Large Numbers)

If \(X_1, ..., X_n\) are i.i.d. with \(\Er[|X_1|] < \infty\), then \(\frac{1}{n} \sum_{i=1}^n X_i \inprob \Er[X_1]\)

Theorem 2.2 (non-iid WLLN)

If \(\Er[X_i]=0\), \(\Er[X_i X_j] = 0\) for all \(i \neq j\) and \(\frac{1}{n} \max_{1 \leq j \leq n} \Er[X_j^2] \to 0\), then \(\frac{1}{n} \sum_{i=1}^n X_i \inprob 0\)

Central Limit Theorem

Theorem 2.3

Suppose \(X_1, ..., X_n \in \R\) are i.i.d. with \(\Er[X_1] = \mu\) and \(\var(X_1) = \sigma^2\), then \[ \frac{1}{\sqrt{n}} \sum_{i=1}^n \frac{X_i - \mu}{\sigma} \indist N(0,1) \]

using PlotlyLight, Distributions, Statistics
function simulateCLT(n, s, dist=Uniform())
  μ = mean(dist)
  σ = sqrt(var(dist))
  x = rand(dist, n, s)
  z = 1/sqrt(n)*sum( (x .- μ)./σ, dims=1)
end
dist = Uniform()
x = range(-2.5, 2.5, length=200)
N = [1, 2, 4, 16, 256]
S = 10_000
Fn = [let z=simulateCLT(n,S, dist);
    x->mean(z .<= x)
      end for n in N];

CDF

plotting code
plt = Plot()
plt.layout = Config()
for i in axes(N)[1]
  plt(x=x, y=Fn[i].(x), name="N=$(N[i])")
end
plt(x=x, y=cdf.(Normal(),x), name="Normal CDF")
fig = plt()
fig
┌ Warning: `Plot(; kw...)` is deprecated. Use `plot(; kw...)` instead.
│   caller = top-level scope at indistribution.qmd:200
└ @ Core ~/626/site/asymptotics/indistribution.qmd:200

Size Distortion

plotting code
p = range(0,1,length=200)
plt = Plot()
plt.layout = Config()
plt.layout.yaxis.title.text="p - Fn(Φ^{-1}(p))"
plt.layout.xaxis.title.text="p"
for i in axes(N)[1]
  plt(x=p,y=p - Fn[i].(quantile.(Normal(),p)), name="N=$(N[i])")
end
fig = plt()
fig
┌ Warning: `Plot(; kw...)` is deprecated. Use `plot(; kw...)` instead.
│   caller = top-level scope at indistribution.qmd:221
└ @ Core ~/626/site/asymptotics/indistribution.qmd:221

Histogram

plotting code
bins = range(-2.5,2.5,length=21)
x = range(-2.5,2.5, length=1000)
plt = Plot();
plt.layout=Config()
function Hn(x, F)
  if (x <= bins[1] || x>bins[end])
    return 0.0
  end
  j = findfirst( x .<= bins )
  return (F(bins[j]) - F(bins[j-1]))
end
plt(x=x,y=Hn.(x,x->cdf(Normal(),x)), name="Normal")
for i in axes(N)[1]
  plt(x=x,y=Hn.(x, Fn[i]), name="N=$(N[i])")
end
fig = plt()
fig
┌ Warning: `Plot(; kw...)` is deprecated. Use `plot(; kw...)` instead.
│   caller = top-level scope at indistribution.qmd:243
└ @ Core ~/626/site/asymptotics/indistribution.qmd:243

Cramér-Wold Device

Lemma 2.2

For \(X_n, X \in \R^d\), \(X_n \indist X\) iff \(t' X_n \indist t' X\) for all \(t \in \R^d\)

Multivariate Central Limit Theorem

Theorem 2.4

Suppose \(X_1, ..., X_n\) are i.i.d. with \(\Er[X_1] = \mu \in \R^d\) and \(\var(X_1) = \Sigma > 0\), then \[ \frac{1}{\sqrt{n}} \sum_{i=1}^n (X_i - \mu) \indist N(0,\Sigma) \]

Delta Method

Theorem 3.1 (Delta Method)

Suppose that \(\hat{\theta}\) is a sequence of estimators of \(\theta_0 \in \R^d\), and \[ \sqrt{n}(\hat{\theta} - \theta_0) \indist S \] Also, assume that \(h: \R^d \to \R^k\) is differentiable at \(\theta_0\), then \[ \sqrt{n} \left( h(\hat{\theta}) - h(\theta_0) \right) \indist Dh(\theta_0) S \]

Delta Method: Example

  • What is the asymptotic distribution of \[ \hat{\sigma} = \sqrt{\frac{1}{n} \sum_{i=1}^n \left(x_i - \frac{1}{n} \sum_{j=1}^n x_j \right)^2}? \]

Continuous Mapping Theorem

Continuous Mapping Theorem

Let \(X_n \indist X\) and \(g\) be continuous on a set \(C\) with \(P(X \in C) = 1\), then \[ g(X_n) \indist g(X) \]

Continuous Mapping Theorem: Example

  • In linear regression, \[ y_i = x_i'\beta_0 + \epsilon_i \]
  • What is the asymptotic distribution of \[ M(\beta) = \left\Vert \frac{1}{\sqrt{n}} \sum_{i=1} x_i (y_i - x_i'\beta) \right\Vert^2 \] when \(\beta=\beta_0\)?

i. non i.d. Central Limit Theorem

  • Triangular array \[ \begin{array}{ccc} X_{1,1}, & ..., & X_{1,k(1)} \\ X_{2,1}, & ..., & X_{2,k(2)} \\ \vdots & & \vdots \\ X_{n,1}, & ..., & X_{n,k(n)} \end{array} \] with \(k(n) \to \infty\) as \(n \to \infty\)

i. non i.d. Central Limit Theorem

Theorem 2.5 (Lindeberg’s Theorem)

Assume that for each \(n\), \(X_{n,1}, ..., X_{n,k(n)}\) are independent with \(\Er[X_{nj}] = 0\), and \(\frac{1}{k(n)} \sum_{j=1}^{k(n)} \Er[X_{nj}^2] = 1\) and for any \(\epsilon>0\), \[ \lim_{n \to \infty} \frac{1}{k(n)} \sum_{j=1}^{k(n)} \Er\left[ X_{nj}^2 1\{|X_{nj}|>\epsilon \sqrt{k(n)} \right] = 0 \] Then, \[ \frac{1}{\sqrt{k(n)}} \sum_{j=1}^{k(n)} X_{n,j} \indist N(0,1) \]

Characterizing Convergence in Distribution

Characterizing Convergence in Distribution

Lemma 1.2

\(X_n \indist X\) iff for any open \(G \subset \R^d\), \[ \liminf P(X_n \in G) \geq P(X \in G) \]

  • This and additional characterizations of convergence in distribution are called the Portmanteau Theorem

Characterizing Convergence in Distribution

Theorem 1.1

If \(X_n \indist X\) if and only if \(P(X_n \leq t) \to P(X \leq t)\) for all \(t\) where \(P(X \leq t)\) is continuous

Theorem 1.2

If \(X_n \indist X\) and \(X\) is continuous, then \[ \sup_{t \in \R^d} | P(X_n \leq t) - P(X \leq t) | \to 0 \]

Non-asymptotic

Weak Berry-Esseen Theorem

Weak Berry-Esseen Theorem

Let \(X_i\) be i.i.d with \(\Er[X]=0\), \(\Er[X^2]=1\) and \(\Er[|X|^3]\) finite. Let \(\varphi\) be smooth with its first three derivatives uniformly bounded, and let \(G \sim N(0,1)\). Then \[ \left\vert \Er\left[ \varphi\left( \frac{1}{\sqrt{n}} \sum_{i=1}^n X_i \right) \right] - \Er\left[\varphi(G)\right] \right\vert \leq C \frac{\Er[|X|^3]}{\sqrt{n}} \sup_{x \in \R} |\varphi'''(x)| \]

1

Berry-Esseen Theorem

Berry-Esseen Theorem

If \(X_i\) are i.i.d. with \(\Er[X] = 0\) and \(\var(X)=1\), then \[ \sup_{z \in \R} \left\vert P\left(\left[\frac{1}{\sqrt{n}} \sum_{i=1}^n X_i\right] \leq z \right) - \Phi(z) \right\vert \leq 0.5 \Er[|X|^3]/\sqrt{n} \] where \(\Phi\) is the normal CDF.

Multivariate Berry-Esseen Theorem

If \(X_i \in \R^d\) are i.i.d. with \(\Er[X] = 0\) and \(\var(X)=I_d\), then \[ \sup_{A \subset \R^d, \text{convex}} \left\vert P\left(\frac{1}{\sqrt{n}} \sum_{i=1}^n X_i \in A \right) - P(N(0,I_d) \in A) \right\vert \leq (42 d^{1/4} + 16) \Er[\Vert X \Vert ^3]/\sqrt{n} \]

1

Simulated Illustration of Berry-Esseen CLT

plotting code
using Plots, Distributions

function dgp(n, xhi=2)
  p = 1/(1+xhi^2)
  xlo = -p*xhi/(1-p)
  hi = rand(n) .< p
  x = ifelse.(hi, xhi, xlo)
end

function Ex3(xhi)
  p = 1/(1+xhi^2)
  xlo = -p*xhi/(1-p)
  p*xhi^3 + (1-p)*-xlo^3
end

function plotcdfwithbounds(dgp,  e3, n=[10,100,1000], S=9999)
  cmap = palette(:tab10)
  x = range(-2.5, 2.5, length=200)
  cdfx=x->cdf(Normal(), x)
  fig=Plots.plot(x, cdfx, label="Normal", color="black", linestyle=:dash)
  for (i,ni) in enumerate(n)
    truedist = [mean(dgp(ni))*sqrt(ni) for _ in 1:S]
    ecdf = x-> mean(truedist .<= x)
    Plots.plot!(x, ecdf, label="n=$ni", color=cmap[i])
    Plots.plot!(x, cdfx.(x), ribbon = 0.5*e3/√ni, fillalpha=0.2, label="", color=cmap[i])
  end
  xlims!(-2.5,2.5)
  ylims!(0,1)
  title!("Distribution of Scaled Sample Mean")
  return(fig)
end
xhi = 2.5
plotcdfwithbounds(n->dgp(n,xhi), Ex3(xhi))

Simulated Illustration of Berry-Esseen CLT : Slack Bounds

References

Döbler, Christian. 2022. “A Short Proof of Lévy’s Continuity Theorem Without Using Tightness.” Statistics & Probability Letters 185: 109438. https://doi.org/https://doi.org/10.1016/j.spl.2022.109438.
Raič, Martin. 2019. A multivariate Berry–Esseen theorem with explicit constants.” Bernoulli 25 (4A): 2824–53. https://doi.org/10.3150/18-BEJ1072.
Song, Kyunchul. 2021. “Introduction to Econometrics.”
Tao, Terence. 2010. “254A, Notes 2: The Central Limit Theorem.” https://terrytao.wordpress.com/2010/01/05/254a-notes-2-the-central-limit-theorem/.