Convergence in Distribution

Paul Schrimpf

2024-12-14

Convergence in Distribution

Reading

Required: Song (2021) chapter 10

\[ \def\Er{{\mathrm{E}}} \def\En{{\mathbb{En}}} \def\cov{{\mathrm{Cov}}} \def\var{{\mathrm{Var}}} \def\R{{\mathbb{R}}} \newcommand\norm[1]{\left\lVert#1\right\rVert} \def\rank{{\mathrm{rank}}} \newcommand{\inpr}{ \overset{p^*_{\scriptscriptstyle n}}{\longrightarrow}} \def\inprob{{\,{\buildrel p \over \rightarrow}\,}} \def\indist{\,{\buildrel d \over \rightarrow}\,} \DeclareMathOperator*{\plim}{plim} \]

Convergence in Distribution

Definition

Random vectors $X_1, X_2, ...$ converge in distribution to the random vector $X$ if for all $f \in \underbrace{\mathcal{C}_b}$ (continuous and bounded) \[ \Er[ f(X_n) ] \to \Er[f(X)] \] denoted by $X_n \indist X$

Relation to Convergence in Probability

Theorem 1.4

If $X_n \indist X$, then $X_n = O_p(1)$
If $c$ is a constant, then $X_n \inprob c$ iff $X_n \indist c$
If $Y_n \inprob c$ and $X_n \indist X$, then $(Y_n, X_n) \indist (c, X)$
If $X_n \inprob X$, then $X_n \indist X$

Slutsky’s Lemma

Theorem 1.5 (Generalized Slutsky’s Lemma)

If $Y_n \inprob c$, $X_n \indist X$, and $g$ is continuous, then \[ g(Y_n, X_n) \indist g(c,X) \]

Implies:
- $Y_n + X_n \indist c + X$
- $Y_n X_n \indist c X$
- $X_n/Y_n \indist X/c$

Central Limit Theorem

Levy’s Continuity Theorem

Lemma 2.1 (Levy’s Continuity Theorem)

$X_n \indist X$ iff $\Er[e^{i t'X_n} ] \to \Er[e^{i t' X} ]$ for all $t \in \R^d$

see Döbler (2022) for a short proof
$\Er[e^{i t' X}] \equiv \varphi(t)$ is the characteristic function of $X$

Law of Large Numbers Revisited

Lemma 2.2 (Weak Law of Large Numbers)

If $X_1, ..., X_n$ are i.i.d. with $\Er[|X_1|] < \infty$, then $\frac{1}{n} \sum_{i=1}^n X_i \inprob \Er[X_1]$

Theorem 2.2 (non-iid WLLN)

If $\Er[X_i]=0$, $\Er[X_i X_j] = 0$ for all $i \neq j$ and $\frac{1}{n} \max_{1 \leq j \leq n} \Er[X_j^2] \to 0$, then $\frac{1}{n} \sum_{i=1}^n X_i \inprob 0$

Central Limit Theorem

Theorem 2.3

Suppose $X_1, ..., X_n \in \R$ are i.i.d. with $\Er[X_1] = \mu$ and $\var(X_1) = \sigma^2$, then \[ \frac{1}{\sqrt{n}} \sum_{i=1}^n \frac{X_i - \mu}{\sigma} \indist N(0,1) \]

using PlotlyLight, Distributions, Statistics
function simulateCLT(n, s, dist=Uniform())
  μ = mean(dist)
  σ = sqrt(var(dist))
  x = rand(dist, n, s)
  z = 1/sqrt(n)*sum( (x .- μ)./σ, dims=1)
end
dist = Uniform()
x = range(-2.5, 2.5, length=200)
N = [1, 2, 4, 16, 256]
S = 10_000
Fn = [let z=simulateCLT(n,S, dist);
    x->mean(z .<= x)
      end for n in N];

CDF

plotting code

plt = Plot()
plt.layout = Config()
for i in axes(N)[1]
  plt(x=x, y=Fn[i].(x), name="N=$(N[i])")
end
plt(x=x, y=cdf.(Normal(),x), name="Normal CDF")
fig = plt()
fig

┌ Warning: `Plot(; kw...)` is deprecated. Use `plot(; kw...)` instead.
│   caller = top-level scope at indistribution.qmd:200
└ @ Core ~/626/site/asymptotics/indistribution.qmd:200

Size Distortion

plotting code

p = range(0,1,length=200)
plt = Plot()
plt.layout = Config()
plt.layout.yaxis.title.text="p - Fn(Φ^{-1}(p))"
plt.layout.xaxis.title.text="p"
for i in axes(N)[1]
  plt(x=p,y=p - Fn[i].(quantile.(Normal(),p)), name="N=$(N[i])")
end
fig = plt()
fig

┌ Warning: `Plot(; kw...)` is deprecated. Use `plot(; kw...)` instead.
│   caller = top-level scope at indistribution.qmd:221
└ @ Core ~/626/site/asymptotics/indistribution.qmd:221

Histogram

plotting code

bins = range(-2.5,2.5,length=21)
x = range(-2.5,2.5, length=1000)
plt = Plot();
plt.layout=Config()
function Hn(x, F)
  if (x <= bins[1] || x>bins[end])
    return 0.0
  end
  j = findfirst( x .<= bins )
  return (F(bins[j]) - F(bins[j-1]))
end
plt(x=x,y=Hn.(x,x->cdf(Normal(),x)), name="Normal")
for i in axes(N)[1]
  plt(x=x,y=Hn.(x, Fn[i]), name="N=$(N[i])")
end
fig = plt()
fig

┌ Warning: `Plot(; kw...)` is deprecated. Use `plot(; kw...)` instead.
│   caller = top-level scope at indistribution.qmd:243
└ @ Core ~/626/site/asymptotics/indistribution.qmd:243

Cramér-Wold Device

Lemma 2.2

For $X_n, X \in \R^d$, $X_n \indist X$ iff $t' X_n \indist t' X$ for all $t \in \R^d$

Multivariate Central Limit Theorem

Theorem 2.4

Suppose $X_1, ..., X_n$ are i.i.d. with $\Er[X_1] = \mu \in \R^d$ and $\var(X_1) = \Sigma > 0$, then \[ \frac{1}{\sqrt{n}} \sum_{i=1}^n (X_i - \mu) \indist N(0,\Sigma) \]

Delta Method

Theorem 3.1 (Delta Method)

Suppose that $\hat{\theta}$ is a sequence of estimators of $\theta_0 \in \R^d$, and \[ \sqrt{n}(\hat{\theta} - \theta_0) \indist S \] Also, assume that $h: \R^d \to \R^k$ is differentiable at $\theta_0$, then \[ \sqrt{n} \left( h(\hat{\theta}) - h(\theta_0) \right) \indist Dh(\theta_0) S \]

Delta Method: Example

What is the asymptotic distribution of \[ \hat{\sigma} = \sqrt{\frac{1}{n} \sum_{i=1}^n \left(x_i - \frac{1}{n} \sum_{j=1}^n x_j \right)^2}? \]

Continuous Mapping Theorem

Continuous Mapping Theorem

Let $X_n \indist X$ and $g$ be continuous on a set $C$ with $P(X \in C) = 1$, then \[ g(X_n) \indist g(X) \]

Continuous Mapping Theorem: Example

In linear regression, \[ y_i = x_i'\beta_0 + \epsilon_i \]
What is the asymptotic distribution of \[ M(\beta) = \left\Vert \frac{1}{\sqrt{n}} \sum_{i=1} x_i (y_i - x_i'\beta) \right\Vert^2 \] when $\beta=\beta_0$?

i. non i.d. Central Limit Theorem

Triangular array \[ \begin{array}{ccc} X_{1,1}, & ..., & X_{1,k(1)} \\ X_{2,1}, & ..., & X_{2,k(2)} \\ \vdots & & \vdots \\ X_{n,1}, & ..., & X_{n,k(n)} \end{array} \] with $k(n) \to \infty$ as $n \to \infty$

i. non i.d. Central Limit Theorem

Theorem 2.5 (Lindeberg’s Theorem)

Assume that for each $n$, $X_{n,1}, ..., X_{n,k(n)}$ are independent with $\Er[X_{nj}] = 0$, and $\frac{1}{k(n)} \sum_{j=1}^{k(n)} \Er[X_{nj}^2] = 1$ and for any $\epsilon>0$, \[ \lim_{n \to \infty} \frac{1}{k(n)} \sum_{j=1}^{k(n)} \Er\left[ X_{nj}^2 1\{|X_{nj}|>\epsilon \sqrt{k(n)} \right] = 0 \] Then, \[ \frac{1}{\sqrt{k(n)}} \sum_{j=1}^{k(n)} X_{n,j} \indist N(0,1) \]

Characterizing Convergence in Distribution

Lemma 1.2

$X_n \indist X$ iff for any open $G \subset \R^d$, \[ \liminf P(X_n \in G) \geq P(X \in G) \]

This and additional characterizations of convergence in distribution are called the Portmanteau Theorem

Characterizing Convergence in Distribution

Theorem 1.1

If $X_n \indist X$ if and only if $P(X_n \leq t) \to P(X \leq t)$ for all $t$ where $P(X \leq t)$ is continuous

Theorem 1.2

If $X_n \indist X$ and $X$ is continuous, then \[ \sup_{t \in \R^d} | P(X_n \leq t) - P(X \leq t) | \to 0 \]

Non-asymptotic

Weak Berry-Esseen Theorem

Weak Berry-Esseen Theorem

Let $X_i$ be i.i.d with $\Er[X]=0$, $\Er[X^2]=1$ and $\Er[|X|^3]$ finite. Let $\varphi$ be smooth with its first three derivatives uniformly bounded, and let $G \sim N(0,1)$. Then \[ \left\vert \Er\left[ \varphi\left( \frac{1}{\sqrt{n}} \sum_{i=1}^n X_i \right) \right] - \Er\left[\varphi(G)\right] \right\vert \leq C \frac{\Er[|X|^3]}{\sqrt{n}} \sup_{x \in \R} |\varphi'''(x)| \]

This is based on Tao (2010) https://terrytao.wordpress.com/2010/01/05/254a-notes-2-the-central-limit-theorem/#berry

Proof

Let $G_i \sim N(0,1)$. Define \[ Z_{n,i} = \frac{X_1 + \cdots X_i + G_{i+1} + \cdots G_n}{\sqrt{n}}, \] so that $Z_{n,n} = _{i=1}^n X_i $ and $Z_{n,0} \equiv G$.

With this notation we have \[ \Er\left[\varphi(Z_{n,n}) - \varphi(Z_{n,0}) \right] = - \sum_{i=0}^{n-i} \Er\left[ \varphi(Z_{n,i}) - \varphi(Z_{n,i+1})\right] \] and we will show that $\Er\left[ \varphi(Z_{n,i}) - \varphi(Z_{n,i+1})\right]$ is small.

Let $S_{n,i} = Z_{n,i} - \frac{G_{i+1}}{\sqrt{n}} = Z_{n,i+1} - \frac{X_{i+1}}/\sqrt{n}$ and take a second order Taylor expansion of $\varphi$ around $S_{n,i}$, giving \[ \varphi(Z_{n,i}) = \varphi(S_{n,i}) + \varphi'(S_{n,i}) \frac{G_{i+1}}{\sqrt{n}} + \frac{1}{2}\varphi''(S_{n,i}) \frac{G_{i+1}^2}{n} + O\left( \frac{|G_{i+1}|^3}{n^{3/2}} \sup \varphi'''(x) \right) \] and \[ \varphi(Z_{n,i+i}) = \varphi(S_{n,i}) + \varphi'(S_{n,i}) \frac{X_{i+1}}{\sqrt{n}} + \frac{1}{2}\varphi''(S_{n,i}) \frac{X_{i+1}^2}{n} + O\left( \frac{|X_{i+1}|^3}{n^{3/2}} \sup \varphi'''(x) \right). \]

We have assumed that $\Er[X_i] = \Er[G_i]$ and $\Er[X_i^2] = \Er[G_i^2]$, so \[ \left\vert \Er\left[ \varphi(Z_{n,i}) - \varphi(Z_{n,i+1})\right] \right\vert \leq C( \frac{|G_{i+1}|^3 + |X_{i+1}|^3}{n^{3/2}} \sup \varphi'''(x) \right) \] for some constant $C$ that depends on $\varphi$, but not on $n$ or $i$. We can conclude that \[ \Er\left[\varphi(Z_{n,n}) - \varphi(Z_{n,0}) \right] \leq n C( \frac{|G_{i+1}|^3 + |X_{i+1}|^3}{n^{3/2}} \sup \varphi'''(x) \right). \]

Berry-Esseen Theorem

Berry-Esseen Theorem

If $X_i$ are i.i.d. with $\Er[X] = 0$ and $\var(X)=1$, then \[ \sup_{z \in \R} \left\vert P\left(\left[\frac{1}{\sqrt{n}} \sum_{i=1}^n X_i\right] \leq z \right) - \Phi(z) \right\vert \leq 0.5 \Er[|X|^3]/\sqrt{n} \] where $\Phi$ is the normal CDF.

Multivariate Berry-Esseen Theorem

If $X_i \in \R^d$ are i.i.d. with $\Er[X] = 0$ and $\var(X)=I_d$, then \[ \sup_{A \subset \R^d, \text{convex}} \left\vert P\left(\frac{1}{\sqrt{n}} \sum_{i=1}^n X_i \in A \right) - P(N(0,I_d) \in A) \right\vert \leq (42 d^{1/4} + 16) \Er[\Vert X \Vert ^3]/\sqrt{n} \]

Simulated Illustration of Berry-Esseen CLT

plotting code

using Plots, Distributions

function dgp(n, xhi=2)
  p = 1/(1+xhi^2)
  xlo = -p*xhi/(1-p)
  hi = rand(n) .< p
  x = ifelse.(hi, xhi, xlo)
end

function Ex3(xhi)
  p = 1/(1+xhi^2)
  xlo = -p*xhi/(1-p)
  p*xhi^3 + (1-p)*-xlo^3
end

function plotcdfwithbounds(dgp,  e3, n=[10,100,1000], S=9999)
  cmap = palette(:tab10)
  x = range(-2.5, 2.5, length=200)
  cdfx=x->cdf(Normal(), x)
  fig=Plots.plot(x, cdfx, label="Normal", color="black", linestyle=:dash)
  for (i,ni) in enumerate(n)
    truedist = [mean(dgp(ni))*sqrt(ni) for _ in 1:S]
    ecdf = x-> mean(truedist .<= x)
    Plots.plot!(x, ecdf, label="n=$ni", color=cmap[i])
    Plots.plot!(x, cdfx.(x), ribbon = 0.5*e3/√ni, fillalpha=0.2, label="", color=cmap[i])
  end
  xlims!(-2.5,2.5)
  ylims!(0,1)
  title!("Distribution of Scaled Sample Mean")
  return(fig)
end
xhi = 2.5
plotcdfwithbounds(n->dgp(n,xhi), Ex3(xhi))

Simulated Illustration of Berry-Esseen CLT : Slack Bounds

References

Döbler, Christian. 2022. “A Short Proof of Lévy’s Continuity Theorem Without Using Tightness.” Statistics & Probability Letters 185: 109438. https://doi.org/https://doi.org/10.1016/j.spl.2022.109438.

Raič, Martin. 2019. “A multivariate Berry–Esseen theorem with explicit constants.” Bernoulli 25 (4A): 2824–53. https://doi.org/10.3150/18-BEJ1072.

Song, Kyunchul. 2021. “Introduction to Econometrics.”

Tao, Terence. 2010. “254A, Notes 2: The Central Limit Theorem.” https://terrytao.wordpress.com/2010/01/05/254a-notes-2-the-central-limit-theorem/.