Introduction to Causality and Potential Outcomes

ECON526

Paul Schrimpf

University of British Columbia

Overview

\[ \def\indep{\perp\!\!\!\perp} % \def\idp{\perp\kern-5pt\perp} \def\Er{\mathrm{E}} \def\R{\mathbb{R}} \def\En{{\mathbb{E}_n}} \def\Pr{\mathrm{P}} \newcommand{\norm}[1]{\left\Vert {#1} \right\Vert} \newcommand{\abs}[1]{\left\vert {#1} \right\vert} \def\inprob{{\,{\buildrel p \over \rightarrow}\,}} \def\indist{\,{\buildrel d \over \rightarrow}\,} \DeclareMathOperator*{\plim}{plim} \DeclareMathOperator*{\argmax}{arg\,max} \DeclareMathOperator*{\argmin}{arg\,min} \]

Summary

  • Potential outcomes, treatment effects
  • Randomized experiments

Potential Outcomes Framework

Treatment

  • \(T_i \in \{0,1\}\)
  • Observed outcome \(Y_i\)
  • We want the causal effect of \(T\) on \(Y\), but what does that mean?
    • Potential outcomes give a rigorous definition

Potential Outcomes

  • Potential outcomes \(Y_i(0), Y_i(1)\) if \(T_i = 0\) or \(1\)
  • Observe \(Y_i = Y_i(T_i)\)
  • Assume: No Interference: \((Y_i(0), Y_i(1))\) is unaffected by \(T_j\) for \(j \neq i\)
    • aka Stable Unit-Treatment Value Assumption (SUTVA)
  • Treatment effect on \(i\) = \(Y_i(1) - Y_i(0)\)

Fundamental Problem of Causal Inference

  • Only observe \(Y_i(1)\) or \(Y_i(0)\), never both
  • Individual effects, \(Y_i(1) - Y_i(0)\), generally impossible to recover
  • Summaries of individual effects, e.g. \(\Er[Y_i(1) - Y_i(0)]\), possible to estimate, but require assumptions

Average Treatment Effect

  • Want the average treatment effect \[ ATE = \Er[Y_i(1) - Y_i(0)] \]
  • Can’t estimate \(\Er[Y_i(d)]\), because \(Y_i(d)\) not always observed

Average Population Effect

  • Can estimate \(\Er[Y_i(d)|T_i=d]\)
  • Average population effect \[ APE = \Er[Y_i(1)|T_i=1] - \Er[Y_i(0)|T_i=0] \]
  • How does it compare to the ATE?

Selection Bias

  • Comparing ATE and APE \[ % \begin{align*} ATE = & \Er[Y_i(1) - Y_i(0)] \\ = & \overbrace{\Er[Y_i(1) - Y_i(0) | T_i=1]}^{\text{avg treatment effect on treated}} P(T_i=1) + \overbrace{\Er[Y_i(1) - Y_i(0) | T_i=0]}^{\text{avg treatment effect on untreated}} P(T_i=0) \\ = & \left(APE + \overbrace{\Er[Y_i(0)|T_i=0] - \Er[Y_i(0)|T_i=1]}^{\text{selection bias}}\right) P(T_i=1) + \\ & + \left(APE + \underbrace{\Er[Y_i(1)|T_i=0] - \Er[Y_i(1)|T_i=1]}_{\text{selection bias}}\right) P(T_i=0) \end{align*} \]

Selection Bias

  • Or, \[ % APE = ATE + \underbrace{\begin{pmatrix} (\Er[Y_i(0) | T_i=1] - \Er[Y_i(0)|T_i=0])P(T_i=1) + \\ + (\Er[Y_i(1) | T_i=1] - \Er[Y_i(1)|T_i=0])P(T_i=0) \end{pmatrix}}_{\text{selection bias}} \]
  • Selection bias is nonzero if the treated and untreated groups would be different even if everyone had been treated or untreated
  • Selection bias usually nonzero if people select their own treatment

Selection Bias Example

  • People have some (possibly noisy) information about \(Y_i(0), Y_i(1)\) and choose \(T_i\) they prefer
    • e.g. \(T_i = \arg\max_{d\in \{0,1\}} \Er[U(Y_i(d)) | \mathcal{I}_i]\)
  • Simulation
    • \(i\) observes signal \(S_i(0) = Y_i(0) + \epsilon_i(0)\) and \(S_i(1) = Y_i(1) + \epsilon_i(1)\)
    • \(\epsilon_i(d) \sim N(0,\sigma^2)\), independent
    • Chooses \(\max_d \Er[Y_i(d)|S_i(0), S_i(1)] = \max_d S_i(d)\)
Code
import numpy as np
np.random.seed(0)
class selectiondata:
    def __init__(self, n=1000, noisesd=1.0, ate=0.5):
        self.Y0 = np.random.normal(size=n)
        self.Y1 = np.random.normal(size=n) + ate
        self.S0 = self.Y0 + np.random.normal(size=n)*noisesd
        self.S1 = self.Y1 + np.random.normal(size=n)*noisesd
        self.T = (self.S1 > self.S0).astype(int)
        self.Y = self.Y0 * (1 - self.T) + self.Y1 * self.T

    def APE(self):
        return np.mean(self.Y[self.T==1]) - np.mean(self.Y[self.T==0])

    def ATE(self):
        return np.mean(self.Y1) - np.mean(self.Y0)

    def selectionbias(self):
        return (self.APE() - self.ATE())

    def selectionbias0(self):
        return np.mean( self.Y0[self.T==1]) - np.mean( self.Y0[self.T==0] )

    def selectionbias1(self):
        return np.mean( self.Y1[self.T==1]) - np.mean(self.Y1[self.T==0] )


s = 0.5
eate = 0.5
data = selectiondata(n=10_000,noisesd=s, ate=eate)

print("|APE|ATE|Selection Bias|\n" +
      "|---|---|---|\n" +
      f"|{data.APE():.2}|{data.ATE():.2}|{data.selectionbias():.2}|\n"
      f"|σ={s:.2}|\n\n")
APE ATE Selection Bias
0.27 0.53 -0.26
σ=0.5

Random Experiments

Random Experiment

  • Assign treatment randomly \[ T_i \indep (Y_i(0),Y_i(1)) \]
  • Implies \[ \Er[Y_i(1) | T_i=1] = \Er[Y_i(1)] \text{ and } \Er[Y_i(0) | T_i=0] = \Er[Y_i(0)] \]
  • So \[ \begin{align*} APE = & \Er[Y_i(1)|T_i=1] - \Er[Y_i(0)|T_i=0] \\ = & \Er[Y_i(1)] - \Er[Y_i(0)] \\ = & ATE \end{align*} \]

Example: Pfizer Covid Vaccine RCT

  • Number of participants and number infected by treatment status
Group Treated Placebo
All 19965 20172
Infected 9 169
65+ 4044 4067
65+ Infected 1 19

Example Pfizer Covid Vaccine RCT

import statsmodels.api as sm

class binarybinaryrct :
    def __init__(self, NT, NU, NYT, NYU):
        self.NT=NT
        self.NU=NU
        self.NYT=NYT
        self.NYU=NYU

    def ATE(self):
        return (self.NYT/self.NT - self.NYU/self.NU)

    def table(self):
        return("|  | Infection Rate per 1000|\n"+
               "|---|---|\n"
               f"|Treated| {self.NYT/self.NT*1000:.2}|\n" +
               f"|Control| {self.NYU/self.NU*1000:.2}|\n" +
               f"|Difference| {self.ATE()*1000:.2}|\n")
    def VE(self):
        tb = sm.stats.Table2x2([[self.NYT, self.NT - self.NYT], [self.NYU, self.NU - self.NYU]])
        ve=1-tb.riskratio
        ci = tb.riskratio_confint()
        ci = [1-ci[1],1-ci[0]]
        return(ve,ci)

pfizerall = binarybinaryrct(19965, 20172, 9, 169)
pfizer65 = binarybinaryrct(4044, 4067, 1, 19)

print("\n- All\n\n" + pfizerall.table() + "\n - 65+\n\n" + pfizer65.table())

- All

|  | Infection Rate per 1000|
|---|---|
|Treated| 0.45|
|Control| 8.4|
|Difference| -7.9|

 - 65+

|  | Infection Rate per 1000|
|---|---|
|Treated| 0.25|
|Control| 4.7|
|Difference| -4.4|

Sources and Further Reading

  • Chapter 2 of Chernozhukov et al. (2024) is the basis for much of these slides, inlcuding the Pfizer/BioNTech Covid Vaccine RCT example
  • Chapter 1 of Facure (2022)

References

Chernozhukov, V., C. Hansen, N. Kallus, M. Spindler, and V. Syrgkanis. 2024. Applied Causal Inference Powered by ML and AI. https://causalml-book.org/.
Facure, Matheus. 2022. Causal Inference for the Brave and True. https://matheusfacure.github.io/python-causality-handbook/landing-page.html.