Introduction to Causality and Potential Outcomes

ECON526

Paul Schrimpf

University of British Columbia

Overview

\[ \def\indep{\perp\!\!\!\perp} % \def\idp{\perp\kern-5pt\perp} \def\Er{\mathrm{E}} \def\R{\mathbb{R}} \def\En{{\mathbb{E}_n}} \def\Pr{\mathrm{P}} \newcommand{\norm}[1]{\left\Vert {#1} \right\Vert} \newcommand{\abs}[1]{\left\vert {#1} \right\vert} \def\inprob{{\,{\buildrel p \over \rightarrow}\,}} \def\indist{\,{\buildrel d \over \rightarrow}\,} \DeclareMathOperator*{\plim}{plim} \DeclareMathOperator*{\argmax}{arg\,max} \DeclareMathOperator*{\argmin}{arg\,min} \]

Summary

Potential outcomes, treatment effects
Randomized experiments

Potential Outcomes Framework

Treatment

\(T_i \in \{0,1\}\)
Observed outcome \(Y_i\)
We want the causal effect of \(T\) on \(Y\), but what does that mean?
- Potential outcomes give a rigorous definition

Potential Outcomes

Potential outcomes \(Y_i(0), Y_i(1)\) if \(T_i = 0\) or \(1\)
Observe \(Y_i = Y_i(T_i)\)

Assume: No Interference: \((Y_i(0), Y_i(1))\) is unaffected by \(T_j\) for \(j \neq i\)
- aka Stable Unit-Treatment Value Assumption (SUTVA)
Treatment effect on \(i\) = \(Y_i(1) - Y_i(0)\)

Fundamental Problem of Causal Inference

Only observe \(Y_i(1)\) or \(Y_i(0)\), never both
Individual effects, \(Y_i(1) - Y_i(0)\), generally impossible to recover
Summaries of individual effects, e.g. \(\Er[Y_i(1) - Y_i(0)]\), possible to estimate, but require assumptions

Average Treatment Effect

Want the average treatment effect \[ ATE = \Er[Y_i(1) - Y_i(0)] \]
Can’t estimate \(\Er[Y_i(d)]\), because \(Y_i(d)\) not always observed

Average Population Effect

Can estimate \(\Er[Y_i(d)|T_i=d]\)
Average population effect \[ APE = \Er[Y_i(1)|T_i=1] - \Er[Y_i(0)|T_i=0] \]
How does it compare to the ATE?

Selection Bias

Comparing ATE and average population effect \[ % \begin{align*} ATE = & \Er[Y_i(1) - Y_i(0)] \\ = & \overbrace{\Er[Y_i(1) - Y_i(0) | T_i=1]}^{\text{avg treatment effect on treated}} P(T_i=1) + \overbrace{\Er[Y_i(1) - Y_i(0) | T_i=0]}^{\text{avg treatment effect on untreated}} P(T_i=0) \\ = & \left(APE + \overbrace{\Er[Y_i(0)|T_i=0] - \Er[Y_i(0)|T_i=1]}^{\text{selection bias}}\right) P(T_i=1) + \\ & + \left(APE + \underbrace{\Er[Y_i(1)|T_i=0] - \Er[Y_i(1)|T_i=1]}_{\text{selection bias}}\right) P(T_i=0) \end{align*} \]

Selection Bias

Or, \[ % APE = ATE + \underbrace{\begin{pmatrix} (\Er[Y_i(0) | T_i=1] - \Er[Y_i(0)|T_i=0])P(T_i=1) + \\ + (\Er[Y_i(1) | T_i=1] - \Er[Y_i(1)|T_i=0])P(T_i=0) \end{pmatrix}}_{\text{selection bias}} \]
Selection bias is nonzero if the treated and untreated groups would be different even if everyone had been treated or untreated
Selection bias usually nonzero if people select their own treatment

Selection Bias Example

People have some (possibly noisy) information about \(Y_i(0), Y_i(1)\) and choose \(T_i\) they prefer
- e.g. \(T_i = \arg\max_{d\in \{0,1\}} \Er[U(Y_i(d)) | \mathcal{I}_i]\)
Simulation
- \(i\) observes signal \(S_i(0) = Y_i(0) + \epsilon_i(0)\) and \(S_i(1) = Y_i(1) + \epsilon_i(1)\)
- \(\epsilon_i(d) \sim N(0,\sigma^2)\), independent
- Chooses \(\max_d \Er[Y_i(d)|S_i(0), S_i(1)] = \max_d S_i(d)\)

Selection Bias Example

Code

import numpy as np
np.random.seed(0)
class selectiondata:
    def __init__(self, n=1000, noisesd=1.0, ate=0.5):
        self.Y0 = np.random.normal(size=n)
        self.Y1 = np.random.normal(size=n) + ate
        self.S0 = self.Y0 + np.random.normal(size=n)*noisesd
        self.S1 = self.Y1 + np.random.normal(size=n)*noisesd
        self.T = (self.S1 > self.S0).astype(int)
        self.Y = self.Y0 * (1 - self.T) + self.Y1 * self.T

    def APE(self):
        return np.mean(self.Y[self.T==1]) - np.mean(self.Y[self.T==0])

    def ATE(self):
        return np.mean(self.Y1) - np.mean(self.Y0)

    def selectionbias(self):
        return (self.APE() - self.ATE())

    def selectionbias0(self):
        return np.mean( self.Y0[self.T==1]) - np.mean( self.Y0[self.T==0] )

    def selectionbias1(self):
        return np.mean( self.Y1[self.T==1]) - np.mean(self.Y1[self.T==0] )


s = 0.5
eate = 0.5
data = selectiondata(n=10_000,noisesd=s, ate=eate)

print("|APE|ATE|Selection Bias|\n" +
      "|---|---|---|\n" +
      f"|{data.APE():.2}|{data.ATE():.2}|{data.selectionbias():.2}|\n"
      f"|σ={s:.2}|\n\n")

Selection Bias Example

APE	ATE	Selection Bias
0.27	0.53	-0.26
σ=0.5

Random Experiments

Random Experiment

Assign treatment randomly \[ T_i \indep (Y_i(0),Y_i(1)) \]
Implies \[ \Er[Y_i(1) | T_i=1] = \Er[Y_i(1)] \text{ and } \Er[Y_i(0) | T_i=0] = \Er[Y_i(0)] \]
So \[ \begin{align*} APE = & \Er[Y_i(1)|T_i=1] - \Er[Y_i(0)|T_i=0] \\ = & \Er[Y_i(1)] - \Er[Y_i(0)] \\ = & ATE \end{align*} \]

Example: Pfizer Covid Vaccine RCT

Number of participants and number infected by treatment status

Group	Treated	Placebo
All	19965	20172
Infected	9	169
65+	4044	4067
65+ Infected	1	19

Example Pfizer Covid Vaccine RCT

import statsmodels.api as sm

class binarybinaryrct :
    def __init__(self, NT, NU, NYT, NYU):
        self.NT=NT
        self.NU=NU
        self.NYT=NYT
        self.NYU=NYU

    def ATE(self):
        return (self.NYT/self.NT - self.NYU/self.NU)

    def table(self):
        return("|  | Infection Rate per 1000|\n"+
               "|---|---|\n"
               f"|Treated| {self.NYT/self.NT*1000:.2}|\n" +
               f"|Control| {self.NYU/self.NU*1000:.2}|\n" +
               f"|Difference| {self.ATE()*1000:.2}|\n")
    def VE(self):
        tb = sm.stats.Table2x2([[self.NYT, self.NT - self.NYT], [self.NYU, self.NU - self.NYU]])
        ve=1-tb.riskratio
        ci = tb.riskratio_confint()
        ci = [1-ci[1],1-ci[0]]
        return(ve,ci)

pfizerall = binarybinaryrct(19965, 20172, 9, 169)
pfizer65 = binarybinaryrct(4044, 4067, 1, 19)

print("\n- All\n\n" + pfizerall.table() + "\n - 65+\n\n" + pfizer65.table())

Example Pfizer Covid Vaccine RCT


- All

|  | Infection Rate per 1000|
|---|---|
|Treated| 0.45|
|Control| 8.4|
|Difference| -7.9|

 - 65+

|  | Infection Rate per 1000|
|---|---|
|Treated| 0.25|
|Control| 4.7|
|Difference| -4.4|

Sources and Further Reading

Chapter 2 of Chernozhukov et al. (2024) is the basis for much of these slides, inlcuding the Pfizer/BioNTech Covid Vaccine RCT example
Chapter 1 of Facure (2022)
Heckman and Pinto (2024) for a critique of potential outcomes and argument to use more structured economic models

References

Chernozhukov, V., C. Hansen, N. Kallus, M. Spindler, and V. Syrgkanis. 2024. Applied Causal Inference Powered by ML and AI. https://causalml-book.org/.

Facure, Matheus. 2022. Causal Inference for the Brave and True. https://matheusfacure.github.io/python-causality-handbook/landing-page.html.

Heckman, James, and Rodrigo Pinto. 2024. “Econometric Causality: The Central Role of Thought Experiments.” Journal of Econometrics 243 (1): 105719. https://doi.org/https://doi.org/10.1016/j.jeconom.2024.105719.