Identification

Paul Schrimpf

2024-12-14

Reading

Required: Song (2021) chapter 4 (which is the basis for these slides)
Recommended: Lewbel (2019)
Supplementary: Matzkin (2013), Molinari (2020) , Imbens (2020), V. Chernozhukov et al. (2024)

\[ \def\Er{{\mathrm{E}}} \def\cov{{\mathrm{Cov}}} \def\var{{\mathrm{Var}}} \def\R{{\mathbb{R}}} \]

Identification

Definition

Let \(X\) be an observed random vector with distribution \(P_X\). Let \(\mathcal{P}\) be aprobability model — a collection of probabilities such that \(P_X \in \mathcal{P}\). Then \(\theta_0 \in \R^k\) is identified in \(\mathcal{P}\) if there exists a known \(\psi: \mathcal{P} \to \R^k\) s.t.

\[ \theta_0 = \psi(P_X) \]

Examples

Example: Descriptive Statistics

\(\theta_0 =\) mean of \(X\), then \(\theta_0\) is identified by \[ \psi_\mu(P) = \int x dP(x) \] in \(\mathcal{P} = \{P : \int x dP(x) < \infty \}\)
Generally, descriptive statistics identified in a broad probability model with just regularity restrictions to ensure the statistics exist

Example: Linear Model

\[ Y = \alpha + \beta X + \epsilon \]

\(\mathcal{P} = \{P_{X,Y}:\) \(Y=\alpha + \beta X + \epsilon\),
1. \(| \mathrm{Cov}(X,Y) | < \infty\), \(0 < \mathrm{Var}(X) < \infty\)
2. \(\mathrm{Cov}(X, \epsilon) = 0\) \(\}\)
\(\beta\) identified as

\[ \beta = \frac{\int (x - \Er X) (y - \Er Y ) dP_{X,Y}(x,y)} {\int (x - \Er X)^2 dP_{X}(x)} = \frac{ \cov(X,Y) }{ \var(X) } \]

Identification requires:
1. Usually innocuous regularity conditions
2. Substantive exogeneity restriction
Evaluating plausibility of exogeneity restrictions requires a priori knowledge of data context and related economic theory

Example: Multiple Regression

\[ Y = X'\beta + \epsilon \]

\(\mathcal{P} = \{P: \Er [X \epsilon] = 0, \Er [X X'] \text{ invertible} \}\)

Example: Binary Choice

\[ Y = 1\{ \beta_0 + \beta_1 X > u \} \]

\(\mathcal{P} = \{P: u \sim N(0,1), 0< \var(X) < \infty \}\)

Is \(u \sim N(0,1)\) innocuous?

Example: Potential Outcomes

Data:
- Treatment \(D_i\)
- Potential outcomes \((Y_{i,0}, Y_{i,1})\), observed outcome \(Y_i = D_i Y_{i,1} + (1-D_i) Y_{i,0}\)
- Covarites \(X_i\)
Parameter: \(\theta_0 = \Er[Y_{i,1} - Y_{i,0}] =\) average treatment effect
Assume:
1. Unconfoundedness: \((Y_{i,0}, Y_{i,1})\) conditionally independent of \(D_i\) given \(X_i\)
2. Overlap: \(\epsilon < P(D=1|X=x) < 1-\epsilon\) for some \(\epsilon > 0\) and all \(x\)

Causal Diagrams

Originate with Wright around 1920, e.g. Wright (1934)
Recently advocated by Pearl, e.g. Pearl (2015), Pearl and Mackenzie (2018)
Recommended introduction Imbens (2020) or CausalMLBook chapters 7 and 8
Sometimes useful expository tool for explaining identifying restriction, but should not be your only or primary approach
- e.g. Victor Chernozhukov, Kasahara, and Schrimpf (2021)

Example: Regression

flowchart LR
    X --> Y
    ϵ -.-> Y

Arrow means \(X\) causes \(Y\)
Dashed arrow means \(\epsilon\) causes \(Y\) and is unobserved (not universal, often dashed box around \(\epsilon\) instead)
Lack of connection between \(X\) and \(\epsilon\) means they are independent

Example: Regression

If you believe:

flowchart LR
    subgraph " Y"
        Y[Wage]
    end
    subgraph X
       E[Education] --> Y
       T[SAT] --> E
       F[Family SES] --> E
       F --> Y
    end
    subgraph ϵ
        I[Intelligence] -.-> Y
        L[Luck] -.-> Y
        I -.-> T
    end

then regression \(Wage = \beta_1 Education + \beta_2 SAT + \beta_3 FamilySES + \epsilon\) identifies causal effect of education on wages

Example: Regression

But reality is likely more complex …

flowchart LR
   subgraph " Y"
        Y[Wage]
   end
   subgraph X
       E[Education] --> Y
       T[SAT] --> E
       F[Family SES] --> E
       F --> Y
   end
   subgraph ϵ
        I[Intelligence] -.-> Y
        L[Luck] -.-> Y
        G[Grit] -.-> Y
        G -.-> E
        L -.-> E
        I -.-> T
        I -.-> E
        I -.-> G
        G -.-> I
   end

Example: Potential Outcomes

flowchart LR
    u -.-> D
    X --> Y
    X --> D
    D --> Y
    ϵ -.-> Y

Example: Potential Outcomes

flowchart LR
    subgraph Treatment
        D[Naloxone distribution site opens]
    end
    subgraph Outcome
        Y[ER visits for overdose]
    end
    u -.-> D
    D --> Y
    ϵ -.-> Y
    subgraph X
        Income --> Y
        Unemployment --> Y
        Income --> D
        Unemployment --> D
        OD[OD rate prior to opening] --> Y
        OD --> D
        Crime --> Y
        Crime --> D
    end

More on Causal Graphs

Given a graph, what can be identified?
- \(Y\) and \(X\) are d-separated by a collection of nodes \(S\) if there are no paths between \(Y\) and \(X\) except through \(S\)
- d-separation implies conditional independence \(Y \perp X | S\)
Does conditional independence imply d-separation? Can we estimate counditional independence to find out the correct causal graph? (Causal discovery)
- Conditional independence does not automatically imply d-separation, but exceptions have measure 0, so maybe causal discovery possible
- But neighborhood of exceptions is large, so causal discovery very difficult
V. Chernozhukov et al. (2024), especially chapters 7 & 8

Reintrepretation of Estimators

Generalized and/or descriptive interpretation of population estimator
Analyze familiar estimator under more general assumptions
- Understand bias when exogeneity assumptions fail
- Sometimes give more general interpretation of existing estimator

Example: Regression

In linear model \(Y_i = X_i'\beta + \epsilon_i\), if just assume \(\Er[X X']\) invertible,
Population regression \[ \begin{align*} \theta = & \Er[ X X']^{-1} \Er[ X Y] \\ = & \Er[X X']^{-1} \Er[X (X' \beta + \epsilon)] \\ = & \beta + \Er[X X']^{-1} \Er[X\epsilon] \end{align*} \]

Example: Regression

If relevant moments exist (no linear model required) population regression solves \[ \Er[ X X']^{-1} \Er[ X Y] \in \mathrm{arg}\min_b \Er[ (X'b - \Er[Y|X])^2 ] \]

Caution

Regression being a linear approximation to \(\Er[Y|X]\) does not mean \(\beta = \Er[X X']^{-1} \Er[X Y]\) necessarily has the sign you want
In example below, \(\Er[Y|x_1=1, x_2] > \Er[Y|x_1=0,x_2]\), but \(\beta_1 < 0\)

3-element Vector{Float64}:
  0.5287246116536986
 -0.12031069784398347
 -0.3384637353611731

Example: Potential Outcomes

Matching initially studied with a linear regression model, e.g. Cochran (1953) \[ Y_i = \alpha D_i + X_i' \beta + \epsilon_i \]
Implies constant treatment effect \(Y_{i,1} - Y_{i,0} = \alpha\)

Observational Equivalence

Identification sometimes defined without explicit mapping from data to parameters, e.g. Hsiao (1983), Matzkin (2007)

Definition: Observationally Equivalent

Let \(\mathcal{P} = \{ P(\cdot; s) : s \in S \}\), two structures \(s\) and \(\tilde{s}\) in \(S\) are observationally equivalent if they imply the same distribution for the observed data, i.e. \[ P(B;s) = P(B; \tilde{s}) \] for all \(B \in \sigma(X)\).
Let \(\lambda: S \to \R^k\), \(\theta\) is observationally equivalent to \(\tilde{\theta}\) if \(\exists s, \tilde{s} \in S\) that are observationally equivalent and \(\theta = \lambda(s)\) and \(\tilde{\theta} = \lambda(\tilde{s})\)
- Let \(\Gamma(\theta, S) = \{P(\dot; s) | s \in S, \theta = \lambda(s) \}\), then \(\theta\) and \(\tilde{\theta}\) are observationally equivalent iff \(\Gamma(\theta,S) \cap \Gamma(\tilde{\theta}, S) \neq \emptyset\)

Non-constructive Identification

Definition: (Non-Constructive) Identification

\(s_0 \in S\) is identified if there is no \(s\) that is observationally equivalent to \(s_0\)
\(\theta_0\) is identified (in \(S\)) if there is no observationally equivalent \(\theta \neq \theta_0\)
- i.e. \(\Gamma(\theta_0, S) \cap \Gamma(\theta, S) = \emptyset\) \(\forall \theta \neq \theta_0\)

Compared to constructive definition with \(\theta_0 = \psi(P)\):
- Less clear how to use identification to estimate
- Easier to show non-identification
- Set of observationally equivalent structures can be of interest

Example: Multiple Regression

\[ Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \epsilon \]

\(X = [X_1\, X_2]'\), if rank \(\Er X X' = 1\), then \(\beta_1, \beta_2\) is observationally equivalent to any \(\tilde{\beta}_1, \tilde{\beta}_2\) s.t. \[ \tilde{\beta}_1 + \tilde{\beta}_2 = \beta_1 + \beta_2 \frac{\cov(X_1, X_2)}{\var(X_2)} \]
\(\theta_0 = \lambda( \beta ) = \beta_1 + \beta_2\) is identified if rank \(\Er [X X'] \geq 1\)

Example: Random Coefficients Logit

\(Y_i = 1\{\beta_0 + \beta_i X_i \geq U_i \}\)
- \(U\) independent \(X_i,\beta_i\),
- \(\beta_i\) indepedent \(X_i\),
- \(F_u(z) = \frac{e^z}{1+e^z}\)
\(\Er[Y|X] = \int \frac{e^{\beta_0 + \beta X_i}} {1+e^{\beta_0 + \beta X_i}} dF_\beta(\beta)\)
Non-constructive and constructive identification of \(F_\beta\) in Fox et al. (2012)

References

Chernozhukov, V., C. Hansen, N. Kallus, M. Spindler, and V. Syrgkanis. 2024. CasualMLBook: Applied Causal Inference Powered by ML and AI. https://causalml-book.org/.

Chernozhukov, Victor, Hiroyuki Kasahara, and Paul Schrimpf. 2021. “Causal Impact of Masks, Policies, Behavior on Early Covid-19 Pandemic in the US.” Journal of Econometrics 220 (1): 23–62.

Christensen, Timothy M. 2015. “Nonparametric Identification of Positive Eigenfunctions.” Econometric Theory 31 (6): 1310–30.

Cochran, William G. 1953. “Matching in Analytical Studies.” American Journal of Public Health and the Nations Health 43 (6_Pt_1): 684–91.

Fox, Jeremy T, Kyoo il Kim, Stephen P Ryan, and Patrick Bajari. 2012. “The Random Coefficients Logit Model Is Identified.” Journal of Econometrics 166 (2): 204–12.

Hsiao, Cheng. 1983. “Chapter 4 Identification.” In, 1:223–83. Handbook of Econometrics. Elsevier. https://doi.org/https://doi.org/10.1016/S1573-4412(83)01008-9.

Imbens, Guido W. 2020. “Potential Outcome and Directed Acyclic Graph Approaches to Causality: Relevance for Empirical Practice in Economics.” Journal of Economic Literature 58 (4): 1129–79. https://doi.org/10.1257/jel.20191597.

Lewbel, Arthur. 2019. “The Identification Zoo: Meanings of Identification in Econometrics.” Journal of Economic Literature 57 (4): 835–903. https://doi.org/10.1257/jel.20181361.

Matzkin, Rosa L. 2007. “Chapter 73 Nonparametric Identification.” In, edited by James J. Heckman and Edward E. Leamer, 6:5307–68. Handbook of Econometrics. Elsevier. https://doi.org/https://doi.org/10.1016/S1573-4412(07)06073-4.

———. 2013. “Nonparametric Identification in Structural Economic Models.” Annu. Rev. Econ. 5 (1): 457–86.

Molinari, Francesca. 2020. “Chapter 5 - Microeconometrics with Partial Identification.” In Handbook of Econometrics, Volume 7A, edited by Steven N. Durlauf, Lars Peter Hansen, James J. Heckman, and Rosa L. Matzkin, 7:355–486. Handbook of Econometrics. Elsevier. https://doi.org/https://doi.org/10.1016/bs.hoe.2020.05.002.

Pearl, Judea. 2015. “TRYGVE HAAVELMO AND THE EMERGENCE OF CAUSAL CALCULUS.” Econometric Theory 31 (1): 152–79. https://doi.org/10.1017/S0266466614000231.

Pearl, Judea, and Dana Mackenzie. 2018. The Book of Why: The New Science of Cause and Effect. Basic books.

Song, Kyunchul. 2021. “Introduction to Econometrics.”

Wright, Sewall. 1934. “The Method of Path Coefficients.” The Annals of Mathematical Statistics 5 (3): 161–215. http://www.jstor.org/stable/2957502.