Identification

Paul Schrimpf

2024-09-23

Reading

  • Required: Song (2021) chapter 4 (which is the basis for these slides)
  • Recommended: Lewbel (2019)
  • Supplementary: Matzkin (2013), Molinari (2020) , Imbens (2020), V. Chernozhukov et al. (2024)

\[ \def\Er{{\mathrm{E}}} \def\cov{{\mathrm{Cov}}} \def\var{{\mathrm{Var}}} \def\R{{\mathbb{R}}} \]

Identification

Definition

Let \(X\) be an observed random vector with distribution \(P_X\). Let \(\mathcal{P}\) be aprobability model — a collection of probabilities such that \(P_X \in \mathcal{P}\). Then \(\theta_0 \in \R^k\) is identified in \(\mathcal{P}\) if there exists a known \(\psi: \mathcal{P} \to \R^k\) s.t.

\[ \theta_0 = \psi(P_X) \]

Examples

Example: Descriptive Statistics

  • \(\theta_0 =\) mean of \(X\), then \(\theta_0\) is identified by \[ \psi_\mu(P) = \int x dP(x) \] in \(\mathcal{P} = \{P : \int x dP(x) < \infty \}\)

  • Generally, descriptive statistics identified in a broad probability model with just regularity restrictions to ensure the statistics exist

Example: Linear Model

\[ Y = \alpha + \beta X + \epsilon \]

  • \(\mathcal{P} = \{P_{X,Y}:\) \(Y=\alpha + \beta X + \epsilon\),

    1. \(| \mathrm{Cov}(X,Y) | < \infty\), \(0 < \mathrm{Var}(X) < \infty\)

    2. \(\mathrm{Cov}(X, \epsilon) = 0\) \(\}\)

  • \(\beta\) identified as

\[ \beta = \frac{\int (x - \Er X) (y - \Er Y ) dP_{X,Y}(x,y)} {\int (x - \Er X)^2 dP_{X}(x)} = \frac{ \cov(X,Y) }{ \var(X) } \]

  • Identification requires:
    1. Usually innocuous regularity conditions
    2. Substantive exogeneity restriction
  • Evaluating plausibility of exogeneity restrictions requires a priori knowledge of data context and related economic theory

Example: Multiple Regression

\[ Y = X'\beta + \epsilon \]

  • \(\mathcal{P} = \{P: \Er [X \epsilon] = 0, \Er [X X'] \text{ invertible} \}\)

Example: Binary Choice

\[ Y = 1\{ \beta_0 + \beta_1 X > u \} \]

  • \(\mathcal{P} = \{P: u \sim N(0,1), 0< \var(X) < \infty \}\)
  • Is \(u \sim N(0,1)\) innocuous?

Example: Potential Outcomes

  • Data:

    • Treatment \(D_i\)
    • Potential outcomes \((Y_{i,0}, Y_{i,1})\), observed outcome \(Y_i = D_i Y_{i,1} + (1-D_i) Y_{i,0}\)
    • Covarites \(X_i\)
  • Parameter: \(\theta_0 = \Er[Y_{i,1} - Y_{i,0}] =\) average treatment effect

  • Assume:

    1. Unconfoundedness: \((Y_{i,0}, Y_{i,1})\) conditionally independent of \(D_i\) given \(X_i\)
    2. Overlap: \(\epsilon < P(D=1|X=x) < 1-\epsilon\) for some \(\epsilon > 0\) and all \(x\)

Causal Diagrams

Causal Diagrams

  • Originate with Wright around 1920, e.g. Wright (1934)
  • Recently advocated by Pearl, e.g. Pearl (2015), Pearl and Mackenzie (2018)
  • Recommended introduction Imbens (2020) or CausalMLBook chapters 7 and 8
  • Sometimes useful expository tool for explaining identifying restriction, but should not be your only or primary approach
    • e.g. Victor Chernozhukov, Kasahara, and Schrimpf (2021)

Example: Regression

flowchart LR
    X --> Y
    ϵ -.-> Y

  • Arrow means \(X\) causes \(Y\)
  • Dashed arrow means \(\epsilon\) causes \(Y\) and is unobserved (not universal, often dashed box around \(\epsilon\) instead)
  • Lack of connection between \(X\) and \(\epsilon\) means they are independent

Example: Regression

If you believe:

flowchart LR
    subgraph " Y"
        Y[Wage]
    end
    subgraph X
       E[Education] --> Y
       T[SAT] --> E
       F[Family SES] --> E
       F --> Y
    end
    subgraph ϵ
        I[Intelligence] -.-> Y
        L[Luck] -.-> Y
        I -.-> T
    end

then regression \(Wage = \beta_1 Education + \beta_2 SAT + \beta_3 FamilySES + \epsilon\) identifies causal effect of education on wages

Example: Regression

But reality is likely more complex …

flowchart LR
   subgraph " Y"
        Y[Wage]
   end
   subgraph X
       E[Education] --> Y
       T[SAT] --> E
       F[Family SES] --> E
       F --> Y
   end
   subgraph ϵ
        I[Intelligence] -.-> Y
        L[Luck] -.-> Y
        G[Grit] -.-> Y
        G -.-> E
        L -.-> E
        I -.-> T
        I -.-> E
        I -.-> G
        G -.-> I
   end

Example: Potential Outcomes

flowchart LR
    u -.-> D
    X --> Y
    X --> D
    D --> Y
    ϵ -.-> Y

Example: Potential Outcomes

flowchart LR
    subgraph Treatment
        D[Naloxone distribution site opens]
    end
    subgraph Outcome
        Y[ER visits for overdose]
    end
    u -.-> D
    D --> Y
    ϵ -.-> Y
    subgraph X
        Income --> Y
        Unemployment --> Y
        Income --> D
        Unemployment --> D
        OD[OD rate prior to opening] --> Y
        OD --> D
        Crime --> Y
        Crime --> D
    end

More on Causal Graphs

  • Given a graph, what can be identified?
    • \(Y\) and \(X\) are d-separated by a collection of nodes \(S\) if there are no paths between \(Y\) and \(X\) except through \(S\)
    • d-separation implies conditional independence \(Y \perp X | S\)
  • Does conditional independence imply d-separation? Can we estimate counditional independence to find out the correct causal graph? (Causal discovery)
    • Conditional independence does not automatically imply d-separation, but exceptions have measure 0, so maybe causal discovery possible
    • But neighborhood of exceptions is large, so causal discovery very difficult
  • V. Chernozhukov et al. (2024), especially chapters 7 & 8

Reintrepretation of Estimators

Reintrepretation of Estimators

  • Generalized and/or descriptive interpretation of population estimator
  • Analyze familiar estimator under more general assumptions
    • Understand bias when exogeneity assumptions fail
    • Sometimes give more general interpretation of existing estimator

Example: Regression

  • In linear model \(Y_i = X_i'\beta + \epsilon_i\), if just assume \(\Er[X X']\) invertible,
  • Population regression \[ \begin{align*} \theta = & \Er[ X X']^{-1} \Er[ X Y] \\ = & \Er[X X']^{-1} \Er[X (X' \beta + \epsilon)] \\ = & \beta + \Er[X X']^{-1} \Er[X\epsilon] \end{align*} \]

Example: Regression

  • If relevant moments exist (no linear model required) population regression solves \[ \Er[ X X']^{-1} \Er[ X Y] \in \mathrm{arg}\min_b \Er[ (X'b - \Er[Y|X])^2 ] \]

Caution

  • Regression being a linear approximation to \(\Er[Y|X]\) does not mean \(\beta = \Er[X X']^{-1} \Er[X Y]\) necessarily has the sign you want

  • In example below, \(\Er[Y|x_1=1, x_2] > \Er[Y|x_1=0,x_2]\), but \(\beta_1 < 0\)

using LinearAlgebra
n = 10_000
EYX(x) = x[2]>0 ? -1.0 + 1.0*x[1] : (1.0 + 0.1*x[1])
x2 = randn(n)
x1 = (x2*3 + randn(n) .> 0)
X = hcat(x1,x2)
y = EYX.(eachrow(X)) + randn(n)
X = hcat(ones(n), X)
β = (X'*X) \ (X'*y)

Example: Potential Outcomes

  • Matching initially studied with a linear regression model, e.g. Cochran (1953) \[ Y_i = \alpha D_i + X_i' \beta + \epsilon_i \]
  • Implies constant treatment effect \(Y_{i,1} - Y_{i,0} = \alpha\)

Observational Equivalence

Observational Equivalence

  • Identification sometimes defined without explicit mapping from data to parameters, e.g. Hsiao (1983), Matzkin (2007)

Definition: Observationally Equivalent

  • Let \(\mathcal{P} = \{ P(\cdot; s) : s \in S \}\), two structures \(s\) and \(\tilde{s}\) in \(S\) are observationally equivalent if they imply the same distribution for the observed data, i.e. \[ P(B;s) = P(B; \tilde{s}) \] for all \(B \in \sigma(X)\).

  • Let \(\lambda: S \to \R^k\), \(\theta\) is observationally equivalent to \(\tilde{\theta}\) if \(\exists s, \tilde{s} \in S\) that are observationally equivalent and \(\theta = \lambda(s)\) and \(\tilde{\theta} = \lambda(\tilde{s})\)

    • Let \(\Gamma(\theta, S) = \{P(\dot; s) | s \in S, \theta = \lambda(s) \}\), then \(\theta\) and \(\tilde{\theta}\) are observationally equivalent iff \(\Gamma(\theta,S) \cap \Gamma(\tilde{\theta}, S) \neq \emptyset\)

Non-constructive Identification

Definition: (Non-Constructive) Identification

  • \(s_0 \in S\) is identified if there is no \(s\) that is observationally equivalent to \(s_0\)

  • \(\theta_0\) is identified (in \(S\)) if there is no observationally equivalent \(\theta \neq \theta_0\)

    • i.e. \(\Gamma(\theta_0, S) \cap \Gamma(\theta, S) = \emptyset\) \(\forall \theta \neq \theta_0\)
  • Compared to constructive definition with \(\theta_0 = \psi(P)\):
    • Less clear how to use identification to estimate
    • Easier to show non-identification
    • Set of observationally equivalent structures can be of interest

Example: Multiple Regression

\[ Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \epsilon \]

  • \(X = [X_1\, X_2]'\), if rank \(\Er X X' = 1\), then \(\beta_1, \beta_2\) is observationally equivalent to any \(\tilde{\beta}_1, \tilde{\beta}_2\) s.t. \[ \tilde{\beta}_1 + \tilde{\beta}_2 = \beta_1 + \beta_2 \frac{\cov(X_1, X_2)}{\var(X_2)} \]

  • \(\theta_0 = \lambda( \beta ) = \beta_1 + \beta_2\) is identified if rank \(\Er [X X'] \geq 1\)

Example: Random Coefficients Logit

  • \(Y_i = 1\{\beta_0 + \beta_i X_i \geq U_i \}\)

    • \(U\) independent \(X_i,\beta_i\),
    • \(\beta_i\) indepedent \(X_i\),
    • \(F_u(z) = \frac{e^z}{1+e^z}\)
  • \(\Er[Y|X] = \int \frac{e^{\beta_0 + \beta X_i}} {1+e^{\beta_0 + \beta X_i}} dF_\beta(\beta)\)

  • Non-constructive and constructive identification of \(F_\beta\) in Fox et al. (2012)

References

Chernozhukov, V., C. Hansen, N. Kallus, M. Spindler, and V. Syrgkanis. 2024. CasualMLBook: Applied Causal Inference Powered by ML and AI. https://causalml-book.org/.
Chernozhukov, Victor, Hiroyuki Kasahara, and Paul Schrimpf. 2021. “Causal Impact of Masks, Policies, Behavior on Early Covid-19 Pandemic in the US.” Journal of Econometrics 220 (1): 23–62.
Christensen, Timothy M. 2015. “Nonparametric Identification of Positive Eigenfunctions.” Econometric Theory 31 (6): 1310–30.
Cochran, William G. 1953. “Matching in Analytical Studies.” American Journal of Public Health and the Nations Health 43 (6_Pt_1): 684–91.
Fox, Jeremy T, Kyoo il Kim, Stephen P Ryan, and Patrick Bajari. 2012. “The Random Coefficients Logit Model Is Identified.” Journal of Econometrics 166 (2): 204–12.
Hsiao, Cheng. 1983. “Chapter 4 Identification.” In, 1:223–83. Handbook of Econometrics. Elsevier. https://doi.org/https://doi.org/10.1016/S1573-4412(83)01008-9.
Imbens, Guido W. 2020. “Potential Outcome and Directed Acyclic Graph Approaches to Causality: Relevance for Empirical Practice in Economics.” Journal of Economic Literature 58 (4): 1129–79. https://doi.org/10.1257/jel.20191597.
Lewbel, Arthur. 2019. “The Identification Zoo: Meanings of Identification in Econometrics.” Journal of Economic Literature 57 (4): 835–903. https://doi.org/10.1257/jel.20181361.
Matzkin, Rosa L. 2007. “Chapter 73 Nonparametric Identification.” In, edited by James J. Heckman and Edward E. Leamer, 6:5307–68. Handbook of Econometrics. Elsevier. https://doi.org/https://doi.org/10.1016/S1573-4412(07)06073-4.
———. 2013. “Nonparametric Identification in Structural Economic Models.” Annu. Rev. Econ. 5 (1): 457–86.
Molinari, Francesca. 2020. “Chapter 5 - Microeconometrics with Partial Identification.” In Handbook of Econometrics, Volume 7A, edited by Steven N. Durlauf, Lars Peter Hansen, James J. Heckman, and Rosa L. Matzkin, 7:355–486. Handbook of Econometrics. Elsevier. https://doi.org/https://doi.org/10.1016/bs.hoe.2020.05.002.
Pearl, Judea. 2015. “TRYGVE HAAVELMO AND THE EMERGENCE OF CAUSAL CALCULUS.” Econometric Theory 31 (1): 152–79. https://doi.org/10.1017/S0266466614000231.
Pearl, Judea, and Dana Mackenzie. 2018. The Book of Why: The New Science of Cause and Effect. Basic books.
Song, Kyunchul. 2021. “Introduction to Econometrics.”
Wright, Sewall. 1934. “The Method of Path Coefficients.” The Annals of Mathematical Statistics 5 (3): 161–215. http://www.jstor.org/stable/2957502.