Probability

Paul Schrimpf

2024-09-16

Reading

  • Song (2021) chapter 2 (which is the basis for these slides)
  • Pollard (2002)

\[ \def\Er{{\mathrm{E}}} \def\R{{\mathbb{R}}} \]

Probability

Mathematically, a probability space is a special measure space where the measure has total mass one. But, our attitude and emotional response toward one is entirely different from those toward the other. On a measure space everything is deterministicand certain, on a probability space we face randomness and uncertainty.

Çinlar (2011)

Probability Space

Definitions

  • Given a measure space \((\Omega ,\mathscr{F})\), a probability (or probability measure)\(\ P\) is a measure s.t. \(P(\Omega )=1\)
  • \((\Omega ,\mathscr{F}, P)\) is a probability space
  • \(\Omega\) is a sample space
  • \(\omega \in \Omega\) is an outcome
  • \(A \in \mathscr{F}\) is an event

Exercise

Show \(\forall A, B \in \mathscr{F}\)

  1. \(P(\varnothing )=0\)

  2. \(P(A)\leq 1.\)

  3. \(P(A^{c})=1-P(A)\)

  4. \(P(A\cup B)=P(A)+P(B)-P(A\cap B)\)

  5. If \(A\subset B\), then \(P(A)\leq P(B)\)

  6. \(P(B\cap A^{c})=P(B)-P(A\cap B)\)

Random Variables

Random Variable

Definition

A random variable \(X\) is a measurable function from \(\Omega\) to \(\R\)

Distribution

Definition

Let \((\Omega ,\mathscr{F},P)\) be a probability space, \(X\) a random variable on \((\Omega ,\mathscr{F})\). A distribution \(P_{X}\) induced by \(X\) is a probability measure on \((\R,\mathscr{B}(\R))\) such that : \(\forall B\in \mathscr{B}(\R)\), \[ P_{X}(B)\equiv P\left\{ \omega \in \Omega :X(\omega )\in B\right\} \]

Distribution Function

Definition

The cumulative distribution function (CDF) of a random variable \(X\) with distribution \(P_{X}\) is defined to be a function \(F:\R\rightarrow [0,1]\) such that \[ F(t)=P_{X}\left( (-\infty ,t]\right) . \]

Density

Definition

  1. Let \(X\) be a random variable with distribution \(P_{X}\). When \(P_{X}\ll \lambda\), we call \(X\) a continuous random variable, and call the Radon-Nikodym derivative \(f\equiv dP_{X}/d\lambda\) the (probability) density function of \(P_{X}\).

  2. We say that \(X\) is a discrete random variable, if there exists a countable set \(A\subset \R\) and such that \(P_{X}A^{c}=0\)

Exercise

Show that when \(X\) is continuous, its CDF is a continuous function

Expectation

Definition

The expection of \(X\) is \(\Er X = \int_\Omega x dP\)

  • \(\Er X = \int_{\R} x dP_X(x)\)
  • positive: \(X \geq 0\) implies \(\Er X \geq 0\)
  • linear : \(\Er[aX + bY] = a \Er X + b \Er Y\)

Inequalities

Markov’s

Markov’s Inequality

\(P(|X|>\epsilon) \leq \frac{\Er[|X|^k]}{\epsilon^k}\) \(\forall \epsilon > 0, k > 0\)

Jensen’s

Jensen’s Inequality

Suppose that \(g\) is convex and \(X\) and \(g(X)\) are integrable, then \(g(\Er X) \leq \Er[g(X)]\)

Exercise

Show \(\Er[|X|^p] \leq \left(\Er[|X|^q] \right)^{p/q}\) for all \(0 < p \leq q\).

Cauchy-Schwarz

Cauchy-Schwarz Inequality

\(\left(\Er[XY]\right)^2 \leq \Er[X^2] \Er[Y^2]\)

Dependence and Information

Generated \(\sigma\)-field

  • \(\sigma(X)\) is \(\sigma\)-field generated by \(X\)
    • smallest \(\sigma\)-field w.r.t. which \(X\) is measurable
    • \(\sigma(X) = \{X^{-1}(B): B \in \mathscr{B}(\R)\}\)

Information

  • \(\forall E \in \sigma(X)\), observing value \(x\) of \(X\), tells us whether \(E\) occurred
  • if \(\sigma(X_1) \subset \sigma(X_2)\), then \(\sigma(X_2)\) has more information than \(\sigma(X_1)\)

Exercise 4.7

Suppose \(g:\R \to \R\) is Borel measurable, then show \(\sigma(g(X)) \subset \sigma(X)\)

Dependence

Theorem 4.2

Suppose \(\sigma(W) \subset \sigma(X)\), then \(\exists\) Borel measurable \(g\) s.t. \(W=g(X)\)

Independence

Independence

Definition

  • Events \(A_1, ..., A_m\) are independent if for any sub-collection \(A_{i_1}, ..., A_{i_s}\) \[ P\left(\cap_{j=1}^s A_{i_j}\right) = \prod_{j=1}^s P(A_{i_j}) \]
  • \(\sigma\)-fields, \(\mathscr{F}_1, .., \mathscr{F}_m \subset \mathscr{F}\) are independent if for any \(\mathscr{F}_{i_1}, .., \mathscr{F}_{i_s}\) and \(E_j \in \mathscr{F}_j\), \[ P\left(\cap_{j=1}^s E_{i_j}\right) = \prod_{j=1}^s P(E_{i_j}) \]
  • Random variables \(X_1, ..., X_m\) are independent if \(\sigma(X_1), ..., \sigma(X_m)\) are independent

Random Vectors

  • measurable \(X: \Omega \to \R^n\)
  • \(\sigma(X) = \{X^{-1}(B): B \in \mathscr{B}(\R^n)\} =\) smallest \(\sigma\)-field containing \(\cup_{i=1}^n \sigma(X_i)\)

Theorem

Suppose that \(X=(X_1, X_2)\) and \(Y=(Y_1, Y_2)\) are independent, then \(f(X)\) and \(g(Y)\) are independent

Conditioning

Conditional Expectation

Definition

Let \(\mathscr{G} \subset \mathscr{F}\) be \(\sigma\)-fields, \(Y\) a random variable with \(\Er |Y| < \infty\), then the conditional expectation of \(Y\) given \(\mathscr{G}\) is \(\Er[Y|\mathscr{G}](\cdot): \Omega \to \R\) s.t.

  1. \(\Er[Y|\mathscr{G}](\cdot)\) is \(\mathscr{G}\) measurable

  2. \(\int_A \Er[Y|\mathscr{G}] dP = \int_A Y dP\) \(\forall A \in \mathscr{G}\)

  • Ex: \(\{E_k\}_{k=1}^m\) partition of \(\Omega\), let \(\mathscr{G} = \sigma(\{E_k\}_{k=1}^m)\)
    • \(\Er[Y | \mathscr{G}](\omega) = \sum_{k=1}^m c_k 1\{\omega \in E_k\}\), then use 2 to solve for \(c_k\)
  • Existence from Radon-Nikodym theorem
  • \(\Er[Y|X] \equiv \Er[Y|\sigma(X)]\)

Conditional Expectation

Exercise

  1. If \(X\) and \(Y\) are discrete with support \(\{x_i\}_{i=1}^I \times \{y_j\}_{j=1}^J\) and PMF \(p\) then \[ \Er[Y|X=x_i] = \frac{\sum_{j=1}^J y_j p(x_i,y_j)} {\sum_{j=1}^J p(x_i,y_j)} \]

  2. If \(X\) and \(Y\) are continuous with density \(f\), then \[ \Er[Y|X=x] = \frac{\int y f(x,y) dy}{\int f(x,y) dy} \]

Properties of Conditional Expectation

  • If \(X\) is \(\mathscr{G}\) measurable, then \(\Er[XY| \mathscr{G}] = X \Er[Y|\mathscr{G}]\) a.e.
  • If \(\sigma(X) \subset \sigma(Z)\), then \(\Er[XY|Z] = X \Er[Y|Z]\)
  • If \(\sigma(X) \subset \sigma(Z)\), then \(\Er[\Er[Y|Z]|X] = \Er[Y|X]\)
  • If \(Y\) and \(X\) are independent, then \(\Er[Y | X ] = \Er[Y]\)

\(\Er[Y|\mathscr{G}]\) as Orthogonal Projection

Theorem

Let \((\Omega, \mathscr{F}, P)\) be a probability space, \(\mathscr{G}\) a sub \(\sigma\)-field, then for any \(Y \in \mathcal{L}^2(\Omega, \mathscr{F}, P) = \{X: \Omega \to \mathbb{R} \text{ s.t. } X \text{ }\mathscr{F}\text{-measurable, } \int X^2 dP < \infty \}\), \[ \inf_{W \in \mathcal{L}^2(\Omega, \mathscr{G}, P)} \Er[(Y-W)^2] = \Er[ (Y - \Er[Y | \mathscr{G}])^2] \]

Conditional Measure

Definition

Let \(\mathscr{G}\) be a sub \(\sigma\)-field of \(\mathscr{F}\). Tthe conditional probability measure given \(\mathscr{G}\) is defined to be a map \(P(\cdot \mid \mathscr{G})(\cdot ):\mathscr{F}\times \Omega \rightarrow [0,1]\) such that

  1. For each \(A\in \mathscr{F}\), \(P(A \mid \mathscr{G})(\cdot )=\mathbf{E}\left[ 1\{\omega \in A\} \mid \mathscr{G}\right] (\cdot )\), a.e.

  2. for each \(\omega \in \Omega\), \(P(\cdot \mid \mathscr{G})(\omega )\) is a probability measure on \((\Omega ,\mathscr{F}).\)

Conditional Independence

Definition

  1. Events \(A_1, ..., A_m \in \mathscr{F}\) are conditionally independent given \(\mathscr{G}\) if for any sub-collection, \[ P\left( \cap_{j=1}^s A_{i_j} | \mathscr{G} \right) = \prod_{j=1}^s P(A_{i_j} | \mathscr{G}) \]

  2. Sub \(\sigma\)-fields \(\mathscr{F}_1, ..., \mathscr{F}_m\) are conditionally independent given \(\mathscr{G}\) if for any sub-collection and events, \(E_i \in \mathscr{F}_i\), \[ P\left( \cap_{j=1}^s E_{i_j} | \mathscr{G} \right) = \prod_{j=1}^s P(E_{i_j} | \mathscr{G}) \]

  3. Random variables \(X_1, ..., X_m\) are conditionally independent given \(\mathscr{G}\) if \(\sigma(X_1), ..., \sigma(X_m)\) are conditionally independent given \(\mathscr{G}\)

References

Çinlar, Erhan. 2011. Probability and Stochastics. Vol. 261. Springer.
Pollard, David. 2002. A User’s Guide to Measure Theoretic Probability. 8. Cambridge University Press.
Song, Kyunchul. 2021. “Introduction to Econometrics.”