ECON526: Quantitative Economics with Data Science Applications

Foundations of Numerical Linear Algebra

Jesse Perla

University of British Columbia

Going Beyond “`reg y x, robust`”

Data science, econometrics, and macroeconomics are built on linear algebra.
Numerical linear algebra has all sorts of pitfalls, which become more critical as we scale up to larger problems.
Speed differences in choosing better algorithms can be orders of magnitude.
Crucial to know what goes on under-the-hood in Stata/R/python packages for applied work, even if you don’t implement it yourself.
Material here is related to

Packages

This section uses the following packages:

import numpy as np
import matplotlib.pyplot as plt
import scipy
from numpy.linalg import cond, matrix_rank, norm
from scipy.linalg import inv, solve, det, eig, lu, eigvals
from scipy.linalg import solve_triangular, eigvalsh, cholesky

Basic Computational Complexity

Big-O Notation

For a function \(f(N)\) and a positive constant \(C\), we say \(f(N)\) is \(O(g(N))\), if there exist positive constants \(C\) and \(N_0\) such that:

\[ 0 \leq f(N) \leq C \cdot g(N) \quad \text{for all } N \geq N_0 \]

Often crucial to know how problems scale asymptotically (as \(N\to\infty\))
Caution! This is only an asymptotic limit, and can be misleading for small \(N\)
- \(f_1(N) = N^3 + N\) is \(O(N^3)\)
- \(f_2(N) = 1000 N^2 + 3 N\) is \(O(N^2)\)
- For roughly \(N>1000\) use \(f_2\) algorithm, otherwise \(f_1\)

Examples of Computational Complexity

Simple examples:
- \(x \cdot y = \sum_{n=1}^N x_n y_n\) is \(O(N)\) since it requires \(N\) multiplications and additions
- \(A x\) for \(A\in\mathbb{R}^{N\times N},x\in\mathbb{R}^N\) is \(O(N^2)\) since it requires \(N\) dot products, each \(O(N)\)

Numerical Precision

Machine Epsilon

For a given datatype, \(\epsilon\) is defined as \(\epsilon = \min_{\delta > 0} \left\{ \delta : 1 + \delta > 1 \right\}\)

Computers have finite precision. 64-bit typical, but 32-bit on GPUs

print(f"machine epsilon for float64 = {np.finfo(float).eps}")
print(f"1 + eps/2 == 1? {1.0 + 1.1e-16 == 1.0}")
print(f"machine epsilon for float32 = {np.finfo(np.float32).eps}")

machine epsilon for float64 = 2.220446049250313e-16
1 + eps/2 == 1? True
machine epsilon for float32 = 1.1920928955078125e-07

Basic Linear Algebra

Norms

Common measure of size is the Euclidean norm, or \(L^2\) norm for \(x \in \mathbb{R}^2\)
Complexity is \(O(N)\), square \(N\) times then \(N\) additions \[ ||x||_2 = \sqrt{\sum_{n=1}^N x_n^2} \]

x = np.array([1, 2, 3]) # Calculating different ways (in order of preference)
print(np.sqrt(sum(xval**2 for xval in x))) # manual with comprehensions
print(np.sqrt(np.sum(np.square(x)))) # broadcasts
print(norm(x)) # built-in to numpy norm(x, ord=2) alternatively
print(f"||x||_2^2 = {norm(x)**2} = {x.T @ x} = {np.dot(x, x)}")

3.7416573867739413
3.7416573867739413
3.7416573867739413
||x||_2^2 = 14.0 = 14 = 14

Solving Systems of Equations

Solving \(A x = b\) for \(x\) is equivalent \(A^{-1} A x = A^{-1} b\)
Then since \(A^{-1}A = I\), and \(I x = x\), we have \(x = A^{-1} b\)
Careful since matrix algebra is not commutative!

A = np.array([[0, 2], [3, 4]]) # or ((0, 2), (3, 4))
b = np.array([2,1])  # Column vector
x = solve(A, b)  # Solve Ax = b for x
x

array([-1.,  1.])

Using the Inverse Directly

Can replace the solve with a calculation of an inverse
But it can be slower or less accurate than solving the system directly

A_inv = inv(A)
A_inv @ b # i.e, A^{-1} * b

array([-1.,  1.])

Linear Combinations

We can think of solving a system as finding the linear combination of columns of \(A\) that equal \(b\)

b_star = x[0] * A[:, 0] + x[1] * A[:, 1] # using x solution
print(f"b = {b}, b_star = {b_star}")

b = [2 1], b_star = [2. 1.]

Column Space and Rank

The column space of a matrix represents all possible linear combinations of its columns.
It forms a basis for the space of solutions when solving systems of linear equations represented by the matrix
The rank of a matrix is the dimension of its column space

A = np.array([[0, 2], [3, 4]])
matrix_rank(A)

Hence, can solve \(A x = b\) for any \(b \in \mathbb{R}^2\) since the column space is the entire space \(\mathbb{R}^2\)

Singular Matrices

On the other hand, note

A = np.array([[1, 2],
              [2, 4]])
matrix_rank(A)

So we can only solve \(A x = b\) for \(b \propto \begin{bmatrix} 1 \\ 2 \end{bmatrix} \propto \begin{bmatrix} 2 \\ 4 \end{bmatrix}\)

Checking Singularity

A = np.array([[1, 2], [2, 4]])
# An (expensive) way to check if A is singular is if det(A) = 0
print(det(A) == 0.0)
print(matrix_rank(A) != A.shape[0]) # or check rank
# Check before inverting or use exceptions
try:
    inv(A)
    print("Matrix is not singular (invertible).")
except np.linalg.LinAlgError:
    print("Matrix is singular (non-invertible).")

True
True
Matrix is singular (non-invertible).

Determinant is Not Scale Invariant

Reminder: numerical precision in calculations makes it hard to compare to zero
The determinate is useful but depends on the scale of the matrix
A more robust alternative is the condition number (more next lecture)

eps, K = 1e-8, 100000
A = np.array([[1, 2], [1 + eps, 2 + eps]])
print(f"det(A)={det(A):.5g}, det(K*A)={det(K*A):.5g}")
print(f"cond(A)={cond(A):.5g}, cond(K*A)={cond(K*A):.5g},")
print(f"det(inv(A))={det(inv(A)):.5g}, cond(inv(A))={cond(inv(A)):.5g}")

det(A)=-1e-08, det(K*A)=-100
cond(A)=1e+09, cond(K*A)=1e+09,
det(inv(A))=-1e+08, cond(inv(A))=1e+09

Interpreting Condition Numbers

The condition number of the matrix \(A\) is \(\kappa(A) = ||A|| \cdot ||A^{-1}||\), which can be shown in terms of ratio of the largest and smallest eigenvalues
- \(\kappa(A) = \frac{\lambda_{\text{max}}}{\lambda_{\text{min}}}\) for \(\lambda\) the eigenvalues of \(A\). More soon!
Crude intuition: for machine epsilon \(\epsilon_{\text{mach}}\) when calculating some \(x\)
- The relative error, \(||x - x_{\text{approx}}||/||x||\) is roughly \(\kappa(A) \cdot \epsilon_{\text{mach}}\)
- Solving \(A x = b\) when \(\epsilon_{\text{mach}} = 1e^{-16}\) it amplifies errors in \(b\), etc.
- if \(\kappa(A) \approx 1e^{16}\) errors amplified so the scale of 100% relative error

Rules of Thumb

Rule of thumb for standard floating points where \(\epsilon_{\text{mach}} = 1e^{-16}\):
- \(\kappa(A) \approx 1\) well-conditioned
- \(\kappa(A) < 100\) fairly well-conditioned
- \(\kappa(A) < 1e^{5}\) moderately ill-conditioned. Take care
- \(\kappa(A) < 1e^{8}\) ill-conditioned and might introduce significant errors, especially in algorithms which repeatedly use the same calculations
- \(\kappa(A) > 1e^{8}\) very ill-conditioned and likely to introduce significant errors
Choose solution algorithms based on “numerical stability” and “conditioning” when worried
Much more extreme with 32-bit floats such as when using GPUs.

Solving Linear Systems of Equations

Solving Systems with Multiple RHS

Inverse is nice because you can reuse the \(A^{-1}\) to solve \(A x = b\) for many \(b\)
However, you can do this with solve as well
Or can reuse LR factorizations (discussed next)

A = np.array([[0, 2], [3, 4]])
B = np.array([[2,3], [1,2]])  # [2,1] and [3,2] as columns
# or: B = np.column_stack([np.array([2, 1]),np.array([3,2])])
X = solve(A, B)  # Solve AX = B for X
print(X)
print(f"Checking: A*{X[:,0]} = {A@X[:, 0]} = {B[:,0]}, column of B")

[[-1.         -1.33333333]
 [ 1.          1.5       ]]
Checking: A*[-1.  1.] = [2. 1.] = [2 1], column of B

LU(P) Decompositions

We can “factor” any square \(A\) into \(P A = L U\) for triangular \(L\) and \(U\). Invertible can have \(A = L U\), called the LU decomposition. “P” is for partial-pivoting
Singular matrices may not have full-rank \(L\) or \(U\) matrices

A = np.array([[1, 2], [2, 4]])
P, L, U = lu(A)
print(f"L*U =\n{L @ U}")
print(f"P*A =\n{P @ A}")

L*U =
[[2. 4.]
 [1. 2.]]
P*A =
[[2. 4.]
 [1. 2.]]

P, U, and L

The \(P\) matrix is a permutation matrix of “pivots” the others are triangular

print(f"P =\n{P}")
print(f"L =\n{L}")
print(f"U =\n{U}")

P =
[[0. 1.]
 [1. 0.]]
L =
[[1.  0. ]
 [0.5 1. ]]
U =
[[2. 4.]
 [0. 0.]]

LU Decompositions and Systems of Equations

Pivoting is typically implied when talking about “LU”
Used in the default solve algorithm (without more structure)
Solving systems of equations with triangular matrices: for \(A x = L U x = b\)
1. Define \(y = U x\)
2. Solve \(L y = b\) for \(y\) and \(U x = y\) for \(x\)
Since both are triangular, process is \(O(N^2)\) (but LU itself \(O(N^3)\))
Could be used to find inv
- \(A = L U\) then \(A A^{-1} = I = L U A^{-1} = I\)
- Solve for \(Y\) in \(L Y = I\), then solve \(U A^{-1} = Y\)
Tight connection to textbook Gaussian elimination (including pivoting)

LU for Non-Singular Matrices

A = np.array([[1, 2], [3, 4]])
P, L, U = lu(A)
print(f"L*U =\n{L @ U}")
print(f"P*A =\n{P @ A}")

L*U =
[[3. 4.]
 [1. 2.]]
P*A =
[[3. 4.]
 [1. 2.]]

L, U, P

print(f"P =\n{P}")
print(f"L =\n{L}")
print(f"U =\n{U}")

P =
[[0. 1.]
 [1. 0.]]
L =
[[1.         0.        ]
 [0.33333333 1.        ]]
U =
[[3.         4.        ]
 [0.         0.66666667]]

Backwards Substitution Example

\[ \begin{aligned} U x &= b\\ U &\equiv \begin{bmatrix} 3 & 1 \\ 0 & 2 \\ \end{bmatrix}, \quad b = \begin{bmatrix} 7 \\ 2 \\ \end{bmatrix} \end{aligned} \]

Solving bottom row for \(x_2\)

\[ 2 x_2 = 2,\quad x_2 = 1 \]

Move up a row, solving for \(x_1\), substituting for \(x_2\)

\[ 3 x_1 + 1 x_2 = 7,\quad 3 x_1 + 1 \times 1 = 7,\quad x_1 = 2 \]

Generalizes to many rows. For \(L\) it is “forward substitution”

Use Triangular Structure if Possible

Triangular matrices of size \(N\) can be solved with back substitution in \(O(N^2)\)
Is \(O(N^2)\) good or bad? Beats, \(O(N^3)\) typical of general methods

U = np.array([[3, 1],
              [0, 2]])
b = np.array([7,2])
solve(U,b) # works, but internally does an LU which is O(N^3)
solve_triangular(U, b, lower=False) # fast O(N^2)

array([2., 1.])

Symmetric Matrix Structure

Another common matrix type are symmetric, \(A = A^{T}\)

A = np.array([[1, 2], [2, 5]]) # also posdef, not singular
b = np.array([1,4])
# With scipy 1.11.3 check with scipy.linalg.issymmetric(A)
solve(A, b, assume_a="sym") # could also use "pos" since positive definite

array([-3.,  2.])

Positive Definite Matrices

A symmetric matrix \(A\) is positive definite if \(x^T A x > 0\) for all \(x \neq 0\)
Useful in many areas, such as covariance matrices. Example

A = np.array([[1, 2], [2, 5]])
x = np.array([0, 1]) # can't really check for all x
print(f"x^T A x = {x.T @ A @ x}")

x^T A x = 5

Example of a symmetric matrix that is not positive definite

A = np.array([[1, 2], [2, 0]])
print(f"x^T A x = {x.T @ A @ x}")  # one counterexample is enough

x^T A x = 0

We can check these with eigenvalues

Cholesky Decomposition

For symmetric positive definite matrices: \(L = U^T\)’
Called a Cholesky decomposition: \(A = L L^T\) for a lower triangular matrix \(L\).
Equivalently, could find A = \(U^T U\) for an upper triangular matrix \(U\)

A = np.array([[1, 2], [2, 5]])
L = cholesky(A, lower=True) # cholesky also defined for upper=True
print(L)
print(f"L*L^T =\n{L @ L.T}")

[[1. 0.]
 [2. 1.]]
L*L^T =
[[1. 2.]
 [2. 5.]]

Solving Positive Definite Systems

A = np.array([[1, 2], [2, 5]])
b = np.array([1,4])
print(solve(A, b, assume_a="pos")) # uses cholesky internally

L = cholesky(A, lower=True)
y = solve_triangular(L, b, lower=True)
x = solve_triangular(L.T, y, lower=False)
print(x)

[-3.  2.]
[-3.  2.]

Cholesky for Covariance Matrices

Covariance matrices are positive-definite, semi-definite if degenerate
Key property of Gaussian random variables:
- \(X \sim N(\mu, \Sigma)\) for \(\mu \in \mathbb{R}^N, \Sigma \in \mathbb{R}^{N \times N}\)
- \(X = \mu + A Z\) for \(Z \sim N(0_N, I_N)\) where \(A A^T = \Sigma\)
That is, \(A\) is the Cholesky decomposition of the covariance matrix

Matrices as Linear Transformations

Recall: for \(x \in \mathbb{R}^N\) we should think of a \(f(x) = A x\) for \(A\in\mathbb{R}^{M \times N}\) as a linear transformation from \(\mathbb{R}^N\) to \(\mathbb{R}^M\)
- Definition of Linear: \(f(a x_1 + b x_2) = a f(x_1) + b f(x_2)\) for scalar \(a,b\)
Similarly, the \(y = f(x) = A x\) then \(f^{-1}(y) = A^{-1} y\) goes from \(\mathbb{R}^M\) to \(\mathbb{R}^N\)
- If the matrix is square and invertible, we can go back and forth without losing information (i.e., bijective). Otherwise we may be projected onto a lower-dimensional “manifold”.

Norms and Linear Transformations

The vector norm \(||x||_2\) is an important feature in many applications
- Hence \(||f(x)||_2 = ||A x||_2\) frequently comes up in economics and datascience
- e.g. linear regression is written as minimizing a vector norm
\[ \min_{\beta} ||y - X \beta||_2 \]
Matrix structure or decompositions of \(A\) help us better understand the \(f(x)\) mapping

Orthogonal Matrices

A square matrix \(Q\) is orthogonal if: \(Q^{-1} = Q^T\), and hence \(Q^T Q = Q Q^T = I\)
- For orthogonal \(Q\), \(f(x) = Q x\) is interpreted as rotating \(x\) without stretching
- \(y = f(x) = Q x\) then \(f^{-1}(y) = Q^{-1} y = Q^T y\) is rotating \(y\) back
- Columns are orthonormal: \(Q = \begin{bmatrix}q_1 |& \ldots & | q_N\end{bmatrix}\) then
  - \(q_i \cdot q_j = 0\) for \(i \neq j\) and \(q_i \cdot q_i = 1\)
- Rotation means the length doesn’t change: \(||Q x||_2 = ||x||_2\)
- Transformations which preserve norms are central in many applications within data science, ML, and economics - especially in high-dimensions

Eigenvalues and Eigenvectors

For a square \(A\), an eigenvector \(x\) and eigenvalue \(\lambda\) satisfy \[ A x = \lambda x \]
\(A\in\mathbb{R}^{N\times N}\) has \(N\) eigenvalue/eigenvector pairs, possible multiplicity of \(\lambda\)
Intuition: \(x\) is a direction \(A x \propto x\) and \(\lambda\) says how much it “stretches”

Properties of Eigenvalues and Eigenvectors

For any eigenvector \(x\) and scalar \(c\) then \(c x \propto A x\) as well
Symmetric matrices have real eigenvalues and orthogonal eigenvectors. i.e. \(x_1 \cdot x_2 = 0\) for \(x_1 \neq x_2\) eigenvectors. Complex in general
Singular if and only if it has an eigenvalue of zero
Positive (semi)definite if and only if all eigenvalues are strictly (weakly) positive
Diagonal matrix has eigenvalues as its diagonal
Triangular matrix has eigenvalues as its diagonal

Positive Definite and Eigenvalues

You cannot check \(x^T A x > 0\) for all \(x\). Check if “stretching” is positive

A = np.array([[3, 1], [2, 1]])
# A_eigs = np.real(eigvals(A)) # symmetric matrices have real eigenvalues
A_eigs = eigvalsh(A) # specialized for symmetric/hermitian matrices
print(A_eigs)
is_positive_definite = np.all(A_eigs > 0)
is_positive_semi_definite = np.all(A_eigs >= 0) # or eigvals(A) >= -eps
print(f"pos-def? {is_positive_definite}")
print(f"pos-semi-def? {is_positive_semi_definite}")

[-0.23606798  4.23606798]
pos-def? False
pos-semi-def? False

Positive Semi-Definite Matrices May Have a Zero Eigenvalue

The simplest positive-semi-definite (but not posdef) matrix is

A_eigs = eigvalsh(np.array([[1, 0], [0, 0]]))
print(A_eigs)
is_positive_definite = np.all(A_eigs > 0)
is_positive_semi_definite = np.all(A_eigs >= 0) # or eigvals(A) >= -eps
print(f"pos-def? {is_positive_definite}")
print(f"pos-semi-def? {is_positive_semi_definite}")

[0. 1.]
pos-def? False
pos-semi-def? True

Eigen Decomposition

For square, symmetric, non-singular matrix \(A\) factor into

\[ A = Q \Lambda Q^{-1} \]

\(Q\) is a matrix of eigenvectors, \(\Lambda\) is a diagonal matrix of paired eigenvalues
For symmetric matrices, the eigenvectors are orthogonal and \(Q^{-1} Q = Q^T Q = I\) which form an orthonormal basis
Orthogonal matrices can be thought of as rotations without stretching
More general matrices all have a Singular Value Decomposition (SVD)
With symmetric \(A\), an interpretation of \(A x\) is that we can first rotate \(x\) into the \(Q\) basis, then stretch by \(\Lambda\), then rotate back

Eigendecompositions and Matrix Powers

Can be used to find \(A^t\) for large \(t\) (e.g. for Markov chains)
- \(P^t\), i.e. \(P \cdot P \cdot \ldots \cdot P\) for \(t\) times
- \(P = Q \Lambda Q^{-1}\) then \(P^t = Q \Lambda^t Q^{-1}\) where \(\Lambda^t\) is just the pointwise power
Related tools such as SVD can help with dimensionality reduction

Spectral/Eigendecomposition of Symmetric Matrix Example

A = np.array([[2, 1], [1, 3]])
Lambda, Q = eig(A)
print(f"eigenvectors are column-by-column in Q =\n{Q}")
print(f"eigenvalues are in Lambda = {Lambda}")
print(f"Q Lambda Q^T =\n{Q @ np.diag(np.real(Lambda)) @ Q.T}")

eigenvectors are column-by-column in Q =
[[-0.85065081 -0.52573111]
 [ 0.52573111 -0.85065081]]
eigenvalues are in Lambda = [1.38196601+0.j 3.61803399+0.j]
Q Lambda Q^T =
[[2. 1.]
 [1. 3.]]

Spectral Radius is Maximum Absolute Eigenvalue

If any \(\lambda \in \Lambda\) are \(> 1\) can see this would explode
Useful for seeing if iteration \(x_{t+1} = A x_t\) from a \(x_0\) explodes
The spectral radius of matrix \(A\) is

\[ \rho(A) = \max_{\lambda \in \Lambda}|\lambda| \]

Least Squares and the Normal Equations

Least Squares

Given a matrix \(X \in \mathbb{R}^{N \times M}\) and a vector \(y \in \mathbb{R}^N\), we want to find \(\beta \in \mathbb{R}^M\) such that \[ \begin{aligned} \min_{\beta} &||y - X \beta||^2, \text{ that is,}\\ \min_{\beta} &\sum_{n=1}^N \frac{1}{N}(y_n - X_n \cdot \beta)^2 \end{aligned} \]

Where \(X_n\) is n’th row. Take FOCS and rearrange to get

\[ (X^T X) \beta = X^T y \]

Solving the Normal Equations

The \(X\) is often referred to as the “design matrix”. \(X^T X\) as the Gram matrix
Can form \(A = X^T X\) and \(b = X^T y\) and solve \(A \beta = b\).
- Or invert \(X^T X\) to get
\[ \beta = (X^T X)^{-1} X^T y \]
- Note that \(X^T X\) is symmetric and, if \(X\) is full-rank, positive definite

Solving Regression Models in Practice

In practice, use the lstsq function in scipy
- It uses better algorithms using eigenvectors. More stable (see next lecture on conditioning)
- One algorithm uses another factoring, the QR decomposition
- There, \(X = Q R\) for \(Q\) orthogonal and \(R\) upper triangular. See QR Decomposition for more
Better yet, for applied work use higher-level libraries like statsmodels (integrates well with pandas and seaborn)
- See statsmodels docs for R-style notation
- See QuantEcon OLS Notes for more.

Example of LLS using Scipy

N, M = 100, 5
X = np.random.randn(N, M)
beta = np.random.randn(M)
y = X @ beta + 0.05 * np.random.randn(N)
beta_hat, residuals, rank, s = scipy.linalg.lstsq(X, y)
print(f"beta =\n {beta}\nbeta_hat =\n{beta_hat}")

beta =
 [-0.37740033 -0.03482823 -0.62279476  1.50312151  0.45715271]
beta_hat =
[-0.38415748 -0.03090123 -0.61707946  1.50418288  0.45413767]

Solving using the Normal Equations

Or we can solve it directly. Provide matrix structure (so it can use a Cholesky)

beta_hat = solve(X.T @ X, X.T @ y, assume_a="pos")
print(f"beta =\n {beta}\nbeta_hat =\n{beta_hat}")

beta =
 [-0.37740033 -0.03482823 -0.62279476  1.50312151  0.45715271]
beta_hat =
[-0.38415748 -0.03090123 -0.61707946  1.50418288  0.45413767]

Collinearity in “Tall” Matrices

Tall \(\mathbb{R}^{N\times M}\) “design matrices” have \(N > M\) and are “overdetermined”
The rank of a matrix is full rank if all columns are linearly independent
You can only identify \(M\) parameters with \(M\) linearly independent columns

X = np.array([[1, 2], [2, 5], [3, 7]]) # 3 observations, 2 variables
X_col = np.array([[1, 2], [2, 4], [3, 6]]) # all proportional
print(f"rank(X) = {matrix_rank(X)}, rank(X_col) = {matrix_rank(X_col)}")

rank(X) = 2, rank(X_col) = 1

Collinearity and Estimation

If \(X\) is not full rank, then \(X^T X\) is not invertible. For example:

print(f"cond(X'*X)={cond(X.T@X)}, cond(X_col'*X_col)={cond(X_col.T@X_col)}")

cond(X'*X)=2819.3329786399063, cond(X_col'*X_col)=1.2999933999712892e+16

Note that when you start doing operations on matrices, numerical error creeps in, so you will not get an exact number
The rule-of-thumb with condition numbers is that if it is \(1\times 10^k\) then you lose about \(k\) digits of precision. So this effectively means it is singular
Given the singular matrix, this means a continuum of \(\beta\) will solve the problem

`lstsq` Solves it? Careful on Interpretation!

Since \(X_{col}^T X_{col}\) is singular, we cannot use solve(X.T@X, y)
But what about lstsq methods?
As you will see, this gives an answer. Interpretation is hard
The key is that in the case of non-full rank, you cannot identify individual parameters
- Related to “Identification” in econometrics
- Having low residuals is not enough

y = np.array([5.0, 10.1, 14.9])
beta_hat, residuals, rank, s = scipy.linalg.lstsq(X_col, y)
print(f"beta_hat_col = {beta_hat}")
print(f"rank={rank}, cols={X.shape[1]}, norm(X*beta_hat_col-y)={norm(residuals)}")

beta_hat_col = [0.99857143 1.99714286]
rank=1, cols=2, norm(X*beta_hat_col-y)=0.0

Fat Design Matrices

Fat \(\mathbb{R}^{N\times M}\) “design matrices” have \(N < M\) and are “underdetermined”
Less common in econometrics, but useful to understand the structure
A continuum \(\beta\in\mathbb{R}^{M - \text{rank}(X)}\) solve this problem

X = np.array([[1, 2, 3], [0, 5, 7]]) # 2 rows, 3 variables
y = np.array([5, 10])
beta_hat, residuals, rank, s = scipy.linalg.lstsq(X, y)
print(f"beta_hat = {beta_hat}, rank={rank}, ? residuals = {residuals}")

beta_hat = [0.8 0.6 1. ], rank=2, ? residuals = []

Which Solution?

Residuals are zero here because there are enough parameters to fit perfectly (i.e., it is underdetermined)
Given the multiple solutions, the lstsq is giving

\[ \min_{\beta} ||\beta||_2^2 \text{ s.t. } X \beta = y \]

i.e., the “smallest” coefficients which interpolate the data exactly
Which trivially fulfills the OLS objective: \(\min_{\beta} ||y - X \beta||_2^2\)

Careful Interpreting Underdetermined Solutions

Useful and common in ML, but be very careful when interpreting for economics
- Tight connections to Bayesian versions of statistical tests
- But until you understand econometrics and “identification” well, stick to full-rank matrices
- Advanced topics: search for “Regularization”, “Ridgeless Regression” and “Benign Overfitting in Linear Regression.”

ECON526: Quantitative Economics with Data Science Applications

Going Beyond “reg y x, robust”

Packages

Basic Computational Complexity

Examples of Computational Complexity

Numerical Precision

Basic Linear Algebra

Norms

Solving Systems of Equations

Using the Inverse Directly

Linear Combinations

Column Space and Rank

Singular Matrices

Checking Singularity

Determinant is Not Scale Invariant

Interpreting Condition Numbers

Rules of Thumb

Solving Linear Systems of Equations

Solving Systems with Multiple RHS

LU(P) Decompositions

P, U, and L

LU Decompositions and Systems of Equations

LU for Non-Singular Matrices

L, U, P

Backwards Substitution Example

Use Triangular Structure if Possible

Symmetric Matrix Structure

Positive Definite Matrices

Cholesky Decomposition

Solving Positive Definite Systems

Cholesky for Covariance Matrices

Matrices as Linear Transformations

Norms and Linear Transformations

Orthogonal Matrices

Eigenvalues and Eigenvectors

Eigenvalues and Eigenvectors

Properties of Eigenvalues and Eigenvectors

Positive Definite and Eigenvalues

Positive Semi-Definite Matrices May Have a Zero Eigenvalue

Eigen Decomposition

Eigendecompositions and Matrix Powers

Spectral/Eigendecomposition of Symmetric Matrix Example

Spectral Radius is Maximum Absolute Eigenvalue

Least Squares and the Normal Equations

Least Squares

Solving the Normal Equations

Solving Regression Models in Practice

Example of LLS using Scipy

Solving using the Normal Equations

Collinearity in “Tall” Matrices

Collinearity and Estimation

lstsq Solves it? Careful on Interpretation!

Fat Design Matrices

Which Solution?

Careful Interpreting Underdetermined Solutions

Going Beyond “`reg y x, robust`”

`lstsq` Solves it? Careful on Interpretation!