Foundations of Numerical Linear Algebra
reg y x, robust
”Data science, econometrics, and macroeconomics are built on linear algebra.
Numerical linear algebra has all sorts of pitfalls, which become more critical as we scale up to larger problems.
Speed differences in choosing better algorithms can be orders of magnitude.
Crucial to know what goes on under-the-hood in Stata/R/python packages for applied work, even if you don’t implement it yourself.
Material here is related to
This section uses the following packages:
Big-O Notation
For a function \(f(N)\) and a positive constant \(C\), we say \(f(N)\) is \(O(g(N))\), if there exist positive constants \(C\) and \(N_0\) such that:
\[ 0 \leq f(N) \leq C \cdot g(N) \quad \text{for all } N \geq N_0 \]
Machine Epsilon
For a given datatype, \(\epsilon\) is defined as \(\epsilon = \min_{\delta > 0} \left\{ \delta : 1 + \delta > 1 \right\}\)
print(f"machine epsilon for float64 = {np.finfo(float).eps}")
print(f"1 + eps/2 == 1? {1.0 + 1.1e-16 == 1.0}")
print(f"machine epsilon for float32 = {np.finfo(np.float32).eps}")
machine epsilon for float64 = 2.220446049250313e-16
1 + eps/2 == 1? True
machine epsilon for float32 = 1.1920928955078125e-07
x = np.array([1, 2, 3]) # Calculating different ways (in order of preference)
print(np.sqrt(sum(xval**2 for xval in x))) # manual with comprehensions
print(np.sqrt(np.sum(np.square(x)))) # broadcasts
print(norm(x)) # built-in to numpy norm(x, ord=2) alternatively
print(f"||x||_2^2 = {norm(x)**2} = {x.T @ x} = {np.dot(x, x)}")
3.7416573867739413
3.7416573867739413
3.7416573867739413
||x||_2^2 = 14.0 = 14 = 14
solve
with a calculation of an inverseWe can think of solving a system as finding the linear combination of columns of \(A\) that equal \(b\)
Hence, can solve \(A x = b\) for any \(b \in \mathbb{R}^2\) since the column space is the entire space \(\mathbb{R}^2\)
On the other hand, note
So we can only solve \(A x = b\) for \(b \propto \begin{bmatrix} 1 \\ 2 \end{bmatrix} \propto \begin{bmatrix} 2 \\ 4 \end{bmatrix}\)
A = np.array([[1, 2], [2, 4]])
# An (expensive) way to check if A is singular is if det(A) = 0
print(det(A) == 0.0)
print(matrix_rank(A) != A.shape[0]) # or check rank
# Check before inverting or use exceptions
try:
inv(A)
print("Matrix is not singular (invertible).")
except np.linalg.LinAlgError:
print("Matrix is singular (non-invertible).")
True
True
Matrix is singular (non-invertible).
eps, K = 1e-8, 100000
A = np.array([[1, 2], [1 + eps, 2 + eps]])
print(f"det(A)={det(A):.5g}, det(K*A)={det(K*A):.5g}")
print(f"cond(A)={cond(A):.5g}, cond(K*A)={cond(K*A):.5g},")
print(f"det(inv(A))={det(inv(A)):.5g}, cond(inv(A))={cond(inv(A)):.5g}")
det(A)=-1e-08, det(K*A)=-100
cond(A)=1e+09, cond(K*A)=1e+09,
det(inv(A))=-1e+08, cond(inv(A))=1e+09
solve
as wellA = np.array([[0, 2], [3, 4]])
B = np.array([[2,3], [1,2]]) # [2,1] and [3,2] as columns
# or: B = np.column_stack([np.array([2, 1]),np.array([3,2])])
X = solve(A, B) # Solve AX = B for X
print(X)
print(f"Checking: A*{X[:,0]} = {A@X[:, 0]} = {B[:,0]}, column of B")
[[-1. -1.33333333]
[ 1. 1.5 ]]
Checking: A*[-1. 1.] = [2. 1.] = [2 1], column of B
The \(P\) matrix is a permutation matrix of “pivots” the others are triangular
solve
algorithm (without more structure)inv
\[ \begin{aligned} U x &= b\\ U &\equiv \begin{bmatrix} 3 & 1 \\ 0 & 2 \\ \end{bmatrix}, \quad b = \begin{bmatrix} 7 \\ 2 \\ \end{bmatrix} \end{aligned} \]
Solving bottom row for \(x_2\)
\[ 2 x_2 = 2,\quad x_2 = 1 \]
Move up a row, solving for \(x_1\), substituting for \(x_2\)
\[ 3 x_1 + 1 x_2 = 7,\quad 3 x_1 + 1 \times 1 = 7,\quad x_1 = 2 \]
Generalizes to many rows. For \(L\) it is “forward substitution”
Another common matrix type are symmetric, \(A = A^{T}\)
A = np.array([[1, 2], [2, 5]])
x = np.array([0, 1]) # can't really check for all x
print(f"x^T A x = {x.T @ A @ x}")
x^T A x = 5
x^T A x = 0
The vector norm \(||x||_2\) is an important feature in many applications
\[ \min_{\beta} ||y - X \beta||_2 \]
Matrix structure or decompositions of \(A\) help us better understand the \(f(x)\) mapping
For a square \(A\), an eigenvector \(x\) and eigenvalue \(\lambda\) satisfy \[ A x = \lambda x \]
\(A\in\mathbb{R}^{N\times N}\) has \(N\) eigenvalue/eigenvector pairs, possible multiplicity of \(\lambda\)
Intuition: \(x\) is a direction \(A x \propto x\) and \(\lambda\) says how much it “stretches”
You cannot check \(x^T A x > 0\) for all \(x\). Check if “stretching” is positive
A = np.array([[3, 1], [2, 1]])
# A_eigs = np.real(eigvals(A)) # symmetric matrices have real eigenvalues
A_eigs = eigvalsh(A) # specialized for symmetric/hermitian matrices
print(A_eigs)
is_positive_definite = np.all(A_eigs > 0)
is_positive_semi_definite = np.all(A_eigs >= 0) # or eigvals(A) >= -eps
print(f"pos-def? {is_positive_definite}")
print(f"pos-semi-def? {is_positive_semi_definite}")
[-0.23606798 4.23606798]
pos-def? False
pos-semi-def? False
The simplest positive-semi-definite (but not posdef) matrix is
A_eigs = eigvalsh(np.array([[1, 0], [0, 0]]))
print(A_eigs)
is_positive_definite = np.all(A_eigs > 0)
is_positive_semi_definite = np.all(A_eigs >= 0) # or eigvals(A) >= -eps
print(f"pos-def? {is_positive_definite}")
print(f"pos-semi-def? {is_positive_semi_definite}")
[0. 1.]
pos-def? False
pos-semi-def? True
\[ A = Q \Lambda Q^{-1} \]
A = np.array([[2, 1], [1, 3]])
Lambda, Q = eig(A)
print(f"eigenvectors are column-by-column in Q =\n{Q}")
print(f"eigenvalues are in Lambda = {Lambda}")
print(f"Q Lambda Q^T =\n{Q @ np.diag(np.real(Lambda)) @ Q.T}")
eigenvectors are column-by-column in Q =
[[-0.85065081 -0.52573111]
[ 0.52573111 -0.85065081]]
eigenvalues are in Lambda = [1.38196601+0.j 3.61803399+0.j]
Q Lambda Q^T =
[[2. 1.]
[1. 3.]]
\[ \rho(A) = \max_{\lambda \in \Lambda}|\lambda| \]
Given a matrix \(X \in \mathbb{R}^{N \times M}\) and a vector \(y \in \mathbb{R}^N\), we want to find \(\beta \in \mathbb{R}^M\) such that \[ \begin{aligned} \min_{\beta} &||y - X \beta||^2, \text{ that is,}\\ \min_{\beta} &\sum_{n=1}^N \frac{1}{N}(y_n - X_n \cdot \beta)^2 \end{aligned} \]
Where \(X_n\) is n’th row. Take FOCS and rearrange to get
\[ (X^T X) \beta = X^T y \]
The \(X\) is often referred to as the “design matrix”. \(X^T X\) as the Gram matrix
Can form \(A = X^T X\) and \(b = X^T y\) and solve \(A \beta = b\).
\[ \beta = (X^T X)^{-1} X^T y \]
lstsq
function in scipy
statsmodels
(integrates well with pandas
and seaborn
)
N, M = 100, 5
X = np.random.randn(N, M)
beta = np.random.randn(M)
y = X @ beta + 0.05 * np.random.randn(N)
beta_hat, residuals, rank, s = scipy.linalg.lstsq(X, y)
print(f"beta =\n {beta}\nbeta_hat =\n{beta_hat}")
beta =
[-0.37740033 -0.03482823 -0.62279476 1.50312151 0.45715271]
beta_hat =
[-0.38415748 -0.03090123 -0.61707946 1.50418288 0.45413767]
Or we can solve it directly. Provide matrix structure (so it can use a Cholesky)
cond(X'*X)=2819.3329786399063, cond(X_col'*X_col)=1.2999933999712892e+16
lstsq
Solves it? Careful on Interpretation!solve(X.T@X, y)
lstsq
methods?lstsq
is giving\[ \min_{\beta} ||\beta||_2^2 \text{ s.t. } X \beta = y \]