Solving Nonlinear Equations

Paul Schrimpf

2024-09-23

\[ \def\Er{{\mathrm{E}}} \def\En{{\mathbb{E}_n}} \def\cov{{\mathrm{Cov}}} \def\var{{\mathrm{Var}}} \def\R{{\mathbb{R}}} \def\arg{{\mathrm{arg}}} \newcommand\norm[1]{\left\lVert#1\right\rVert} \def\rank{{\mathrm{rank}}} \newcommand{\inpr}{ \overset{p^*_{\scriptscriptstyle n}}{\longrightarrow}} \def\inprob{{\,{\buildrel p \over \rightarrow}\,}} \def\indist{\,{\buildrel d \over \rightarrow}\,} \DeclareMathOperator*{\plim}{plim} \DeclareMathOperator*{\argmax}{argmax} \DeclareMathOperator*{\argmin}{argmin} \]

Nonlinear Equations

\(F: \R^n \to \R^n\)
Want to solve for \(x\) \[ F(x) = 0 \]

Example: BLP

Share equation \[ s_{j} = \int \frac{e^{\delta_{j} + x_{j}'\Sigma \nu}} {\sum_{k = 0}^J e^{\delta_{k} + x_{k}'\Sigma \nu}} dF\nu \]
\(J\) equations to solve for \(\delta = (\delta_{1t}, ..., \delta_{Jt})\)

Newton’s Method

\(F(x)\) differentiable with Jacobian \(F'(x)\)
Algorithm:
1. Initial guess \(x_0\)
2. Update based on first order expansion \[ \begin{align*} F(x_{s+1}) \approx F(x_s) + F'(x_s)(x_{s+1} - x_s) = & 0 \\ x_{s+1} = & x_s + F'(x_s)^{-1} F(x_s) \end{align*} \]
3. Repeat until \(\Vert F(x_s) \Vert \approx 0\)

Simple Idea, Many Variations

Step size
1. Initial guess \(x_0\)
2. Update based on first order expansion \[ \begin{align*} F(x_{s+1}) \approx F(x_s) + F'(x_s)(x_{s+1} - x_s) = & 0 \\ x_{s+1} = x_s + {\color{red}{\lambda}} F'(x_s)^{-1} F(x_s) \end{align*} \]
3. Repeat until \(\Vert F(x_s) \Vert \approx 0\)
line search or trust region

Simple Idea, Many Variations

1. Initial guess \(x_0\)
2. Update based on first order expansion \[ \begin{align*} F(x_{s+1}) \approx F(x_s) + F'(x_s)(x_{s+1} - x_s) = & 0 \end{align*} \] approximately solve \[ F'(x_s) A = F(x_s) \] update \[ x_{s+1} = x_s + \lambda {\color{red}{A}} \]
3. Repeat until \(\Vert F(x_s) \Vert \approx 0\)
Especially if \(F'(x_s)\) is large and/or sparse

Simple Idea, Many Variations

1. Initial guess \(x_0\)
2. Update based on first order expansion \[ \begin{align*} F(x_{s+1}) \approx F(x_s) + F'(x_s)(x_{s+1} - x_s) = & 0 \\ x_{s+1} = x_s + F'(x_s)^{-1} F(x_s) \end{align*} \]
3. Repeat until \({\color{red}{\Vert F(x_{s+1}) \Vert < rtol \Vert F(x_0) \Vert + atol }}\)

Simple Idea, Many Variations

1. Initial guess \(x_0\)
2. Update based on first order expansion
  - compute \(F'(x_s)\) using:
    - hand written code or
    - finite differences or
    - secant method \(F'(x_s) \approx \frac{F(x_s) - F(x_{s-1})}{\Vert x_s - x_{s-1} \Vert}\) or
    - automatic differentiation \[ \begin{align*} F(x_{s+1}) \approx F(x_s) + F'(x_s)(x_{s+1} - x_s) = & 0 \\ x_{s+1} = x_s + F'(x_s)^{-1} F(x_s) \end{align*} \]
3. Repeat until \(\Vert F(x_{s+1}) \Vert \approx 0\)

Simple Idea, Many Variations

Kelley (2022) is thorough reference for nonlinear equation solving methods and their properties
NonlinearSolve.jl gives unified interface for many methods

Examples

BLP share equation

@doc raw"""
    share(δ, Σ, dFν, x)

Computes shares in random coefficient logit with mean tastes `δ`, observed characteristics `x`, unobserved taste distribution `dFν`, and taste covariances `Σ`.

# Arguments

- `δ` vector of length `J`
- `Σ` `K` by `K` matrix
- `dFν` distribution of length `K` vector
- `x` `J` by `K` array

# Returns

- vector of length `J` consisting of $s_1$, ..., $s_J$
"""
function share(δ, Σ, dFν, x, ∫ = ∫cuba)
  J,K = size(x)
  (length(δ) == J) || error("length(δ)=$(length(δ)) != size(x,1)=$J")
  (K,K) === size(Σ) || error("size(x,2)=$K != size(Σ)=$(size(Σ))")
  function shareν(ν)
    s = δ + x*Σ*ν
    s .-= maximum(s)
    s .= exp.(s)
    s ./= sum(s)
    return(s)
  end
  return(∫(shareν, dFν))
end

using HCubature
function ∫cuba(f, dx; rtol=1e-4)
  D = length(dx)
  x(t) = t./(1 .- t.^2)
  Dx(t) = prod((1 .+ t.^2)./(1 .- t.^2).^2)
  hcubature(t->f(x(t))*pdf(dx,x(t))*Dx(t), -ones(D),ones(D), rtol=rtol)[1]
end

using SparseGrids, FastGaussQuadrature, Distributions
function ∫sgq(f, dx::MvNormal; order=5)
  X, W = sparsegrid(length(dx), order, gausshermite, sym=true)
  L = cholesky(dx.Σ).L
  sum(f(√2*L*x + dx.μ)*w for (x,w) ∈ zip(X,W))/(π^(length(dx)/2))
end

∫sgq (generic function with 1 method)

Solving for \(\delta\)

# create a problem to solve
using LinearAlgebra
J = 3
K = 2
x = randn(J,K)
C = randn(K,K)
Σ = C'*C + I
δ = randn(J)
dFν = MvNormal(zeros(K), I)
s = share(δ, Σ, dFν, x, ∫sgq)

# try to recover δ
using NonlinearSolve, LinearAlgebra
p = (Σ=Σ, x=x)
F(d, p) = share([0, d...],p.Σ,dFν, p.x,∫sgq) - s
d0 = zeros(length(δ)-1)
prob = NonlinearProblem(F, d0, p)
sol = solve(prob, show_trace=Val(true), trace_level=TraceAll())

δ = δ[2:end].-δ[1]
println("True δ: $(δ)")
println("Solved δ: $(sol.u)")
println("||F(sol.u)||: $(norm(sol.resid))")
println("Error: $(norm(sol.u - δ))")


Algorithm: NewtonRaphson(
   descent = NewtonDescent()
)

----     -------------        -----------          -------             
Iter     f(u) inf-norm        Step 2-norm          cond(J)             
----     -------------        -----------          -------             
0        3.54967143e-01       5.92878775e-322      Inf                 
1        4.13880262e-02       7.72383583e-01       2.73588247e+00      
2        7.64865333e-03       4.31612562e-01       8.06142988e+00      
3        1.14967474e-03       1.14838074e-01       1.42242944e+01      
4        1.65278023e-04       4.87198810e-02       2.44051669e+01      
5        5.85421573e-06       9.24852841e-03       3.32829238e+01      
6        8.40818404e-09       3.51367336e-04       3.57326286e+01      
7        1.75137682e-14       5.05626714e-07       3.58325861e+01      
Final    1.75137682e-14      
----------------------      
True δ: [-0.7263649359693549, 0.4686811504051215]
Solved δ: [-0.7263649359683523, 0.46868115040542213]
||F(sol.u)||: 2.2115203043548142e-14
Error: 1.0467479481157482e-12

Alternative algorithms

solr = solve(prob, RobustMultiNewton(), show_trace=Val(true), trace_level=TraceAll())


Algorithm: TrustRegion(
   trustregion = GenericTrustRegionScheme(method = RadiusUpdateSchemes.Simple),
   descent = Dogleg(newton_descent = NewtonDescent(), steepest_descent = SteepestDescent())
)

----     -------------        -----------          -------             
Iter     f(u) inf-norm        Step 2-norm          cond(J)             
----     -------------        -----------          -------             
0        3.54967143e-01       6.54784263e-310      Inf                 
1        3.36075646e-01       3.98826683e-02       2.73588247e+00      
2        2.97527757e-01       7.97653366e-02       2.84184080e+00      
3        2.17869604e-01       1.59530673e-01       3.08804897e+00      
4        5.25891857e-02       3.19061346e-01       3.75535293e+00      
5        6.77935485e-03       2.22938570e-01       6.56376700e+00      
6        1.72409426e-03       1.54193501e-01       1.24526403e+01      
7        2.80691101e-04       6.27258034e-02       2.18669326e+01      
8        1.55370564e-05       1.50461351e-02       3.17764284e+01      
9        5.87411523e-08       9.28602780e-04       3.55687426e+01      
10       8.49598170e-13       3.53247281e-06       3.58317226e+01      
11       1.11022302e-16       5.10928003e-11       3.58327304e+01      
Final    1.11022302e-16      
----------------------

retcode: Success
u: 2-element Vector{Float64}:
 -0.7263649359693556
  0.468681150405121

using FixedPointAcceleration
G(δ,p) = log.(s) - log.(share([0,δ...], Σ, dFν, x, ∫sgq))
probfp = NonlinearProblem(G, d0, nothing)
solfp = solve(probfp, FixedPointAccelerationJL(algorithm=:Anderson, m=1) )

retcode: Failure
u: 2-element Vector{Float64}:
 0.0
 0.0

References

Kelley, C. T. 2022. Solving Nonlinear Equations with Iterative Methods: Solvers and Examples in Julia. Philadelphia, PA: Society for Industrial; Applied Mathematics. https://doi.org/10.1137/1.9781611977271.