Solving Nonlinear Equations

Paul Schrimpf

2024-09-23

\[ \def\Er{{\mathrm{E}}} \def\En{{\mathbb{E}_n}} \def\cov{{\mathrm{Cov}}} \def\var{{\mathrm{Var}}} \def\R{{\mathbb{R}}} \def\arg{{\mathrm{arg}}} \newcommand\norm[1]{\left\lVert#1\right\rVert} \def\rank{{\mathrm{rank}}} \newcommand{\inpr}{ \overset{p^*_{\scriptscriptstyle n}}{\longrightarrow}} \def\inprob{{\,{\buildrel p \over \rightarrow}\,}} \def\indist{\,{\buildrel d \over \rightarrow}\,} \DeclareMathOperator*{\plim}{plim} \DeclareMathOperator*{\argmax}{argmax} \DeclareMathOperator*{\argmin}{argmin} \]

Nonlinear Equations

Nonlinear Equations

  • \(F: \R^n \to \R^n\)
  • Want to solve for \(x\) \[ F(x) = 0 \]

Example: BLP

  • Share equation \[ s_{j} = \int \frac{e^{\delta_{j} + x_{j}'\Sigma \nu}} {\sum_{k = 0}^J e^{\delta_{k} + x_{k}'\Sigma \nu}} dF\nu \]
  • \(J\) equations to solve for \(\delta = (\delta_{1t}, ..., \delta_{Jt})\)

Newton’s Method

  • \(F(x)\) differentiable with Jacobian \(F'(x)\)
  • Algorithm:
    1. Initial guess \(x_0\)
    2. Update based on first order expansion \[ \begin{align*} F(x_{s+1}) \approx F(x_s) + F'(x_s)(x_{s+1} - x_s) = & 0 \\ x_{s+1} = & x_s + F'(x_s)^{-1} F(x_s) \end{align*} \]
    3. Repeat until \(\Vert F(x_s) \Vert \approx 0\)

Simple Idea, Many Variations

  • Step size
    1. Initial guess \(x_0\)
    2. Update based on first order expansion \[ \begin{align*} F(x_{s+1}) \approx F(x_s) + F'(x_s)(x_{s+1} - x_s) = & 0 \\ x_{s+1} = x_s + {\color{red}{\lambda}} F'(x_s)^{-1} F(x_s) \end{align*} \]
    3. Repeat until \(\Vert F(x_s) \Vert \approx 0\)
  • line search or trust region

Simple Idea, Many Variations

    1. Initial guess \(x_0\)
    2. Update based on first order expansion \[ \begin{align*} F(x_{s+1}) \approx F(x_s) + F'(x_s)(x_{s+1} - x_s) = & 0 \end{align*} \] approximately solve \[ F'(x_s) A = F(x_s) \] update \[ x_{s+1} = x_s + \lambda {\color{red}{A}} \]
    3. Repeat until \(\Vert F(x_s) \Vert \approx 0\)
  • Especially if \(F'(x_s)\) is large and/or sparse

Simple Idea, Many Variations

    1. Initial guess \(x_0\)
    2. Update based on first order expansion \[ \begin{align*} F(x_{s+1}) \approx F(x_s) + F'(x_s)(x_{s+1} - x_s) = & 0 \\ x_{s+1} = x_s + F'(x_s)^{-1} F(x_s) \end{align*} \]
    3. Repeat until \({\color{red}{\Vert F(x_{s+1}) \Vert < rtol \Vert F(x_0) \Vert + atol }}\)

Simple Idea, Many Variations

    1. Initial guess \(x_0\)
    2. Update based on first order expansion
      • compute \(F'(x_s)\) using:
        • hand written code or
        • finite differences or
        • secant method \(F'(x_s) \approx \frac{F(x_s) - F(x_{s-1})}{\Vert x_s - x_{s-1} \Vert}\) or
        • automatic differentiation \[ \begin{align*} F(x_{s+1}) \approx F(x_s) + F'(x_s)(x_{s+1} - x_s) = & 0 \\ x_{s+1} = x_s + F'(x_s)^{-1} F(x_s) \end{align*} \]
    3. Repeat until \(\Vert F(x_{s+1}) \Vert \approx 0\)

Simple Idea, Many Variations

  • Kelley (2022) is thorough reference for nonlinear equation solving methods and their properties
  • NonlinearSolve.jl gives unified interface for many methods

Examples

BLP share equation

@doc raw"""
    share(δ, Σ, dFν, x)

Computes shares in random coefficient logit with mean tastes `δ`, observed characteristics `x`, unobserved taste distribution `dFν`, and taste covariances `Σ`.

# Arguments

- `δ` vector of length `J`
- `Σ` `K` by `K` matrix
- `dFν` distribution of length `K` vector
- `x` `J` by `K` array

# Returns

- vector of length `J` consisting of $s_1$, ..., $s_J$
"""
function share(δ, Σ, dFν, x, ∫ = ∫cuba)
  J,K = size(x)
  (length(δ) == J) || error("length(δ)=$(length(δ)) != size(x,1)=$J")
  (K,K) === size(Σ) || error("size(x,2)=$K != size(Σ)=$(size(Σ))")
  function shareν(ν)
    s = δ + x*Σ*ν
    s .-= maximum(s)
    s .= exp.(s)
    s ./= sum(s)
    return(s)
  end
  return((shareν, dFν))
end

using HCubature
function ∫cuba(f, dx; rtol=1e-4)
  D = length(dx)
  x(t) = t./(1 .- t.^2)
  Dx(t) = prod((1 .+ t.^2)./(1 .- t.^2).^2)
  hcubature(t->f(x(t))*pdf(dx,x(t))*Dx(t), -ones(D),ones(D), rtol=rtol)[1]
end

using SparseGrids, FastGaussQuadrature, Distributions
function ∫sgq(f, dx::MvNormal; order=5)
  X, W = sparsegrid(length(dx), order, gausshermite, sym=true)
  L = cholesky(dx.Σ).L
  sum(f(√2*L*x + dx.μ)*w for (x,w)  zip(X,W))/(π^(length(dx)/2))
end
∫sgq (generic function with 1 method)

Solving for \(\delta\)

# create a problem to solve
using LinearAlgebra
J = 3
K = 2
x = randn(J,K)
C = randn(K,K)
Σ = C'*C + I
δ = randn(J)
dFν = MvNormal(zeros(K), I)
s = share(δ, Σ, dFν, x, ∫sgq)

# try to recover δ
using NonlinearSolve, LinearAlgebra
p ==Σ, x=x)
F(d, p) = share([0, d...],p.Σ,dFν, p.x,∫sgq) - s
d0 = zeros(length(δ)-1)
prob = NonlinearProblem(F, d0, p)
sol = solve(prob, show_trace=Val(true), trace_level=TraceAll())

δ = δ[2:end].-δ[1]
println("True δ: $(δ)")
println("Solved δ: $(sol.u)")
println("||F(sol.u)||: $(norm(sol.resid))")
println("Error: $(norm(sol.u - δ))")

Algorithm: NewtonRaphson(
   descent = NewtonDescent()
)

----     -------------        -----------          -------             
Iter     f(u) inf-norm        Step 2-norm          cond(J)             
----     -------------        -----------          -------             
0        3.54967143e-01       5.92878775e-322      Inf                 
1        4.13880262e-02       7.72383583e-01       2.73588247e+00      
2        7.64865333e-03       4.31612562e-01       8.06142988e+00      
3        1.14967474e-03       1.14838074e-01       1.42242944e+01      
4        1.65278023e-04       4.87198810e-02       2.44051669e+01      
5        5.85421573e-06       9.24852841e-03       3.32829238e+01      
6        8.40818404e-09       3.51367336e-04       3.57326286e+01      
7        1.75137682e-14       5.05626714e-07       3.58325861e+01      
Final    1.75137682e-14      
----------------------      
True δ: [-0.7263649359693549, 0.4686811504051215]
Solved δ: [-0.7263649359683523, 0.46868115040542213]
||F(sol.u)||: 2.2115203043548142e-14
Error: 1.0467479481157482e-12

Alternative algorithms

solr = solve(prob, RobustMultiNewton(), show_trace=Val(true), trace_level=TraceAll())

Algorithm: TrustRegion(
   trustregion = GenericTrustRegionScheme(method = RadiusUpdateSchemes.Simple),
   descent = Dogleg(newton_descent = NewtonDescent(), steepest_descent = SteepestDescent())
)

----     -------------        -----------          -------             
Iter     f(u) inf-norm        Step 2-norm          cond(J)             
----     -------------        -----------          -------             
0        3.54967143e-01       6.54784263e-310      Inf                 
1        3.36075646e-01       3.98826683e-02       2.73588247e+00      
2        2.97527757e-01       7.97653366e-02       2.84184080e+00      
3        2.17869604e-01       1.59530673e-01       3.08804897e+00      
4        5.25891857e-02       3.19061346e-01       3.75535293e+00      
5        6.77935485e-03       2.22938570e-01       6.56376700e+00      
6        1.72409426e-03       1.54193501e-01       1.24526403e+01      
7        2.80691101e-04       6.27258034e-02       2.18669326e+01      
8        1.55370564e-05       1.50461351e-02       3.17764284e+01      
9        5.87411523e-08       9.28602780e-04       3.55687426e+01      
10       8.49598170e-13       3.53247281e-06       3.58317226e+01      
11       1.11022302e-16       5.10928003e-11       3.58327304e+01      
Final    1.11022302e-16      
----------------------      
retcode: Success
u: 2-element Vector{Float64}:
 -0.7263649359693556
  0.468681150405121
using FixedPointAcceleration
G(δ,p) = log.(s) - log.(share([0...], Σ, dFν, x, ∫sgq))
probfp = NonlinearProblem(G, d0, nothing)
solfp = solve(probfp, FixedPointAccelerationJL(algorithm=:Anderson, m=1) )
retcode: Failure
u: 2-element Vector{Float64}:
 0.0
 0.0

References

Kelley, C. T. 2022. Solving Nonlinear Equations with Iterative Methods: Solvers and Examples in Julia. Philadelphia, PA: Society for Industrial; Applied Mathematics. https://doi.org/10.1137/1.9781611977271.