Solving Nonlinear Equations

Paul Schrimpf

2026-03-10

\[ \def\Er{{\mathrm{E}}} \def\En{{\mathbb{E}_n}} \def\cov{{\mathrm{Cov}}} \def\var{{\mathrm{Var}}} \def\R{{\mathbb{R}}} \def\arg{{\mathrm{arg}}} \newcommand\norm[1]{\left\lVert#1\right\rVert} \def\rank{{\mathrm{rank}}} \newcommand{\inpr}{ \overset{p^*_{\scriptscriptstyle n}}{\longrightarrow}} \def\inprob{{\,{\buildrel p \over \rightarrow}\,}} \def\indist{\,{\buildrel d \over \rightarrow}\,} \DeclareMathOperator*{\plim}{plim} \DeclareMathOperator*{\argmax}{argmax} \DeclareMathOperator*{\argmin}{argmin} \]

Nonlinear Equations

Nonlinear Equations

  • \(F: \R^n \to \R^n\)
  • Want to solve for \(x\) \[ F(x) = 0 \]

Example: BLP

  • Share equation \[ s_{j} = \int \frac{e^{\delta_{j} + x_{j}'\Sigma \nu}} {\sum_{k = 0}^J e^{\delta_{k} + x_{k}'\Sigma \nu}} dF\nu \]
  • \(J\) equations to solve for \(\delta = (\delta_{1t}, ..., \delta_{Jt})\)

Newton’s Method

  • \(F(x)\) differentiable with Jacobian \(F'(x)\)
  • Algorithm:
    1. Initial guess \(x_0\)
    2. Update based on first order expansion \[ \begin{align*} F(x_{s+1}) \approx F(x_s) + F'(x_s)(x_{s+1} - x_s) = & 0 \\ x_{s+1} = & x_s + F'(x_s)^{-1} F(x_s) \end{align*} \]
    3. Repeat until \(\Vert F(x_s) \Vert \approx 0\)

Simple Idea, Many Variations

  • Step size
    1. Initial guess \(x_0\)
    2. Update based on first order expansion \[ \begin{align*} F(x_{s+1}) \approx F(x_s) + F'(x_s)(x_{s+1} - x_s) = & 0 \\ x_{s+1} = x_s + {\color{red}{\lambda}} F'(x_s)^{-1} F(x_s) \end{align*} \]
    3. Repeat until \(\Vert F(x_s) \Vert \approx 0\)
  • line search or trust region

Simple Idea, Many Variations

    1. Initial guess \(x_0\)
    2. Update based on first order expansion \[ \begin{align*} F(x_{s+1}) \approx F(x_s) + F'(x_s)(x_{s+1} - x_s) = & 0 \end{align*} \] approximately solve \[ F'(x_s) A = F(x_s) \] update \[ x_{s+1} = x_s + \lambda {\color{red}{A}} \]
    3. Repeat until \(\Vert F(x_s) \Vert \approx 0\)
  • Especially if \(F'(x_s)\) is large and/or sparse

Simple Idea, Many Variations

    1. Initial guess \(x_0\)
    2. Update based on first order expansion \[ \begin{align*} F(x_{s+1}) \approx F(x_s) + F'(x_s)(x_{s+1} - x_s) = & 0 \\ x_{s+1} = x_s + F'(x_s)^{-1} F(x_s) \end{align*} \]
    3. Repeat until \({\color{red}{\Vert F(x_{s+1}) \Vert < rtol \Vert F(x_0) \Vert + atol }}\)

Simple Idea, Many Variations

    1. Initial guess \(x_0\)
    2. Update based on first order expansion
      • compute \(F'(x_s)\) using:
        • hand written code or
        • finite differences or
        • secant method \(F'(x_s) \approx \frac{F(x_s) - F(x_{s-1})}{\Vert x_s - x_{s-1} \Vert}\) or
        • automatic differentiation \[ \begin{align*} F(x_{s+1}) \approx F(x_s) + F'(x_s)(x_{s+1} - x_s) = & 0 \\ x_{s+1} = x_s + F'(x_s)^{-1} F(x_s) \end{align*} \]
    3. Repeat until \(\Vert F(x_{s+1}) \Vert \approx 0\)

Simple Idea, Many Variations

  • Kelley (2022) is thorough reference for nonlinear equation solving methods and their properties
  • NonlinearSolve.jl gives unified interface for many methods

Examples

BLP share equation

@doc raw"""
    share(δ, Σ, dFν, x)

Computes shares in random coefficient logit with mean tastes `δ`, observed characteristics `x`, unobserved taste distribution `dFν`, and taste covariances `Σ`.

# Arguments

- `δ` vector of length `J`
- `Σ` `K` by `K` matrix
- `dFν` distribution of length `K` vector
- `x` `J` by `K` array

# Returns

- vector of length `J` consisting of $s_1$, ..., $s_J$
"""
function share(δ, Σ, dFν, x, ∫ = ∫cuba)
    J,K = size(x)
    (length(δ) == J) || error("length(δ)=$(length(δ)) != size(x,1)=$J")
    ((K,K) === size(Σ)) || error("size(x,2)=$K != size(Σ)=$(size(Σ))")
    function shareν(ν)
        s = δ + x*Σ*ν
        s .-= maximum(s)
        s .= exp.(s)
        s ./= sum(s)
        return(s)
    end
    return((shareν, dFν))
end

using HCubature
function ∫cuba(f, dx; rtol=1e-4)
    D = length(dx)
    x(t) = t./(1 .- t.^2)
    Dx(t) = prod((1 .+ t.^2)./(1 .- t.^2).^2)
    hcubature(t->f(x(t))*pdf(dx,x(t))*Dx(t), -ones(D),ones(D), rtol=rtol)[1]
end

using SparseGrids, FastGaussQuadrature, Distributions
function ∫sgq(f, dx::MvNormal; order=5)
    X, W = sparsegrid(length(dx), order, gausshermite, sym=true)
    L = cholesky(dx.Σ).L
    d = length(dx)
    rootdetΣ = 1 #abs(det(sqrt(2)*L))
    wnorm = (π)^(d/2) * rootdetΣ
    sum(f(√2*L*x + dx.μ)*w for (x,w)  zip(X,W))/wnorm
end

∫mc(f, dx; ndraw=100) = mean(f(rand(dx)) for i in 1:ndraw)
∫mc (generic function with 1 method)

Solving for \(\delta\)

# create a problem to solve
using LinearAlgebra
J = 3
K = 2
x = randn(J,K)
C = randn(K,K)
Σ = C'*C + I
δ = randn(J)
dFν = MvNormal(zeros(K), I)
s = share(δ, Σ, dFν, x, ∫sgq)

# try to recover δ
using NonlinearSolve, LinearAlgebra
p ==Σ, x=x, ∫=∫sgq, dFν=dFν, s=s)
function F(d, p)
    share([0, d...],p.Σ,p.dFν, p.x,p.∫) - p.s
end
d0 = zeros(length(δ)-1)
prob = NonlinearProblem(F, d0, p)
sol = solve(prob, show_trace=Val(true), trace_level=TraceAll())

δ = δ[2:end].-δ[1]
println("True δ: $(δ)")
println("Solved δ: $(sol.u)")
println("||F(sol.u)||: $(norm(sol.resid))")
println("Error: $(norm(sol.u - δ))")
Precompiling packages...
    356.8 msRecursiveArrayTools → RecursiveArrayToolsForwardDiffExt
    590.7 msPreallocationTools → PreallocationToolsForwardDiffExt
   3361.4 msKrylov
   5829.0 msSciMLBase
    606.5 msSciMLBase → SciMLBaseDifferentiationInterfaceExt
    677.5 msSciMLBase → SciMLBaseChainRulesCoreExt
   1485.3 msSciMLBase → SciMLBaseForwardDiffExt
   1985.5 msSciMLJacobianOperators
   2500.7 msLineSearch
   4893.8 msLinearSolve
   4089.9 msNonlinearSolveBase
    764.8 msNonlinearSolveBase → NonlinearSolveBaseLineSearchExt
    779.3 msNonlinearSolveBase → NonlinearSolveBaseChainRulesCoreExt
    894.0 msNonlinearSolveBase → NonlinearSolveBaseSparseArraysExt
   2079.1 msLinearSolve → LinearSolveForwardDiffExt
   1325.0 msNonlinearSolveBase → NonlinearSolveBaseForwardDiffExt
   2573.9 msLinearSolve → LinearSolveSparseArraysExt
   1672.3 msNonlinearSolveBase → NonlinearSolveBaseLinearSolveExt
   2407.5 msBracketingNonlinearSolve
   1289.3 msLinearSolve → LinearSolveEnzymeExt
    728.5 msBracketingNonlinearSolve → BracketingNonlinearSolveForwardDiffExt
    707.3 msBracketingNonlinearSolve → BracketingNonlinearSolveChainRulesCoreExt
   4046.4 msNonlinearSolveSpectralMethods
    725.6 msNonlinearSolveSpectralMethods → NonlinearSolveSpectralMethodsForwardDiffExt
   4511.2 msSimpleNonlinearSolve
    710.0 msSimpleNonlinearSolve → SimpleNonlinearSolveChainRulesCoreExt
   7217.9 msNonlinearSolveQuasiNewton
   1210.2 msNonlinearSolveQuasiNewton → NonlinearSolveQuasiNewtonForwardDiffExt
  11884.1 msNonlinearSolveFirstOrder
   5671.8 msNonlinearSolve
  30 dependencies successfully precompiled in 32 seconds. 123 already precompiled.
Precompiling packages...
    688.0 msSciMLBase → SciMLBaseDistributionsExt
  1 dependency successfully precompiled in 1 seconds. 95 already precompiled.

Algorithm: NewtonRaphson(
    descent = NewtonDescent(),
    autodiff = AutoForwardDiff(),
    vjp_autodiff = AutoFiniteDiff(
        fdtype = Val{:forward}(),
        fdjtype = Val{:forward}(),
        fdhtype = Val{:hcentral}(),
        dir = true
    ),
    jvp_autodiff = AutoForwardDiff(),
    concrete_jac = Val{false}()
)

----        -------------           -----------             -------             
Iter        f(u) inf-norm           Step 2-norm             cond(J)             
----        -------------           -----------             -------             
0           1.52359564e-01          0.00000000e+00          2.07468713e+00      
1           1.65661499e-02          4.95751011e-01          -1.00000000e+00     
2           3.89855724e-04          6.41510368e-02          -1.00000000e+00     
3           2.39515391e-07          1.71979494e-03          -1.00000000e+00     
4           9.12880882e-14          9.92093278e-07          -1.00000000e+00     
Final       9.12880882e-14      
----------------------      
True δ: [-0.027039049695910644, -0.5599894534029164]
Solved δ: [-0.02703904969589968, -0.5599894534025214]
||F(sol.u)||: 1.1197942522895471e-13
Error: 3.9516946465613327e-13

Alternative algorithms

solr = solve(prob, RobustMultiNewton(), show_trace=Val(true), trace_level=TraceAll())
Algorithm: TrustRegion(
    trustregion = GenericTrustRegionScheme(
        method = __Simple(),
        step_threshold = 1//10000,
        shrink_threshold = 1//4,
        shrink_factor = 1//4,
        expand_factor = 2//1,
        expand_threshold = 3//4,
        max_trust_radius = 0//1,
        initial_trust_radius = 0//1
    ),
    descent = Dogleg(
        newton_descent = NewtonDescent(),
        steepest_descent = SteepestDescent()
    ),
    max_shrink_times = 32,
    autodiff = AutoForwardDiff(),
    vjp_autodiff = AutoFiniteDiff(
        fdtype = Val{:forward}(),
        fdjtype = Val{:forward}(),
        fdhtype = Val{:hcentral}(),
        dir = true
    ),
    jvp_autodiff = AutoForwardDiff(),
    concrete_jac = Val{false}()
)

----        -------------           -----------             -------             
Iter        f(u) inf-norm           Step 2-norm             cond(J)             
----        -------------           -----------             -------             
0           1.52359564e-01          0.00000000e+00          2.07468713e+00      
1           1.46330690e-01          1.69855287e-02          -1.00000000e+00     
2           1.34406641e-01          3.39710573e-02          -1.00000000e+00     
3           1.11128094e-01          6.79421147e-02          -1.00000000e+00     
4           6.71535256e-02          1.35884229e-01          -1.00000000e+00     
5           4.01948151e-02          1.86840815e-01          -1.00000000e+00     
6           6.94466492e-03          1.86840815e-01          -1.00000000e+00     
7           7.30014718e-05          2.88929665e-02          -1.00000000e+00     
8           8.45273683e-09          3.09915597e-04          -1.00000000e+00     
9           1.52655666e-16          3.59166821e-08          -1.00000000e+00     
Final       1.52655666e-16      
----------------------      
retcode: Success
u: 2-element Vector{Float64}:
 -0.027039049695910745
 -0.5599894534029162
using FixedPointAcceleration
import NaNMath
function G(δ,p)
    d=NaNMath.log.(p.s) - NaNMath.log.(share([0...], p.Σ, p.dFν, p.x, p.∫))
    return(d[2:end])
end
probfp = NonlinearProblem(G, d0, p)
solfp = solve(probfp,
              FixedPointAccelerationJL(algorithm=:Anderson),
              show_trace=Val(true), trace_level=TraceAll() )
Precompiling packages...
   1391.0 msNonlinearSolve → NonlinearSolveFixedPointAccelerationExt
  1 dependency successfully precompiled in 2 seconds. 154 already precompiled.
                                          Algorithm: Anderson. Iteration:     1. Convergence:   0.8976354766. Time: 2026-03-11T12:26:40.237
                           Used:  0 lags. Algorithm: Anderson. Iteration:     2. Convergence:    1.709150011. Time: 2026-03-11T12:26:41.828
Condition number is   1.0. Used:  2 lags. Algorithm: Anderson. Iteration:     3. Convergence:   0.3675223765. Time: 2026-03-11T12:26:41.837
Condition number is  38.0. Used:  3 lags. Algorithm: Anderson. Iteration:     4. Convergence:   0.2798232133. Time: 2026-03-11T12:26:41.837
Condition number is   1.0. Used:  2 lags. Algorithm: Anderson. Iteration:     5. Convergence:  0.01167858532. Time: 2026-03-11T12:26:41.837
Condition number is  32.0. Used:  3 lags. Algorithm: Anderson. Iteration:     6. Convergence:  0.01434927187. Time: 2026-03-11T12:26:41.837
Condition number is  27.0. Used:  3 lags. Algorithm: Anderson. Iteration:     7. Convergence: 0.001734941312. Time: 2026-03-11T12:26:41.837
Condition number is   2.2. Used:  3 lags. Algorithm: Anderson. Iteration:     8. Convergence: 1.010411201e-5. Time: 2026-03-11T12:26:41.837
Condition number is   1.0. Used:  2 lags. Algorithm: Anderson. Iteration:     9. Convergence: 5.621106913e-8. Time: 2026-03-11T12:26:41.837
Condition number is   1.0. Used:  2 lags. Algorithm: Anderson. Iteration:    10. Convergence: 1.142894046e-8. Time: 2026-03-11T12:26:41.837
Condition number is   1.0. Used:  2 lags. Algorithm: Anderson. Iteration:    11. Convergence: 3.238480595e-10. Time: 2026-03-11T12:26:41.837
Condition number is 190.0. Used:  3 lags. Algorithm: Anderson. Iteration:    12. Convergence: 5.551115123e-16. Time: 2026-03-11T12:26:41.837
retcode: Success
u: 2-element Vector{Float64}:
 -0.02703904969591097
 -0.5599894534029165

Choosing Algorithm

  • Best algorithm depends on problem and goal
  • Can be tradeoff between fast performance in typical case versus robustly solve all cases
  • NonlinearSolve.jl’s recommended methods are usually a good choice
  • Fastest algorithm will depend on problem size, implementation of equations, etc
  • Benchmark if speed is important

Benchmarking

using BenchmarkTools
@benchmark solve(probfp)
BenchmarkTools.Trial: 8821 samples with 1 evaluation per sample.
 Range (minmax):  361.404 μs 17.326 ms   GC (min … max):  0.00% … 95.65%
 Time  (median):     480.940 μs                GC (median):     0.00%
 Time  (mean ± σ):   564.624 μs ± 732.676 μs   GC (mean ± σ):  12.76% ±  9.39%

  █ ▃▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂ ▂
  361 μs           Histogram: frequency by time         6.83 ms <

 Memory estimate: 1.21 MiB, allocs estimate: 26289.
@benchmark solve(probfp, FixedPointAccelerationJL())
BenchmarkTools.Trial: 9276 samples with 1 evaluation per sample.
 Range (minmax):  340.043 μs 14.487 ms   GC (min … max):  0.00% … 94.37%
 Time  (median):     463.587 μs                GC (median):     0.00%
 Time  (mean ± σ):   538.219 μs ± 733.351 μs   GC (mean ± σ):  14.45% ±  9.99%

  █ ▃▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▂ ▂
  340 μs           Histogram: frequency by time         6.77 ms <

 Memory estimate: 1.23 MiB, allocs estimate: 23815.
@benchmark solve(probfp, RobustMultiNewton())
BenchmarkTools.Trial: 7943 samples with 1 evaluation per sample.
 Range (minmax):  409.545 μs 18.648 ms   GC (min … max):  0.00% … 95.27%
 Time  (median):     522.528 μs                GC (median):     0.00%
 Time  (mean ± σ):   628.151 μs ± 812.485 μs   GC (mean ± σ):  13.12% ±  9.60%

  █ ▃▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂ ▂
  410 μs           Histogram: frequency by time         7.48 ms <

 Memory estimate: 1.36 MiB, allocs estimate: 29609.

References

Kelley, C. T. 2022. Solving Nonlinear Equations with Iterative Methods: Solvers and Examples in Julia. Philadelphia, PA: Society for Industrial; Applied Mathematics. https://doi.org/10.1137/1.9781611977271.