Papers
Topics
Authors
Recent
Search
2000 character limit reached

Levenberg–Marquardt Algorithm for Nonlinear Optimization

Updated 28 April 2026
  • Levenberg–Marquardt is a nonlinear least-squares solver that blends Gauss–Newton and gradient descent through adaptive damping.
  • It employs an iterative update based on Jacobian approximations and trust-region strategies to ensure global convergence and efficiency.
  • Recent extensions address constraints, derivative-free contexts, and large-scale applications in neural networks, inverse problems, and tensor decompositions.

The Levenberg–Marquardt (LM) algorithm is a foundational iterative method for solving nonlinear least-squares problems, positioned at the intersection of gradient descent and Gauss–Newton strategies. It features a robust trust-region mechanism through adaptive damping, facilitating reliable parameter estimation across a diverse range of applications in scientific computing, signal processing, neural network training, inverse problems, and tensor decomposition. Recent developments have expanded the LM framework with modifications for constraints, inexactness, parallelism, robustness to ill-posedness, derivative-free contexts, and integration with modern machine learning architectures.

1. Core Principles and Algorithmic Structure

The LM method seeks to minimize a nonlinear least-squares objective of the form

minxRn12F(x)2,\min_{x \in \R^n} \frac12 \|F(x)\|^2,

where F:RnRmF : \R^n \to \R^m is at least continuously differentiable. The method combines the Gauss–Newton approximation of the Hessian (JTJJ^T J, where JJ is the Jacobian of FF) with a damping (regularization) term to interpolate between Gauss–Newton and gradient descent. The classic update is

xk+1=xk(JkTJk+λkI)1JkTF(xk),x_{k+1} = x_k - (J_k^T J_k + \lambda_k I)^{-1} J_k^T F(x_k),

where λk>0\lambda_k > 0 is an adaptively tuned parameter. Decreasing λk\lambda_k shifts the method toward Gauss–Newton; increasing λk\lambda_k produces a more conservative, gradient-like step (Karim et al., 2024, Pooladzandi et al., 2022, Bergou et al., 2020).

Acceptance of the trial step is governed by the ratio of actual to predicted decrease in the cost function; the damping parameter is updated accordingly ( e.g., decreased on successful steps and increased when the step is rejected) (Kandel et al., 2021, Karim et al., 2024, Pooladzandi et al., 2022).

2. Theoretical Properties: Convergence and Complexity

Under standard smoothness and error-bound conditions, the LM method enjoys robust theoretical guarantees:

3. Extensions: Constraints, Inexactness, and Parallelism

Constrained and Nonsmooth LM

The LM framework has been extended to handle convex constraints and nonsmooth regularizers F:RnRmF : \R^n \to \R^m1 by iterative minimization of regularized convex models (e.g., via accelerated/projected gradient or proximal gradient methods), with adaptive damping schemes based on majorization and descent verification (Marumo et al., 2020, Aravkin et al., 2023, Marumo et al., 2022). In these settings, global sublinear convergence and local quadratic rates are preserved under mild regularity and error-bound hypotheses.

Inexact LM and Large-Scale Structure

Inexact LM methods allow the LM system F:RnRmF : \R^n \to \R^m2 to be solved only approximately, under controlled residual norms (e.g., F:RnRmF : \R^n \to \R^m3 for fixed F:RnRmF : \R^n \to \R^m4). This inexactness can be exploited for scalability using iterative solvers such as LSQR (particularly for large, sparse, or nearly-separable Jacobians) (Symoens et al., 22 Jul 2025, Fodor et al., 2023). Parallelization is achieved by decomposing the problem into loosely-coupled blocks with efficient inter-block communication, yielding scalable implementations suitable for million-variable regimes (Fodor et al., 2023).

Modified and Accelerated Variants

Enhancements to classical LM include:

  • Two-stage and modified updates (e.g., reusing Jacobians within iterations, combining primary and correction steps), offering significant speedups in tensor decompositions and data fitting (Karim et al., 2024).
  • Incorporation of inertial (heavy-ball/Polyak) extrapolation for improved iteration speed in ill-posed operator equations, with maintained stability and convergence properties (Leitão et al., 2024).
  • Use of singular or problem-structured scaling matrices to regularize ill-posed inverse problems (e.g., enforcing smoothness in parameter reconstructions), subject to algebraic completeness conditions (Boos et al., 2023).

4. Derivative-Free and Randomized Algorithms

Recent work exploits derivative-free approaches when Jacobian access is infeasible:

  • Sparse problems: Jacobian models are constructed via random directional sampling and F:RnRmF : \R^n \to \R^m5-minimization under sparsity assumptions, yielding probabilistically first-order accurate surrogates. The derivative-free LM algorithm with such models obtains global almost-sure convergence (Feng et al., 9 Jul 2025).
  • Orthogonal spherical smoothing: Randomized orthonormal smoothing and finite differences approximate Jacobians, and high-probability F:RnRmF : \R^n \to \R^m6 global complexity bounds are established (Chen et al., 2024).
  • Randomized truncated SVD: For extremely high-dimensional or ill-posed systems, low-rank (randomized) approximations of the sensitivity or Jacobian matrices allow efficient computation of regularized LM steps, enabling parallel forward/adjoint solves and dramatic computation time reductions (Bjarkason et al., 2017).

5. Applications in Modern Learning and Inverse Problems

LM is deeply embedded in current research on parameter estimation, system identification, and scientific machine learning:

  • Neural networks: LM is competitive for regression and classification, as well as for anomaly detection—particularly in moderate-size networks. Enhancements including adaptive momentum, maximum-diagonal damping, line search, and uphill acceptance provide robustness and fast convergence (Pooladzandi et al., 2022, Wang et al., 2021).
  • Physics-informed neural networks (PINNs): Reformulating forward/inverse PDE solution as nonlinear least-squares enables direct application of LM. Explicit Jacobian computation results in rapid, high-precision convergence for shallow PINN architectures, significantly outperforming BFGS in both accuracy and speed (Shahab et al., 9 Feb 2026).
  • Tensor decomposition: For tensor CP decomposition in data compression, tailored LM variants (with reused Jacobian structure and multi-stage updates) yield substantial speedups while preserving accuracy (Karim et al., 2024).
  • Dynamical system identification: LM combined with parallel block-structured solvers enables scalable identification and model sparsification in ODE systems, leveraging problem-specific Markovian or nearly-separable structures (Haring et al., 2022, Fodor et al., 2023).
  • Inverse problems and ill-posed estimation: Adaptive regularization, surrogate functionals, and semi-convergence strategies within LM yield quadratic rates even for nonlinear, nonconvex inverse Robin and parameter identification problems (Daijun et al., 2015, Boos et al., 2023, Leitão et al., 2024).

6. Practical Considerations, Robustness, and Implementation

The per-iteration cost of LM is typically dominated by evaluation (and, if needed, storage) of the Jacobian and solution of the regularized normal equations. Strategies for managing computational load include:

Overall performance of LM and its variants is empirically confirmed across test suites in optimization, inverse problems, and machine learning, with consistent evidence of rapid local convergence and high practical efficiency compared to first-order or quasi-Newton competitors (Bergou et al., 2020, Marumo et al., 2022, Pooladzandi et al., 2022, Karim et al., 2024).

7. Impact and Continuing Research Directions

The LM method remains a focal point for numerical optimization research, with ongoing innovations in:

Theoretical advances continue to extend global complexity, local convergence, and robustness guarantees to broader classes of nonsmooth, constrained, or composite problems. Such developments position the LM paradigm at the core of robust and scalable solvers for modern nonlinear and data-driven scientific inference.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (19)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Levenberg–Marquardt (LM) algorithm.