Levenberg–Marquardt Algorithm for Nonlinear Optimization

Updated 28 April 2026

Levenberg–Marquardt is a nonlinear least-squares solver that blends Gauss–Newton and gradient descent through adaptive damping.
It employs an iterative update based on Jacobian approximations and trust-region strategies to ensure global convergence and efficiency.
Recent extensions address constraints, derivative-free contexts, and large-scale applications in neural networks, inverse problems, and tensor decompositions.

The Levenberg–Marquardt (LM) algorithm is a foundational iterative method for solving nonlinear least-squares problems, positioned at the intersection of gradient descent and Gauss–Newton strategies. It features a robust trust-region mechanism through adaptive damping, facilitating reliable parameter estimation across a diverse range of applications in scientific computing, signal processing, neural network training, inverse problems, and tensor decomposition. Recent developments have expanded the LM framework with modifications for constraints, inexactness, parallelism, robustness to ill-posedness, derivative-free contexts, and integration with modern machine learning architectures.

1. Core Principles and Algorithmic Structure

The LM method seeks to minimize a nonlinear least-squares objective of the form

$\min_{x \in \R^n} \frac12 \|F(x)\|^2,$

where $F : \R^n \to \R^m$ is at least continuously differentiable. The method combines the Gauss–Newton approximation of the Hessian ( $J^T J$ , where $J$ is the Jacobian of $F$ ) with a damping (regularization) term to interpolate between Gauss–Newton and gradient descent. The classic update is

$x_{k+1} = x_k - (J_k^T J_k + \lambda_k I)^{-1} J_k^T F(x_k),$

where $\lambda_k > 0$ is an adaptively tuned parameter. Decreasing $\lambda_k$ shifts the method toward Gauss–Newton; increasing $\lambda_k$ produces a more conservative, gradient-like step (Karim et al., 2024, Pooladzandi et al., 2022, Bergou et al., 2020).

Acceptance of the trial step is governed by the ratio of actual to predicted decrease in the cost function; the damping parameter is updated accordingly ( e.g., decreased on successful steps and increased when the step is rejected) (Kandel et al., 2021, Karim et al., 2024, Pooladzandi et al., 2022).

2. Theoretical Properties: Convergence and Complexity

Under standard smoothness and error-bound conditions, the LM method enjoys robust theoretical guarantees:

Global convergence is ensured when the model decrease is sufficient, the Jacobian is Lipschitz and bounded, and an adaptive trust-region or regularization strategy is employed (Bergou et al., 2020, Karim et al., 2024).
Worst-case iteration complexity is $O(\epsilon^{-2})$ to achieve an $F : \R^n \to \R^m$ 0-stationary point, matching first-order methods, even in the presence of constraints, nonsmooth regularization, or inexact steps (Bergou et al., 2020, Marumo et al., 2020, Aravkin et al., 2023, Symoens et al., 22 Jul 2025).
Local convergence rate is quadratic at zero-residual solutions under suitable assumptions (Lipschitz Jacobian, local error bounds), and linear when the minimum attainable residual is small but nonzero (Bergou et al., 2020, Karim et al., 2024, Daijun et al., 2015, Boos et al., 2023).
Superlinear and quadratic convergence under generalized growth conditions (e.g., Hölderian metric subregularity) can be proved for appropriately damped and inexact variants (Symoens et al., 22 Jul 2025, Marumo et al., 2022).

3. Extensions: Constraints, Inexactness, and Parallelism

Constrained and Nonsmooth LM

The LM framework has been extended to handle convex constraints and nonsmooth regularizers $F : \R^n \to \R^m$ 1 by iterative minimization of regularized convex models (e.g., via accelerated/projected gradient or proximal gradient methods), with adaptive damping schemes based on majorization and descent verification (Marumo et al., 2020, Aravkin et al., 2023, Marumo et al., 2022). In these settings, global sublinear convergence and local quadratic rates are preserved under mild regularity and error-bound hypotheses.

Inexact LM and Large-Scale Structure

Inexact LM methods allow the LM system $F : \R^n \to \R^m$ 2 to be solved only approximately, under controlled residual norms (e.g., $F : \R^n \to \R^m$ 3 for fixed $F : \R^n \to \R^m$ 4). This inexactness can be exploited for scalability using iterative solvers such as LSQR (particularly for large, sparse, or nearly-separable Jacobians) (Symoens et al., 22 Jul 2025, Fodor et al., 2023). Parallelization is achieved by decomposing the problem into loosely-coupled blocks with efficient inter-block communication, yielding scalable implementations suitable for million-variable regimes (Fodor et al., 2023).

Modified and Accelerated Variants

Enhancements to classical LM include:

Two-stage and modified updates (e.g., reusing Jacobians within iterations, combining primary and correction steps), offering significant speedups in tensor decompositions and data fitting (Karim et al., 2024).
Incorporation of inertial (heavy-ball/Polyak) extrapolation for improved iteration speed in ill-posed operator equations, with maintained stability and convergence properties (Leitão et al., 2024).
Use of singular or problem-structured scaling matrices to regularize ill-posed inverse problems (e.g., enforcing smoothness in parameter reconstructions), subject to algebraic completeness conditions (Boos et al., 2023).

4. Derivative-Free and Randomized Algorithms

Recent work exploits derivative-free approaches when Jacobian access is infeasible:

Sparse problems: Jacobian models are constructed via random directional sampling and $F : \R^n \to \R^m$ 5-minimization under sparsity assumptions, yielding probabilistically first-order accurate surrogates. The derivative-free LM algorithm with such models obtains global almost-sure convergence (Feng et al., 9 Jul 2025).
Orthogonal spherical smoothing: Randomized orthonormal smoothing and finite differences approximate Jacobians, and high-probability $F : \R^n \to \R^m$ 6 global complexity bounds are established (Chen et al., 2024).
Randomized truncated SVD: For extremely high-dimensional or ill-posed systems, low-rank (randomized) approximations of the sensitivity or Jacobian matrices allow efficient computation of regularized LM steps, enabling parallel forward/adjoint solves and dramatic computation time reductions (Bjarkason et al., 2017).

5. Applications in Modern Learning and Inverse Problems

LM is deeply embedded in current research on parameter estimation, system identification, and scientific machine learning:

Neural networks: LM is competitive for regression and classification, as well as for anomaly detection—particularly in moderate-size networks. Enhancements including adaptive momentum, maximum-diagonal damping, line search, and uphill acceptance provide robustness and fast convergence (Pooladzandi et al., 2022, Wang et al., 2021).
Physics-informed neural networks (PINNs): Reformulating forward/inverse PDE solution as nonlinear least-squares enables direct application of LM. Explicit Jacobian computation results in rapid, high-precision convergence for shallow PINN architectures, significantly outperforming BFGS in both accuracy and speed (Shahab et al., 9 Feb 2026).
Tensor decomposition: For tensor CP decomposition in data compression, tailored LM variants (with reused Jacobian structure and multi-stage updates) yield substantial speedups while preserving accuracy (Karim et al., 2024).
Dynamical system identification: LM combined with parallel block-structured solvers enables scalable identification and model sparsification in ODE systems, leveraging problem-specific Markovian or nearly-separable structures (Haring et al., 2022, Fodor et al., 2023).
Inverse problems and ill-posed estimation: Adaptive regularization, surrogate functionals, and semi-convergence strategies within LM yield quadratic rates even for nonlinear, nonconvex inverse Robin and parameter identification problems (Daijun et al., 2015, Boos et al., 2023, Leitão et al., 2024).

6. Practical Considerations, Robustness, and Implementation

The per-iteration cost of LM is typically dominated by evaluation (and, if needed, storage) of the Jacobian and solution of the regularized normal equations. Strategies for managing computational load include:

Matrix-free and automatic differentiation for Jacobian-vector/Jacobian-transpose-vector products (Kandel et al., 2021).
Block/parallel algorithms for large, structured problems (Fodor et al., 2023, Haring et al., 2022).
Use of iterative linear solvers or low-rank approximations when the normal equations become prohibitive (Bjarkason et al., 2017, Symoens et al., 22 Jul 2025).
Robust step acceptance criteria (gain-ratio, line search, bold moves), momentum, and inertia to mitigate stagnation and exploit curvature (Transtrum et al., 2012, Pooladzandi et al., 2022, Leitão et al., 2024).
Regularization scaling, majorization tests, and discrepancy principles to control overfitting or handle noise (Boos et al., 2023, Daijun et al., 2015, Marumo et al., 2020).

Overall performance of LM and its variants is empirically confirmed across test suites in optimization, inverse problems, and machine learning, with consistent evidence of rapid local convergence and high practical efficiency compared to first-order or quasi-Newton competitors (Bergou et al., 2020, Marumo et al., 2022, Pooladzandi et al., 2022, Karim et al., 2024).

7. Impact and Continuing Research Directions

The LM method remains a focal point for numerical optimization research, with ongoing innovations in:

High-dimensional and distributed optimization leveraging problem structure for scalability (Fodor et al., 2023, Haring et al., 2022).
Robust regularization for ill-posed and nonconvex problems (Daijun et al., 2015, Boos et al., 2023, Leitão et al., 2024).
Derivative-free, randomized, and compressed-sensing-inspired variants capable of handling limited model access or black-box contexts (Chen et al., 2024, Feng et al., 9 Jul 2025).
Integration with advanced machine learning paradigms, including deep scientific learners, PINNs, and neural ODEs (Shahab et al., 9 Feb 2026, Pooladzandi et al., 2022).
Automated tuning (e.g., adaptive regularization, inexactness control, and inertia parameters) to balance convergence speed, robustness, and computational cost.

Theoretical advances continue to extend global complexity, local convergence, and robustness guarantees to broader classes of nonsmooth, constrained, or composite problems. Such developments position the LM paradigm at the core of robust and scalable solvers for modern nonlinear and data-driven scientific inference.