Levenberg–Marquardt Algorithm

Updated 23 January 2026

The Levenberg–Marquardt algorithm is an iterative optimization technique that solves nonlinear least squares problems by blending Gauss–Newton and gradient descent strategies.
It employs a damping parameter updated via trust-region or gain-ratio methods to ensure robust convergence even on noisy or ill-posed problems.
Advanced variants extend LM to stochastic, derivative-free, and large-scale scenarios, with applications in robotics, neural networks, and inverse problems.

The Levenberg–Marquardt (LM) algorithm is a foundational iterative method for solving nonlinear least squares problems, widely employed in fields such as numerical optimization, inverse problems, machine learning, parameter estimation, and scientific computing. By interpolating between the Gauss–Newton method and gradient descent through the addition of a damping parameter, LM achieves robust convergence properties on a broad variety of problem instances. Continuous theoretical and algorithmic advancements have extended LM’s applicability to stochastic, derivative-free, and large-scale settings while maintaining strong complexity and convergence guarantees.

1. Core Algorithmic Structure

Given a nonlinear least squares objective of the form

$S(x) = \tfrac12 \|r(x)\|^2 = \tfrac12 \sum_{i=1}^m [r_i(x)]^2,$

where $r: \mathbb{R}^n \to \mathbb{R}^m$ is a residual vector and $J(x) = \partial r/\partial x$ is its Jacobian, the classical LM iteration at $x_k$ computes the step $\Delta x_k$ by

$(J_k^\top J_k + \lambda_k I)\, \Delta x_k = -J_k^\top r(x_k)$

with update

$x_{k+1} = x_k + \Delta x_k,$

where $\lambda_k > 0$ is the damping parameter. For large $\lambda_k$ , the step approximates scaled steepest descent; for small $\lambda_k$ , it approaches the Gauss–Newton step.

The damping parameter is updated by a trust-region or gain-ratio mechanism. A typical strategy is:

If $r: \mathbb{R}^n \to \mathbb{R}^m$ 0, accept the step and decrease $r: \mathbb{R}^n \to \mathbb{R}^m$ 1: $r: \mathbb{R}^n \to \mathbb{R}^m$ 2, $r: \mathbb{R}^n \to \mathbb{R}^m$ 3 (often $r: \mathbb{R}^n \to \mathbb{R}^m$ 4).
Otherwise, increase $r: \mathbb{R}^n \to \mathbb{R}^m$ 5: $r: \mathbb{R}^n \to \mathbb{R}^m$ 6 (Li et al., 2022).

LM accommodates further generalization through scaling matrices, adaptive damping, and integration with probabilistic or derivative-free models (Boos et al., 2023, Chen et al., 2024, Feng et al., 9 Jul 2025).

2. Theoretical Convergence and Complexity

LM is globally convergent under standard regularity conditions (Lipschitz continuity of the gradient, positive definiteness, bounded Jacobian). Specifically, given the iteration

$r: \mathbb{R}^n \to \mathbb{R}^m$ 7

with $r: \mathbb{R}^n \to \mathbb{R}^m$ 8 and judiciously updated $r: \mathbb{R}^n \to \mathbb{R}^m$ 9, the method ensures

limsup $J(x) = \partial r/\partial x$ 0
iteration complexity $J(x) = \partial r/\partial x$ 1 to achieve $J(x) = \partial r/\partial x$ 2 (Bergou et al., 2020)

Local convergence is quadratic when the residual approaches zero and linear otherwise:

For $J(x) = \partial r/\partial x$ 3 at the solution, LM achieves quadratic convergence;
For small but nonzero residual, local convergence is linear, with the rate determined by the residual's magnitude.

These properties are preserved under several variations, including singular scaling and stochastic/deterministic hybrid models (Boos et al., 2023, Bergou et al., 2018).

3. Algorithmic Enhancements and Generalizations

Multiple enhancements have been developed to mitigate stagnation, improve robustness, and extend applicability:

Randomized and Step-Size Variants: Random rescaling of the step, as in step-size LM (SLM), combines a decaying damping factor with a random step multiplier to escape local minima and accelerate global convergence (Li et al., 2022).
Derivative-Free LM (DFLM): When analytic derivatives are unavailable, surrogate Jacobians are constructed using compressed sensing (sparse interpolation and ℓ₁ minimization) (Feng et al., 9 Jul 2025) or orthogonal spherical smoothing (random projections on the Stiefel manifold) (Chen et al., 2024). These approaches yield first-order accurate models with high probability and retain $J(x) = \partial r/\partial x$ 4 global complexity.
Stochastic LM: For problems with stochastic objectives or noisy evaluations, LM integrates random model and function estimates with probabilistic accuracy conditions, updating damping based on model variance and function value confidence (Bergou et al., 2018). Global convergence is guaranteed in expectation under weak probabilistic accuracy assumptions.
Singular Scaling: The use of singular scaling matrices $J(x) = \partial r/\partial x$ 5 (e.g., Tikhonov-type regularization with singular $J(x) = \partial r/\partial x$ 6) improves solution quality in ill-posed inverse problems, ensuring quadratic convergence under a completeness condition and error bound (Boos et al., 2023).
Exponential Family and Maximum Likelihood: LM-type updates extend to maximum likelihood in the exponential family, replacing the Hessian with its diagonal or negative identity for regularization, and using an adaptive gain-ratio mechanism for the damping parameter (Giordan et al., 2014).
Enhanced Step Strategies: Improvements such as geodesic acceleration (second-order corrections), controlled uphill step acceptance ("bold" moves), and Broyden rank-one Jacobian updates can substantially enhance robustness and efficiency in highly curved or stiff problems (Transtrum et al., 2012, Pooladzandi et al., 2022).

4. Practical Implementations and Comparative Analysis

LM is implemented in various software environments and optimized for computational efficiency:

marqLevAlg R Package: Incorporates a Newton-type step with diagonal inflation, automatic parallelization of gradient and Hessian estimation, and robust convergence checks including a relative distance to minimum (RDM) criterion to avoid spurious optima or saddle points. Empirical results demonstrate superior performance over BFGS, L-BFGS-B, and EM on complex maximum likelihood estimation tasks (Philipps et al., 2020).
Neural Networks: LM and its enhancements (adaptive momentum, learning rate line search, bold uphill acceptance, maximum-diagonal damping) enable fast convergence—4–5 epochs to 97.5% accuracy on MNIST, outperforming SGD, Adam, KFAC, and L-BFGS in total optimization time and final accuracy (Pooladzandi et al., 2022).

Comparison with first-order and quasi-Newton solvers reveals:

LM requires more per-iteration computation (solving linear systems of dimension $J(x) = \partial r/\partial x$ 7), but achieves far fewer iterations to convergence.
Enhanced LM variants robustly escape shallow minima and winding valleys, but may risk loss of global convergence in highly flat manifolds without careful tuning (Transtrum et al., 2012, Pooladzandi et al., 2022).

5. Applications and Extensions

Applications span a wide spectrum:

Robotics: High-precision calibration of industrial robots using step-size LM (SLM) combined with pre-processing by unscented Kalman filtering yields substantial accuracy gains and reduced computation time (e.g., 4–7% improvement over SLM alone, and 60% reduction in time vs. standard LM on ABB IRB120) (Li et al., 2022).
Large-Scale Sparse Problems: Exploiting Jacobian sparsity with compressed sensing-based DFLM enables efficient parameter estimation in geodetic adjustment, sensor localization, and large simulation models. O( $J(x) = \partial r/\partial x$ 8) interpolation points suffice for $J(x) = \partial r/\partial x$ 9-sparse Jacobians in high dimensions (Feng et al., 9 Jul 2025).
Inverse Problems and Parameter Identification: LM with singular scaling is effective in PDE-constrained inverse problems such as thermal conductivity and perfusion parameter recovery, reducing relative errors by factors of 3–5 over conventional LM (Boos et al., 2023).
Maximum Likelihood for Exponential Families: LM-inspired regularized Newton methods dramatically improve convergence rates and robustness, particularly in compositional data analysis (Dirichlet, Aitchison models), where classical Newton–Raphson often fails (Giordan et al., 2014).
Stochastic Data Assimilation and Machine Learning: In stochastic settings (e.g., 4DVAR data assimilation with ensemble Kalman filtering, mini-batch neural network training), LM achieves global convergence under scalable complexity using subsampled models and function estimates (Bergou et al., 2018).

6. Limitations and Trade-Offs

Despite its broad applicability, LM carries several limitations:

Computation and Memory Cost: Each LM step requires explicit or approximate solution of a dense or structured linear system of size $x_k$ 0.
Parameter Tuning Sensitivity: Convergence rate and global robustness depend on appropriate selection of the damping parameter, scaling matrices, and convergence criteria. Aggressive enhancements (step-size randomization, bold acceptance) can lead to faster convergence but may increase the risk of divergence or failure in flat/unstructured problems (Transtrum et al., 2012, Pooladzandi et al., 2022).
Derivative-Free Overhead: DFLM algorithms involve additional overhead in surrogate Jacobian construction (compressed sensing or random smoothing), entailing extra function evaluations and potential randomization-induced variance (Feng et al., 9 Jul 2025, Chen et al., 2024).
Scalability to Extreme Dimensions: While sparse and block-structured variants improve scaling, classical dense LM is impractical for extreme parameter counts without further structural exploitation or Hessian-free methods.

7. Outlook and Future Directions

Ongoing research extends LM through several directions:

Enhanced globalization (stochastic/deterministic hybrids, more aggressive bold moves) to ensure convergence on highly nonconvex landscapes (Bergou et al., 2018, Li et al., 2022).
Scalable DFLM frameworks using advanced random projections, compressed sensing, and structured sparsity for ultra-high-dimensional regimes (Feng et al., 9 Jul 2025, Chen et al., 2024).
Integration of LM with advanced statistical modeling (e.g., maximum likelihood in nonstandard families, generalized linear and latent-variable models), leveraging adaptive regularization and convergence diagnostics (Giordan et al., 2014, Philipps et al., 2020).
Advanced step-control and adaptivity (geodesic acceleration, singular scaling, adaptive momentum) to strike an optimal efficiency/robustness balance across a variety of real-world tasks (Transtrum et al., 2012, Pooladzandi et al., 2022).

The Levenberg–Marquardt algorithm thus remains a central paradigm in modern nonlinear optimization, continuously evolving to meet the demands of large-scale, noisy, and high-dimensional estimation problems.