Derivative-Free LM Optimization

Updated 28 April 2026

Derivative-Free LM is a class of methods that solve nonlinear least-squares problems without requiring analytic derivatives, relying instead on function evaluations.
They construct probabilistic gradient and Jacobian models using techniques like orthogonal spherical smoothing and ℓ₁-minimization to ensure robust convergence.
DFLM algorithms offer provable complexity bounds and almost sure global convergence, with adaptive regularization strategies enhancing performance on large-scale and noisy problems.

A derivative-free Levenberg–Marquardt (DFLM) algorithm is any method for nonlinear least-squares optimization that incorporates a Levenberg–Marquardt-style (LM) trust-region or regularization framework, but does not require analytic derivatives. DFLM methods are designed for problems where the Jacobian of the residuals is unavailable, expensive, or unreliable. They construct gradient or Jacobian models from function evaluations using techniques such as orthogonal spherical smoothing, interpolation, $\ell_1$ -minimization, or stochastic sampling, and are especially relevant in high-dimensional, black-box, or sparse-least-squares contexts. Modern DFLM algorithms can provide provable convergence to stationary points, with global complexity bounds matching those of gradient-based methods, and demonstrate strong empirical performance on large-scale or structured least-squares problems.

1. Problem Formulation and Core Principles

The canonical target of DFLM algorithms is the nonlinear least-squares problem: $\min_{x\in\mathbb R^n} f(x) = \frac{1}{2}\|r(x)\|^2 = \frac{1}{2}\sum_{i=1}^m [r_i(x)]^2,$ where $r: \mathbb R^n \to \mathbb R^m$ are smooth residuals and the analytic Jacobian $J(x) = [\nabla r_1(x), ..., \nabla r_m(x)]$ is unavailable or impractical to compute. DFLM techniques replace $J(x)$ with probabilistically accurate models $\widetilde{J}(x)$ derived from function evaluations, estimating the gradient $\nabla f(x) = J(x)^{\top} r(x)$ by $\widetilde{J}(x)^{\top} r(x)$ .

The DFLM paradigm also extends to:

Equality-constrained nonlinear least-squares: $\min f(x)$ subject to $c(x)=0$ where neither $\min_{x\in\mathbb R^n} f(x) = \frac{1}{2}\|r(x)\|^2 = \frac{1}{2}\sum_{i=1}^m [r_i(x)]^2,$ 0 nor the constraint Jacobian $\min_{x\in\mathbb R^n} f(x) = \frac{1}{2}\|r(x)\|^2 = \frac{1}{2}\sum_{i=1}^m [r_i(x)]^2,$ 1 is available (Chen et al., 8 Jul 2025).
Regularized least-squares: $\min_{x\in\mathbb R^n} f(x) = \frac{1}{2}\|r(x)\|^2 = \frac{1}{2}\sum_{i=1}^m [r_i(x)]^2,$ 2 where $\min_{x\in\mathbb R^n} f(x) = \frac{1}{2}\|r(x)\|^2 = \frac{1}{2}\sum_{i=1}^m [r_i(x)]^2,$ 3 is a convex, possibly nonsmooth regularizer with a cheap proximal operator (Liu et al., 2024).
Sparse Jacobian scenarios: When $\min_{x\in\mathbb R^n} f(x) = \frac{1}{2}\|r(x)\|^2 = \frac{1}{2}\sum_{i=1}^m [r_i(x)]^2,$ 4 is known to be sparse but the sparsity pattern is unknown and function evaluation is expensive (Feng et al., 9 Jul 2025).

2. Construction of Derivative-Free Gradient and Jacobian Models

Orthogonal Spherical Smoothing

DFLM algorithms approximate the Jacobian by averaging directional differences over random or orthogonal frames. For each residual $\min_{x\in\mathbb R^n} f(x) = \frac{1}{2}\|r(x)\|^2 = \frac{1}{2}\sum_{i=1}^m [r_i(x)]^2,$ 5,

$\min_{x\in\mathbb R^n} f(x) = \frac{1}{2}\|r(x)\|^2 = \frac{1}{2}\sum_{i=1}^m [r_i(x)]^2,$ 6

where $\min_{x\in\mathbb R^n} f(x) = \frac{1}{2}\|r(x)\|^2 = \frac{1}{2}\sum_{i=1}^m [r_i(x)]^2,$ 7 are $\min_{x\in\mathbb R^n} f(x) = \frac{1}{2}\|r(x)\|^2 = \frac{1}{2}\sum_{i=1}^m [r_i(x)]^2,$ 8 orthonormal directions sampled uniformly from the sphere $\min_{x\in\mathbb R^n} f(x) = \frac{1}{2}\|r(x)\|^2 = \frac{1}{2}\sum_{i=1}^m [r_i(x)]^2,$ 9 and $r: \mathbb R^n \to \mathbb R^m$ 0 is a smoothing radius (Chen et al., 2024, Chen et al., 8 Jul 2025). This construction satisfies: $r: \mathbb R^n \to \mathbb R^m$ 1 Provided $r: \mathbb R^n \to \mathbb R^m$ 2, these models have controlled variance and bias as a function of $r: \mathbb R^n \to \mathbb R^m$ 3, $r: \mathbb R^n \to \mathbb R^m$ 4, and the Lipschitz constants of $r: \mathbb R^n \to \mathbb R^m$ 5.

Interpolation and $r: \mathbb R^n \to \mathbb R^m$ 6-Minimization for Sparse Jacobians

When Jacobians are sparse, DFLM can use compressed-sensing style interpolation. For each $r: \mathbb R^n \to \mathbb R^m$ 7, given $r: \mathbb R^n \to \mathbb R^m$ 8 random directions $r: \mathbb R^n \to \mathbb R^m$ 9 and $J(x) = [\nabla r_1(x), ..., \nabla r_m(x)]$ 0, construct interpolation points $J(x) = [\nabla r_1(x), ..., \nabla r_m(x)]$ 1 and solve: $J(x) = [\nabla r_1(x), ..., \nabla r_m(x)]$ 2 where $J(x) = [\nabla r_1(x), ..., \nabla r_m(x)]$ 3 comprises the $J(x) = [\nabla r_1(x), ..., \nabla r_m(x)]$ 4 as rows and $J(x) = [\nabla r_1(x), ..., \nabla r_m(x)]$ 5 are finite difference target values (Feng et al., 9 Jul 2025). If $J(x) = [\nabla r_1(x), ..., \nabla r_m(x)]$ 6 satisfies a suitable restricted isometry property, the recovered $J(x) = [\nabla r_1(x), ..., \nabla r_m(x)]$ 7 is close (in probability) to the true sparse gradient.

Quadratic and Fully Linear Models

For regularized or trust-region variants, DFLM forms quadratic models $J(x) = [\nabla r_1(x), ..., \nabla r_m(x)]$ 8 by interpolating $J(x) = [\nabla r_1(x), ..., \nabla r_m(x)]$ 9 at a poised set of points, where $J(x)$ 0 satisfies $J(x)$ 1 for $J(x)$ 2 (Liu et al., 2024). This ensures that the model is "fully linear" on a trust region.

3. Algorithmic Structure and Parameter Updates

A typical DFLM iteration includes the following components (Chen et al., 2024, Feng et al., 9 Jul 2025, Liu et al., 2024):

Step	Operation	Purpose
1	Model construction	Approximate $J(x)$ 3 or $J(x)$ 4
2	Compute predicted step	Solve LM-type system $J(x)$ 5
3	Step acceptance	Compare actual and predicted reduction: $J(x)$ 6
4	Update regularization	If $J(x)$ 7 (accept), decrease $J(x)$ 8 or $J(x)$ 9; otherwise, increase them
5	Adapt smoothing, sample size, or interpolation radius	Reduce $\widetilde{J}(x)$ 0 as steps get smaller; increase number of directions/interpolation points as needed

Parameter recommendations from the literature:

Smoothing radius: $\widetilde{J}(x)$ 1
Sample size: $\widetilde{J}(x)$ 2 for orthogonal sampling
Thresholds: $\widetilde{J}(x)$ 3, $\widetilde{J}(x)$ 4, $\widetilde{J}(x)$ 5, $\widetilde{J}(x)$ 6, with specific $\widetilde{J}(x)$ 7 for model gradient control
Sparse Jacobian models: $\widetilde{J}(x)$ 8 interpolation points if $\widetilde{J}(x)$ 9-sparsity is known (Feng et al., 9 Jul 2025).

4. Probabilistic Accuracy, Complexity, and Convergence Guarantees

DFLM methods offer formal guarantees on the probabilistic accuracy of their stochastic models, and on the global rate of convergence to first-order stationary points.

Probabilistic Model Accuracy

DFLM algorithms attain $\nabla f(x) = J(x)^{\top} r(x)$ 0-probabilistically $\nabla f(x) = J(x)^{\top} r(x)$ 1 first-order accurate gradient models if

$\nabla f(x) = J(x)^{\top} r(x)$ 2

where $\nabla f(x) = J(x)^{\top} r(x)$ 3 scale with $\nabla f(x) = J(x)^{\top} r(x)$ 4, $\nabla f(x) = J(x)^{\top} r(x)$ 5, and the Lipschitz constants. Taking $\nabla f(x) = J(x)^{\top} r(x)$ 6 yields tighter bounds as stationarity is approached (Chen et al., 2024).

Complexity Bounds

Under Lipschitz continuity and boundedness of residuals and Jacobians, DFLM with spherical smoothing achieves the following (Chen et al., 2024):

With probability at least $\nabla f(x) = J(x)^{\top} r(x)$ 7,

$\nabla f(x) = J(x)^{\top} r(x)$ 8

implying that $\nabla f(x) = J(x)^{\top} r(x)$ 9 iterations suffice (up to logarithmic terms) to drive the gradient below $\widetilde{J}(x)^{\top} r(x)$ 0 with high probability.

A similar $\widetilde{J}(x)^{\top} r(x)$ 1 complexity holds for the regularized, nonsmooth, or constrained DFLM variants (Liu et al., 2024, Chen et al., 8 Jul 2025).

Almost Sure Global Convergence

For sparse DFLM, if the interpolation matrix $\widetilde{J}(x)^{\top} r(x)$ 2 is RIP with probability $\widetilde{J}(x)^{\top} r(x)$ 3 (beta large enough), the stationary condition

$\widetilde{J}(x)^{\top} r(x)$ 4

holds almost surely (Feng et al., 9 Jul 2025). For equality constrained DFLM, the algorithm either achieves approximate KKT contraction with arbitrary high probability or the constraint violation converges to zero almost surely (Chen et al., 8 Jul 2025).

5. Variants and Extensions: Constrained, Regularized, and Sparse DFLM

Regularized/Composite DFLM

For objectives of the form $\widetilde{J}(x)^{\top} r(x)$ 5 where $\widetilde{J}(x)^{\top} r(x)$ 6 is convex nonsmooth, DFLM uses either model-based trust region steps or a Moreau envelope smoothing approach. Stationarity is measured by $\widetilde{J}(x)^{\top} r(x)$ 7 and trust-region subproblems include the regularizer (Liu et al., 2024).

DFLM for Sparse Jacobians

Exploiting sparsity, $\widetilde{J}(x)^{\top} r(x)$ 8-based interpolation reconstructs a sparse Jacobian using $\widetilde{J}(x)^{\top} r(x)$ 9 function evaluations, yielding strong empirical and theoretical performance on large-scale problems with structured sparsity (Feng et al., 9 Jul 2025).

Equality-Constrained DFLM

For $\min f(x)$ 0 s.t. $\min f(x)$ 1, DFLM is integrated in a regularized augmented Lagrangian framework. Smoothing-based Jacobian models are used for both $\min f(x)$ 2 and $\min f(x)$ 3, with the DFLM step applied when Newton-like steps do not yield sufficient KKT residual decrease (Chen et al., 8 Jul 2025).

6. Practical Implementation and Empirical Performance

DFLM algorithms are implemented efficiently by:

Sampling orthonormal direction sets via QR factorization of random Gaussian matrices
Reusing direction sets for several iterations when function evaluations are expensive
Automatic radius and sample/adaptation strategies as stationarity is approached
Ensuring stability by safeguarding regularization parameters and maintaining bounded curvature conditions

Empirical profiles show that DFLM methods, including sparsity-exploiting and regularized variants, outperform BFGS, classical LM, and finite-difference-based solvers in the number of function evaluations and robustness, especially on large-scale and noisy or structured least-squares test sets (Chen et al., 2024, Liu et al., 2024, Feng et al., 9 Jul 2025).

7. Theoretical and Practical Considerations

Key theoretical principles established for DFLM methods include:

Probabilistically first-order accurate stochastic models, controllable via smoothing radius and sampling/embedding dimension
Global high-probability complexity bounds matching gradient-based methods under mild smoothness and boundedness assumptions
Robustness to model inaccuracy via adaptive regularization and step-rejection/acceptance policies
Almost sure global convergence in the compressed sensing/sparse setting when interpolation matrices satisfy appropriate RIP with sufficient probability

The overall DFLM approach unifies several strands of derivative-free optimization, offering scalable, theoretically justified, and practically effective methodologies for high-dimensional, black-box, regularized, constrained, and structured nonlinear least-squares problems (Chen et al., 2024, Liu et al., 2024, Chen et al., 8 Jul 2025, Feng et al., 9 Jul 2025).

Markdown Report Issue Upgrade to Chat

References (4)

A derivative-free regularization algorithm for equality constrained nonlinear least squares problems (2025)

Black-box Optimization Algorithms for Regularized Least-squares Problems (2024)

A derivative-free Levenberg-Marquardt method for sparse nonlinear least squares problems (2025)

On the global complexity of a derivative-free Levenberg-Marquardt algorithm via orthogonal spherical smoothing (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Derivative-Free LM (DFLM).