Papers
Topics
Authors
Recent
Search
2000 character limit reached

Derivative-Free LM Optimization

Updated 28 April 2026
  • Derivative-Free LM is a class of methods that solve nonlinear least-squares problems without requiring analytic derivatives, relying instead on function evaluations.
  • They construct probabilistic gradient and Jacobian models using techniques like orthogonal spherical smoothing and ℓ₁-minimization to ensure robust convergence.
  • DFLM algorithms offer provable complexity bounds and almost sure global convergence, with adaptive regularization strategies enhancing performance on large-scale and noisy problems.

A derivative-free Levenberg–Marquardt (DFLM) algorithm is any method for nonlinear least-squares optimization that incorporates a Levenberg–Marquardt-style (LM) trust-region or regularization framework, but does not require analytic derivatives. DFLM methods are designed for problems where the Jacobian of the residuals is unavailable, expensive, or unreliable. They construct gradient or Jacobian models from function evaluations using techniques such as orthogonal spherical smoothing, interpolation, 1\ell_1-minimization, or stochastic sampling, and are especially relevant in high-dimensional, black-box, or sparse-least-squares contexts. Modern DFLM algorithms can provide provable convergence to stationary points, with global complexity bounds matching those of gradient-based methods, and demonstrate strong empirical performance on large-scale or structured least-squares problems.

1. Problem Formulation and Core Principles

The canonical target of DFLM algorithms is the nonlinear least-squares problem: minxRnf(x)=12r(x)2=12i=1m[ri(x)]2,\min_{x\in\mathbb R^n} f(x) = \frac{1}{2}\|r(x)\|^2 = \frac{1}{2}\sum_{i=1}^m [r_i(x)]^2, where r:RnRmr: \mathbb R^n \to \mathbb R^m are smooth residuals and the analytic Jacobian J(x)=[r1(x),...,rm(x)]J(x) = [\nabla r_1(x), ..., \nabla r_m(x)] is unavailable or impractical to compute. DFLM techniques replace J(x)J(x) with probabilistically accurate models J~(x)\widetilde{J}(x) derived from function evaluations, estimating the gradient f(x)=J(x)r(x)\nabla f(x) = J(x)^{\top} r(x) by J~(x)r(x)\widetilde{J}(x)^{\top} r(x).

The DFLM paradigm also extends to:

  • Equality-constrained nonlinear least-squares: minf(x)\min f(x) subject to c(x)=0c(x)=0 where neither minxRnf(x)=12r(x)2=12i=1m[ri(x)]2,\min_{x\in\mathbb R^n} f(x) = \frac{1}{2}\|r(x)\|^2 = \frac{1}{2}\sum_{i=1}^m [r_i(x)]^2,0 nor the constraint Jacobian minxRnf(x)=12r(x)2=12i=1m[ri(x)]2,\min_{x\in\mathbb R^n} f(x) = \frac{1}{2}\|r(x)\|^2 = \frac{1}{2}\sum_{i=1}^m [r_i(x)]^2,1 is available (Chen et al., 8 Jul 2025).
  • Regularized least-squares: minxRnf(x)=12r(x)2=12i=1m[ri(x)]2,\min_{x\in\mathbb R^n} f(x) = \frac{1}{2}\|r(x)\|^2 = \frac{1}{2}\sum_{i=1}^m [r_i(x)]^2,2 where minxRnf(x)=12r(x)2=12i=1m[ri(x)]2,\min_{x\in\mathbb R^n} f(x) = \frac{1}{2}\|r(x)\|^2 = \frac{1}{2}\sum_{i=1}^m [r_i(x)]^2,3 is a convex, possibly nonsmooth regularizer with a cheap proximal operator (Liu et al., 2024).
  • Sparse Jacobian scenarios: When minxRnf(x)=12r(x)2=12i=1m[ri(x)]2,\min_{x\in\mathbb R^n} f(x) = \frac{1}{2}\|r(x)\|^2 = \frac{1}{2}\sum_{i=1}^m [r_i(x)]^2,4 is known to be sparse but the sparsity pattern is unknown and function evaluation is expensive (Feng et al., 9 Jul 2025).

2. Construction of Derivative-Free Gradient and Jacobian Models

Orthogonal Spherical Smoothing

DFLM algorithms approximate the Jacobian by averaging directional differences over random or orthogonal frames. For each residual minxRnf(x)=12r(x)2=12i=1m[ri(x)]2,\min_{x\in\mathbb R^n} f(x) = \frac{1}{2}\|r(x)\|^2 = \frac{1}{2}\sum_{i=1}^m [r_i(x)]^2,5,

minxRnf(x)=12r(x)2=12i=1m[ri(x)]2,\min_{x\in\mathbb R^n} f(x) = \frac{1}{2}\|r(x)\|^2 = \frac{1}{2}\sum_{i=1}^m [r_i(x)]^2,6

where minxRnf(x)=12r(x)2=12i=1m[ri(x)]2,\min_{x\in\mathbb R^n} f(x) = \frac{1}{2}\|r(x)\|^2 = \frac{1}{2}\sum_{i=1}^m [r_i(x)]^2,7 are minxRnf(x)=12r(x)2=12i=1m[ri(x)]2,\min_{x\in\mathbb R^n} f(x) = \frac{1}{2}\|r(x)\|^2 = \frac{1}{2}\sum_{i=1}^m [r_i(x)]^2,8 orthonormal directions sampled uniformly from the sphere minxRnf(x)=12r(x)2=12i=1m[ri(x)]2,\min_{x\in\mathbb R^n} f(x) = \frac{1}{2}\|r(x)\|^2 = \frac{1}{2}\sum_{i=1}^m [r_i(x)]^2,9 and r:RnRmr: \mathbb R^n \to \mathbb R^m0 is a smoothing radius (Chen et al., 2024, Chen et al., 8 Jul 2025). This construction satisfies: r:RnRmr: \mathbb R^n \to \mathbb R^m1 Provided r:RnRmr: \mathbb R^n \to \mathbb R^m2, these models have controlled variance and bias as a function of r:RnRmr: \mathbb R^n \to \mathbb R^m3, r:RnRmr: \mathbb R^n \to \mathbb R^m4, and the Lipschitz constants of r:RnRmr: \mathbb R^n \to \mathbb R^m5.

Interpolation and r:RnRmr: \mathbb R^n \to \mathbb R^m6-Minimization for Sparse Jacobians

When Jacobians are sparse, DFLM can use compressed-sensing style interpolation. For each r:RnRmr: \mathbb R^n \to \mathbb R^m7, given r:RnRmr: \mathbb R^n \to \mathbb R^m8 random directions r:RnRmr: \mathbb R^n \to \mathbb R^m9 and J(x)=[r1(x),...,rm(x)]J(x) = [\nabla r_1(x), ..., \nabla r_m(x)]0, construct interpolation points J(x)=[r1(x),...,rm(x)]J(x) = [\nabla r_1(x), ..., \nabla r_m(x)]1 and solve: J(x)=[r1(x),...,rm(x)]J(x) = [\nabla r_1(x), ..., \nabla r_m(x)]2 where J(x)=[r1(x),...,rm(x)]J(x) = [\nabla r_1(x), ..., \nabla r_m(x)]3 comprises the J(x)=[r1(x),...,rm(x)]J(x) = [\nabla r_1(x), ..., \nabla r_m(x)]4 as rows and J(x)=[r1(x),...,rm(x)]J(x) = [\nabla r_1(x), ..., \nabla r_m(x)]5 are finite difference target values (Feng et al., 9 Jul 2025). If J(x)=[r1(x),...,rm(x)]J(x) = [\nabla r_1(x), ..., \nabla r_m(x)]6 satisfies a suitable restricted isometry property, the recovered J(x)=[r1(x),...,rm(x)]J(x) = [\nabla r_1(x), ..., \nabla r_m(x)]7 is close (in probability) to the true sparse gradient.

Quadratic and Fully Linear Models

For regularized or trust-region variants, DFLM forms quadratic models J(x)=[r1(x),...,rm(x)]J(x) = [\nabla r_1(x), ..., \nabla r_m(x)]8 by interpolating J(x)=[r1(x),...,rm(x)]J(x) = [\nabla r_1(x), ..., \nabla r_m(x)]9 at a poised set of points, where J(x)J(x)0 satisfies J(x)J(x)1 for J(x)J(x)2 (Liu et al., 2024). This ensures that the model is "fully linear" on a trust region.

3. Algorithmic Structure and Parameter Updates

A typical DFLM iteration includes the following components (Chen et al., 2024, Feng et al., 9 Jul 2025, Liu et al., 2024):

Step Operation Purpose
1 Model construction Approximate J(x)J(x)3 or J(x)J(x)4
2 Compute predicted step Solve LM-type system J(x)J(x)5
3 Step acceptance Compare actual and predicted reduction: J(x)J(x)6
4 Update regularization If J(x)J(x)7 (accept), decrease J(x)J(x)8 or J(x)J(x)9; otherwise, increase them
5 Adapt smoothing, sample size, or interpolation radius Reduce J~(x)\widetilde{J}(x)0 as steps get smaller; increase number of directions/interpolation points as needed

Parameter recommendations from the literature:

  • Smoothing radius: J~(x)\widetilde{J}(x)1
  • Sample size: J~(x)\widetilde{J}(x)2 for orthogonal sampling
  • Thresholds: J~(x)\widetilde{J}(x)3, J~(x)\widetilde{J}(x)4, J~(x)\widetilde{J}(x)5, J~(x)\widetilde{J}(x)6, with specific J~(x)\widetilde{J}(x)7 for model gradient control
  • Sparse Jacobian models: J~(x)\widetilde{J}(x)8 interpolation points if J~(x)\widetilde{J}(x)9-sparsity is known (Feng et al., 9 Jul 2025).

4. Probabilistic Accuracy, Complexity, and Convergence Guarantees

DFLM methods offer formal guarantees on the probabilistic accuracy of their stochastic models, and on the global rate of convergence to first-order stationary points.

Probabilistic Model Accuracy

DFLM algorithms attain f(x)=J(x)r(x)\nabla f(x) = J(x)^{\top} r(x)0-probabilistically f(x)=J(x)r(x)\nabla f(x) = J(x)^{\top} r(x)1 first-order accurate gradient models if

f(x)=J(x)r(x)\nabla f(x) = J(x)^{\top} r(x)2

where f(x)=J(x)r(x)\nabla f(x) = J(x)^{\top} r(x)3 scale with f(x)=J(x)r(x)\nabla f(x) = J(x)^{\top} r(x)4, f(x)=J(x)r(x)\nabla f(x) = J(x)^{\top} r(x)5, and the Lipschitz constants. Taking f(x)=J(x)r(x)\nabla f(x) = J(x)^{\top} r(x)6 yields tighter bounds as stationarity is approached (Chen et al., 2024).

Complexity Bounds

Under Lipschitz continuity and boundedness of residuals and Jacobians, DFLM with spherical smoothing achieves the following (Chen et al., 2024):

  • With probability at least f(x)=J(x)r(x)\nabla f(x) = J(x)^{\top} r(x)7,

f(x)=J(x)r(x)\nabla f(x) = J(x)^{\top} r(x)8

implying that f(x)=J(x)r(x)\nabla f(x) = J(x)^{\top} r(x)9 iterations suffice (up to logarithmic terms) to drive the gradient below J~(x)r(x)\widetilde{J}(x)^{\top} r(x)0 with high probability.

A similar J~(x)r(x)\widetilde{J}(x)^{\top} r(x)1 complexity holds for the regularized, nonsmooth, or constrained DFLM variants (Liu et al., 2024, Chen et al., 8 Jul 2025).

Almost Sure Global Convergence

For sparse DFLM, if the interpolation matrix J~(x)r(x)\widetilde{J}(x)^{\top} r(x)2 is RIP with probability J~(x)r(x)\widetilde{J}(x)^{\top} r(x)3 (beta large enough), the stationary condition

J~(x)r(x)\widetilde{J}(x)^{\top} r(x)4

holds almost surely (Feng et al., 9 Jul 2025). For equality constrained DFLM, the algorithm either achieves approximate KKT contraction with arbitrary high probability or the constraint violation converges to zero almost surely (Chen et al., 8 Jul 2025).

5. Variants and Extensions: Constrained, Regularized, and Sparse DFLM

Regularized/Composite DFLM

For objectives of the form J~(x)r(x)\widetilde{J}(x)^{\top} r(x)5 where J~(x)r(x)\widetilde{J}(x)^{\top} r(x)6 is convex nonsmooth, DFLM uses either model-based trust region steps or a Moreau envelope smoothing approach. Stationarity is measured by J~(x)r(x)\widetilde{J}(x)^{\top} r(x)7 and trust-region subproblems include the regularizer (Liu et al., 2024).

DFLM for Sparse Jacobians

Exploiting sparsity, J~(x)r(x)\widetilde{J}(x)^{\top} r(x)8-based interpolation reconstructs a sparse Jacobian using J~(x)r(x)\widetilde{J}(x)^{\top} r(x)9 function evaluations, yielding strong empirical and theoretical performance on large-scale problems with structured sparsity (Feng et al., 9 Jul 2025).

Equality-Constrained DFLM

For minf(x)\min f(x)0 s.t. minf(x)\min f(x)1, DFLM is integrated in a regularized augmented Lagrangian framework. Smoothing-based Jacobian models are used for both minf(x)\min f(x)2 and minf(x)\min f(x)3, with the DFLM step applied when Newton-like steps do not yield sufficient KKT residual decrease (Chen et al., 8 Jul 2025).

6. Practical Implementation and Empirical Performance

DFLM algorithms are implemented efficiently by:

  • Sampling orthonormal direction sets via QR factorization of random Gaussian matrices
  • Reusing direction sets for several iterations when function evaluations are expensive
  • Automatic radius and sample/adaptation strategies as stationarity is approached
  • Ensuring stability by safeguarding regularization parameters and maintaining bounded curvature conditions

Empirical profiles show that DFLM methods, including sparsity-exploiting and regularized variants, outperform BFGS, classical LM, and finite-difference-based solvers in the number of function evaluations and robustness, especially on large-scale and noisy or structured least-squares test sets (Chen et al., 2024, Liu et al., 2024, Feng et al., 9 Jul 2025).

7. Theoretical and Practical Considerations

Key theoretical principles established for DFLM methods include:

  • Probabilistically first-order accurate stochastic models, controllable via smoothing radius and sampling/embedding dimension
  • Global high-probability complexity bounds matching gradient-based methods under mild smoothness and boundedness assumptions
  • Robustness to model inaccuracy via adaptive regularization and step-rejection/acceptance policies
  • Almost sure global convergence in the compressed sensing/sparse setting when interpolation matrices satisfy appropriate RIP with sufficient probability

The overall DFLM approach unifies several strands of derivative-free optimization, offering scalable, theoretically justified, and practically effective methodologies for high-dimensional, black-box, regularized, constrained, and structured nonlinear least-squares problems (Chen et al., 2024, Liu et al., 2024, Chen et al., 8 Jul 2025, Feng et al., 9 Jul 2025).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Derivative-Free LM (DFLM).