Derivative-Free LM Optimization
- Derivative-Free LM is a class of methods that solve nonlinear least-squares problems without requiring analytic derivatives, relying instead on function evaluations.
- They construct probabilistic gradient and Jacobian models using techniques like orthogonal spherical smoothing and ℓ₁-minimization to ensure robust convergence.
- DFLM algorithms offer provable complexity bounds and almost sure global convergence, with adaptive regularization strategies enhancing performance on large-scale and noisy problems.
A derivative-free Levenberg–Marquardt (DFLM) algorithm is any method for nonlinear least-squares optimization that incorporates a Levenberg–Marquardt-style (LM) trust-region or regularization framework, but does not require analytic derivatives. DFLM methods are designed for problems where the Jacobian of the residuals is unavailable, expensive, or unreliable. They construct gradient or Jacobian models from function evaluations using techniques such as orthogonal spherical smoothing, interpolation, -minimization, or stochastic sampling, and are especially relevant in high-dimensional, black-box, or sparse-least-squares contexts. Modern DFLM algorithms can provide provable convergence to stationary points, with global complexity bounds matching those of gradient-based methods, and demonstrate strong empirical performance on large-scale or structured least-squares problems.
1. Problem Formulation and Core Principles
The canonical target of DFLM algorithms is the nonlinear least-squares problem: where are smooth residuals and the analytic Jacobian is unavailable or impractical to compute. DFLM techniques replace with probabilistically accurate models derived from function evaluations, estimating the gradient by .
The DFLM paradigm also extends to:
- Equality-constrained nonlinear least-squares: subject to where neither 0 nor the constraint Jacobian 1 is available (Chen et al., 8 Jul 2025).
- Regularized least-squares: 2 where 3 is a convex, possibly nonsmooth regularizer with a cheap proximal operator (Liu et al., 2024).
- Sparse Jacobian scenarios: When 4 is known to be sparse but the sparsity pattern is unknown and function evaluation is expensive (Feng et al., 9 Jul 2025).
2. Construction of Derivative-Free Gradient and Jacobian Models
Orthogonal Spherical Smoothing
DFLM algorithms approximate the Jacobian by averaging directional differences over random or orthogonal frames. For each residual 5,
6
where 7 are 8 orthonormal directions sampled uniformly from the sphere 9 and 0 is a smoothing radius (Chen et al., 2024, Chen et al., 8 Jul 2025). This construction satisfies: 1 Provided 2, these models have controlled variance and bias as a function of 3, 4, and the Lipschitz constants of 5.
Interpolation and 6-Minimization for Sparse Jacobians
When Jacobians are sparse, DFLM can use compressed-sensing style interpolation. For each 7, given 8 random directions 9 and 0, construct interpolation points 1 and solve: 2 where 3 comprises the 4 as rows and 5 are finite difference target values (Feng et al., 9 Jul 2025). If 6 satisfies a suitable restricted isometry property, the recovered 7 is close (in probability) to the true sparse gradient.
Quadratic and Fully Linear Models
For regularized or trust-region variants, DFLM forms quadratic models 8 by interpolating 9 at a poised set of points, where 0 satisfies 1 for 2 (Liu et al., 2024). This ensures that the model is "fully linear" on a trust region.
3. Algorithmic Structure and Parameter Updates
A typical DFLM iteration includes the following components (Chen et al., 2024, Feng et al., 9 Jul 2025, Liu et al., 2024):
| Step | Operation | Purpose |
|---|---|---|
| 1 | Model construction | Approximate 3 or 4 |
| 2 | Compute predicted step | Solve LM-type system 5 |
| 3 | Step acceptance | Compare actual and predicted reduction: 6 |
| 4 | Update regularization | If 7 (accept), decrease 8 or 9; otherwise, increase them |
| 5 | Adapt smoothing, sample size, or interpolation radius | Reduce 0 as steps get smaller; increase number of directions/interpolation points as needed |
Parameter recommendations from the literature:
- Smoothing radius: 1
- Sample size: 2 for orthogonal sampling
- Thresholds: 3, 4, 5, 6, with specific 7 for model gradient control
- Sparse Jacobian models: 8 interpolation points if 9-sparsity is known (Feng et al., 9 Jul 2025).
4. Probabilistic Accuracy, Complexity, and Convergence Guarantees
DFLM methods offer formal guarantees on the probabilistic accuracy of their stochastic models, and on the global rate of convergence to first-order stationary points.
Probabilistic Model Accuracy
DFLM algorithms attain 0-probabilistically 1 first-order accurate gradient models if
2
where 3 scale with 4, 5, and the Lipschitz constants. Taking 6 yields tighter bounds as stationarity is approached (Chen et al., 2024).
Complexity Bounds
Under Lipschitz continuity and boundedness of residuals and Jacobians, DFLM with spherical smoothing achieves the following (Chen et al., 2024):
- With probability at least 7,
8
implying that 9 iterations suffice (up to logarithmic terms) to drive the gradient below 0 with high probability.
A similar 1 complexity holds for the regularized, nonsmooth, or constrained DFLM variants (Liu et al., 2024, Chen et al., 8 Jul 2025).
Almost Sure Global Convergence
For sparse DFLM, if the interpolation matrix 2 is RIP with probability 3 (beta large enough), the stationary condition
4
holds almost surely (Feng et al., 9 Jul 2025). For equality constrained DFLM, the algorithm either achieves approximate KKT contraction with arbitrary high probability or the constraint violation converges to zero almost surely (Chen et al., 8 Jul 2025).
5. Variants and Extensions: Constrained, Regularized, and Sparse DFLM
Regularized/Composite DFLM
For objectives of the form 5 where 6 is convex nonsmooth, DFLM uses either model-based trust region steps or a Moreau envelope smoothing approach. Stationarity is measured by 7 and trust-region subproblems include the regularizer (Liu et al., 2024).
DFLM for Sparse Jacobians
Exploiting sparsity, 8-based interpolation reconstructs a sparse Jacobian using 9 function evaluations, yielding strong empirical and theoretical performance on large-scale problems with structured sparsity (Feng et al., 9 Jul 2025).
Equality-Constrained DFLM
For 0 s.t. 1, DFLM is integrated in a regularized augmented Lagrangian framework. Smoothing-based Jacobian models are used for both 2 and 3, with the DFLM step applied when Newton-like steps do not yield sufficient KKT residual decrease (Chen et al., 8 Jul 2025).
6. Practical Implementation and Empirical Performance
DFLM algorithms are implemented efficiently by:
- Sampling orthonormal direction sets via QR factorization of random Gaussian matrices
- Reusing direction sets for several iterations when function evaluations are expensive
- Automatic radius and sample/adaptation strategies as stationarity is approached
- Ensuring stability by safeguarding regularization parameters and maintaining bounded curvature conditions
Empirical profiles show that DFLM methods, including sparsity-exploiting and regularized variants, outperform BFGS, classical LM, and finite-difference-based solvers in the number of function evaluations and robustness, especially on large-scale and noisy or structured least-squares test sets (Chen et al., 2024, Liu et al., 2024, Feng et al., 9 Jul 2025).
7. Theoretical and Practical Considerations
Key theoretical principles established for DFLM methods include:
- Probabilistically first-order accurate stochastic models, controllable via smoothing radius and sampling/embedding dimension
- Global high-probability complexity bounds matching gradient-based methods under mild smoothness and boundedness assumptions
- Robustness to model inaccuracy via adaptive regularization and step-rejection/acceptance policies
- Almost sure global convergence in the compressed sensing/sparse setting when interpolation matrices satisfy appropriate RIP with sufficient probability
The overall DFLM approach unifies several strands of derivative-free optimization, offering scalable, theoretically justified, and practically effective methodologies for high-dimensional, black-box, regularized, constrained, and structured nonlinear least-squares problems (Chen et al., 2024, Liu et al., 2024, Chen et al., 8 Jul 2025, Feng et al., 9 Jul 2025).