Separable Non-linear Least Squares (SNLLS)

Updated 25 January 2026

SNLLS is a subclass of nonlinear least squares where the objective function is separable into linear and nonlinear parameters, enabling efficient variable elimination.
Variable projection reduces dimensionality by solving for linear parameters in closed form, improving conditioning and optimization speed.
Robust SNLLS algorithms employ block-structured and dual formulations to handle constraints and non-quadratic losses, with applications in system identification, signal processing, and biological modeling.

Separable Non-linear Least Squares (SNLLS) is a specific subclass of nonlinear least squares problems in which the objective depends linearly on a subset of parameters and nonlinearly on another subset. This structure enables fundamentally different algorithmic strategies compared to general nonlinear least squares, including explicit variable elimination (“variable projection”), block-structured Newton-type methods, and dual formulations for robust norms. SNLLS models arise extensively in system identification, biology, signal processing, and high-dimensional inverse problems.

1. Mathematical Formulation and Problem Structure

The canonical SNLLS problem minimizes a squared residual of the form

$\min_{x \in \mathbb{R}^k,\, y \in \mathbb{R}^n}\ \Phi(x, y) = \lVert F(x, y) \rVert_2^2,\quad F(x, y) = A(x) y - b(x),$

where:

$x$ is the vector of nonlinear parameters,
$y$ is the vector of linear parameters,
$A(x) \in \mathbb{R}^{m\times n}$ is a matrix whose entries involve nonlinear functions of $x$ (often constructed via basis functions $\phi_j(x, t_i)$ ),
$b(x)\in\mathbb{R}^m$ is the data vector (potentially nonlinear in $x$ ),
$m$ is the number of observations.

For fixed $x$ , the residual is affine (linear) in $x$ 0, allowing the elimination of $x$ 1 via

$x$ 2

with $x$ 3 denoting the Moore–Penrose pseudoinverse. Substituting $x$ 4 back yields the reduced variable-projection functional

$x$ 5

which now depends only on the nonlinear parameters $x$ 6 (Gharibi et al., 2011, Herrera-Gomez et al., 2017, Dattner et al., 2019).

This structure generalizes broadly. For models $x$ 7, $x$ 8 enters linearly and $x$ 9 nonlinearly, and for any fixed $y$ 0, the optimal $y$ 1 is given by linear least squares (Herrera-Gomez et al., 2017).

2. Algorithmic Approaches: Variable Projection and Semi-reduced Schemes

Variable Projection

The variable-projection method exploits the separability by analytically solving for the linear parameters. The resulting reduced optimization, expressed solely in the nonlinear variables, dramatically reduces dimensionality and often improves conditioning. The steps are:

For the current nonlinear parameter $y$ 2, form $y$ 3 and $y$ 4.
Solve for the optimal linear parameters $y$ 5 via standard linear least squares.
Compute the reduced cost $y$ 6.
Update $y$ 7 via gradient-based (Gauss–Newton, Levenberg–Marquardt) methods, using a projected reduced Jacobian (Herrera-Gomez et al., 2017, Shearer et al., 2013).

The gradient and approximate Hessian of $y$ 8 are given by

$y$ 9

$A(x) \in \mathbb{R}^{m\times n}$ 0

where $A(x) \in \mathbb{R}^{m\times n}$ 1 is the Jacobian of the residual with respect to the nonlinear parameters, computed at the optimal linear fit (Herrera-Gomez et al., 2017).

Semi-reduced and Generalized Variable Elimination

For cases where closed-form elimination is impossible (Poisson likelihoods, bound constraints, non-quadratic losses), semi-reduced methods generalize the classical variable projection:

The overall Newton-type system is partitioned into blocks associated with linear and nonlinear parameters.
Block Gaussian elimination (Schur complement) is used to solve for the nonlinear parameters, with trial-point adjustment in the linear parameters—possibly via partial or exact inner minimization.
This interpolates between full parameter joint updates and reduced variable projection, maintaining robust convergence properties, especially in large-scale or ill-conditioned regimes (Shearer et al., 2013).

Pseudocode of Semi-reduced Newton-Line Search

$\phi_j(x, t_i)$ 0

3. Norm Choices, Duality, and Robust Formulations

SNLLS can be posed under both the Euclidean (2-norm) and Chebyshev (∞-norm) metrics:

2-norm: Differentiable, conventional Gauss–Newton/LM methods apply. Variable projection is efficient and robust in this setting.
∞-norm: Nondifferentiable at maxima; classical gradient-based approaches break down. Dual or subgradient-based schemes are required.

A Lagrangian-dual algorithm is introduced for minimax (∞-norm) problems, transforming

$A(x) \in \mathbb{R}^{m\times n}$ 2

into an iterated sequence:

At each dual iterate $A(x) \in \mathbb{R}^{m\times n}$ 3, solve a weighted nonlinear least squares:

$A(x) \in \mathbb{R}^{m\times n}$ 4

Update $A(x) \in \mathbb{R}^{m\times n}$ 5 via subgradient (projection onto the probability simplex).
Iterate until the duality gap/convergence criteria are satisfied.

This approach preserves separable structure within each weighted subproblem and is applicable in robust regression and Chebyshev approximation settings where maximum residual control is required (Gharibi et al., 2011).

4. Computational Complexity, Convergence Properties, and Statistical Interpretation

Computational Complexity

Classical SNLLS permits dimensional reduction: for $A(x) \in \mathbb{R}^{m\times n}$ 6 total parameters, the variable-projection method reduces outer optimization to $A(x) \in \mathbb{R}^{m\times n}$ 7 variables, with linear solves in $A(x) \in \mathbb{R}^{m\times n}$ 8 (Dattner et al., 2019).
Per iteration costs: forming and solving the linear least squares (dominant for large $A(x) \in \mathbb{R}^{m\times n}$ 9), plus Jacobian computations and outer nonlinear updates.
Semi-reduced methods leverage block structure in the Hessian, facilitating sparsity-exploiting direct solvers (block-diagonal, circulant, banded) and parallelization (Shearer et al., 2013, Fodor et al., 2023).

Method	Outer Dim.	Inner Solve	Overall Complexity
Variable Projection	$x$ 0	$x$ 1	$x$ 2
Semi-reduced	$x$ 3	Block LS/CG	Leverages structure

Convergence Theory

SNLLS inherits local superlinear (or quadratic) convergence of Gauss–Newton-type methods when residual norms are small and Jacobians well-conditioned.
Semi-reduced methods admit global convergence proofs via Armijo and monotonic adjustment operators.
Dual subgradient methods for ∞-norm formulations converge as $x$ 4 under nonsmooth settings, with rapid initial residual decrease (Gharibi et al., 2011, Shearer et al., 2013).

Statistical Significance

The Schur complement formula for the reduced Hessian ensures that covariance estimates of the nonlinear parameters are identical to what would be obtained from the full nonlinear least squares, preserving statistical validity (Herrera-Gomez et al., 2017).
Variable-projection does not bias estimation, nor does it alter variance properties, if the noise is Gaussian and the model structure adhered to.

5. Practical Implementations and Extensions

SNLLS methodologies have led to specialized packages and scalable solvers:

Non-gradient grid-search methods (e.g., nlstac): exploit separability for robustness and initialization-free parameter estimation, particularly in models such as sums of exponentials, Gaussians, and exponential+sinusoid composites. These methods solve for linear parameters at every grid point over bounded intervals of nonlinear parameters, yielding globally reliable fits but at exponential cost in nonlinear parameter dimension (Torvisco et al., 2024).
Nonnegative least squares (NNLS)-driven SNLLS: basis functions parameterized nonlinearly, with nonnegative linear coefficients. Grid discretization in nonlinear parameters and NNLS solvers select an optimal sparse basis expansion (illustrated in rational/exponential function approximation) (Vabishchevich, 2023).

Some SNLLS solvers are designed for large-scale, nearly-block-separable problems, using parallel fixed-point iteration for block-structured Levenberg–Marquardt systems (Fodor et al., 2023). This is relevant in high-dimensional geospatial applications (cadastral map refinement, bundle adjustment) and any scenario with block-sparse Jacobians.

6. Applications and Illustrative Case Studies

Key domains leveraging SNLLS structure include:

System identification: best Chebyshev-fit of parameterized models for robust system parameter recovery (Gharibi et al., 2011).
Biological modeling: parameter inference in ODE models with linearly embedded rates and nonlinear kinetic orders. SNLLS provides substantial speedups (2–10× over vanilla NLLS) and improved robustness in biochemical systems and epidemic modeling (Dattner et al., 2019).
Signal fitting: multi-channel regression, e.g., fitting sums of damped exponentials in spectroscopy or sensor analysis.
Computer vision/robotics: reprojection error minimization (with linear pose or scale parameters) under robust norms or constraints (Gharibi et al., 2011, Fodor et al., 2023).
Function approximation: rational or exponential sum approximations implemented via NNLS-based SNLLS workflows, guiding selection of basis parameters for high-precision fits of functions like $x$ 5 or $x$ 6 (Vabishchevich, 2023).
Large-scale inverse problems: block-separable and nearly-separable NLS solvers scale efficiently via parallel fixed-point inner iterations and are empirically validated on million-variable test problems (Fodor et al., 2023).

7. Limitations, Extensions, and Open Problems

SNLLS algorithms rely on model separability and explicit closed-form for the linear parameter block. When constraints (e.g., $x$ 7), non-Gaussian likelihoods (Poisson), or nonquadratic losses are present, classical elimination is infeasible; semi-reduced and adjustment-based extensions become necessary (Shearer et al., 2013). Unseparated approaches are useful when $x$ 8 is ill-conditioned or $x$ 9 is constrained, as they avoid repeated inversion and enable handling of general constraints. Robust formulations under the ∞-norm require dual or nonsmooth optimization strategies, as standard variable-projection fails in nondifferentiable regimes (Gharibi et al., 2011).

Grid-search and NNLS-based SNLLS methods have exponential complexity in the number of nonlinear parameters and become impractical for high-dimensional nonlinear blocks (Torvisco et al., 2024, Vabishchevich, 2023). In such cases, hybrid approaches using SNLLS for initialization followed by local gradient-based methods are recommended.

The continuing development of SNLLS methodologies is situated in the context of large-scale data-driven modeling, leveraging separable structure for both algorithmic efficiency and statistical reliability across disciplinary boundaries.