Random Subspace Gauss-Newton (RS-GN)

Updated 12 February 2026

RS-GN is a randomized iterative optimization method that replaces the full Gauss-Newton step with a subspace search, reducing costs for large-scale nonlinear least-squares problems.
It employs sketching techniques based on the Johnson-Lindenstrauss lemma to preserve critical first- and second-order information with high probability, ensuring robust global convergence.
Adaptive strategies in RS-GN dynamically adjust the subspace dimension, yielding empirical speedups by reducing Jacobian evaluations and the cost of solving linear systems.

Random Subspace Gauss-Newton (RS-GN) methods are randomized iterative algorithms for unconstrained nonlinear least-squares optimization that replace the classical Gauss-Newton step with a search in a randomly selected low-dimensional subspace. Developed to address the computational bottlenecks of Jacobian evaluation and linear system solves in large-scale problems, RS-GN methods employ sketching techniques and probabilistic embedding guarantees—particularly those based on the Johnson-Lindenstrauss (JL) lemma—to ensure that subspace approximations preserve critical first- and second-order problem structure with high probability. RS-GN has become a central framework for randomized second-order optimization in nonlinear least-squares, offering rigorous global convergence theory, dimension-free complexity guarantees, and substantial empirical speedups across a range of regimes (Cartis et al., 2022, &&&1&&&, Cartis et al., 2022).

1. Algorithmic Structure of RS-GN

The RS-GN method seeks the minimizer of the nonlinear least-squares objective,

$f(x) = \frac{1}{2}\|r(x)\|^2, \quad r:\mathbb{R}^d \to \mathbb{R}^n,$

by iteratively constructing and minimizing a quadratic Gauss-Newton model in a randomly chosen $p$ -dimensional subspace. At iteration $k$ , a random sketching matrix $P_k \in \mathbb{R}^{d \times p}$ ( $p \ll d$ ) is sampled, and a reduced Jacobian $J^p_k = J(x_k)^\top P_k$ (of size $n \times p$ ) is formed. The subspace Gauss-Newton step is defined as

$s_k = \underset{s \in \mathbb{R}^p}{\arg\min} \|J^p_k s + r_k\|^2 + \lambda_k \|s\|^2,$

where $r_k = r(x_k)$ and $\lambda_k > 0$ is a regularization parameter. The full step is $d_k = P_k s_k$ . Step acceptance is governed by a sufficient decrease condition:

$f(x_k) - f(x_k + d_k) \geq \theta \cdot [m_k(0) - m_k(s_k)],$

where $m_k(\cdot)$ is the subspace model and $\theta \in (0,1)$ . On success, $x_{k+1} = x_k + d_k$ and $\lambda_{k+1}$ is potentially decreased; otherwise, the regularization is increased, and the iterate is retained. Trust-region and quadratic-regularization variants are standard, ensuring robustness and convergence (Cartis et al., 2022, Cartis et al., 2022).

2. Theoretical Foundations and Probabilistic Guarantees

Central to RS-GN is the "subspace-gradient" assumption, formalized via the $(1-\epsilon)$ -JL property: for $P_k$ sampled from an appropriate distribution (e.g., Gaussian with entries $N(0,1/p)$ , s-hashing),

$\Pr\left[ (1-\epsilon)\|w\|^2 \leq \|P_k^\top w\|^2 \leq (1+\epsilon)\|w\|^2 \text{ for all } w \in \mathbb{R}^d \right] \geq 1 - \delta,$

where $\epsilon \in (0,1)$ and $\delta \in (0,1)$ (Cartis et al., 2022, Bellavia et al., 4 Jun 2025, Cartis et al., 2022). The JL lemma establishes that $p = O(\epsilon^{-2}\log(1/\delta))$ suffices for Gaussian or s-hashing sketches, with no explicit dependence on $d$ or $n$ (for dense sketches). This concentration property ensures that the sketched subspace accurately preserves geometric structure relevant for descent and curvature calculations, with high probability across iterations.

3. Complexity and Convergence Analysis

Under Lipschitz continuity and boundedness of $J(x)$ and $r(x)$ , and with probabilistic sketching satisfying the embedding criteria, RS-GN achieves with high probability a first-order convergence rate matching that of full-dimensional Gauss-Newton:

$\#\{k : \|\nabla f(x_k)\| > \epsilon\} = O(\epsilon^{-2}), \quad \text{with probability } \geq 1 - \delta,$

for any $\epsilon > 0$ (assuming $p=O(\epsilon^{-2}\log(1/\delta))$ ). The per-iteration cost advantage arises from solving a $p$ -dimensional linear system instead of a $d$ -dimensional one, with the leading $O(\epsilon^{-2})$ iteration complexity term independent of subspace dimension $p$ (its influence is only logarithmic via $\delta$ ). Trust-region or regularization safeguards are essential for ensuring global convergence and controlling the model-data fit (Cartis et al., 2022, Cartis et al., 2022, Bellavia et al., 4 Jun 2025).

A summary of key formulas includes:

Subspace-gradient approximation: $g_k = P_k P_k^\top \nabla f(x_k)$ .
Subspace Gauss-Newton subproblem: $s_k = \arg\min_{s \in \mathbb{R}^p} \|J(x_k) P_k s + r(x_k)\|^2 + \lambda_k \|s\|^2$ ; $d_k = P_k s_k$ .
JL embedding: $\Pr[|\|P_k^\top w\|^2 - \|w\|^2| \leq \epsilon\|w\|^2] \geq 1 - \delta$ , $p = O(\epsilon^{-2}\log(1/\delta))$ .

4. Adaptive and Variable Dimension Strategies

Recent variants introduce subspace dimension $\ell_k$ adaptation, motivated by the observation that optimal subspace dimensionality may vary along the optimization trajectory. Variable-dimension RS-GN maintains the fundamental structure but allows $\ell_k$ (the subspace size at iteration $k$ ) to grow or shrink based on observed descent quality.

Strategies include:

Armijo-based adaptation: On a successful step, reduce $\ell_k$ (down to a minimum); on failure, increase $\ell_k$ (up to a maximum).
Model-accuracy-based adaptation: Compute the residual measure $\theta^*_k = \|\nabla m_k(p_k)\|/\|\nabla f(x_k)\|$ ; shrink or enlarge $\ell_k$ based on whether $\theta^*_k$ falls below a threshold.
Both strategies are shown to produce theoretical complexity guarantees analogous to fixed-dimension variants, while often yielding substantial empirical reductions in computational resources (Bellavia et al., 4 Jun 2025).

5. Sketching Techniques and Subspace Construction

RS-GN relies on a variety of sketching matrices:

Gaussian random matrices: Entries drawn i.i.d. from $\mathcal{N}(0, 1/p)$ ; satisfy JL embedding with minimal sample complexity.
Sparse "s-hashing" sketches: Reduce computational overhead while still achieving JL properties.
Coordinate/block-coordinate sampling: Particularly effective for high-sparsity regimes; dimension requirement may depend on the non-uniformity of the gradient vector through a factor $\nu^2 = \max_i \|e_i^\top \nabla f(x)\|^2/\|\nabla f(x)\|^2$ (Cartis et al., 2022, Cartis et al., 2022).

The theoretical justification for all these constructions centers on the embedding guarantees that ensure curvature and descent directions are not significantly distorted in the subspace.

6. Numerical Performance and Empirical Observations

Comprehensive numerical experiments—across CUTEst nonlinear least-squares benchmarks, logistic regression datasets, and large-scale artificial problems—demonstrate:

For moderate-accuracy requirements (objective fall to 10% of initial decrease), RS-GN with $p=0.5d$ or $0.75d$ typically matches full Gauss-Newton while incurring only a fraction of the Jacobian-action cost.
Coordinate/block sampling sketches yield the lowest per-iteration costs but slower overall convergence; Gaussian and s-hashing sketches offer a balance between per-iteration cost and descent progress.
On very large-scale problems, RS-GN with $p/d \lesssim 0.1$ can outperform full Gauss-Newton in early-iteration progress windows.
Adaptive dimension strategies improve early convergence and can cut per-iteration cost by 50–90% and total cost by factors of 2–5 on medium-scale problems (Cartis et al., 2022, Cartis et al., 2022, Bellavia et al., 4 Jun 2025).

A summary of performance characteristics is captured in the following table:

Sketch Type	Cost per Iteration	Iteration Speedup
Gaussian / Hashing	Moderate	Balanced cost and progress
Coordinate / Block Sampling	Lowest	Slower (in $k$ )
Adaptive Dimension	Varies (often lower long-term)	Empirically best for wall-time

7. Applicability, Limitations, and Research Directions

RS-GN methods are particularly effective when the Jacobian is low-rank, its spectrum decays rapidly, or exact Gauss-Newton steps are computationally prohibitive. The dimension-reducing approach enables practical application to problems with large ambient dimension $d$ or where Jacobian formation is costly. Empirical gains are most pronounced in medium- and large-scale regimes, and especially when $n \gg \text{rank}(J)$ .

A critical assumption is that the random sketches achieve the required JL embedding properties to ensure theoretical guarantees; violation of embedding accuracy may deteriorate convergence rates. The necessity of trust-region or quadratic regularization safeguards is emphasized: these mechanisms are vital for robust global convergence.

Research continues into optimizing sketching strategies, enhancing adaptive dimension control, and extending RS-GN to non-standard models such as structured nonlinear regression and inverse problems. There is ongoing investigation of local convergence rates under stronger embedding hypotheses, superlinear or quadratic convergence under residual-zero conditions, and improved tradeoffs between per-iteration complexity and global iteration count (Cartis et al., 2022, Cartis et al., 2022, Bellavia et al., 4 Jun 2025).