Papers
Topics
Authors
Recent
Search
2000 character limit reached

Random Subspace Gauss-Newton (RS-GN)

Updated 12 February 2026
  • RS-GN is a randomized iterative optimization method that replaces the full Gauss-Newton step with a subspace search, reducing costs for large-scale nonlinear least-squares problems.
  • It employs sketching techniques based on the Johnson-Lindenstrauss lemma to preserve critical first- and second-order information with high probability, ensuring robust global convergence.
  • Adaptive strategies in RS-GN dynamically adjust the subspace dimension, yielding empirical speedups by reducing Jacobian evaluations and the cost of solving linear systems.

Random Subspace Gauss-Newton (RS-GN) methods are randomized iterative algorithms for unconstrained nonlinear least-squares optimization that replace the classical Gauss-Newton step with a search in a randomly selected low-dimensional subspace. Developed to address the computational bottlenecks of Jacobian evaluation and linear system solves in large-scale problems, RS-GN methods employ sketching techniques and probabilistic embedding guarantees—particularly those based on the Johnson-Lindenstrauss (JL) lemma—to ensure that subspace approximations preserve critical first- and second-order problem structure with high probability. RS-GN has become a central framework for randomized second-order optimization in nonlinear least-squares, offering rigorous global convergence theory, dimension-free complexity guarantees, and substantial empirical speedups across a range of regimes (Cartis et al., 2022, &&&1&&&, Cartis et al., 2022).

1. Algorithmic Structure of RS-GN

The RS-GN method seeks the minimizer of the nonlinear least-squares objective,

f(x)=12r(x)2,r:RdRn,f(x) = \frac{1}{2}\|r(x)\|^2, \quad r:\mathbb{R}^d \to \mathbb{R}^n,

by iteratively constructing and minimizing a quadratic Gauss-Newton model in a randomly chosen pp-dimensional subspace. At iteration kk, a random sketching matrix PkRd×pP_k \in \mathbb{R}^{d \times p} (pdp \ll d) is sampled, and a reduced Jacobian Jkp=J(xk)PkJ^p_k = J(x_k)^\top P_k (of size n×pn \times p) is formed. The subspace Gauss-Newton step is defined as

sk=argminsRpJkps+rk2+λks2,s_k = \underset{s \in \mathbb{R}^p}{\arg\min} \|J^p_k s + r_k\|^2 + \lambda_k \|s\|^2,

where rk=r(xk)r_k = r(x_k) and λk>0\lambda_k > 0 is a regularization parameter. The full step is dk=Pkskd_k = P_k s_k. Step acceptance is governed by a sufficient decrease condition:

f(xk)f(xk+dk)θ[mk(0)mk(sk)],f(x_k) - f(x_k + d_k) \geq \theta \cdot [m_k(0) - m_k(s_k)],

where mk()m_k(\cdot) is the subspace model and θ(0,1)\theta \in (0,1). On success, xk+1=xk+dkx_{k+1} = x_k + d_k and λk+1\lambda_{k+1} is potentially decreased; otherwise, the regularization is increased, and the iterate is retained. Trust-region and quadratic-regularization variants are standard, ensuring robustness and convergence (Cartis et al., 2022, Cartis et al., 2022).

2. Theoretical Foundations and Probabilistic Guarantees

Central to RS-GN is the "subspace-gradient" assumption, formalized via the (1ϵ)(1-\epsilon)-JL property: for PkP_k sampled from an appropriate distribution (e.g., Gaussian with entries N(0,1/p)N(0,1/p), s-hashing),

Pr[(1ϵ)w2Pkw2(1+ϵ)w2 for all wRd]1δ,\Pr\left[ (1-\epsilon)\|w\|^2 \leq \|P_k^\top w\|^2 \leq (1+\epsilon)\|w\|^2 \text{ for all } w \in \mathbb{R}^d \right] \geq 1 - \delta,

where ϵ(0,1)\epsilon \in (0,1) and δ(0,1)\delta \in (0,1) (Cartis et al., 2022, Bellavia et al., 4 Jun 2025, Cartis et al., 2022). The JL lemma establishes that p=O(ϵ2log(1/δ))p = O(\epsilon^{-2}\log(1/\delta)) suffices for Gaussian or s-hashing sketches, with no explicit dependence on dd or nn (for dense sketches). This concentration property ensures that the sketched subspace accurately preserves geometric structure relevant for descent and curvature calculations, with high probability across iterations.

3. Complexity and Convergence Analysis

Under Lipschitz continuity and boundedness of J(x)J(x) and r(x)r(x), and with probabilistic sketching satisfying the embedding criteria, RS-GN achieves with high probability a first-order convergence rate matching that of full-dimensional Gauss-Newton:

#{k:f(xk)>ϵ}=O(ϵ2),with probability 1δ,\#\{k : \|\nabla f(x_k)\| > \epsilon\} = O(\epsilon^{-2}), \quad \text{with probability } \geq 1 - \delta,

for any ϵ>0\epsilon > 0 (assuming p=O(ϵ2log(1/δ))p=O(\epsilon^{-2}\log(1/\delta))). The per-iteration cost advantage arises from solving a pp-dimensional linear system instead of a dd-dimensional one, with the leading O(ϵ2)O(\epsilon^{-2}) iteration complexity term independent of subspace dimension pp (its influence is only logarithmic via δ\delta). Trust-region or regularization safeguards are essential for ensuring global convergence and controlling the model-data fit (Cartis et al., 2022, Cartis et al., 2022, Bellavia et al., 4 Jun 2025).

A summary of key formulas includes:

  • Subspace-gradient approximation: gk=PkPkf(xk)g_k = P_k P_k^\top \nabla f(x_k).
  • Subspace Gauss-Newton subproblem: sk=argminsRpJ(xk)Pks+r(xk)2+λks2s_k = \arg\min_{s \in \mathbb{R}^p} \|J(x_k) P_k s + r(x_k)\|^2 + \lambda_k \|s\|^2; dk=Pkskd_k = P_k s_k.
  • JL embedding: Pr[Pkw2w2ϵw2]1δ\Pr[|\|P_k^\top w\|^2 - \|w\|^2| \leq \epsilon\|w\|^2] \geq 1 - \delta, p=O(ϵ2log(1/δ))p = O(\epsilon^{-2}\log(1/\delta)).

4. Adaptive and Variable Dimension Strategies

Recent variants introduce subspace dimension k\ell_k adaptation, motivated by the observation that optimal subspace dimensionality may vary along the optimization trajectory. Variable-dimension RS-GN maintains the fundamental structure but allows k\ell_k (the subspace size at iteration kk) to grow or shrink based on observed descent quality.

Strategies include:

  • Armijo-based adaptation: On a successful step, reduce k\ell_k (down to a minimum); on failure, increase k\ell_k (up to a maximum).
  • Model-accuracy-based adaptation: Compute the residual measure θk=mk(pk)/f(xk)\theta^*_k = \|\nabla m_k(p_k)\|/\|\nabla f(x_k)\|; shrink or enlarge k\ell_k based on whether θk\theta^*_k falls below a threshold.
  • Both strategies are shown to produce theoretical complexity guarantees analogous to fixed-dimension variants, while often yielding substantial empirical reductions in computational resources (Bellavia et al., 4 Jun 2025).

5. Sketching Techniques and Subspace Construction

RS-GN relies on a variety of sketching matrices:

  • Gaussian random matrices: Entries drawn i.i.d. from N(0,1/p)\mathcal{N}(0, 1/p); satisfy JL embedding with minimal sample complexity.
  • Sparse "s-hashing" sketches: Reduce computational overhead while still achieving JL properties.
  • Coordinate/block-coordinate sampling: Particularly effective for high-sparsity regimes; dimension requirement may depend on the non-uniformity of the gradient vector through a factor ν2=maxieif(x)2/f(x)2\nu^2 = \max_i \|e_i^\top \nabla f(x)\|^2/\|\nabla f(x)\|^2 (Cartis et al., 2022, Cartis et al., 2022).

The theoretical justification for all these constructions centers on the embedding guarantees that ensure curvature and descent directions are not significantly distorted in the subspace.

6. Numerical Performance and Empirical Observations

Comprehensive numerical experiments—across CUTEst nonlinear least-squares benchmarks, logistic regression datasets, and large-scale artificial problems—demonstrate:

  • For moderate-accuracy requirements (objective fall to 10% of initial decrease), RS-GN with p=0.5dp=0.5d or $0.75d$ typically matches full Gauss-Newton while incurring only a fraction of the Jacobian-action cost.
  • Coordinate/block sampling sketches yield the lowest per-iteration costs but slower overall convergence; Gaussian and s-hashing sketches offer a balance between per-iteration cost and descent progress.
  • On very large-scale problems, RS-GN with p/d0.1p/d \lesssim 0.1 can outperform full Gauss-Newton in early-iteration progress windows.
  • Adaptive dimension strategies improve early convergence and can cut per-iteration cost by 50–90% and total cost by factors of 2–5 on medium-scale problems (Cartis et al., 2022, Cartis et al., 2022, Bellavia et al., 4 Jun 2025).

A summary of performance characteristics is captured in the following table:

Sketch Type Cost per Iteration Iteration Speedup
Gaussian / Hashing Moderate Balanced cost and progress
Coordinate / Block Sampling Lowest Slower (in kk)
Adaptive Dimension Varies (often lower long-term) Empirically best for wall-time

7. Applicability, Limitations, and Research Directions

RS-GN methods are particularly effective when the Jacobian is low-rank, its spectrum decays rapidly, or exact Gauss-Newton steps are computationally prohibitive. The dimension-reducing approach enables practical application to problems with large ambient dimension dd or where Jacobian formation is costly. Empirical gains are most pronounced in medium- and large-scale regimes, and especially when nrank(J)n \gg \text{rank}(J).

A critical assumption is that the random sketches achieve the required JL embedding properties to ensure theoretical guarantees; violation of embedding accuracy may deteriorate convergence rates. The necessity of trust-region or quadratic regularization safeguards is emphasized: these mechanisms are vital for robust global convergence.

Research continues into optimizing sketching strategies, enhancing adaptive dimension control, and extending RS-GN to non-standard models such as structured nonlinear regression and inverse problems. There is ongoing investigation of local convergence rates under stronger embedding hypotheses, superlinear or quadratic convergence under residual-zero conditions, and improved tradeoffs between per-iteration complexity and global iteration count (Cartis et al., 2022, Cartis et al., 2022, Bellavia et al., 4 Jun 2025).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Random Subspace Gauss-Newton (RS-GN).