LSR: Linearized Subspace Refinement

Updated 30 March 2026

LSR is a universal, architecture-agnostic framework that refines neural network predictions using a low-dimensional subspace derived from the Jacobian.
It constructs a reduced least-squares problem via randomized range finding and SVD, enabling one-shot or iterative accuracy improvements without altering the underlying architecture.
Empirical results demonstrate order-of-magnitude error reductions in supervised, operator, and physics-informed neural learning compared to conventional gradient methods.

Linearized Subspace Refinement (LSR) is a universal, architecture-agnostic framework designed for refining neural network predictions beyond the limits typically achieved by gradient-based optimization. LSR leverages the linearized residual model induced by the Jacobian at a fixed trained parameter state and solves a reduced least-squares problem within a data-driven low-rank subspace. This approach provides a tractable and numerically stable mechanism for substantial post-training or in-training accuracy improvement across supervised learning, operator learning, and physics-informed neural operator fine-tuning, without altering model architectures or loss formulations (Cao et al., 20 Jan 2026).

1. Problem Formulation and Core Methodology

LSR operates on a generic neural network predictor $q(\theta, x)\in\mathbb{R}^d$ with a parameter vector $\theta\in\mathbb{R}^m$ and a residual vector $f(\theta)\in\mathbb{R}^n$ , where the training objective is to minimize the squared norm $L(\theta) = \frac{1}{2}\|f(\theta)\|_2^2$ . Typical choices for $f(\theta)$ encompass residuals for supervised learning, operator learning, or physics-informed learning.

At a pretrained state $\theta_0$ , the first-order Taylor expansion provides

$f(\theta_0+\delta) \approx f(\theta_0) + G_0\,\delta$

with $G_0 = \frac{\partial f}{\partial \theta}|_{\theta_0}$ . Direct solution of the full least-squares problem,

$\delta^* = \arg\min_{\delta\in\mathbb{R}^m} \|f(\theta_0) + G_0\,\delta\|_2^2,$

is intractable for large $m$ . LSR addresses this by restricting $\delta$ to a low-dimensional subspace: $\delta = V\,y$ , where $V\in\mathbb{R}^{m\times r}$ with $V^TV=I_r$ and $r\ll m$ . The reduced problem is

$y^* = \arg\min_{y\in\mathbb{R}^r} \|f(\theta_0) + G_0 V y\|_2^2,$

with the refined predictor given by

$q_\text{LSR}(x) = q(\theta_0, x) + J_0(x)\,\delta^*,$

where $J_0(x) = \frac{\partial q}{\partial\theta}|_{\theta_0, x}$ .

2. Subspace Construction and Linear Residual Modeling

To construct the subspace $V$ , LSR employs a randomized range-finding strategy based on the network-output Jacobian $J_0 = \frac{\partial q}{\partial\theta}$ . The process involves:

Drawing a Gaussian random matrix $\Omega\in\mathbb{R}^{m\times(r+p)}$ ,
Computing $Y = J_0\Omega$ using Jacobian-vector products,
QR factorization of $Y=QR$ ,
SVD on the reduced matrix $B = Q^TJ_0$ to extract the dominant right singular vectors,
Selecting the first $r$ columns of $V$ .

Restricting to this subspace yields a reduced least-squares problem in $r$ dimensions, efficiently solvable via direct methods (thin QR or normal equations). For typical $r\sim 10^2 - 10^3$ , this procedure enables order-of-magnitude improvements in empirical accuracy.

The underlying assumption is that most actionable directions for residual minimization are captured in a tractable, low-rank local subspace of parameter perturbations, exposing accuracy not attainable by standard gradient-based training due to ill-conditioning.

3. One-Shot and Iterative LSR Algorithms

The LSR methodology encompasses two core algorithmic modes:

One-Shot LSR: Performed as a post-processing step after standard training has converged. The algorithm consists of subspace identification, construction of a reduced system (using $G_0V$ ), and solution via direct linear algebra to deliver a refined linear predictor.
Iterative LSR: Designed for composite or operator-constrained objectives, such as those arising in PDE-constrained learning. The procedure alternates between one-shot LSR subspace corrections and supervised nonlinear alignment using standard optimizers (e.g., L-BFGS or Adam) to minimize alignment losses. This approach is particularly effective for physics-informed learning where residual minimization is combined with boundary and physical constraints.

High-level pseudocode for both one-shot and iterative variants explicitly specifies the sequence of subspace construction (randomized SVD, QR), system assembly (batching, Jacobian-vector products), and least-squares solution, as well as recommended practical choices for rank $r$ and oversampling parameter $p$ (Cao et al., 20 Jan 2026).

4. Numerical Conditioning and Trade-Offs Versus Gradient Training

Within convex quadratic regimes, the convergence rate of gradient or quasi-Newton methods is governed by the condition number of the Hessian $H = 2G_0^T G_0$ . Ill-conditioning (large $\kappa(H)$ ) provokes slow or stalled convergence, frequently causing early plateaus in loss minimization.

Empirical results indicate that for supervised function fitting with modern neural networks, standard optimizers such as Adam can stall at MSE levels orders of magnitude above the solution attainable by direct reduced-subspace solutions. One-shot LSR achieves full machine-precision minimization in these cases, while even iterative solvers on the full linearized system plateau due to ill-conditioning. For operator learning and PINN fine-tuning, growing $r$ improves the LSR loss up to a threshold, beyond which numerical errors and subspace ill-conditioning dominate.

The practical selection of subspace rank $r$ is dictated by monotonic improvement in residual loss; rank is increased up to the point where this improvement is no longer observed.

5. Empirical Performance Across Applications

Experimental results in multiple regimes demonstrate the broad efficacy of LSR:

Supervised Function Approximation: On a 2D sine target, Adam converges to test MSE $\sim10^{-6}$ , whereas one-shot LSR (with $r=1000$ ) reduces this to $\sim10^{-12}$ . From random initialization, LSR still surpasses optimizer plateaus.
Operator Learning (1D Burgers equation): Across DeepONet and MultiONet architectures, median test-error reduction factors post-LSR range from $12\times$ to $236\times$ , depending on nonlinearity choices.
Physics-Informed Fine-Tuning: For 300 test instances each, error reduction ranges from $8\times$ (advection) to $240\times$ (linear ODE).
Iterative LSR in PDE Solving: Combining LSR and nonlinear alignment accelerates convergence by $>10\times$ compared to standard PINN or TSONN approaches, with alternating steps effectively neutralizing both high- and low-frequency errors.
Classification (MNIST): On a $110$k-parameter CNN with random initialization, LSR reduces test error from $90\%$ to $4\%$ with a single application at $r\approx1000$ .

A summary table of physics-informed fine-tuning results is below (averaging over 300 instances):

Equation	Baseline Test Error	Post-LSR Test Error	Error Reduction
Linear ODE	$3.7\times10^{-3}$	$1.5\times10^{-5}$	$240\times$
Reaction–Diffusion	$5.6\times10^{-3}$	$5.7\times10^{-4}$	$10\times$
Burgers	$1.2\times10^{-2}$	$3.9\times10^{-4}$	$30\times$
Advection	$2.2\times10^{-2}$	$3.1\times10^{-3}$	$8\times$

6. Computational Aspects and Integration Guidelines

Complexity: Subspace construction requires $O((r+p)\cdot\text{Cost}(\text{JVP}))$ for $Y=J_0\Omega$ and $O(m(r+p)^2)$ for the small QR+SVD. Reduced-system solve costs $O(nr^2+r^3)$ , dominated by assembling $G_0V$ for large $n$ .
Memory: Storing $V_r$ uses $O(mr)$ memory; batch-LSR can reduce peak usage by iterating over data.
Rank Selection: Monitor refinement loss as a function of $r$ and select the highest rank before numerical issues arise; typical practical values are $r=100 – 2000$ .
Pipeline Integration: One-shot LSR can be applied post-convergence of standard neural network training, while Iterative LSR is compatible with operator-constrained and PDE-driven training by alternating with supervised alignment steps.
Implementation: Automatic differentiation frameworks (e.g., PyTorch, TensorFlow) are leveraged for Jacobian-vector products; standard linear algebra (LAPACK, NumPy) is used for small QR/SVD solves.

7. Theoretical Limitations and Research Directions

LSR yields a nonzero correction only when the pretrained parameters are not already stationary points for the residual (i.e., $G_0^T f_0 \ne 0$ ). In practice, such exact stationarity is rarely attained, hence LSR often leads to tangible gains.

Increasing subspace rank $r$ induces severe ill-conditioning beyond a problem-specific threshold. LSR fundamentally operates only on the local linearized residual, without altering the global nonlinear characteristics of the underlying network. Naïvely using the LSR correction $\delta^*$ as a full Gauss–Newton parameter update leads to breakdowns outside the linear regime; LSR is defined only as a mechanism for linear predictor refinement at a fixed $\theta_0$ .

Additional limitations include susceptibility to overfitting in low-sample or noisy regimes (necessitating cross-validation or regularization), and current applicability restricts to least-squares or similar objectives.

Open research directions include dynamic/adaptive rank selection, incorporation of damping or $\ell_2$ regularization in the subspace solve (e.g., Levenberg–Marquardt), integration with learned preconditioners, characterization of deep network Jacobian spectra, and extension to general classification losses beyond least-squares.

Summary

Linearized Subspace Refinement addresses loss-induced numerical ill-conditioning that limits the effectiveness of gradient-based neural network training. By marrying randomized subspace construction with direct reduced least-squares solvers, LSR reliably exploits the model's attainable accuracy within a convex local linearization, providing measurable and frequently order-of-magnitude function and operator error reductions while preserving compatibility with standard architectures and training pipelines (Cao et al., 20 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

A universal linearized subspace refinement framework for neural networks (2026)

Topic to Video (Beta)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Linearized Subspace Refinement (LSR).