LSR: Linearized Subspace Refinement
- LSR is a universal, architecture-agnostic framework that refines neural network predictions using a low-dimensional subspace derived from the Jacobian.
- It constructs a reduced least-squares problem via randomized range finding and SVD, enabling one-shot or iterative accuracy improvements without altering the underlying architecture.
- Empirical results demonstrate order-of-magnitude error reductions in supervised, operator, and physics-informed neural learning compared to conventional gradient methods.
Linearized Subspace Refinement (LSR) is a universal, architecture-agnostic framework designed for refining neural network predictions beyond the limits typically achieved by gradient-based optimization. LSR leverages the linearized residual model induced by the Jacobian at a fixed trained parameter state and solves a reduced least-squares problem within a data-driven low-rank subspace. This approach provides a tractable and numerically stable mechanism for substantial post-training or in-training accuracy improvement across supervised learning, operator learning, and physics-informed neural operator fine-tuning, without altering model architectures or loss formulations (Cao et al., 20 Jan 2026).
1. Problem Formulation and Core Methodology
LSR operates on a generic neural network predictor with a parameter vector and a residual vector , where the training objective is to minimize the squared norm . Typical choices for encompass residuals for supervised learning, operator learning, or physics-informed learning.
At a pretrained state , the first-order Taylor expansion provides
with . Direct solution of the full least-squares problem,
is intractable for large . LSR addresses this by restricting to a low-dimensional subspace: , where with and . The reduced problem is
with the refined predictor given by
where .
2. Subspace Construction and Linear Residual Modeling
To construct the subspace , LSR employs a randomized range-finding strategy based on the network-output Jacobian . The process involves:
- Drawing a Gaussian random matrix ,
- Computing using Jacobian-vector products,
- QR factorization of ,
- SVD on the reduced matrix to extract the dominant right singular vectors,
- Selecting the first columns of .
Restricting to this subspace yields a reduced least-squares problem in dimensions, efficiently solvable via direct methods (thin QR or normal equations). For typical , this procedure enables order-of-magnitude improvements in empirical accuracy.
The underlying assumption is that most actionable directions for residual minimization are captured in a tractable, low-rank local subspace of parameter perturbations, exposing accuracy not attainable by standard gradient-based training due to ill-conditioning.
3. One-Shot and Iterative LSR Algorithms
The LSR methodology encompasses two core algorithmic modes:
- One-Shot LSR: Performed as a post-processing step after standard training has converged. The algorithm consists of subspace identification, construction of a reduced system (using ), and solution via direct linear algebra to deliver a refined linear predictor.
- Iterative LSR: Designed for composite or operator-constrained objectives, such as those arising in PDE-constrained learning. The procedure alternates between one-shot LSR subspace corrections and supervised nonlinear alignment using standard optimizers (e.g., L-BFGS or Adam) to minimize alignment losses. This approach is particularly effective for physics-informed learning where residual minimization is combined with boundary and physical constraints.
High-level pseudocode for both one-shot and iterative variants explicitly specifies the sequence of subspace construction (randomized SVD, QR), system assembly (batching, Jacobian-vector products), and least-squares solution, as well as recommended practical choices for rank and oversampling parameter (Cao et al., 20 Jan 2026).
4. Numerical Conditioning and Trade-Offs Versus Gradient Training
Within convex quadratic regimes, the convergence rate of gradient or quasi-Newton methods is governed by the condition number of the Hessian . Ill-conditioning (large ) provokes slow or stalled convergence, frequently causing early plateaus in loss minimization.
Empirical results indicate that for supervised function fitting with modern neural networks, standard optimizers such as Adam can stall at MSE levels orders of magnitude above the solution attainable by direct reduced-subspace solutions. One-shot LSR achieves full machine-precision minimization in these cases, while even iterative solvers on the full linearized system plateau due to ill-conditioning. For operator learning and PINN fine-tuning, growing improves the LSR loss up to a threshold, beyond which numerical errors and subspace ill-conditioning dominate.
The practical selection of subspace rank is dictated by monotonic improvement in residual loss; rank is increased up to the point where this improvement is no longer observed.
5. Empirical Performance Across Applications
Experimental results in multiple regimes demonstrate the broad efficacy of LSR:
- Supervised Function Approximation: On a 2D sine target, Adam converges to test MSE , whereas one-shot LSR (with ) reduces this to . From random initialization, LSR still surpasses optimizer plateaus.
- Operator Learning (1D Burgers equation): Across DeepONet and MultiONet architectures, median test-error reduction factors post-LSR range from to , depending on nonlinearity choices.
- Physics-Informed Fine-Tuning: For 300 test instances each, error reduction ranges from (advection) to (linear ODE).
- Iterative LSR in PDE Solving: Combining LSR and nonlinear alignment accelerates convergence by compared to standard PINN or TSONN approaches, with alternating steps effectively neutralizing both high- and low-frequency errors.
- Classification (MNIST): On a $110$k-parameter CNN with random initialization, LSR reduces test error from to with a single application at .
A summary table of physics-informed fine-tuning results is below (averaging over 300 instances):
| Equation | Baseline Test Error | Post-LSR Test Error | Error Reduction |
|---|---|---|---|
| Linear ODE | |||
| Reaction–Diffusion | |||
| Burgers | |||
| Advection |
6. Computational Aspects and Integration Guidelines
- Complexity: Subspace construction requires for and for the small QR+SVD. Reduced-system solve costs , dominated by assembling for large .
- Memory: Storing uses memory; batch-LSR can reduce peak usage by iterating over data.
- Rank Selection: Monitor refinement loss as a function of and select the highest rank before numerical issues arise; typical practical values are .
- Pipeline Integration: One-shot LSR can be applied post-convergence of standard neural network training, while Iterative LSR is compatible with operator-constrained and PDE-driven training by alternating with supervised alignment steps.
- Implementation: Automatic differentiation frameworks (e.g., PyTorch, TensorFlow) are leveraged for Jacobian-vector products; standard linear algebra (LAPACK, NumPy) is used for small QR/SVD solves.
7. Theoretical Limitations and Research Directions
LSR yields a nonzero correction only when the pretrained parameters are not already stationary points for the residual (i.e., ). In practice, such exact stationarity is rarely attained, hence LSR often leads to tangible gains.
Increasing subspace rank induces severe ill-conditioning beyond a problem-specific threshold. LSR fundamentally operates only on the local linearized residual, without altering the global nonlinear characteristics of the underlying network. Naïvely using the LSR correction as a full Gauss–Newton parameter update leads to breakdowns outside the linear regime; LSR is defined only as a mechanism for linear predictor refinement at a fixed .
Additional limitations include susceptibility to overfitting in low-sample or noisy regimes (necessitating cross-validation or regularization), and current applicability restricts to least-squares or similar objectives.
Open research directions include dynamic/adaptive rank selection, incorporation of damping or regularization in the subspace solve (e.g., Levenberg–Marquardt), integration with learned preconditioners, characterization of deep network Jacobian spectra, and extension to general classification losses beyond least-squares.
Summary
Linearized Subspace Refinement addresses loss-induced numerical ill-conditioning that limits the effectiveness of gradient-based neural network training. By marrying randomized subspace construction with direct reduced least-squares solvers, LSR reliably exploits the model's attainable accuracy within a convex local linearization, providing measurable and frequently order-of-magnitude function and operator error reductions while preserving compatibility with standard architectures and training pipelines (Cao et al., 20 Jan 2026).