Orthogonal Residuals in Regression & ML

Updated 10 May 2026

Orthogonal Residuals are components derived by decomposing data to be orthogonal to a model's subspace, isolating unexplained variance.
They underpin methods in OLS, total least squares, and Procrustes problems, ensuring robust estimation and preservation of geometric structure.
Efficient computation via QR and SVD factorizations enables practical extraction of orthogonal residuals in high-dimensional applications.

Orthogonal residuals are a class of residuals arising from the orthogonal decomposition of model or data representations, where the error or deviation is mathematically constrained to be orthogonal to a specified subspace or transformation. This concept is foundational across regression analysis, machine learning, signal processing, and model merging, enabling robust estimation, geometric preservation, and principled decomposition of information.

1. Mathematical Formulations and Core Properties

Orthogonal residuals arise when a residual vector or matrix is constructed to be orthogonal to a particular subspace defined by a model, constraint, or set of features. The archetypal example is in least squares regression, where the residual $r = b - Ax$ is by construction orthogonal to the column space of $A$ , i.e., $A^\top r = 0$ (Grcar, 2010). This orthogonality ensures that all variation explained by $A$ is projected away, isolating the unexplained component.

In modern extensions, such as orthogonal model merging, the process involves decomposing a weight update $W_f - W_0$ into an orthogonal transformation $Q^\star$ (solving an orthogonal Procrustes problem) and a residual $R = W_f - W_0 Q^\star$ satisfying Frobenius-orthogonality with respect to the rotated base $W_0 Q^\star$ (Yang et al., 5 Feb 2026). Such constructions generalize seamlessly to block-wise, layer-wise, or even infinite-dimensional contexts, as in orthonormal expansions of function spaces for residual-stress fields (Tiwari et al., 2024).

The key characteristic is that the orthogonal residual represents the unexplained or unmodeled component that is strictly orthogonal—under an appropriate inner product or geometric criterion—to the explanatory subspace.

2. Orthogonal Residuals in Classical and Generalized Regression

In linear regression, orthogonal residuals are the classical ordinary least squares (OLS) residuals. If $X \in \mathbb{R}^{n \times p}$ is of full rank, the OLS residual $r = y - X\hat{\beta}$ is orthogonal to $A$ 0, with $A$ 1 the orthogonal projector. Under canonical assumptions, the residual sum of squares $A$ 2 admits a $A$ 3 law by Cochran’s theorem (Elton et al., 2022).

In orthogonal regression (total least squares, TLS, or “errors-in-variables”), orthogonal residuals are defined as the minimal orthogonal deviations in the augmented space $A$ 4, corresponding to the shortest perpendicular distance from each data point to the solution hyperplane (Aishima, 2023, Dragović et al., 2022). This setting demands that the residual be orthogonal to both the predictor and response errors, often leading to eigenproblems with explicit orthogonal projectors and Ritz-vector solutions. The geometric interpretation is deeply tied to inertia operators, Jacobi coordinates, and the geometry of confocal quadrics (Dragović et al., 2022).

In partial least squares (PLS) regression, residuals after $A$ 5 PLS components satisfy $A$ 6, being orthogonal to the span of the latent score vectors (Blazère et al., 2014). These residuals admit explicit expressions in terms of the spectrum of $A$ 7 and polynomials orthogonal with respect to problem-specific weightings.

3. Orthogonal Residuals in Machine Learning and Model Fusion

Orthogonality-based approaches underpin robust estimation in Double Machine Learning (DML). A Neyman-orthogonal moment function is one whose expectation is first-order insensitive to nuisance parameter estimation error, which is achieved by writing moments in terms of appropriately constructed orthogonal residuals (Mackey et al., 2017). For partially linear regression, residuals such as $A$ 8 and $A$ 9 enter the moment function in a way that the influence of nuisance estimation vanishes to first order. Higher-order orthogonality extends this insensitivity to slower convergence rates of nuisance regressors, provided treatment residuals are non-Gaussian.

In model merging, particularly in LLMs, the Orthogonal Model Merging framework performs a decoupling of weight adaptation into orthogonal (rotation) components and residual (Euclidean) components (Yang et al., 5 Feb 2026). The orthogonal component, derived via the orthogonal Procrustes solution, is merged on the manifold SO( $A^\top r = 0$ 0) using Karcher means or Lie algebra averaging, while the residuals are additively combined. This achieves geometric structure preservation and mitigates catastrophic forgetting.

4. Applications in Signal Processing, Representation Learning, and Beyond

Orthogonal residual decompositions are instrumental in subspace and representation learning. In multimodal imaging, decomposing the error $A^\top r = 0$ 1 into a component orthogonal to the feature subspace (e.g., MRI-explainable) allows one to characterize the irreducible, modality-specific information carried in another domain (e.g., PET uptake) (Adomeit et al., 8 Apr 2026). The orthogonality is enforced via projection penalties (e.g., $A^\top r = 0$ 2 after SVD decomposition), ensuring that the final orthogonal residual is supported only outside the span of the explanatory features.

In large-scale approximate nearest neighbor search, especially for maximum inner product search, local orthogonal decomposition (LOD) splits the residual vector $A^\top r = 0$ 3 into two orthogonal components aligned with high variance direction and its orthogonal complement (Wu et al., 2019). Each part is quantized separately, tailoring bitrate allocation to minimize inner product errors under a fixed budget.

Orthogonal expansions of residual-stress fields (Tiwari et al., 2024) provide a rigorous framework in continuum mechanics for constructing bases in which each function is orthogonal in $A^\top r = 0$ 4 and $A^\top r = 0$ 5 norms, enabling efficient representation of general residual-stress states and best finite-dimensional approximations with controlled error.

5. Algorithms, Computational Aspects, and Practical Considerations

Stable computation of orthogonal residuals leverages orthogonal factorizations (QR with Householder reflections) to produce explicit bases for orthocomplement spaces (Elton et al., 2022). This enables algorithmic extraction of independent, identically distributed residuals in regression, facilitating exact hypothesis testing and variance estimation.

For model merging, Procrustes-based decomposition is accomplished via SVD, leading to closed-form solutions for the optimal rotation, with the associated residual provably orthogonal to the rotated base (in Frobenius norm) (Yang et al., 5 Feb 2026). Merging on the orthogonal group SO( $A^\top r = 0$ 6), either by Karcher mean or Lie-algebra averaging, maintains intrinsic geometric invariants, critical for deep model fusion.

In subspace projection, SVD-based thin projections or orthogonal projectors (e.g., $A^\top r = 0$ 7 after SVD of $A^\top r = 0$ 8) enable efficient computation of the decomposition into parallel and orthogonal parts (Adomeit et al., 8 Apr 2026).

Several sources of ill-conditioning or instability in orthogonal residuals are identified (Grcar, 2010): poor conditioning of the explanatory matrix, near-collinearity of the observed vector with the subspace, and dominance of low-singular value directions can exacerbate sensitivity.

6. Theoretical and Geometric Insights

The geometry of orthogonal residuals extends to pencils of confocal quadrics and Jacobi coordinate systems (Dragović et al., 2022), providing a unified framework for restricted regression, inertia-based projection, and explicit test statistics for assessing the containment of points in best-fit subspaces. Orthogonal polynomial theory rigorously connects PLS residuals to spectral properties of the design matrix, with explicit residual polynomials and moment-based formulae dictating their decay and statistical properties (Blazère et al., 2014).

In continuum mechanics, the extremizing of quadratic functionals on divergence-free stress fields yields complete orthonormal sequences, where each function is orthogonal to all prior ones with respect to prescribed inner products, offering a systematic, physically informed approach for parameterizing complex residual stress states (Tiwari et al., 2024).

7. Impact, Limitations, and Future Directions

The orthogonality property of residuals is central to achieving invariance, robustness, and interpretability in high-dimensional statistics, model fusion, and function approximation. It enables decoupling of error or adaptation into interpretable, geometry-preserving, and computationally tractable components. Emerging applications—model merging on Riemannian manifolds, representation disentanglement in multimodal data, and robust inferential methods in machine learning—all critically leverage orthogonal residuals for principled solution structure and statistical guarantees.

Future research directions include generalization of orthogonal residual construction to non-Euclidean geometries, adaptive and data-driven determination of decomposing subspaces (e.g., for LOD in vector quantization), improved theory for condition numbers in complex models, and broader deployment of orthonormal sequence expansions in diverse physical and engineering sciences.

Key references: (Grcar, 2010, Elton et al., 2022, Aishima, 2023, Yang et al., 5 Feb 2026, Adomeit et al., 8 Apr 2026, Tiwari et al., 2024, Mackey et al., 2017, Dragović et al., 2022, Blazère et al., 2014, Wu et al., 2019)