Linear-Residual Decomposition Techniques

Updated 1 May 2026

Linear-residual decomposition is a technique that partitions high-dimensional data into a linear fit and a residual component for explicit error analysis.
It is applied in regression, operator theory, and signal processing to improve interpretability and predictive accuracy by isolating explainable structures.
Advanced methods like regression-aware PCA, dynamic mode decomposition, and recursive frameworks refine residual extraction to boost robustness in complex models.

Linear-residual decomposition refers broadly to representing high-dimensional data, signals, operators, or model outputs as the sum of an explicitly structured “linear” component and an explicit “residual” component, often with the residual itself further structured or minimized. This paradigm appears across linear algebra, operator theory, statistical learning, time-series modeling, and signal processing, where it provides interpretability, error quantification, and improved algorithmic performance. The core principle is to isolate the component that is “explained” by a given model or linear operation, orthogonally decompose the remainder, and analyze or use the residual explicitly.

1. Fundamental Principles: Fit-plus-Residual Decomposition

Central to linear-residual decomposition is the explicit partition

$\text{Data} = \text{Linear fit} + \text{Residual}$

with the linear fit minimizing a normed discrepancy. In the canonical regression scenario with design matrix $A\in\mathbb R^{m\times p}$ and response $B\in\mathbb R^{m\times n}$ , the least-squares solution

$X^* = \arg\min_{X \in \mathbb{R}^{p\times n}} \|AX - B\|$

yields the orthogonal splitting

$B = AX^* + R,\quad X^* = A^\dagger B, \quad R = (I - AA^\dagger) B$

where $A^\dagger$ is the Moore–Penrose pseudoinverse and $\|\cdot\|$ a unitarily invariant norm (Frobenius or operator) (Tygert, 2017). By construction, $AX^*$ is the orthogonal projection of $B$ onto the column space of $A$ , and $A\in\mathbb R^{m\times p}$ 0 is uniquely characterized as the minimal norm residual. This decomposition generalizes to Hilbert-space operator theory, where the defect identity $A\in\mathbb R^{m\times p}$ 1 (with $A\in\mathbb R^{m\times p}$ 2) yields a telescoping account of the “principal” and “residual” components under iterative application (Jorgensen et al., 26 Jan 2026).

2. Regression-aware and Supervised Decomposition Methods

Linear-residual structure is leveraged in advanced supervised matrix approximation and feature extraction schemes. Regression-aware interpolative decomposition (RAID) and regression-aware PCA (RAPCA) improve on classical ID/PCA by projecting $A\in\mathbb R^{m\times p}$ 3 with a regression-informed operator

$A\in\mathbb R^{m\times p}$ 4

yielding a “supervised” matrix $A\in\mathbb R^{m\times p}$ 5. Column selection or SVD is then performed on $A\in\mathbb R^{m\times p}$ 6, focusing approximation or dimensionality reduction on components most relevant for the predictive model: $A\in\mathbb R^{m\times p}$ 7 where $A\in\mathbb R^{m\times p}$ 8 are selected columns of $A\in\mathbb R^{m\times p}$ 9 and $B\in\mathbb R^{m\times n}$ 0 is an interpolation matrix. The error in reconstructing the regression fit is explicitly bounded by the residual in the supervised space: $B\in\mathbb R^{m\times n}$ 1 This approach ensures that retained components are those predictive under $B\in\mathbb R^{m\times n}$ 2, suppressing directions orthogonal to the regression (Tygert, 2017).

3. Linear-residual Frameworks in Operator Theory and Iterative Algorithms

Beyond finite matrices, operator-theoretic treatments use defect and telescoping identities for precise accounting of residual energy under iterative linear operations: $B\in\mathbb R^{m\times n}$ 3 so that the difference between the initial norm and the iterated signal is the accumulated residual (error) (Jorgensen et al., 26 Jan 2026). Under mild “admissibility” (summability) of step-size parameters, recursive projection or Kaczmarz-type methods converge strongly, with explicit control of the residual decay. This calculus generalizes directly to kernel methods in RKHS, yielding closed-form, non-spectral residual error representations in greedy kernel PCA, kernel interpolation, and machine-learning tasks (Jorgensen et al., 26 Jan 2026).

An alternative residual-weighted approach for positive (semi-)definite operators employs monotone sequences of congruence-compressed residuals: $B\in\mathbb R^{m\times n}$ 4 with $B\in\mathbb R^{m\times n}$ 5. The limit

$B\in\mathbb R^{m\times n}$ 6

is a canonical telescoping decomposition, and, when exhaustion occurs ( $B\in\mathbb R^{m\times n}$ 7), $B\in\mathbb R^{m\times n}$ 8 forms a Parseval frame (Tian, 28 Nov 2025).

4. Extensions: Residuals in Structured Statistical Models

Probabilistic modeling and high-dimensional inference commonly use linear-residual decompositions, especially as generalizations of PCA and factor models. Residual Component Analysis (RCA) considers data generated as

$B\in\mathbb R^{m\times n}$ 9

with known or parameterized $X^* = \arg\min_{X \in \mathbb{R}^{p\times n}} \|AX - B\|$ 0 representing explained (“residual”) variance from structured sources (e.g., graphical, temporal, or blockwise models). The maximal-likelihood fit seeks to decompose

$X^* = \arg\min_{X \in \mathbb{R}^{p\times n}} \|AX - B\|$ 1

by solving the generalized eigenproblem

$X^* = \arg\min_{X \in \mathbb{R}^{p\times n}} \|AX - B\|$ 2

and retaining directions $X^* = \arg\min_{X \in \mathbb{R}^{p\times n}} \|AX - B\|$ 3 with $X^* = \arg\min_{X \in \mathbb{R}^{p\times n}} \|AX - B\|$ 4 (variances exceeding the background). In the EM/RCA extension, both low-rank and structured sparse-inverse residuals are learned variationally, facilitating the separation of latent confounders and context-dependent noise (Kalaitzis et al., 2012).

In financial time series, principal component analysis (PCA) isolates market-wide linear components, but residuals are further orthogonalized via constrained graphical models (MTP2-GGM) to yield residual factors with improved decorrelation, leading to better risk-adjusted portfolio construction and more interpretable idiosyncratic factors (Watanabe et al., 5 Feb 2026).

5. Decomposition of Score Residuals: Input-adaptive Post-hoc Models

In modern post-hoc classification calibration, especially for long-tailed or imbalanced problems, linear residual decomposition is used to analyze and correct score biases. The residual between Bayes-optimal and base-model logits on a shortlist of classes decomposes as

$X^* = \arg\min_{X \in \mathbb{R}^{p\times n}} \|AX - B\|$ 5

where $X^* = \arg\min_{X \in \mathbb{R}^{p\times n}} \|AX - B\|$ 6 is a classwise offset and $X^* = \arg\min_{X \in \mathbb{R}^{p\times n}} \|AX - B\|$ 7 is a pairwise, context-dependent correction. If $X^* = \arg\min_{X \in \mathbb{R}^{p\times n}} \|AX - B\|$ 8, a global logit adjustment suffices; otherwise, context-sensitive (e.g., pairwise linear) corrections are required to recover Bayes-optimal decisions. The REPAIR approach parameterizes and learns this decomposition, with empirical results confirming substantial gains on non-class-separable regimes (Wang et al., 2 Apr 2026).

6. Linear-residual Methods in Signal Processing and Dynamical Systems

Direct-residual subspace decomposition via generalized singular value decomposition (GSVD) is deployed to separate direct (salient) from residual (diffuse/late/noise) signal components in multichannel acoustic impulse data. Given data window $X^* = \arg\min_{X \in \mathbb{R}^{p\times n}} \|AX - B\|$ 9 and residual estimate $B = AX^* + R,\quad X^* = A^\dagger B, \quad R = (I - AA^\dagger) B$ 0, GSVD finds a common right basis and assigns large generalized singular values (GSVs) to components attributed to the direct signal; the rest is the residual. Detection thresholds and blockwise adaptive updates yield robust, interpretable source separation for spatial acoustics, outperforming subtraction-based approaches (Deppisch et al., 2022).

In dynamical systems, Dynamic Mode Decomposition (DMD) and its recent enhancement ResDMD explicitly quantify residual error between candidate Koopman modes and true dynamics by computing the squared projection residual

$B = AX^* + R,\quad X^* = A^\dagger B, \quad R = (I - AA^\dagger) B$ 1

for operators $B = AX^* + R,\quad X^* = A^\dagger B, \quad R = (I - AA^\dagger) B$ 2 and eigenfunctions $B = AX^* + R,\quad X^* = A^\dagger B, \quad R = (I - AA^\dagger) B$ 3, using data-driven matrix approximations. Modes with large residuals are rejected, ensuring only “verified” dynamic structures are retained, enabling robust extraction of coherent patterns even in turbulent or stochastic regimes, with all decompositions supplied with explicit a posteriori error bounds (Colbrook et al., 2022, Colbrook et al., 2023). In stochastic systems, the total residual naturally splits into deterministic projection error and variance components, leading to “variance-pseudospectra” for precise quantification of statistical uncertainty (Colbrook et al., 2023).

7. Advanced Nonlinear and Recursive Linear-residual Decomposition

Recent advances in deep time-series forecasting utilize recursive, multi-level linear-residual decomposition to explicitly extract sequences of linear and nonlinear patterns. In the LiNo framework, alternated blocks extract all linear modes (via AR-convolutional “Li blocks”) and key nonlinear patterns (via “No blocks” that fuse time, frequency, and inter-series dependencies) recursively: $B = AX^* + R,\quad X^* = A^\dagger B, \quad R = (I - AA^\dagger) B$ 4 This recursion enables robust separation of complex, entangled signal modes, yielding more accurate forecasts and interpretability in both multivariate and univariate benchmarks. Ablations confirm that both recursive depth and explicit linear-residual splitting are crucial for performance (Yu et al., 2024).

Across all these domains, the linear-residual decomposition framework delivers not only algorithmic and statistical benefits but also clarifies the “explainable” and “unexplained” structure inherent in data, operators, or models, often providing direct means for control, error analysis, or further domain-specific model refinement.