Structured Residual Reconstruction (SRR)

Updated 9 February 2026

SRR is a modular technique that improves recovery outcomes by decoupling baseline approximations from focused residual corrections.
It employs methods like deep networks, clustering, and low-rank modeling to address high-frequency errors and quantization artifacts.
Applications span stereo vision, medical CT, and LLM quantization, leading to significant improvements in metrics such as MAE, RMSE, and perplexity.

Structured Residual Reconstruction (SRR) is a class of techniques for signal, image, and model recovery that enhances an initial coarse estimate by explicitly modeling, learning, or optimizing over structured residuals. Established across domains—from computer vision and medical imaging to LLM compression—SRR frameworks share the principle of decoupling “baseline” approximation from a subsequent, often learnable, residual correction step structured by domain knowledge or statistical priors. This modularity yields improvements in reconstruction accuracy, efficiency, and interpretability, while allowing seamless integration of classical methods, deep networks, and optimization-based modeling.

1. Mathematical Principles and Common Structure

SRR is characterized by a two-stage formulation:

1. Baseline Approximation: An initial solution is computed using traditional, analytical, or coarse algorithms, yielding a prediction $x^{(0)}$ (state, depth map, weight, or image patch).

Structured Residual Correction: A parameterized function—often a neural network, learned transform, or low-rank matrix—is trained or optimized to model and add the residual $\Delta x$ that corrects remaining errors: $x^{(1)} = x^{(0)} + \Delta x$ .

Formally, in many instantiations, the refined reconstruction is

$x^{(1)} = x^{(0)} + f_\theta(\text{inputs})$

where $f_\theta$ captures the structure of differences between the baseline and ground truth, and often leverages the baseline itself in its input domain.

This bifurcation simplifies learning or optimization by focusing model capacity exclusively on correcting 'difficult' local artifacts, high-frequency errors, or quantization-induced distortions that escape classical pipelines.

2. SRR in Stereo Depth: ResDepth Framework

In stereo reconstruction, SRR is instantiated in the ResDepth approach (Stucker et al., 2020). Given two rectified images $I_1$ and $I_2$ , with known camera intrinsics and pose, an initial depth map $D^{(0)}$ is produced via a classical method (e.g., semi-global matching). The secondary view $I_2$ is then backwards-warped into the $I_1$ frame based on $D^{(0)}$ using a projective warping operator $W[I_2; D^{(0)}]$ . A compact U-Net $f_\theta$ is trained to regress the per-pixel residual:

$\Delta D = f_\theta(I_1, W[I_2; D^{(0)}], D^{(0)}),$

and the refined depth is $D^{(1)} = D^{(0)} + \Delta D$ .

The network is supervised using an $L_1$ loss relative to ground-truth depth $D^*$ :

$L(\theta) = \sum_{u \in \Omega} | D^{(1)}(u) - D^*(u) |.$

Iterative refinement is enabled by re-warping $I_2$ using the latest $D^{(t)}$ and repeatedly applying $f_\theta$ , yielding updates $D^{(t+1)} = D^{(t)} + f_\theta(I_1, W[I_2; D^{(t)}], D^{(t)})$ .

Architecturally, the U-Net is lightweight, purely learns the residual, and uses deep skip connections, ensuring prediction is always local with respect to the baseline estimate. Quantitative results demonstrate that a single SRR pass reduces mean absolute error (MAE) in satellite stereo from 2.81 m to 1.11 m, and further refinement yields marginal additional gain; similar reductions are observed in ETH3D indoor stereo (Stucker et al., 2020).

3. Multi-layer and Clustered Residual Modeling in Medical Imaging

In medical CT reconstruction, SRR manifests as multi-layer clustering-based residual sparsifying transform (MCST) learning (Yang et al., 2022). Here, the goal is to reconstruct high-quality images from low-dose, noisy X-ray projections.

The MCST model decomposes each image patch across $L$ sequential layers. In layer $\ell$ , input residual patches $\{ r_{\ell, i} \}$ are clustered ( $K_\ell$ ) and each cluster is assigned a unitary transform $\Omega_{\ell, k}$ , leading to sparse code $z_{\ell, i}$ via hard thresholding:

$z_{\ell, i} = H_{\eta_\ell}(\Omega_{\ell, k(i, \ell)} r_{\ell, i} - b_i^{\ell \leftarrow s})$

and the next-layer residual is recursively defined as $r_{\ell+1,i} = \Omega_{\ell,k(i,\ell)} r_{\ell,i}-z_{\ell,i}$ .

Inference proceeds by iteratively minimizing a penalized weighted least squares (PWLS) cost regularized by the MCST representation:

$\min_{x \ge 0} \frac{1}{2} \| y - Ax \|_W^2 + \beta S(x),$

with $S(x)$ enforcing sparsity in layered transform domains.

Layering enables the model to successively extract signal content and isolate structured residuals; clustering adapts transforms to local features, enhancing recovery of subtle anatomical detail. Empirically, two MCST layers with $K_\ell = 5$ clusters achieve up to 20% reductions in RMSE and substantial SSIM improvement over classical and recent learned methods such as FBP, PWLS-EP, PWLS-ULTRA, and MARS, particularly for edge and vessel recovery (Yang et al., 2022).

4. SRR for Quantization Error Reconstruction in LLMs

SRR has been extended to post-training quantization (PTQ) of LLMs in the Preserve-Then-Quantize framework (Cho et al., 2 Feb 2026). Standard quantization error reconstruction (QER) approximates a weight matrix $W \in \mathbb{R}^{m \times n}$ as $W \approx Q + LR$ , where $Q = \mathcal{Q}(W)$ is a quantized copy and $LR$ is a low-rank, trainable correction of rank $r$ .

SRR introduces a rank allocation strategy: the leading $k$ singular modes of the activation-scaled weight matrix $S W$ (where $S$ is derived from activation statistics) are preserved as $W_k = S^{-1} U_k \Sigma_k V_k^\top$ and never quantized, guaranteeing that the most informative structures survive. Only the residual $\Delta W = W - W_k$ is quantized, and the induced quantization error is reconstructed with a rank- $(r-k)$ correction.

The optimal rank split $k^\star \le r$ balances preservation and reconstruction by minimizing the surrogate

$\rho_k(SW)\, \rho_{r-k}(SE)$

where

$\rho_p(A) = 1 - \frac{\sum_{j=1}^p \sigma_j(A)^2}{\|A\|_F^2}$

is the unrecoverable energy ratio for the top $p$ singular values, and $E$ is a random probe for quantization effects. This criterion is computationally efficient and empirically stable.

This SRR decomposition natively supports quantized parameter-efficient fine-tuning (QPEFT), where only $LR$ is trainable while $Q$ is fixed. Gradient scaling is applied to limit updates in the preserved subspace, safeguarding dominant model capacity. Benchmarks demonstrate that SRR consistently lowers perplexity and boosts accuracy over standard QER (e.g., $13.51 \to 11.22$ perplexity on LLaMA-2 7B at $r=32$ ; 5.9 percentage points GLUE gain under 2-bit QPEFT), particularly in aggressive (2–3 bit) quantization scenarios (Cho et al., 2 Feb 2026).

5. Algorithmic Schemes and Learning Procedures

Across domains, SRR implementations vary by application but share explicit computational stages:

ResDepth: U-Net is trained with Adam (learning rate $1 \times 10^{-5}$ , weight decay $1\times 10^{-5}$ ), with no auxiliary photometric or smoothness losses. Warping is fully differentiable, enabling iterative updates.
MCST in CT: Model is trained with block-coordinate descent (500–1,000 passes), alternating between patch clustering, sparse coding (hard thresholding), and orthogonal Procrustes unitary update. Reconstruction iterates between image update (relaxed LALM steps), cluster and code updates. Typical hyperparameters are $L=2$ , $K_\ell=5$ , with thresholds $\eta_\ell \sim (80,60)$ in learning and corresponding $\gamma_\ell \sim \eta_\ell/2$ in inference.
SRR in PTQ/QPEFT: Algorithm samples a single random probe, computes spectral energy ratios, and chooses $k^\star$ with minimal loss surrogate. Preserved singular vectors and quantized residuals yield $Q$ and $(L,R)$ . Gradient scaling or singular-gradient projection (SGP) can be optionally employed in fine-tuning.

6. Quantitative Performance and Domain Impact

SRR consistently improves accuracy and reconstruction quality across modalities:

Domain	Baseline	SRR Variant	Metric	Baseline Value	SRR Value	Relative Gain
Satellite stereo	SGM DEM	ResDepth	MAE	2.81 m	1.11 m	>50% reduction
Indoor stereo	COLMAP PatchMatch	ResDepth	MAE/RMSE	0.35m/1.13m	0.15m/0.57m	>50% MAE reduction
Low-dose CT (XCAT)	FBP	PWLS-MCST2	RMSE/SSIM	26 HU/0.82	12 HU/0.95	RMSE↓54%, SSIM↑0.13
CT (Mayo Clinic)	FBP	PWLS-MCST2	RMSE/SSIM	30 HU/0.78	15 HU/0.92	RMSE↓50%, SSIM↑0.14
LLM PTQ	QERA-exact	SRR	Perplexity	14.51	11.22	27.1% reduction
LLM QPEFT (2-bit)	QERA	SRR	GLUE Average	72.51%	78.43%	+5.9 percentage points

These improvements are realized without heavy architectural modifications, auxiliary losses, or prohibitive computational overhead. When stacking more residual layers (e.g., MCST3 in CT), marginal improvements continue but may saturate.

7. Limitations, Sensitivity, and Extensions

While SRR schemes offer empirical robustness and generality, they rely on certain modeling assumptions:

In quantization error reconstruction, the surrogate for optimal rank split assumes constant relative quantization noise and statistical similarity to random matrix spectral decay. Deviation from these can degrade results (Cho et al., 2 Feb 2026).
SRR typically allocates a global $k$ per layer; finer-grained splits (e.g., per subblock or head) may unlock further gains.
Medical imaging SRR relies on well-chosen patch, clustering, and threshold parameters; excessive layering can have diminishing returns.
All frameworks require an initial, sufficiently high-quality baseline; pathological failure of the first stage can limit maximum achievable accuracy.

Potential extensions include dynamic, data-adaptive rank splits, application to non-uniform and mixed precisions, and deeper integration with other parameter-efficient optimization and fine-tuning methods.

SRR thus unifies a family of modular enhancement techniques for complex reconstruction and compression tasks, demonstrating consistent efficacy across vision, medical imaging, and LLM domains by structuring the recovery of residual errors through learned, clustered, or low-rank models. Recent studies have established SRR as a reliable paradigm for leveraging classical methods and modern learning frameworks in tandem, with significant quantitative advances in accuracy and practical deployment (Stucker et al., 2020, Yang et al., 2022, Cho et al., 2 Feb 2026).