SA-ResGS: Self-Augmented Residual 3D Gaussian Splatting

Updated 13 January 2026

The paper introduces a novel residual learning mechanism combined with self-augmented points to enhance uncertainty quantification and NBV selection.
It employs a 3D Gaussian splatting representation to model complex scenes, enabling robust active scene reconstruction.
Empirical evaluations demonstrate superior reconstruction quality and stability over baselines on multiple datasets.

Self-Augmented Residual 3D Gaussian Splatting (SA-ResGS) is a framework designed to enhance the stability of uncertainty quantification and facilitate uncertainty-aware supervision in next-best-view (NBV) selection for active scene reconstruction. The method simultaneously improves the reliability of uncertainty estimates and their effectiveness in guiding supervision by introducing both a novel residual learning mechanism for 3D Gaussian Splatting and a physically grounded view selection scheme based on self-augmented synthetic observations. The approach addresses instability caused by under-supervised Gaussians, particularly prominent in sparse and wide-baseline scenarios, and achieves superior reconstruction quality and robustness in NBV planning relative to contemporary baselines (Jun-Seong et al., 6 Jan 2026).

1. Scene Representation and Self-Augmented Points

SA-ResGS employs a 3D Gaussian Splatting representation for scenes, where the space is modeled as a collection of $N$ Gaussians:

$G_i = (\mu_i, \Sigma_i, w_i, c_i)$

with $\mu_i \in \mathbb{R}^3$ as the center, $\Sigma_i \in \mathbb{R}^{3\times 3}$ the anisotropic covariance, $w_i \in \mathbb{R}$ the density weight, and $c_i \in \mathbb{R}^3$ the color. Differentiable splatting and alpha compositing facilitate rendering.

To enhance coverage estimation in NBV selection, SA-ResGS introduces Self-Augmented Points (SA-Points), generated as follows:

Select a reference view $I_r$ with pose $P_r$ and generate an extrapolated pose $P_e$ by perturbing the translation.
Render the extrapolated image $I_e = \text{Render}(\{G_i\}, P_e)$ .
Use MASt3R to predict dense correspondences $\{(p_r^k, p_e^k)\}$ .
For each match, triangulate $X^k = \arg\min_X \left[ \|p_r^k - \pi(P_r X)\|^2 + \|p_e^k - \pi(P_e X)\|^2 \right]$ , retaining $X^k$ only if the reprojection error $\epsilon^k$ is below a threshold $\tau$ .
Aggregate and hash these points into a voxel grid, producing a binary occupancy map for coverage-driven NBV selection.

This self-augmented occupancy mechanism explicitly guides view selection by physical scene coverage, reducing the risk of uncovered regions.

2. Residual Supervision for 3D Gaussians

The framework addresses the “vanishing gradient” problem for Gaussians of low opacity or large scale. Residual supervision is implemented by:

Partitioning the set of Gaussians $\mathcal{G}$ into a supervision subset $\mathcal{G}_{\text{sup}} = \mathcal{G}_{\text{rand}} \cup \mathcal{G}_{\text{uncertain}}$ , where $\mathcal{G}_{\text{rand}}$ is a random $\alpha \%$ (e.g., $\alpha=90$ ) and $\mathcal{G}_{\text{uncertain}}$ consists of the top $\beta$ Gaussians by uncertainty (e.g., $\beta=10$ ).
Uncertainty $U_i$ per Gaussian is estimated via opacity and spatial spread:

$U_i = (1-\alpha_i) + \frac{\operatorname{trace}(\Sigma_i)}{\text{max\_trace}}$

For each view, two images are rendered: $I_{\text{full}} = \text{Render}(\mathcal{G}, P)$ ; $I_{\text{sup}} = \text{Render}(\mathcal{G}_{\text{sup}}, P)$ . The aggregate loss is

$L = \lambda_{\text{full}} [\|I_{\text{full}}-I_{\text{gt}}\|_1 + L_{\text{ssim}}(I_{\text{full}}, I_{\text{gt}})] + \lambda_{\text{sup}} [\|I_{\text{sup}}-I_{\text{gt}}\|_1 + L_{\text{ssim}}(I_{\text{sup}}, I_{\text{gt}})]$

with $\lambda_{\text{full}} = \lambda_{\text{sup}}=0.5$ .

Additionally, uncertainty-weighted sampling, in the spirit of hard-negative mining, can supplement the primary loss:

$L_{\text{sample}} = -\frac{1}{|\mathcal{G}_{\text{uncertain}}|} \sum_{G_i \in \mathcal{G}_{\text{uncertain}}} U_i$

In practice, the two-image residual loss is sufficient.

3. Uncertainty Quantification Mechanisms

SA-ResGS quantifies uncertainty per Gaussian both post hoc—using a Laplacian approximation (FisherRF)—and via a proxy based on opacity and spread:

$U_i = w_{\text{op}} (1-\alpha_i) + w_{\text{scale}} \left(\frac{\operatorname{trace}(\Sigma_i)}{3}\right)$

This per-Gaussian uncertainty correlates with rendering error and allows real-time estimation.

In NBV planning, pixel-wise uncertainty maps and view-aggregated uncertainty $U_j$ inform candidate view scoring. In residual supervision, the most uncertain Gaussians are directly targeted for amplified supervision.

4. Physically Guided Next-Best-View Selection

SA-ResGS’s NBV strategy is physically grounded, prioritizing efficient and uniform scene coverage:

The scene’s bounding box is voxelized; voxels containing SA-Points form the observed set $V_{\text{obs}}$ , which is dilated for robustness.
For candidate camera $j$ , the visible voxels $V_{\text{cand}}^{(j)}$ are determined; both $V_{\text{obs}}$ and $V_{\text{cand}}^{(j)}$ are hash-encoded as binary occupancy codes.
The normalized Hamming distance

$d_j = \frac{1}{K}\|b_{\text{obs}} \oplus b_{\text{cand}}^{(j)}\|_1$

identifies coverage novelty. The top $N\%$ by $d_j$ are retained for fine-grained scoring.

Within the narrowed set, the final NBV is selected by maximizing a weighted combination of coverage novelty and uncertainty:

$\text{NBV} = \arg\max_{j \in \mathcal{C}'} \left[ d_{\text{balance}} \cdot d_j + d_{\text{uncert}} \cdot U_j \right]$

with $d_{\text{balance}} \gg d_{\text{uncert}}$ early in acquisition to favor coverage.

5. Algorithmic Implementation

The joint training and NBV planning loop is orchestrated as follows:

Algorithm Train SA-ResGS:
Input: initial Gaussians G, initial views V_train, candidate set V_all 
for iter=1…MaxIters do
  Sample batch of training views v∈V_train
  for each v:
    I_full ← Render(G, v)
    Compute per-G_i uncertainty U_i
    G_rand ← random α% of G
    G_unc ← top-β of G by U_i
    G_sup ← G_rand ∪ G_unc
    I_sup ← Render(G_sup, v)
    L ← λ_full[‖I_full–I_gt(v)‖₁+L_ssim] + λ_sup[‖I_sup–I_gt(v)‖₁+L_ssim]
    Backpropagate L, update G
  if iter mod T_nbv == 0 and |V_train|<MaxViews:
    SA-Points ← GenerateSA-Points(G, last view added)
    v_nbv ← SelectNBV(G, SA-Points, V_all–V_train)
    V_train ← V_train ∪ {v_nbv}
end for

The NBV selection function follows the coverage and uncertainty-guided scoring described previously, ensuring both robust sample efficiency and improved scene completeness.

6. Empirical Evaluation and Ablation Analysis

SA-ResGS is benchmarked on Mip-NeRF-360, NeRF-Synthetic, and an extended Tanks & Temples dataset. For active view selection with 20 views (averaged over four seeds), SA-ResGS outperforms random, ACP, and FisherRF baselines in both PSNR, SSIM, and LPIPS metrics.

Dataset	Method	PSNR↑	SSIM↑	LPIPS↓
Mip-NeRF360	Random	19.97	0.584	0.456
	ACP	20.33	0.596	0.449
	FisherRF	20.64	0.595	0.450
	Ours (SA-ResGS)	21.41	0.613	0.451
NeRF-Synth	Random	24.85	0.893	0.117
	FisherRF	25.19	0.892	0.116
	Ours	26.58	0.907	0.110
Extended	Random	18.92	0.694	0.390
	FisherRF	19.46	0.710	0.381
	Ours	20.06	0.722	0.377

Ablation studies demonstrate:

Omitting residual supervision results in a $\sim$ 0.3 dB PSNR decrease.
Removing SA-Points filtering destabilizes early NBV selection and yields a 0.2 dB loss in PSNR.
Full SA-ResGS achieves a 0.71 dB PSNR gain over FisherRF on Mip-NeRF360.

Uncertainty calibration (AUSE) shows improvement over FisherRF (0.297 vs. 0.327). Qualitatively, SA-ResGS reduces floating artifacts, increases coverage, and produces smoother renderings in high-uncertainty regions (Jun-Seong et al., 6 Jan 2026).

7. Significance and Implications

SA-ResGS establishes a new paradigm for integrating residual learning and physically motivated self-augmentation in 3D Gaussian Splatting frameworks. The combination of uncertainty-aware residual supervision and robust NBV selection mitigates the conflicting demands of wide-baseline exploration and sparse-view ambiguity, supporting stable and complete active scene reconstruction. Its methodological innovations—Self-Augmented Point coverage, uncertainty-driven Gaussian sampling, and implicit unbiasing of uncertainty estimates through constrained supervision—are demonstrated to improve both quantitative and qualitative outcomes, suggesting efficacy for broader active vision and scene representation applications (Jun-Seong et al., 6 Jan 2026).

PDF Markdown Chat (Pro)

References (1)

SA-ResGS: Self-Augmented Residual 3D Gaussian Splatting for Next Best View Selection (2026)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Self-Augmented Residual 3D Gaussian Splatting (SA-ResGS).