Papers
Topics
Authors
Recent
2000 character limit reached

SA-ResGS: Self-Augmented Residual 3D Gaussian Splatting

Updated 13 January 2026
  • The paper introduces a novel residual learning mechanism combined with self-augmented points to enhance uncertainty quantification and NBV selection.
  • It employs a 3D Gaussian splatting representation to model complex scenes, enabling robust active scene reconstruction.
  • Empirical evaluations demonstrate superior reconstruction quality and stability over baselines on multiple datasets.

Self-Augmented Residual 3D Gaussian Splatting (SA-ResGS) is a framework designed to enhance the stability of uncertainty quantification and facilitate uncertainty-aware supervision in next-best-view (NBV) selection for active scene reconstruction. The method simultaneously improves the reliability of uncertainty estimates and their effectiveness in guiding supervision by introducing both a novel residual learning mechanism for 3D Gaussian Splatting and a physically grounded view selection scheme based on self-augmented synthetic observations. The approach addresses instability caused by under-supervised Gaussians, particularly prominent in sparse and wide-baseline scenarios, and achieves superior reconstruction quality and robustness in NBV planning relative to contemporary baselines (Jun-Seong et al., 6 Jan 2026).

1. Scene Representation and Self-Augmented Points

SA-ResGS employs a 3D Gaussian Splatting representation for scenes, where the space is modeled as a collection of NN Gaussians:

Gi=(μi,Σi,wi,ci)G_i = (\mu_i, \Sigma_i, w_i, c_i)

with μiR3\mu_i \in \mathbb{R}^3 as the center, ΣiR3×3\Sigma_i \in \mathbb{R}^{3\times 3} the anisotropic covariance, wiRw_i \in \mathbb{R} the density weight, and ciR3c_i \in \mathbb{R}^3 the color. Differentiable splatting and alpha compositing facilitate rendering.

To enhance coverage estimation in NBV selection, SA-ResGS introduces Self-Augmented Points (SA-Points), generated as follows:

  1. Select a reference view IrI_r with pose PrP_r and generate an extrapolated pose PeP_e by perturbing the translation.
  2. Render the extrapolated image Ie=Render({Gi},Pe)I_e = \text{Render}(\{G_i\}, P_e).
  3. Use MASt3R to predict dense correspondences {(prk,pek)}\{(p_r^k, p_e^k)\}.
  4. For each match, triangulate Xk=argminX[prkπ(PrX)2+pekπ(PeX)2]X^k = \arg\min_X \left[ \|p_r^k - \pi(P_r X)\|^2 + \|p_e^k - \pi(P_e X)\|^2 \right], retaining XkX^k only if the reprojection error ϵk\epsilon^k is below a threshold τ\tau.
  5. Aggregate and hash these points into a voxel grid, producing a binary occupancy map for coverage-driven NBV selection.

This self-augmented occupancy mechanism explicitly guides view selection by physical scene coverage, reducing the risk of uncovered regions.

2. Residual Supervision for 3D Gaussians

The framework addresses the “vanishing gradient” problem for Gaussians of low opacity or large scale. Residual supervision is implemented by:

  • Partitioning the set of Gaussians G\mathcal{G} into a supervision subset Gsup=GrandGuncertain\mathcal{G}_{\text{sup}} = \mathcal{G}_{\text{rand}} \cup \mathcal{G}_{\text{uncertain}}, where Grand\mathcal{G}_{\text{rand}} is a random α%\alpha \% (e.g., α=90\alpha=90) and Guncertain\mathcal{G}_{\text{uncertain}} consists of the top β\beta Gaussians by uncertainty (e.g., β=10\beta=10).
  • Uncertainty UiU_i per Gaussian is estimated via opacity and spatial spread:

Ui=(1αi)+trace(Σi)max_traceU_i = (1-\alpha_i) + \frac{\operatorname{trace}(\Sigma_i)}{\text{max\_trace}}

  • For each view, two images are rendered: Ifull=Render(G,P)I_{\text{full}} = \text{Render}(\mathcal{G}, P); Isup=Render(Gsup,P)I_{\text{sup}} = \text{Render}(\mathcal{G}_{\text{sup}}, P). The aggregate loss is

L=λfull[IfullIgt1+Lssim(Ifull,Igt)]+λsup[IsupIgt1+Lssim(Isup,Igt)]L = \lambda_{\text{full}} [\|I_{\text{full}}-I_{\text{gt}}\|_1 + L_{\text{ssim}}(I_{\text{full}}, I_{\text{gt}})] + \lambda_{\text{sup}} [\|I_{\text{sup}}-I_{\text{gt}}\|_1 + L_{\text{ssim}}(I_{\text{sup}}, I_{\text{gt}})]

with λfull=λsup=0.5\lambda_{\text{full}} = \lambda_{\text{sup}}=0.5.

Additionally, uncertainty-weighted sampling, in the spirit of hard-negative mining, can supplement the primary loss:

Lsample=1GuncertainGiGuncertainUiL_{\text{sample}} = -\frac{1}{|\mathcal{G}_{\text{uncertain}}|} \sum_{G_i \in \mathcal{G}_{\text{uncertain}}} U_i

In practice, the two-image residual loss is sufficient.

3. Uncertainty Quantification Mechanisms

SA-ResGS quantifies uncertainty per Gaussian both post hoc—using a Laplacian approximation (FisherRF)—and via a proxy based on opacity and spread:

Ui=wop(1αi)+wscale(trace(Σi)3)U_i = w_{\text{op}} (1-\alpha_i) + w_{\text{scale}} \left(\frac{\operatorname{trace}(\Sigma_i)}{3}\right)

This per-Gaussian uncertainty correlates with rendering error and allows real-time estimation.

In NBV planning, pixel-wise uncertainty maps and view-aggregated uncertainty UjU_j inform candidate view scoring. In residual supervision, the most uncertain Gaussians are directly targeted for amplified supervision.

4. Physically Guided Next-Best-View Selection

SA-ResGS’s NBV strategy is physically grounded, prioritizing efficient and uniform scene coverage:

  • The scene’s bounding box is voxelized; voxels containing SA-Points form the observed set VobsV_{\text{obs}}, which is dilated for robustness.
  • For candidate camera jj, the visible voxels Vcand(j)V_{\text{cand}}^{(j)} are determined; both VobsV_{\text{obs}} and Vcand(j)V_{\text{cand}}^{(j)} are hash-encoded as binary occupancy codes.
  • The normalized Hamming distance

dj=1Kbobsbcand(j)1d_j = \frac{1}{K}\|b_{\text{obs}} \oplus b_{\text{cand}}^{(j)}\|_1

identifies coverage novelty. The top N%N\% by djd_j are retained for fine-grained scoring.

Within the narrowed set, the final NBV is selected by maximizing a weighted combination of coverage novelty and uncertainty:

NBV=argmaxjC[dbalancedj+duncertUj]\text{NBV} = \arg\max_{j \in \mathcal{C}'} \left[ d_{\text{balance}} \cdot d_j + d_{\text{uncert}} \cdot U_j \right]

with dbalanceduncertd_{\text{balance}} \gg d_{\text{uncert}} early in acquisition to favor coverage.

5. Algorithmic Implementation

The joint training and NBV planning loop is orchestrated as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Algorithm Train SA-ResGS:
Input: initial Gaussians G, initial views V_train, candidate set V_all 
for iter=1MaxIters do
  Sample batch of training views vV_train
  for each v:
    I_full  Render(G, v)
    Compute per-G_i uncertainty U_i
    G_rand  random α% of G
    G_unc  top-β of G by U_i
    G_sup  G_rand  G_unc
    I_sup  Render(G_sup, v)
    L  λ_full[I_fullI_gt(v)+L_ssim] + λ_sup[I_supI_gt(v)+L_ssim]
    Backpropagate L, update G
  if iter mod T_nbv == 0 and |V_train|<MaxViews:
    SA-Points  GenerateSA-Points(G, last view added)
    v_nbv  SelectNBV(G, SA-Points, V_allV_train)
    V_train  V_train  {v_nbv}
end for

The NBV selection function follows the coverage and uncertainty-guided scoring described previously, ensuring both robust sample efficiency and improved scene completeness.

6. Empirical Evaluation and Ablation Analysis

SA-ResGS is benchmarked on Mip-NeRF-360, NeRF-Synthetic, and an extended Tanks & Temples dataset. For active view selection with 20 views (averaged over four seeds), SA-ResGS outperforms random, ACP, and FisherRF baselines in both PSNR, SSIM, and LPIPS metrics.

Dataset Method PSNR↑ SSIM↑ LPIPS↓
Mip-NeRF360 Random 19.97 0.584 0.456
ACP 20.33 0.596 0.449
FisherRF 20.64 0.595 0.450
Ours (SA-ResGS) 21.41 0.613 0.451
NeRF-Synth Random 24.85 0.893 0.117
FisherRF 25.19 0.892 0.116
Ours 26.58 0.907 0.110
Extended Random 18.92 0.694 0.390
FisherRF 19.46 0.710 0.381
Ours 20.06 0.722 0.377

Ablation studies demonstrate:

  • Omitting residual supervision results in a \sim0.3 dB PSNR decrease.
  • Removing SA-Points filtering destabilizes early NBV selection and yields a 0.2 dB loss in PSNR.
  • Full SA-ResGS achieves a 0.71 dB PSNR gain over FisherRF on Mip-NeRF360.

Uncertainty calibration (AUSE) shows improvement over FisherRF (0.297 vs. 0.327). Qualitatively, SA-ResGS reduces floating artifacts, increases coverage, and produces smoother renderings in high-uncertainty regions (Jun-Seong et al., 6 Jan 2026).

7. Significance and Implications

SA-ResGS establishes a new paradigm for integrating residual learning and physically motivated self-augmentation in 3D Gaussian Splatting frameworks. The combination of uncertainty-aware residual supervision and robust NBV selection mitigates the conflicting demands of wide-baseline exploration and sparse-view ambiguity, supporting stable and complete active scene reconstruction. Its methodological innovations—Self-Augmented Point coverage, uncertainty-driven Gaussian sampling, and implicit unbiasing of uncertainty estimates through constrained supervision—are demonstrated to improve both quantitative and qualitative outcomes, suggesting efficacy for broader active vision and scene representation applications (Jun-Seong et al., 6 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Self-Augmented Residual 3D Gaussian Splatting (SA-ResGS).