SA-ResGS: Self-Augmented Residual 3D Gaussian Splatting
- The paper introduces a novel residual learning mechanism combined with self-augmented points to enhance uncertainty quantification and NBV selection.
- It employs a 3D Gaussian splatting representation to model complex scenes, enabling robust active scene reconstruction.
- Empirical evaluations demonstrate superior reconstruction quality and stability over baselines on multiple datasets.
Self-Augmented Residual 3D Gaussian Splatting (SA-ResGS) is a framework designed to enhance the stability of uncertainty quantification and facilitate uncertainty-aware supervision in next-best-view (NBV) selection for active scene reconstruction. The method simultaneously improves the reliability of uncertainty estimates and their effectiveness in guiding supervision by introducing both a novel residual learning mechanism for 3D Gaussian Splatting and a physically grounded view selection scheme based on self-augmented synthetic observations. The approach addresses instability caused by under-supervised Gaussians, particularly prominent in sparse and wide-baseline scenarios, and achieves superior reconstruction quality and robustness in NBV planning relative to contemporary baselines (Jun-Seong et al., 6 Jan 2026).
1. Scene Representation and Self-Augmented Points
SA-ResGS employs a 3D Gaussian Splatting representation for scenes, where the space is modeled as a collection of Gaussians:
with as the center, the anisotropic covariance, the density weight, and the color. Differentiable splatting and alpha compositing facilitate rendering.
To enhance coverage estimation in NBV selection, SA-ResGS introduces Self-Augmented Points (SA-Points), generated as follows:
- Select a reference view with pose and generate an extrapolated pose by perturbing the translation.
- Render the extrapolated image .
- Use MASt3R to predict dense correspondences .
- For each match, triangulate , retaining only if the reprojection error is below a threshold .
- Aggregate and hash these points into a voxel grid, producing a binary occupancy map for coverage-driven NBV selection.
This self-augmented occupancy mechanism explicitly guides view selection by physical scene coverage, reducing the risk of uncovered regions.
2. Residual Supervision for 3D Gaussians
The framework addresses the “vanishing gradient” problem for Gaussians of low opacity or large scale. Residual supervision is implemented by:
- Partitioning the set of Gaussians into a supervision subset , where is a random (e.g., ) and consists of the top Gaussians by uncertainty (e.g., ).
- Uncertainty per Gaussian is estimated via opacity and spatial spread:
- For each view, two images are rendered: ; . The aggregate loss is
with .
Additionally, uncertainty-weighted sampling, in the spirit of hard-negative mining, can supplement the primary loss:
In practice, the two-image residual loss is sufficient.
3. Uncertainty Quantification Mechanisms
SA-ResGS quantifies uncertainty per Gaussian both post hoc—using a Laplacian approximation (FisherRF)—and via a proxy based on opacity and spread:
This per-Gaussian uncertainty correlates with rendering error and allows real-time estimation.
In NBV planning, pixel-wise uncertainty maps and view-aggregated uncertainty inform candidate view scoring. In residual supervision, the most uncertain Gaussians are directly targeted for amplified supervision.
4. Physically Guided Next-Best-View Selection
SA-ResGS’s NBV strategy is physically grounded, prioritizing efficient and uniform scene coverage:
- The scene’s bounding box is voxelized; voxels containing SA-Points form the observed set , which is dilated for robustness.
- For candidate camera , the visible voxels are determined; both and are hash-encoded as binary occupancy codes.
- The normalized Hamming distance
identifies coverage novelty. The top by are retained for fine-grained scoring.
Within the narrowed set, the final NBV is selected by maximizing a weighted combination of coverage novelty and uncertainty:
with early in acquisition to favor coverage.
5. Algorithmic Implementation
The joint training and NBV planning loop is orchestrated as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
Algorithm Train SA-ResGS: Input: initial Gaussians G, initial views V_train, candidate set V_all for iter=1…MaxIters do Sample batch of training views v∈V_train for each v: I_full ← Render(G, v) Compute per-G_i uncertainty U_i G_rand ← random α% of G G_unc ← top-β of G by U_i G_sup ← G_rand ∪ G_unc I_sup ← Render(G_sup, v) L ← λ_full[‖I_full–I_gt(v)‖₁+L_ssim] + λ_sup[‖I_sup–I_gt(v)‖₁+L_ssim] Backpropagate L, update G if iter mod T_nbv == 0 and |V_train|<MaxViews: SA-Points ← GenerateSA-Points(G, last view added) v_nbv ← SelectNBV(G, SA-Points, V_all–V_train) V_train ← V_train ∪ {v_nbv} end for |
The NBV selection function follows the coverage and uncertainty-guided scoring described previously, ensuring both robust sample efficiency and improved scene completeness.
6. Empirical Evaluation and Ablation Analysis
SA-ResGS is benchmarked on Mip-NeRF-360, NeRF-Synthetic, and an extended Tanks & Temples dataset. For active view selection with 20 views (averaged over four seeds), SA-ResGS outperforms random, ACP, and FisherRF baselines in both PSNR, SSIM, and LPIPS metrics.
| Dataset | Method | PSNR↑ | SSIM↑ | LPIPS↓ |
|---|---|---|---|---|
| Mip-NeRF360 | Random | 19.97 | 0.584 | 0.456 |
| ACP | 20.33 | 0.596 | 0.449 | |
| FisherRF | 20.64 | 0.595 | 0.450 | |
| Ours (SA-ResGS) | 21.41 | 0.613 | 0.451 | |
| NeRF-Synth | Random | 24.85 | 0.893 | 0.117 |
| FisherRF | 25.19 | 0.892 | 0.116 | |
| Ours | 26.58 | 0.907 | 0.110 | |
| Extended | Random | 18.92 | 0.694 | 0.390 |
| FisherRF | 19.46 | 0.710 | 0.381 | |
| Ours | 20.06 | 0.722 | 0.377 |
Ablation studies demonstrate:
- Omitting residual supervision results in a 0.3 dB PSNR decrease.
- Removing SA-Points filtering destabilizes early NBV selection and yields a 0.2 dB loss in PSNR.
- Full SA-ResGS achieves a 0.71 dB PSNR gain over FisherRF on Mip-NeRF360.
Uncertainty calibration (AUSE) shows improvement over FisherRF (0.297 vs. 0.327). Qualitatively, SA-ResGS reduces floating artifacts, increases coverage, and produces smoother renderings in high-uncertainty regions (Jun-Seong et al., 6 Jan 2026).
7. Significance and Implications
SA-ResGS establishes a new paradigm for integrating residual learning and physically motivated self-augmentation in 3D Gaussian Splatting frameworks. The combination of uncertainty-aware residual supervision and robust NBV selection mitigates the conflicting demands of wide-baseline exploration and sparse-view ambiguity, supporting stable and complete active scene reconstruction. Its methodological innovations—Self-Augmented Point coverage, uncertainty-driven Gaussian sampling, and implicit unbiasing of uncertainty estimates through constrained supervision—are demonstrated to improve both quantitative and qualitative outcomes, suggesting efficacy for broader active vision and scene representation applications (Jun-Seong et al., 6 Jan 2026).