Papers
Topics
Authors
Recent
2000 character limit reached

OUGS: Active View Selection via Object-aware Uncertainty Estimation in 3DGS (2511.09397v1)

Published 12 Nov 2025 in cs.CV, cs.CG, cs.GR, and cs.HC

Abstract: Recent advances in 3D Gaussian Splatting (3DGS) have achieved state-of-the-art results for novel view synthesis. However, efficiently capturing high-fidelity reconstructions of specific objects within complex scenes remains a significant challenge. A key limitation of existing active reconstruction methods is their reliance on scene-level uncertainty metrics, which are often biased by irrelevant background clutter and lead to inefficient view selection for object-centric tasks. We present OUGS, a novel framework that addresses this challenge with a more principled, physically-grounded uncertainty formulation for 3DGS. Our core innovation is to derive uncertainty directly from the explicit physical parameters of the 3D Gaussian primitives (e.g., position, scale, rotation). By propagating the covariance of these parameters through the rendering Jacobian, we establish a highly interpretable uncertainty model. This foundation allows us to then seamlessly integrate semantic segmentation masks to produce a targeted, object-aware uncertainty score that effectively disentangles the object from its environment. This allows for a more effective active view selection strategy that prioritizes views critical to improving object fidelity. Experimental evaluations on public datasets demonstrate that our approach significantly improves the efficiency of the 3DGS reconstruction process and achieves higher quality for targeted objects compared to existing state-of-the-art methods, while also serving as a robust uncertainty estimator for the global scene.

Summary

  • The paper presents a physically-grounded uncertainty model on 3D Gaussian primitives to directly improve view selection for precise object reconstruction.
  • It leverages semantic segmentation masks and Jacobian-based covariance propagation to isolate object-level uncertainties and optimize camera views.
  • Experimental results show enhanced object fidelity in PSNR, SSIM, and LPIPS, outperforming traditional scene-level uncertainty approaches.

Object-aware Uncertainty Estimation for Active View Selection in 3D Gaussian Splatting

Introduction

The paper "OUGS: Active View Selection via Object-aware Uncertainty Estimation in 3DGS" proposes a new framework for efficiently reconstructing specific objects within complex 3D scenes, focusing on the problem of active view selection in the context of 3D Gaussian Splatting (3DGS). The central contribution is the derivation of a physically-grounded uncertainty model that operates directly on the explicit parameters of the 3D Gaussian primitives—rather than relying on abstract neural network weights or scene-level metrics. Through a principled propagation of parameter covariance via the rendering Jacobian and subsequent integration of semantic segmentation masks, OUGS isolates object-level uncertainty, enabling targeted camera view selection to maximize object reconstruction quality. Figure 1

Figure 1: A complex background can inflate image-level uncertainty and mislead active view selection away from the object of interest.

Methodology

Physically-grounded Parameter Uncertainty in 3DGS

Traditional uncertainty estimation approaches in 3D reconstruction often operate at the scene level, leading to suboptimal active view selection especially in scenarios where object fidelity is paramount (Figure 1). OUGS initiates a departure from these methods by explicitly modeling uncertainty for each 3D Gaussian primitive, parameterized by position, scale, rotation, opacity, and Spherical Harmonics for color. These parameters are treated as random variables with an associated covariance, Σ\Sigma.

The covariance propagation to image space follows the classic Jacobian–Covariance law. For color C(u)C(u) at pixel uu:

Var[C(u;θ)]JuΣJu\text{Var}[C(u;\theta)] \approx J_u\,\Sigma\,J_u^{\top}

where JuJ_u is the pixel-wise Jacobian of color with respect to the full parameter set. Figure 2

Figure 2: The parameterization of a 3D Gaussian primitive. Each building block is quantified by explicit physical parameters, enabling direct uncertainty estimation.

This pixel-wise uncertainty quantifies how each Gaussian's parameter uncertainty translates into uncertainty in the reconstructed image. The uncertainty can be decomposed into geometric and appearance contributions due to diagonal Fisher Information Matrix (FIM) approximation.

Object-aware Uncertainty via Semantic Masking

To realize object-centric view selection, OUGS incorporates a semantic mask Mk(u)M_k(u) for object kk, ideally derived from a sufficiently accurate semantic segmentation model. The object-aware uncertainty at pixel uu is thus:

ΣC,k(u)=[Mk(u)]2JuΣJu\Sigma_{C,k}(u) = [M_k(u)]^2 J_u\,\Sigma\,J_u^{\top}

By summing over all object pixels, a scalar uncertainty score is obtained that directs active view selection towards regions maximizing the reduction of object uncertainty. Figure 3

Figure 3: Object-aware uncertainty guides 3DGS view planning for precise object reconstruction, combining physical parameter uncertainty and semantic segmentation.

Fisher Information Matrix-based Covariance Update

Direct computation of full Σ\Sigma is impractical. OUGS adopts a diagonal FIM approximation:

Σσ2I1\Sigma \simeq \sigma^2 \, \mathcal{I}^{-1}

where I\mathcal{I} is tracked online via an exponential moving average (EMA) of squared gradients, with a decaying momentum schedule for robust adaptation:

It,i=αtIt1,i+(1αt)[θit]2\mathcal{I}_{t,i} = \alpha_t\,\mathcal{I}_{t-1,i} + (1-\alpha_t)\,[\nabla_{\theta_i} \ell_t]^2

This update is computationally lightweight and aligns with standard stochastic optimization practices.

Experimental Results

Object-aware Versus Scene-level NBV Selection

Extensive evaluations across Mip-NeRF360, Light-Field, and Tanks & Temples datasets establish the efficacy of OUGS in the active view selection loop. When evaluation is constrained to object masks, OUGS surpasses all prior methods (including FisherRF and GauSS-MI) in PSNR, SSIM, and LPIPS for targeted objects, demonstrating superior view allocation and sharper reconstructions. Figure 4

Figure 4: Object-aware approach speeds up convergence; OUGS rapidly improves object fidelity while FisherRF lags due to scene-level distraction.

In panoramic (scene-level) evaluations, OUGS remains competitive with information-theoretic approaches, substantiating its general robustness even when global coverage is considered.

Qualitative Analysis

Qualitative results reinforce the quantitative findings. OUGS uniquely maintains high fidelity in object regions, mitigating the misallocations caused by scene-level uncertainty metrics that tend to prioritize background clutter. Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5: OUGS reconstructions preserve object fidelity, outperforming FisherRF and Random policies, which dilute view budget across the background.

Uncertainty heatmaps generated by the method are tightly correlated with actual rendering artifacts. High uncertainty regions predicted by OUGS correspond to blurry or mis-reconstructed areas in the render, demonstrating calibration accuracy. Figure 6

Figure 6

Figure 6

Figure 6

Figure 6

Figure 6

Figure 6: Uncertainty heatmaps localize prediction errors, confirming physically-grounded model’s reliability in error explanation.

Uncertainty Calibration and Component Analysis

AUSE metrics confirm the FIM-based uncertainty estimator as competitive or superior for error localization compared to variational and ensemble uncertainty methods.

Sensitivity analysis of semantic mask binarization underscores the importance of mask quality in object-aware uncertainty estimation. Over-permissive masks inflate uncertainty scores via background inclusion; overly strict thresholds erode object coverage and degrade performance. Figure 7

Figure 7: Semantic mask threshold impacts object-level uncertainty calibration; optimal masking isolates the object without discarding informative pixels.

Ablation studies also demonstrate the significance of the EMA schedule. Decaying momentum achieves higher object PSNR than either constant low or high momentum alternatives, affirming the balance between early uncertainty smoothing and late-stage responsiveness.

Implementation and Computational Considerations

OUGS is designed for efficient deployment. Diagonal FIM tracking and Jacobian-based uncertainty propagation require only lightweight per-parameter updates and gradient computations, readily parallelizable on modern hardware. Strong performance is contingent on reliable semantic masking; current implementation utilizes SAM2 for mask generation. The linear per-view computational cost is significantly below that of ensemble or Bayesian 3DGS uncertainty approaches. OUGS integrates with standard NBV planning pipelines by replacing the view selection score with its object-aware uncertainty metric.

Limitations include reliance on mask quality and independence assumptions implicit in diagonal FIM approximation. Extending OUGS to multi-object or global coverage scenarios could require structured FIM variants or composite uncertainty weighting. Scalability for large scenes may require further innovations in online FIM tracking or parameter pruning.

Theoretical and Practical Implications

The explicit modeling of uncertainty on physical parameters aligns uncertainty quantification with the interpretability and controllability desirable in robotics and vision applications. Object-aware uncertainty enables principled NBV selection in object-focused acquisition tasks, advancing the fidelity and efficiency of 3DGS-based reconstruction in real-world scenarios. Methodologically, OUGS sets a precedent for separating geometric and appearance uncertainty, which may inform future uncertainty quantification approaches for other explicit representations.

OUGS opens avenues for further research in self-supervised mask estimation, joint multi-object NBV planning, and adaptive FIM structures for finer uncertainty coupling. The separation of object and background uncertainty also facilitates downstream object-centric tasks in AR/VR, SLAM, and robotic manipulation.

Conclusion

OUGS presents a principled framework for object-centric active reconstruction in 3DGS, advancing the state-of-the-art in uncertainty-driven view planning. By directly propagating physical parameter uncertainty and leveraging semantic masks, OUGS achieves targeted fidelity improvements unattainable by scene-level approaches, with competitive scene-level performance and robust uncertainty calibration. The proposed diagonal FIM and online update strategy provide scalable and interpretable uncertainty estimates, laying groundwork for future object-aware reconstruction methodologies in explicit scene representations.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Knowledge Gaps

Knowledge gaps, limitations, and open questions

Below is a single, actionable list of what remains uncertain or unexplored in the paper and where future work could concretely extend the method.

  • Reliance on externally provided semantic masks (SAM‑2): no joint reconstruction–segmentation, no multi‑view consistency, no explicit handling of occlusions/partial visibility, and limited analysis under segmentation domain shift or severe mask noise/outliers.
  • Single‑object focus: no formulation for multi‑object NBV with competing priorities, dynamic weighting, or fairness constraints; no paper of inter‑object trade‑offs in shared view budgets.
  • Diagonal Fisher approximation: independence across parameters within and across Gaussians is assumed; missing structured correlations (e.g., position–opacity, neighboring Gaussians) and no evaluation of block‑diagonal/KFAC/Shampoo‑style alternatives versus compute overhead.
  • First‑order uncertainty propagation (JΣJᵀ): no empirical analysis of breakdown regimes (e.g., highly nonlinear alpha compositing, saturated transmittance, sharp occlusion boundaries); second‑order or unscented approximations not benchmarked.
  • Score design: the view score uses the sum of pixel‑wise trace(Σ); no comparison to alternative criteria (e.g., log‑determinant, largest eigenvalue, entropy, mutual information, or geometry‑biased weights) and their impact on NBV.
  • Mask usage and uncertainty: masking is implemented as M(u)² scaling; mask uncertainty is not modeled or propagated, and alternative soft weighting schemes (e.g., calibration with mask reliability, boundary‑aware emphasis) are not explored.
  • Geometry vs appearance decomposition: no experiments that separately score or weight geometric and appearance uncertainty to prioritize geometry early or adapt weighting over time.
  • Calibration of predictive uncertainty: evaluation relies on AUSE only; no reliability diagrams, negative log‑likelihood, or expected calibration error; no separation of aleatoric vs epistemic components.
  • Noise model and σ²: the Fisher–covariance link assumes a (homoscedastic) noise scale, but σ² is unspecified and uncalibrated; heteroscedastic image noise and HDR effects are not modeled.
  • Computational profile: lack of runtime/memory analysis of Jacobian computation and Fisher updates for candidate views; scalability of per‑candidate uncertainty scoring with many Gaussians and large candidate sets is unquantified.
  • Large‑scale scenes: beyond mentioning difficulty, there is no concrete strategy for memory/compute reduction (e.g., tiling, hierarchical LOD, on‑the‑fly pruning/merging, region‑of‑interest Jacobians).
  • NBV planner myopia: the selection is greedy and per‑step; no look‑ahead/planning under motion/occlusion constraints, no trajectory or kinematic cost modeling, and no exploration–exploitation analysis.
  • Candidate view space: the method selects from a discrete set of images; extension to continuous camera spaces, physical feasibility constraints, and real‑robot integration remain unaddressed.
  • Stopping criteria: no uncertainty‑based termination rule or budget‑aware stopping condition beyond a fixed number of views.
  • Robustness to pose/geometry initialization: sensitivity to COLMAP pose errors or poor Gaussian initialization is not analyzed; no closed‑loop pose refinement or uncertainty‑aware re‑localization.
  • Dynamic or non‑Lambertian scenes: method assumes static scenes and SH‑based appearance; moving objects, specular/transparent materials, and lighting changes are not evaluated or explicitly modeled in uncertainty.
  • Occlusion/visibility reasoning: uncertainty is not integrated with learned or geometric visibility priors; no explicit modeling of how candidate views disocclude high‑uncertainty regions.
  • Hyperparameter sensitivity: aside from EMA momentum, key choices (λ regularization, SH order, σ², mask thresholding strategy, number of candidate views) lack sensitivity analyses.
  • Fairness of object‑aware benchmarking: baselines that do not support masked scoring still evaluated with object‑only metrics; a standardized protocol for object‑centric NBV evaluation (metrics, masks, candidate sets) is not established.
  • Geometry‑specific evaluation: despite geometric/appearance uncertainty decomposition, no geometry‑focused metrics (e.g., depth/normal/mesh error) are reported to validate geometric gains.
  • Combining paradigms: integration of object‑aware Fisher with information‑theoretic selection (e.g., mutual information) or learned policies (RL/IL) is not investigated.
  • Growth/pruning of Gaussians driven by uncertainty: no strategy to use uncertainty for adaptive Gaussian splitting, merging, or pruning during active acquisition.
  • Online compute budget: no paper of how often to recompute Fisher/uncertainty, sub‑sampling pixels/Gaussians for fast surrogate scores, or anytime scoring under tight latency constraints.
Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Glossary

  • 3D Gaussian Splatting (3DGS): An explicit scene representation using many 3D Gaussian primitives, enabling fast differentiable rendering. "Recent advances in 3D Gaussian Splatting (3DGS) have achieved state-of-the-art results for novel view synthesis."
  • Alpha compositing: A standard image blending technique that accumulates colors and transparency along viewing rays. "Rendering in 3DGS uses a differentiable splatting approach based on standard alpha compositing."
  • Anisotropic: Having direction-dependent properties; in 3DGS, Gaussians can scale differently along different axes. "represents a scene as a collection of anisotropic 3D Gaussian primitives"
  • Area Under the Sparsification Error (AUSE): A metric that evaluates how well predicted uncertainty aligns with actual errors (lower is better). "The table reports the Area Under the Sparsification Error (AUSE), a rigorous metric for uncertainty quality (lower is better)."
  • Block-diagonal matrix: A matrix composed of smaller square matrices along the diagonal, used to stack independent covariance blocks. "we stack the per-Gaussian covariances into a block-diagonal matrix"
  • COLMAP: A structure-from-motion and multi-view stereo pipeline often used to initialize 3D scene geometry. "The 3D Gaussians are initialised with COLMAP"
  • Covariance: A measure of joint variability among parameters; here propagated to pixel uncertainty via the Jacobian. "By propagating the covariance of these parameters through the rendering Jacobian"
  • Differentiable rasterization pipeline: A rendering process whose outputs are differentiable with respect to scene parameters, enabling gradient-based optimization. "leveraging a fast, differentiable rasterization pipeline"
  • Diagonal FIM approximation: An assumption that keeps only the Fisher Information Matrix’s diagonal entries to decouple parameters for efficiency. "we make a key simplifying assumption: we approximate the full FIM with its diagonal entries only"
  • Exponential Moving Average (EMA): A smoothing method that updates statistics using exponentially decaying weights over time. "using an exponential moving average (EMA) of the squared gradients"
  • Farthest-point strategy: A selection heuristic that chooses initial views that are maximally separated to improve coverage. "four initial views are selected using the farthest‑point strategy"
  • Fisher Information Matrix (FIM): A matrix that quantifies how sensitive model predictions are to parameter changes; its inverse approximates parameter covariance. "inverse of the Fisher Information Matrix (FIM)"
  • Fisher information gain: An information-theoretic criterion that measures expected improvement from acquiring a new view. "FisherRF proposes using Fisher information gain as a more principled metric."
  • Frontier exploration: A planning strategy that prioritizes views on the boundary between known and unknown regions. "select views based on metrics like Shannon entropy or frontier exploration"
  • Hessian-based metric: A second-derivative-based sensitivity measure used to quantify and prune uncertain Gaussians. "uses a Hessian-based metric to prune Gaussians with high uncertainty."
  • Hierarchical Bayesian priors: Structured prior distributions over parameters that capture multi-level uncertainty. "use hierarchical Bayesian priors"
  • Jacobian: The matrix of first-order partial derivatives mapping parameter perturbations to changes in rendered pixel colors. "through the rendering Jacobian"
  • LPIPS: A learned perceptual image similarity metric used to evaluate visual quality. "Hence, we apply PSNR, SSIM, and LPIPS to evaluate the result."
  • Maximum a posteriori (MAP) estimate: The parameter values that maximize the posterior probability given data and priors. "θ⋆ is the MAP estimate after optimization."
  • Mutual information: A measure of shared information between variables; used to select views that most reduce uncertainty. "selects views that maximize mutual information."
  • Neural Radiance Fields (NeRF): An implicit volumetric representation that synthesizes views by learning radiance and density fields. "The advent of Neural Radiance Fields (NeRF) marked a breakthrough"
  • Neural Visibility Fields: Models that predict which scene regions are visible from a viewpoint to guide view planning. "Neural Visibility Fields learn to predict which parts of a scene are visible from a given viewpoint"
  • Next-Best-View (NBV) planning: The process of selecting the next camera viewpoint to maximally improve reconstruction. "Active view selection, or Next-Best-View (NBV) planning, is a long-standing problem in computer vision and robotics"
  • Occupancy maps: Grids that encode free, occupied, or unknown space, commonly used in robotic exploration. "maximize the exploration of unknown free space using occupancy maps."
  • Opacity: A scalar controlling a Gaussian’s transparency contribution in compositing. "A scalar opacity value αiR\alpha_i \in \mathbb{R}"
  • Parallax: Apparent displacement of scene features due to viewpoint changes; important for depth reasoning. "characterised by long-baseline parallax and strong depth discontinuities."
  • PSNR: Peak Signal-to-Noise Ratio, a fidelity metric for reconstructed images. "Hence, we apply PSNR, SSIM, and LPIPS to evaluate the result."
  • Quaternion: A four-dimensional representation of 3D rotation used for Gaussian orientation. "an orientation quaternion qiS3\mathbf{q}_i \in \mathbb{S}^3."
  • SAM2: A segmentation model used to obtain object masks for object-aware uncertainty. "object masks are obtained from SAM2"
  • Semantic probabilities: Class-likelihoods per pixel used to weight an object’s soft mask. "a soft mask Mk(u)[0,1]M_k(u) \in [0, 1] based on semantic probabilities."
  • Shannon entropy: An information measure used to evaluate uncertainty or information content in view selection. "select views based on metrics like Shannon entropy or frontier exploration"
  • Soft mask: A probabilistic per-pixel weighting in [0,1] that isolates an object in uncertainty estimation. "we introduce a soft mask Mk(u)[0,1]M_k(u) \in [0, 1] based on semantic probabilities."
  • Spatial Uncertainty Field: An auxiliary learned field that predicts uncertainty across space. "adds a Spatial Uncertainty Field for sparse inputs"
  • Spherical Harmonics (SH): A basis for representing view-dependent color on the sphere via low-order coefficients. "view-dependent color modeled by Spherical Harmonics (SH)."
  • Splatting: A rendering technique that projects and blends primitives (Gaussians) onto the image plane. "Rendering in 3DGS uses a differentiable splatting approach"
  • SSIM: Structural Similarity Index Measure, a perceptual metric for image quality. "Hence, we apply PSNR, SSIM, and LPIPS to evaluate the result."
  • Variational inference: A method to approximate complex posteriors by optimizing a tractable family of distributions. "employ variational inference."
  • Voxel-grid representations: Discrete volumetric grids used to model and plan in 3D environments. "voxel-grid representations and select views based on metrics like Shannon entropy or frontier exploration"
Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We found no open problems mentioned in this paper.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 2 tweets and received 35 likes.

Upgrade to Pro to view all of the tweets about this paper: