Papers
Topics
Authors
Recent
2000 character limit reached

NeU-NBV: Uncertainty-Driven NBV for Robotics

Updated 9 November 2025
  • The paper introduces a novel framework that selects next-best views by maximizing predicted rendering uncertainty, eliminating the need for explicit 3D map construction.
  • It employs an adaptive ray-marching LSTM with a probabilistic output head to efficiently estimate per-pixel uncertainty, enhancing view synthesis and scene reconstruction.
  • A domain-invariant variant integrates a pole-like landmark detector and deep Q-learning, enabling robust cross-domain self-localization under varying environmental conditions.

The NeU-NBV Framework is a paradigm for active perception in robotics that addresses the problem of next-best-view (NBV) planning for scene exploration and self-localization under domain shift. It combines a mapless information-seeking approach grounded in uncertainty-aware neural rendering ("NeU-NBV" in (Jin et al., 2023)) and a domain-invariant, cue-driven RL policy for cross-domain self-localization ("Domain-invariant NBV Planner for Active Cross-domain Self-localization" (Tanaka, 2021)). The core innovation is to drive the view acquisition policy by maximizing information value—either in terms of predicted renderer uncertainty or robust, domain-invariant landmarks—rather than heuristic or explicit 3D map construction.

1. System Architecture and NBV Problem Formulation

The NeU-NBV framework formalizes NBV selection as an iterative, data-driven process in which the system maintains:

  • A growing reference set I\mathcal{I} of RGB images with associated camera poses.
  • An image-based neural renderer fθf_\theta, trained offline on diverse scenes and fixed at test time.

At each acquisition step, a discrete candidate set of views V={vkk=1,,K}\mathcal{V} = \{v_k \mid k=1,\ldots,K\} is sampled within neighborhood constraints (bounded azimuth/elevation). For each vkv_k, the NN closest existing references in pose space are selected, and the renderer predicts an uncertainty map UkU_k for all pixels and channels. The mean uncertainty g(vk)g(v_k) is computed:

g(vk)=1HrWr3xpixelsUk(x)g(v_k) = \frac{1}{H_r W_r 3} \sum_{x \in \text{pixels}} U_k(x)

The NBV is selected by maximizing this criterion: v=argmaxvkVg(vk)v^* = \arg\max_{v_k \in \mathcal{V}} g(v_k).

No explicit 3D map is constructed or updated. Instead, the approach uses the internal uncertainty of a photometric renderer as a proxy for unexplored or ambiguous regions of the scene. After capturing the real image at vv^*, the observation is added to I\mathcal{I}, and the process continues until the measurement budget BB is exhausted.

A related, domain-invariant variant (Tanaka, 2021) targets active self-localization under changing appearance (season, weather). Here, the architecture includes:

  • A multi-scale pole-like landmark detector (PLD) CNN, yielding a compact 4-dimensional feature ftf_t summarizing the likelihood of domain-stable geometric cues.
  • A lightweight deep Q-network (DQN) policy π:ftat\pi: f_t \mapsto a_t trained to maximize pole detection rates while minimizing movement cost.
  • An experience replay buffer and Bag-of-Words pose retrieval module.

The pose-estimation process is triggered opportunistically when sufficient landmark cues are detected.

2. Neural Rendering and Uncertainty Estimation

NeU-NBV builds on PixelNeRF but incorporates two critical changes:

  1. Adaptive ray-marching LSTM: Instead of dense volumetric sampling, an LSTM dynamically determines the next sample point along each ray, leveraging previous feature aggregation for efficient view synthesis.
  2. Probabilistic output head: For each pixel, the network predicts both the logit-space mean μi\mu_i and standard deviation σi\sigma_i for each color channel. The RGB channel cic_i is modeled as logistic-normal:

zi=logit(ci)N(μi,σi2)z_i = \mathrm{logit}(c_i) \sim \mathcal{N}(\mu_i, \sigma_i^2)

This enables direct aleatoric per-pixel uncertainty estimation without ensembles or dropout. At inference, per-pixel uncertainty uiu_i is computed as the variance of sigmoid-transformed samples drawn from N(μi,σi2)\mathcal{N}(\mu_i, \sigma_i^2).

The network is trained using the negative log-likelihood of the logistic-normal model:

Lphoto=i=13[12ln(σi2)+ln(yi(1yi))+(logit(yi)μi)22σi2]\mathcal{L}_\textrm{photo} = \sum_{i=1}^3 \left[ \frac{1}{2} \ln(\sigma_i^2) + \ln(y_i(1-y_i)) + \frac{(\operatorname{logit}(y_i)-\mu_i)^2}{2 \sigma_i^2} \right]

No additional regularization, depth supervision, or adversarial loss is applied.

3. NBV Selection Algorithm and Planning Loop

At runtime, NeU-NBV executes the following procedure until the acquisition budget BB is reached:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Input: pretrained renderer f_theta, initial reference set I, budget B, candidate count K, nearest refs N
for step = 1 to (B - |I|):
    V = sample_K_candidate_views(current_pose, K)
    best_score = -∞
    best_view  = None

    for v in V:
        refs = find_N_closest_refs(I, v, N)
        U = f_theta.predict_uncertainty(v, refs)
        score = mean(U)   # average over H_r x W_r x 3
        if score > best_score:
            best_score = score
            best_view  = v

    (I_new, T_new) = capture_image(best_view)
    I.append((I_new, T_new))
end
return I

This policy requires only local operations (nearest neighbor pose search, feedforward inference, empirical averaging) and is mapless—no volumetric or geometric scene model is built or maintained. Because fθf_\theta is pretrained, there is no per-scene retraining.

4. Domain-Invariant NBV for Active Self-Localization

The variant in (Tanaka, 2021) introduces several components to address visual domain shifts:

  • Pole-like Landmark Detector (PLD): A multi-encoder CNN inspired by HED, trained on pole endpoint annotations, robustly detects pole-like structures that are invariant to appearance variations.
  • Spatial Landmark Aggregation (SLA): The PLD's output is binned horizontally and aggregated to form ftR4f_t \in \mathbb{R}^4.
  • Deep Q-Learning Policy: A model-free DQN maps ftf_t to discrete forward motion actions AA. Rewards favor observations where pole cues are detected and penalize unnecessary moves.
  • Passive Self-Localization (PSL): Upon pole detection, a Bag-of-Words-based retrieval estimates pose by matching ftf_t to a database.
  • Domain Generalization: The PLD is pretrained on a source domain and transfers directly without domain adaptation or adversarial losses. Mapping policy evaluation to a compact geometry-driven feature enables robust performance across environmental changes.

5. Experimental Protocols and Benchmark Results

Datasets:

  • Real: DTU multiview stereo (49 views/scene; 88 train, 15 test), 400x300 px.
  • Synthetic: ShapeNet (car, moto, camera, ship), 100 views/object, 200x200 px.
  • Domain-invariant NBV: University of Michigan NCLT dataset, four seasons, 26k images/sequence.

Training:

  • NeU-NBV: Adam, LR 1×1051 \times 10^{-5}; LSTM sampling iterations T=16T=16; 2 days on one RTX A5000; 3-5 random reference views/scene.
  • DQN NBV: γ=0.99\gamma=0.99, Adam, batch size 32, buffer size 10510^5, target update every 1k steps; exploration temperature annealed from 1.0 to 0.1.

Evaluation:

  • Uncertainty Calibration: Spearman's Rank Correlation (SRCC) between predicted uncertainty and true MSE; Area Under Sparsification Error (AUSE).
    • Aleatoric uncertainty SRCC: 0.84\approx 0.84 (competing methods: $0.27$–$0.83$); AUSE: 0.12\approx 0.12 (vs.\ $0.26$–$0.50$).
  • Planning Quality: Test-time PSNR and SSIM on held-out images after fixed-budget planning (DTU: 9 images, ShapeNet/indoor: 20 images, 50 candidates/step).
    • Uncertainty-based NBV outperforms random and max-distance planners on both DTU and simulator setups.
  • Impact on Downstream Reconstruction: Instant-NGP trained on data acquired by NeU-NBV yields higher PSNR/SSIM than models trained on random or max-distance acquisitions.
  • Domain Transfer in NBV DQN: Median rank of ground-truth pose 1.2\approx1.2 after 5\approx5 moves with learned policy; baseline heuristics yield rank 2.8\approx2.8 after 6\approx6 moves.

6. Strengths, Limitations, and Future Prospects

Strengths:

  • NeU-NBV achieves efficient, mapless, uncertainty-driven view planning with no per-scene retraining or explicit 3D map construction.
  • Uncertainty estimates are strongly correlated with actual reconstruction error, enabling effective budget utilization.
  • Domain-invariant variant leverages geometry-driven cues, providing robust NBV policies across seasons/lighting without retraining.

Limitations:

  • In domain-invariant NBV, the PLD may confound vertical structures unrelated to poles, especially in cluttered scenes.
  • Neither approach explicitly handles full occlusions, e.g., poles obstructed by dynamic obstacles.
  • The reward structure in RL-based variant is sparse; more informative shaping (e.g., retrieval-score gain) could accelerate learning.
  • The view planner's action space is restricted (e.g., forward motion only) in the RL variant.

Future Directions:

  • Integrating richer or multi-cue representations (e.g., combining geometric and photometric uncertainty) could further extend robustness.
  • Mechanisms for active disambiguation under occlusion or to support higher-dimensional navigation policies are natural extensions.
  • Exploration of adversarial or contrastive domain-alignment methods may further improve invariance.

7. Context within Active Perception and Neural Rendering

The NeU-NBV framework represents an overview of active perception, deep photometric rendering, and robust landmark-based reasoning. By eschewing explicit geometric models in favor of information-driven rendering uncertainty or domain-invariant geometric cues, the framework addresses key bottlenecks of earlier NBV planners: computational scalability, sensitivity to domain shift, and sample efficiency. It contributes both a practical methodology for data acquisition in scene understanding and a benchmark for uncertainty-driven planning in neural rendering and robotic self-localization tasks.

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to NeU-NBV Framework.