Papers
Topics
Authors
Recent
Search
2000 character limit reached

NBV-Net: Rapid Boundary Exploration in 3D Scanning

Updated 14 February 2026
  • The paper introduces BENBV-Net, which predicts optimal boundary exploration targets directly from point cloud data without needing a reference model.
  • BENBV-Net uses hierarchical point and normal encoders combined with boundary-feature extraction, context fusion, and self-attention to efficiently score candidate views.
  • Experimental results show BENBV-Net achieves near model-based performance with up to an 8× reduction in runtime, enabling real-time, robust 3D scanning.

The Boundary Exploration Next Best View Network (BENBV-Net) is a specialized deep neural network framework designed for the Next Best View (NBV) problem in 3D robotic scanning, addressing both scan coverage maximization and robust registration through intrinsic overlap awareness. By predicting the optimal boundary exploration target directly from empirical point cloud data—without reliance on a reference model—BENBV-Net achieves near model-based performance at significantly reduced inference times, facilitating efficient and practical deployment in unknown object scanning scenarios (Li et al., 2024).

1. NBV Problem Formulation

The NBV task is formalized using stepwise 3D surface point clouds, where at each step ii, the acquired scan data are represented as si={sijR3}j=1sis_i = \{ s_i^j \in \mathbb{R}^3 \}_{j=1}^{|s_i|}. Each candidate view is parameterized by Vi=(Vicam,Vitar)R3×R3V_i = (V^{cam}_i, V^{tar}_i) \in \mathbb{R}^3 \times \mathbb{R}^3, comprising the camera position and a focal point constrained to boundary points of the current cloud. The central objective is to select the next view maximizing a utility function FF balancing coverage and overlap: argmaxViV  F(si(Vicam,Vitar)).\arg\max_{V_i \in \mathcal V}\; F\bigl(s_i(V^{cam}_i,V^{tar}_i)\bigr). Coverage ratio is defined as Ci=si/SC_i = |s_i| / |S|, overlap ratio as Oi=po/siO_i = |p_o| / |s_i|, where SS is the reference model and pop_o denotes newly acquired, already-seen points. This dual emphasis enables maximally informative and registration-robust scanning.

2. Model-Based Boundary-Exploration NBV Policy

The foundational model-based approach iteratively searches for NBVs by evaluating candidate boundary views using a composite score: si=(1Wc)Oi+WcCi,Wc=11+e10(Ci0.6).s_i = (1-W_c) O_i + W_c C_i, \quad W_c = \frac{1}{1+e^{-10(C_i-0.6)}}. Here, WcW_c is a sigmoid coverage-weight, shifting emphasis from overlap to coverage once coverage exceeds $0.6$. At each scan iteration, boundaries are detected via an angle threshold (120°), clustered into K=20K=20 candidates via K-means, and for each boundary, a local frame is estimated, normals are perturbed by θ{45,0,45}\theta \in \{-45^\circ, 0, 45^\circ\}, and the camera is positioned at a controllable working distance dd along the (possibly perturbed) normal. Candidate views are evaluated in simulation for (Ci,Oi)(C_i, O_i), with the optimal sis_i determining the NBV.

This search process not only provides a near-optimal NBV policy but also generates supervised (view, score) pairs to train BENBV-Net.

3. BENBV-Net Architecture and Training

BENBV-Net receives as input: the downsampled current point cloud PP (R4096×6\mathbb{R}^{4096 \times 6} with coordinates and normals), a set of 20 boundary points BB (selected per the model-based step), and a context vector per boundary encapsulating local point density and view-order index.

Architectural Components

  1. Point-Feature Encoder: Processes PP (xyz only) using PointNet-like MLPs with max pooling to yield global features fP\mathbf{f}_P.
  2. Normal-Feature Encoder: Processes PP's normals through MLPs to obtain global normal features fN\mathbf{f}_N.
  3. Boundary-Feature Extractor: Encodes xyz+normals for each boundary (R20×6\in \mathbb{R}^{20\times6}) via lightweight MLPs for per-boundary features fB,i\mathbf{f}_{B,i}.
  4. Context Fusion Module: Fuses per-boundary density ρi\rho_i and normalized view-order index xix_i through MLPs, producing fC,i\mathbf{f}_{C,i}.
  5. Multi-Scale Residual Fusion: Broadcasts inner features, aggregates fP\mathbf{f}_P, fN\mathbf{f}_N, fB,i\mathbf{f}_{B,i}, and fC,i\mathbf{f}_{C,i} via residual MLP blocks for per-boundary tokens.
  6. Self-Attention Layer: A multi-head self-attention module models dependencies between the 20 boundary candidates.
  7. Prediction Head: Outputs scalar scores ysiy^i_s per boundary through MLP with dropout.

Loss Function

The learning objective is a position-aware weighted regression loss: L=λi=120wi(ysiYsi)2,wi=(xi1210)2+0.3,xi[0,19],  λ=5.0L = \lambda \sum_{i=1}^{20} w_i (y^i_s - Y^i_s)^2,\quad w_i = \Bigl( \frac{x_i-12}{10} \Bigr)^2 + 0.3,\quad x_i\in[0,19],\;\lambda=5.0 This weighting scheme assigns different importances to early and late views, proven crucial in early-stage overlap performance.

Training is conducted end-to-end with Adam optimizer (learning rate $1$e3^{-3}, batch size 128, \sim150 epochs), requiring approximately two hours.

4. NBV Prediction and Execution

During inference, boundary detection and clustering produce 20 candidates. BENBV-Net forward-passes these as boundary tokens to obtain scores {ysi}\{ y^i_s \}, selecting the highest-scoring index ii^*. The NBV is thus:

  • Target: Vtar=biV^{tar} = b_{i^*}
  • Camera: Vcam=dn+biV^{cam} = d n' + b_{i^*}, with dd the adjustable working distance and nn' the perturbed normal.

Notably, BENBV-Net does not embed the distance dd as a learnable parameter, but this parameter is immediately configurable at inference to accommodate sensor-specific requirements.

5. Experimental Evaluation and Performance

Benchmarks were performed on ShapeNetV1, ModelNet40, and Stanford 3D Repository datasets, comprising respectively 24,000 scans (train/test) or 128 generalization scans. The evaluation protocol utilized:

  • Final coverage (%) after 15 scans
  • Early overlap (%) over initial 5 scans
  • Chamfer and Hausdorff distances to ground truth
  • Scanning efficiency e=c×100/ve = c \times 100 / v, with cc coverage and vv views to 90% coverage
  • Number of views to reach specified coverage milestones (50%, 80%, 90%)
  • Overlap ratio at scan intervals
Method ShapeNet (Coverage/Overlap/Efficiency) ModelNet40 (Coverage/Overlap/Efficiency) Repository (Coverage/Overlap/Efficiency)
BENBV 89.1% / 59.3% / 8.76 89.1% / 62.2% / 9.01 95.0% / 53.5% / 13.0
BENBV-Net 85.9% / 55.3% / 7.51 87.3% / 58.2% / 8.03 94.3% / 47.0% / 11.2
PC-NBV 87.4% / 33.8% / 7.18 88.2% / 33.1% / 7.49 91.9% / 32.6% / 8.59
SEE 62.9% / 55.2% / 4.13 65.8% / 56.2% / 4.40 77.7% / 57.9% / 5.53

BENBV-Net attains 85.9–94.3% coverage, 47.0–55.3% overlap, achieving 7.51–11.2 efficiency; these values approach model-based BENBV and consistently outperform prior works PC-NBV and SEE, especially in practical early-overlap and coverage milestone metrics. BENBV-Net achieves an average 8×8\times reduction in per-object runtime (7.6\sim7.6–$7.8$ s) relative to model-based search (64\sim64–$67$ s), facilitating near real-time NBV selection.

Ablation studies indicate the position-aware loss and context fusion significantly enhance early overlap; omitting view-order weighting degrades performance by approximately 5% in initial scans.

6. Applications, Flexibility, and Future Directions

BENBV-Net is architected for efficient NBV selection in object-agnostic, model-free 3D scanning contexts where rapid adaptation to scene geometry is critical. The design enables:

  • Flexible deployment on varying sensors and working distances, as dd is trivially adjustable at test time.
  • Intrinsic registration robustness via overlap-aware boundary prioritization, balancing discovery and scan alignment as coverage accumulates.

However, certain limitations exist: the pipeline is not fully end-to-end, as boundary extraction is performed separately, and key hyperparameters (e.g., cluster count, angle threshold, loss weighting) require tuning. Anticipated extensions include integration of learned boundary detectors, explicit robot motion planning, and joint optimization of camera placement and orientation.

A plausible implication is that the boundary exploration paradigm promoted by BENBV-Net may generalize to active vision frameworks beyond point clouds, especially where incremental, registration-aware information gain is paramount.

7. Summary and Key Insights

BENBV-Net exemplifies a practical NBV-Net in which hierarchical point/boundary-feature encoders and context-fusion strategies enable rapid, boundary-specific NBV prediction. The network’s position-aware regression loss, fusion of geometric context (local density and boundary sequence), and inter-boundary self-attention are central to its efficacy. BENBV-Net closely approximates model-based NBV search accuracy with an order-of-magnitude speed improvement, facilitating real-time usage in complex and unstructured 3D scanning environments (Li et al., 2024).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to NBV-Net.