NBV-Net: Rapid Boundary Exploration in 3D Scanning

Updated 14 February 2026

The paper introduces BENBV-Net, which predicts optimal boundary exploration targets directly from point cloud data without needing a reference model.
BENBV-Net uses hierarchical point and normal encoders combined with boundary-feature extraction, context fusion, and self-attention to efficiently score candidate views.
Experimental results show BENBV-Net achieves near model-based performance with up to an 8× reduction in runtime, enabling real-time, robust 3D scanning.

The Boundary Exploration Next Best View Network (BENBV-Net) is a specialized deep neural network framework designed for the Next Best View (NBV) problem in 3D robotic scanning, addressing both scan coverage maximization and robust registration through intrinsic overlap awareness. By predicting the optimal boundary exploration target directly from empirical point cloud data—without reliance on a reference model—BENBV-Net achieves near model-based performance at significantly reduced inference times, facilitating efficient and practical deployment in unknown object scanning scenarios (Li et al., 2024).

1. NBV Problem Formulation

The NBV task is formalized using stepwise 3D surface point clouds, where at each step $i$ , the acquired scan data are represented as $s_i = \{ s_i^j \in \mathbb{R}^3 \}_{j=1}^{|s_i|}$ . Each candidate view is parameterized by $V_i = (V^{cam}_i, V^{tar}_i) \in \mathbb{R}^3 \times \mathbb{R}^3$ , comprising the camera position and a focal point constrained to boundary points of the current cloud. The central objective is to select the next view maximizing a utility function $F$ balancing coverage and overlap: $\arg\max_{V_i \in \mathcal V}\; F\bigl(s_i(V^{cam}_i,V^{tar}_i)\bigr).$ Coverage ratio is defined as $C_i = |s_i| / |S|$ , overlap ratio as $O_i = |p_o| / |s_i|$ , where $S$ is the reference model and $p_o$ denotes newly acquired, already-seen points. This dual emphasis enables maximally informative and registration-robust scanning.

2. Model-Based Boundary-Exploration NBV Policy

The foundational model-based approach iteratively searches for NBVs by evaluating candidate boundary views using a composite score: $s_i = (1-W_c) O_i + W_c C_i, \quad W_c = \frac{1}{1+e^{-10(C_i-0.6)}}.$ Here, $W_c$ is a sigmoid coverage-weight, shifting emphasis from overlap to coverage once coverage exceeds $0.6$. At each scan iteration, boundaries are detected via an angle threshold (120°), clustered into $K=20$ candidates via K-means, and for each boundary, a local frame is estimated, normals are perturbed by $\theta \in \{-45^\circ, 0, 45^\circ\}$ , and the camera is positioned at a controllable working distance $d$ along the (possibly perturbed) normal. Candidate views are evaluated in simulation for $(C_i, O_i)$ , with the optimal $s_i$ determining the NBV.

This search process not only provides a near-optimal NBV policy but also generates supervised (view, score) pairs to train BENBV-Net.

3. BENBV-Net Architecture and Training

BENBV-Net receives as input: the downsampled current point cloud $P$ ( $\mathbb{R}^{4096 \times 6}$ with coordinates and normals), a set of 20 boundary points $B$ (selected per the model-based step), and a context vector per boundary encapsulating local point density and view-order index.

Architectural Components

Point-Feature Encoder: Processes $P$ (xyz only) using PointNet-like MLPs with max pooling to yield global features $\mathbf{f}_P$ .
Normal-Feature Encoder: Processes $P$ 's normals through MLPs to obtain global normal features $\mathbf{f}_N$ .
Boundary-Feature Extractor: Encodes xyz+normals for each boundary ( $\in \mathbb{R}^{20\times6}$ ) via lightweight MLPs for per-boundary features $\mathbf{f}_{B,i}$ .
Context Fusion Module: Fuses per-boundary density $\rho_i$ and normalized view-order index $x_i$ through MLPs, producing $\mathbf{f}_{C,i}$ .
Multi-Scale Residual Fusion: Broadcasts inner features, aggregates $\mathbf{f}_P$ , $\mathbf{f}_N$ , $\mathbf{f}_{B,i}$ , and $\mathbf{f}_{C,i}$ via residual MLP blocks for per-boundary tokens.
Self-Attention Layer: A multi-head self-attention module models dependencies between the 20 boundary candidates.
Prediction Head: Outputs scalar scores $y^i_s$ per boundary through MLP with dropout.

Loss Function

The learning objective is a position-aware weighted regression loss: $L = \lambda \sum_{i=1}^{20} w_i (y^i_s - Y^i_s)^2,\quad w_i = \Bigl( \frac{x_i-12}{10} \Bigr)^2 + 0.3,\quad x_i\in[0,19],\;\lambda=5.0$ This weighting scheme assigns different importances to early and late views, proven crucial in early-stage overlap performance.

Training is conducted end-to-end with Adam optimizer (learning rate $1$e $^{-3}$ , batch size 128, $\sim$ 150 epochs), requiring approximately two hours.

4. NBV Prediction and Execution

During inference, boundary detection and clustering produce 20 candidates. BENBV-Net forward-passes these as boundary tokens to obtain scores $\{ y^i_s \}$ , selecting the highest-scoring index $i^*$ . The NBV is thus:

Target: $V^{tar} = b_{i^*}$
Camera: $V^{cam} = d n' + b_{i^*}$ , with $d$ the adjustable working distance and $n'$ the perturbed normal.

Notably, BENBV-Net does not embed the distance $d$ as a learnable parameter, but this parameter is immediately configurable at inference to accommodate sensor-specific requirements.

5. Experimental Evaluation and Performance

Benchmarks were performed on ShapeNetV1, ModelNet40, and Stanford 3D Repository datasets, comprising respectively 24,000 scans (train/test) or 128 generalization scans. The evaluation protocol utilized:

Final coverage (%) after 15 scans
Early overlap (%) over initial 5 scans
Chamfer and Hausdorff distances to ground truth
Scanning efficiency $e = c \times 100 / v$ , with $c$ coverage and $v$ views to 90% coverage
Number of views to reach specified coverage milestones (50%, 80%, 90%)
Overlap ratio at scan intervals

Method	ShapeNet (Coverage/Overlap/Efficiency)	ModelNet40 (Coverage/Overlap/Efficiency)	Repository (Coverage/Overlap/Efficiency)
BENBV	89.1% / 59.3% / 8.76	89.1% / 62.2% / 9.01	95.0% / 53.5% / 13.0
BENBV-Net	85.9% / 55.3% / 7.51	87.3% / 58.2% / 8.03	94.3% / 47.0% / 11.2
PC-NBV	87.4% / 33.8% / 7.18	88.2% / 33.1% / 7.49	91.9% / 32.6% / 8.59
SEE	62.9% / 55.2% / 4.13	65.8% / 56.2% / 4.40	77.7% / 57.9% / 5.53

BENBV-Net attains 85.9–94.3% coverage, 47.0–55.3% overlap, achieving 7.51–11.2 efficiency; these values approach model-based BENBV and consistently outperform prior works PC-NBV and SEE, especially in practical early-overlap and coverage milestone metrics. BENBV-Net achieves an average $8\times$ reduction in per-object runtime ( $\sim7.6$ –$7.8$ s) relative to model-based search ( $\sim64$ –$67$ s), facilitating near real-time NBV selection.

Ablation studies indicate the position-aware loss and context fusion significantly enhance early overlap; omitting view-order weighting degrades performance by approximately 5% in initial scans.

6. Applications, Flexibility, and Future Directions

BENBV-Net is architected for efficient NBV selection in object-agnostic, model-free 3D scanning contexts where rapid adaptation to scene geometry is critical. The design enables:

Flexible deployment on varying sensors and working distances, as $d$ is trivially adjustable at test time.
Intrinsic registration robustness via overlap-aware boundary prioritization, balancing discovery and scan alignment as coverage accumulates.

However, certain limitations exist: the pipeline is not fully end-to-end, as boundary extraction is performed separately, and key hyperparameters (e.g., cluster count, angle threshold, loss weighting) require tuning. Anticipated extensions include integration of learned boundary detectors, explicit robot motion planning, and joint optimization of camera placement and orientation.

A plausible implication is that the boundary exploration paradigm promoted by BENBV-Net may generalize to active vision frameworks beyond point clouds, especially where incremental, registration-aware information gain is paramount.

7. Summary and Key Insights

BENBV-Net exemplifies a practical NBV-Net in which hierarchical point/boundary-feature encoders and context-fusion strategies enable rapid, boundary-specific NBV prediction. The network’s position-aware regression loss, fusion of geometric context (local density and boundary sequence), and inter-boundary self-attention are central to its efficacy. BENBV-Net closely approximates model-based NBV search accuracy with an order-of-magnitude speed improvement, facilitating real-time usage in complex and unstructured 3D scanning environments (Li et al., 2024).

Markdown Upgrade to Chat

References (1)

Boundary Exploration of Next Best View Policy in 3D Robotic Scanning (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to NBV-Net.

NBV-Net: Rapid Boundary Exploration in 3D Scanning

1. NBV Problem Formulation

2. Model-Based Boundary-Exploration NBV Policy

3. BENBV-Net Architecture and Training

Architectural Components

Loss Function

4. NBV Prediction and Execution

5. Experimental Evaluation and Performance

6. Applications, Flexibility, and Future Directions

7. Summary and Key Insights

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

NBV-Net: Rapid Boundary Exploration in 3D Scanning

1. NBV Problem Formulation

2. Model-Based Boundary-Exploration NBV Policy

3. BENBV-Net Architecture and Training

Architectural Components

Loss Function

4. NBV Prediction and Execution

5. Experimental Evaluation and Performance

6. Applications, Flexibility, and Future Directions

7. Summary and Key Insights

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research