Papers
Topics
Authors
Recent
Search
2000 character limit reached

GenNBV: Generalizable Next-Best-View Policy for Active 3D Reconstruction

Published 25 Feb 2024 in cs.CV, cs.AI, and cs.RO | (2402.16174v3)

Abstract: While recent advances in neural radiance field enable realistic digitization for large-scale scenes, the image-capturing process is still time-consuming and labor-intensive. Previous works attempt to automate this process using the Next-Best-View (NBV) policy for active 3D reconstruction. However, the existing NBV policies heavily rely on hand-crafted criteria, limited action space, or per-scene optimized representations. These constraints limit their cross-dataset generalizability. To overcome them, we propose GenNBV, an end-to-end generalizable NBV policy. Our policy adopts a reinforcement learning (RL)-based framework and extends typical limited action space to 5D free space. It empowers our agent drone to scan from any viewpoint, and even interact with unseen geometries during training. To boost the cross-dataset generalizability, we also propose a novel multi-source state embedding, including geometric, semantic, and action representations. We establish a benchmark using the Isaac Gym simulator with the Houses3K and OmniObject3D datasets to evaluate this NBV policy. Experiments demonstrate that our policy achieves a 98.26% and 97.12% coverage ratio on unseen building-scale objects from these datasets, respectively, outperforming prior solutions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. Vision-only robot navigation in a neural radiance world. IEEE Robotics and Automation Letters, 7(2):4606–4613, 2022.
  2. Jack E Bresenham. Algorithm for computer control of a digital plotter. IBM Systems journal, 4(1):25–30, 1965.
  3. Learning to explore using active neural slam. In International Conference on Learning Representations, 2020.
  4. Tensorf: Tensorial radiance fields. In European Conference on Computer Vision, pages 333–350. Springer, 2022.
  5. Learning exploration policies for navigation. In International Conference on Learning Representations, 2019.
  6. Leveraging procedural generation to benchmark reinforcement learning. In International conference on machine learning, pages 2048–2056. PMLR, 2020.
  7. Objaverse: A universe of annotated 3d objects. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13142–13153, 2023.
  8. A reinforcement learning approach to the view planning problem. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6933–6941, 2017.
  9. Crazyflie 2.0 quadrotor as a platform for research and education in robotics and control engineering. In 2017 22nd International Conference on Methods and Models in Automation and Robotics (MMAR), pages 37–42. IEEE, 2017.
  10. Scone: Surface coverage optimization in unknown environments by volumetric integration. In Advances in Neural Information Processing Systems, 2022.
  11. Asynchronous collaborative autoscanning with mode switching for multi-robot scene reconstruction. ACM Transactions on Graphics (TOG), 41(6):1–13, 2022.
  12. Next-best-view planning for surface reconstruction of large-scale 3d environments with multiple uavs. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1567–1574. IEEE, 2020.
  13. An information gain formulation for active volumetric 3d reconstruction. In 2016 IEEE International Conference on Robotics and Automation (ICRA), pages 3477–3484. IEEE, 2016.
  14. Eslam: Efficient dense slam system based on hybrid representation of signed distance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17408–17419, 2023.
  15. Curl: Contrastive unsupervised representations for reinforcement learning. In International Conference on Machine Learning, pages 5639–5650. PMLR, 2020.
  16. Randomized kinodynamic planning. The international journal of robotics research, 20(5):378–400, 2001.
  17. Uncertainty guided policy for active robotic 3d reconstruction using neural radiance fields. IEEE Robotics and Automation Letters, 7(4):12070–12077, 2022.
  18. Scenarionet: Open-source platform for large-scale traffic scenario simulation and modeling. In Advances in Neural Information Processing Systems, 2022a.
  19. Metadrive: Composing diverse driving scenarios for generalizable reinforcement learning. IEEE transactions on pattern analysis and machine intelligence, 45(3):3461–3475, 2022b.
  20. Matrixcity: A large-scale city dataset for city-scale neural rendering and beyond. arXiv e-prints, pages arXiv–2308, 2023.
  21. Learning reconstructability for drone aerial path planning. ACM Transactions on Graphics (TOG), 41(6):1–17, 2022.
  22. Isaac gym: High performance gpu-based physics simulation for robot learning, 2021.
  23. Nerf: Representing scenes as neural radiance fields for view synthesis. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16, pages 405–421. Springer, 2020.
  24. isdf: Real-time neural signed distance fields for robot perception. Robotics: Science and Systems, 2022.
  25. Activenerf: Learning where to see with uncertainty estimation. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXIII, pages 230–246. Springer, 2022.
  26. Deepsdf: Learning continuous signed distance functions for shape representation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 165–174, 2019.
  27. Houses3k dataset. https://github.com/darylperalta/Houses3K, 2020a.
  28. Next-best view policy for 3d reconstruction. In Computer Vision–ECCV 2020 Workshops: Glasgow, UK, August 23–28, 2020, Proceedings, Part IV 16, pages 558–573. Springer, 2020b.
  29. Stable-baselines3: Reliable reinforcement learning implementations. The Journal of Machine Learning Research, 22(1):12348–12355, 2021.
  30. Neurar: Neural uncertainty for autonomous 3d reconstruction with implicit neural representations. IEEE Robotics and Automation Letters, 2023.
  31. Accelerating 3d deep learning with pytorch3d. arXiv:2007.08501, 2020.
  32. Nerf-slam: Real-time dense monocular slam with neural radiance fields. arXiv preprint arXiv:2210.13641, 2022.
  33. Learning to walk in minutes using massively parallel deep reinforcement learning. In 5th Annual Conference on Robot Learning, 2021.
  34. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  35. James A Sethian. A fast marching level set method for monotonically advancing fronts. proceedings of the National Academy of Sciences, 93(4):1591–1595, 1996.
  36. Uncertainty-driven active vision for implicit scene reconstruction. arXiv preprint arXiv:2210.00978, 2022.
  37. The replica dataset: A digital replica of indoor spaces. arXiv preprint arXiv:1906.05797, 2019.
  38. imap: Implicit mapping and positioning in real-time. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6229–6238, 2021.
  39. Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5459–5469, 2022a.
  40. Neuralrecon: Real-time coherent 3d reconstruction from monocular video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15598–15607, 2021.
  41. Neural 3D reconstruction in the wild. In SIGGRAPH Conference Proceedings, 2022b.
  42. Sebastian Thrun. Probabilistic robotics. Communications of the ACM, 45(3):52–57, 2002.
  43. Domain randomization for transferring deep neural networks from simulation to the real world. In 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS), pages 23–30. IEEE, 2017.
  44. Omniobject3d: Large-vocabulary 3d object dataset for realistic perception, reconstruction and generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 803–814, 2023.
  45. Bungeenerf: Progressive neural radiance field for extreme multi-scale scene rendering. In The European Conference on Computer Vision (ECCV), 2022.
  46. Multi-robot active mapping via neural bipartite graph matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14839–14848, 2022.
  47. Cem Yuksel. Sample elimination for generating poisson disk sample sets. Computer Graphics Forum, 34(2):25–32, 2015.
  48. Activermap: Radiance field for active mapping and planning. arXiv preprint arXiv:2211.12656, 2022.
  49. Continuous aerial path planning for 3d urban scene reconstruction. ACM Trans. Graph., 40(6):225–1, 2021.
  50. Nerfusion: Fusing radiance fields for large-scale scene reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5449–5458, 2022.
  51. Open3D: A modern library for 3D data processing. arXiv:1801.09847, 2018.
  52. Nice-slam: Neural implicit scalable encoding for slam. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12786–12796, 2022.
Citations (3)

Summary

  • The paper introduces an RL-based NBV policy that expands the action space to a 5D free space, enabling flexible viewpoint selection.
  • It employs a multi-source state embedding that fuses geometric, semantic, and action information and optimizes the policy via PPO.
  • Experiments show superior performance with coverage ratios up to 98.26%, highlighting robust generalization across diverse 3D scenes.

GenNBV: Generalizable Next-Best-View Policy for Active 3D Reconstruction

Introduction

The paper "GenNBV: Generalizable Next-Best-View Policy for Active 3D Reconstruction" addresses the challenge of automating the image-capturing process for large-scale scene reconstruction, which continues to be labor-intensive and time-consuming despite advances in neural radiance fields. Traditional Next-Best-View (NBV) policies face limitations due to reliance on hand-crafted criteria and constrained action spaces. GenNBV introduces an RL-based framework that expands the action space to a 5D free space, allowing drones to interact with unseen geometries during training and scan from any viewpoint. Figure 1

Figure 1: To determine the best view for 3D reconstruction, previous methods only chose from hand-crafted action space or based on object-centric capturing, lacking the ability to generalize to unforeseen scenes (Left). With our end-to-end trained, generalized free-space policy, it can generalize to unseen objects, enabling the captured drone to image from any viewpoint (Right).

Methodology

GenNBV reformulates the NBV problem as a Markov Decision Process (MDP), proposing both a novel large action space and a multi-source state embedding to ensure effective policy generalization across diverse object geometries.

Action Space

The action space in GenNBV is composed of 3D position coordinates and 2D rotation angles (yaw and pitch), enabling free space exploration, unlike previous methods that confined drones to predefined surfaces such as hemispheres.

State Embedding

The state embedding incorporates geometric, semantic, and action representations. A probabilistic 3D grid is employed to track voxel occupancy, differentiating between occupied and unscanned but empty regions. Semantic embeddings are derived from multi-frame RGB images, aiding in identifying environmental context through visual clues.

Policy Network

GenNBV uses a 3-layer MLP to generate actions sampled from the stochastic policy informed by state embeddings. The policy is optimized using PPO, which leverages parallelized sampling for increased efficiency. Figure 2

Figure 2: Overview of our proposed framework GenNBV. Our end-to-end policy takes the historical multi-source observations as input, transforms them into a more informative scene representation, and produces the next viewpoint position. A reward signal will be returned at training time to optimize the end-to-end policy for maximizing the expected cumulative reward in one episode. Specifically, the signal is the increased coverage ratio after collecting a new viewpoint.

Experiments

Experimental validation showed that GenNBV achieved superior performance on the Houses3K and OmniObject3D datasets, with a 98.26% and 97.12% coverage ratio respectively, demonstrating robust generalization capabilities. Figure 3

Figure 3: The visualization results of unseen 3D objects reconstructed by Scan-RL~\citep{peralta2020next}.

Results

The proposed GenNBV policy significantly outperformed heuristic, information-gain based, and other RL-based baselines on building-scale object reconstruction, as evidenced by improved AUC and coverage metrics. Figure 4

Figure 4: The visualization results of an unseen 3D outdoor scene with enormous details from Objaverse, reconstructed by Uncertainty-guided, Scan-RL and our model. Compared to the uncertainty-guided method and Scan-RL, the scene reconstructed by our method is more watertight and has fewer holes on the ground and building surface, especially in the region highlighted by the red box.

Conclusion

GenNBV demonstrates a scalable approach to active 3D reconstruction in complex environments through its generalizable and efficient view planning policy. Its implementation in diverse datasets highlights its potential for real-world applications where robust scene understanding is requisite. Future work may explore optimizing reward functions or extending generalizability to more complex scenarios. Figure 5

Figure 5: The curve of coverage ratio with the increasing number of training objects on unseen OmniObject3D house category.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We found no open problems mentioned in this paper.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.