Exploiting Priors from 3D Diffusion Models for RGB-Based One-Shot View Planning (2403.16803v2)
Abstract: Object reconstruction is relevant for many autonomous robotic tasks that require interaction with the environment. A key challenge in such scenarios is planning view configurations to collect informative measurements for reconstructing an initially unknown object. One-shot view planning enables efficient data collection by predicting view configurations and planning the globally shortest path connecting all views at once. However, prior knowledge about the object is required to conduct one-shot view planning. In this work, we propose a novel one-shot view planning approach that utilizes the powerful 3D generation capabilities of diffusion models as priors. By incorporating such geometric priors into our pipeline, we achieve effective one-shot view planning starting with only a single RGB image of the object to be reconstructed. Our planning experiments in simulation and real-world setups indicate that our approach balances well between object reconstruction quality and movement cost.
- S. Chitta, “MoveIt!: An Introduction,” Robot Operating System (ROS) The Complete Reference, vol. 1, pp. 3–27, 2016.
- M. Deitke, D. Schwenk, J. Salvador, L. Weihs, O. Michel, E. VanderBilt, L. Schmidt, K. Ehsani, A. Kembhavi, and A. Farhadi, “Objaverse: A Universe of Annotated 3D Objects,” in Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 2023.
- N. Dengler, S. Pan, V. Kalagaturu, R. Menon, M. Dawood, and M. Bennewitz, “Viewpoint Push Planning for Mapping of Unknown Confined Spaces,” in Proc. of the IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), 2023.
- L. Gurobi Optimization, “Gurobi Optimizer Reference Manual,” 2021.
- A. Hornung, K. M. Wurm, M. Bennewitz, C. Stachniss, and W. Burgard, “OctoMap: An Efficient Probabilistic 3D Mapping Framework Based on Octrees,” Autonomous Robots, vol. 34, pp. 189–206, 2013.
- H. Hu, S. Pan, L. Jin, M. Popović, and M. Bennewitz, “Active Implicit Reconstruction Using One-Shot View Planning,” in Proc. of the IEEE Intl. Conf. on Robotics & Automation (ICRA), 2024.
- S. Isler, R. Sabzevari, J. Delmerico, and D. Scaramuzza, “An Information Gain Formulation for Active Volumetric 3D Reconstruction,” in Proc. of the IEEE Intl. Conf. on Robotics & Automation (ICRA), 2016.
- L. Jin, X. Chen, J. Rückin, and M. Popović, “NeU-NBV: Next Best View Planning Using Uncertainty Estimation in Image-Based Neural Rendering,” in Proc. of the IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), 2023.
- A. J. Jonathan Ho and P. Abbeel, “Denoising Diffusion Probabilistic Models,” in Proc. of the Conf. on Neural Information Processing Systems (NeurIPS), 2020.
- R. Kaskman, S. Zakharov, I. Shugurov, and S. Ilic, “HomebrewedDB: RGB-D Dataset for 6D Pose Estimation of 3D Objects,” in Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 2019.
- K. Lin and B. Yi, “Active View Planning for Radiance Fields,” in Robotics Science and Systems (RSS) Workshop on Implicit Representations for Robotic Manipulation, 2022.
- M. Liu, R. Shi, L. Chen, Z. Zhang, C. Xu, X. Wei, H. Chen, C. Zeng, J. Gu, and H. Su, “One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion,” arXiv preprint arXiv:2311.07885, 2023.
- M. Liu, C. Xu, H. Jin, L. Chen, M. Varma T, Z. Xu, and H. Su, “One-2-3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape Optimization,” in Proc. of the Conf. on Neural Information Processing Systems (NeurIPS), 2023.
- R. Liu, R. Wu, B. Van Hoorick, P. Tokmakov, S. Zakharov, and C. Vondrick, “Zero-1-to-3: Zero-Shot One Image to 3D Object,” in Proc. of the IEEE/CVF Intl. Conf. on Computer Vision (ICCV), 2023.
- Y. Liu, C. Lin, Z. Zeng, X. Long, L. Liu, T. Komura, and W. Wang, “SyncDreamer: Generating Multiview-consistent Images from a Single-view Image,” in Proc. of the Intl. Conf. on Learning Representations (ICLR), 2024.
- X. Long, C. Lin, P. Wang, T. Komura, and W. Wang, “SparseNeuS: Fast Generalizable Neural Surface Reconstruction from Sparse Views,” in Proc. of the Europ. Conf. on Computer Vision (ECCV), 2022.
- X. Long, Y.-C. Guo, C. Lin, Y. Liu, Z. Dou, L. Liu, Y. Ma, S.-H. Zhang, M. Habermann, C. Theobalt et al., “Wonder3D: Single Image to 3D Using Cross-Domain Diffusion,” arXiv preprint arXiv:2310.15008, 2023.
- M. Mendoza, J. I. Vasquez-Gomez, H. Taud, L. E. Sucar, and C. Reta, “Supervised Learning of the Next-Best-View for 3D Object Reconstruction,” Pattern Recognition Letters, vol. 133, pp. 224–231, 2020.
- R. Menon, T. Zaenker, N. Dengler, and M. Bennewitz, “NBV-SC: Next Best View Planning based on Shape Completion for Fruit Mapping and Reconstruction,” in Proc. of the IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), 2023.
- B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng, “NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis,” in Proc. of the Europ. Conf. on Computer Vision (ECCV), 2020.
- T. Müller, A. Evans, C. Schied, and A. Keller, “Instant Neural Graphics Primitives with a Multiresolution Hash Encoding,” ACM Trans. on Graphics, vol. 41, no. 4, pp. 102:1–102:15, 2022.
- S. Oßwald, M. Bennewitz, W. Burgard, and C. Stachniss, “Speeding-Up Robot Exploration by Exploiting Background Information,” IEEE Robotics and Automation Letters (RA-L), vol. 1, no. 2, pp. 716–723, 2016.
- E. Palazzolo and C. Stachniss, “Effective Exploration for MAVs Based on the Expected Information Gain,” Drones, vol. 2, no. 1, pp. 59–66, 2018.
- S. Pan and H. Wei, “A Global Max-Flow-Based Multi-Resolution Next-Best-View Method for Reconstruction of 3D Unknown Objects,” IEEE Robotics and Automation Letters (RA-L), vol. 7, no. 2, pp. 714–721, 2022.
- S. Pan and H. Wei, “A Global Generalized Maximum Coverage-Based Solution to the Non-Model-Based View Planning problem for object reconstruction,” Journal of Computer Vision and Image Understanding (CVIU), vol. 226, p. 103585, 2023.
- S. Pan, H. Hu, and H. Wei, “SCVP: Learning One-Shot View Planning via Set Covering for Unknown Object Reconstruction,” IEEE Robotics and Automation Letters (RA-L), vol. 7, no. 2, pp. 1463–1470, 2022.
- S. Pan, H. Hu, H. Wei, N. Dengler, T. Zaenker, and M. Bennewitz, “Integrating One-Shot View Planning with a Single Next-Best View via Long-Tail Multiview Sampling,” arXiv preprint arXiv:2304.00910, 2023.
- S. Pan, L. Jin, H. Hu, M. Popović, and M. Bennewitz, “How Many Views Are Needed to Reconstruct an Unknown Object Using NeRF?” in Proc. of the IEEE Intl. Conf. on Robotics & Automation (ICRA), 2024.
- X. Pan, Z. Lai, S. Song, and G. Huang, “ActiveNeRF: Learning Where to See with Uncertainty Estimation,” in Proc. of the Europ. Conf. on Computer Vision (ECCV), 2022.
- B. Poole, A. Jain, J. T. Barron, and B. Mildenhall, “DreamFusion: Text-to-3D using 2D Diffusion,” in Proc. of the Intl. Conf. on Learning Representations (ICLR), 2023.
- R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-Resolution Image Synthesis with Latent Diffusion Models,” in Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 2022.
- Y. Shi, P. Wang, J. Ye, L. Mai, K. Li, and X. Yang, “MVDream: Multi-view Diffusion for 3D Generation,” in Proc. of the Intl. Conf. on Learning Representations (ICLR), 2024.
- N. Sünderhauf, J. Abou-Chakra, and D. Miller, “Density-Aware NeRF Ensembles: Quantifying Predictive Uncertainty in Neural Radiance Fields,” in Proc. of the IEEE Intl. Conf. on Robotics & Automation (ICRA), 2023.
- Z. Wang, C. Lu, Y. Wang, F. Bao, C. Li, H. Su, and J. Zhu, “ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation,” in Proc. of the Conf. on Neural Information Processing Systems (NeurIPS), 2023.
- Z. Yang, Z. Ren, M. A. Bautista, Z. Zhang, Q. Shan, and Q. Huang, “FvOR: Robust Joint Shape and Pose Optimization for Few-View Object Reconstruction,” in Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 2022.
- T. Zaenker, J. Rückin, R. Menon, M. Popović, and M. Bennewitz, “Graph-Based View Motion Planning for Fruit Detection,” in Proc. of the IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), 2023.
- R. Zeng, W. Zhao, and Y.-J. Liu, “PC-NBV: A Point Cloud Based Deep Network for Efficient Next Best View Planning,” in Proc. of the IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), 2020.