SAID-NeRF: Segmentation-AIDed NeRF for Depth Completion of Transparent Objects (2403.19607v1)
Abstract: Acquiring accurate depth information of transparent objects using off-the-shelf RGB-D cameras is a well-known challenge in Computer Vision and Robotics. Depth estimation/completion methods are typically employed and trained on datasets with quality depth labels acquired from either simulation, additional sensors or specialized data collection setups and known 3d models. However, acquiring reliable depth information for datasets at scale is not straightforward, limiting training scalability and generalization. Neural Radiance Fields (NeRFs) are learning-free approaches and have demonstrated wide success in novel view synthesis and shape recovery. However, heuristics and controlled environments (lights, backgrounds, etc) are often required to accurately capture specular surfaces. In this paper, we propose using Visual Foundation Models (VFMs) for segmentation in a zero-shot, label-free way to guide the NeRF reconstruction process for these objects via the simultaneous reconstruction of semantic fields and extensions to increase robustness. Our proposed method Segmentation-AIDed NeRF (SAID-NeRF) shows significant performance on depth completion datasets for transparent objects and robotic grasping.
- J. Jiang, et al., “Robotic perception of transparent objects: A review,” IEEE Transactions on Artificial Intelligence, 2023.
- H. Fang, et al., “Transcg: A large-scale real-world dataset for transparent object depth completion and a grasping baseline,” IEEE Robotics and Automation Letters, vol. 7, no. 3, pp. 7383–7390, 2022.
- Y. R. Wang, et al., “Mvtrans: Multi-view perception of transparent objects,” in 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 3771–3778.
- Q. Dai, et al., “Domain randomization-enhanced depth simulation and restoration for perceiving and grasping specular and transparent objects,” in European Conference on Computer Vision. Springer, 2022, pp. 374–391.
- S. Sajjan, et al., “Clear grasp: 3d shape estimation of transparent objects for manipulation,” in 2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2020, pp. 3634–3642.
- X. Chen, et al., “Clearpose: Large-scale transparent object dataset and benchmark,” in European Conference on Computer Vision, 2022.
- B. Mildenhall, et al., “Nerf: Representing scenes as neural radiance fields for view synthesis,” Communications of the ACM, vol. 65, no. 1, pp. 99–106, 2021.
- T. Müller, et al., “Instant neural graphics primitives with a multiresolution hash encoding,” ACM Trans. Graph., vol. 41, no. 4, pp. 102:1–102:15, July 2022. [Online]. Available: https://doi.org/10.1145/3528223.3530127
- P. Wang, et al., “Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction,” NeurIPS, 2021.
- J. Ichnowski*, et al., “Dex-NeRF: Using a neural radiance field to grasp transparent objects,” in Conference on Robot Learning (CoRL), 2020.
- R. Bommasani, et al., “On the opportunities and risks of foundation models,” arXiv preprint arXiv:2108.07258, 2021.
- A. Kirillov, et al., “Segment anything,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2023, pp. 4015–4026.
- E. Xie, et al., “Segmenting transparent objects in the wild,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIII 16. Springer, 2020, pp. 696–711.
- F. Li, et al., “Semantic-sam: Segment and recognize anything at any granularity,” arXiv preprint arXiv:2307.04767, 2023.
- X. Zou, et al., “Segment everything everywhere all at once,” arXiv preprint arXiv:2304.06718, 2023.
- S. F. Bhat, et al., “Zoedepth: Zero-shot transfer by combining relative and metric depth,” arXiv preprint arXiv:2302.12288, 2023.
- R. Martin-Brualla, et al., “NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections,” in CVPR, 2021.
- D. Rebain, et al., “Derf: Decomposed radiance fields,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 14 153–14 161.
- M. Boss, et al., “Nerd: Neural reflectance decomposition from image collections,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 12 684–12 694.
- P. P. Srinivasan, et al., “Nerv: Neural reflectance and visibility fields for relighting and view synthesis,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 7495–7504.
- X. Zhang, et al., “Nerfactor: Neural factorization of shape and reflectance under an unknown illumination,” ACM Transactions on Graphics (ToG), vol. 40, no. 6, pp. 1–18, 2021.
- D. Verbin, et al., “Ref-nerf: Structured view-dependent appearance for neural radiance fields,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2022, pp. 5481–5490.
- B. Roessle, et al., “Dense depth priors for neural radiance fields from sparse input views,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 12 892–12 901.
- Guangcong, et al., “Sparsenerf: Distilling depth ranking for few-shot novel view synthesis,” IEEE/CVF International Conference on Computer Vision (ICCV), 2023.
- M. A. Uy, et al., “Scade: Nerfs from space carving with ambiguity-aware depth estimates,” in Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
- M. Prinzler, et al., “Diner: Depth-aware image-based neural radiance fields,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 12 449–12 459.
- J. Song, et al., “Därf: Boosting radiance fields from sparse input views with monocular depth adaptation,” Advances in Neural Information Processing Systems, vol. 36, 2023.
- S. Vora, et al., “Nesf: Neural semantic fields for generalizable semantic segmentation of 3d scenes,” Trans. Mach. Learn. Res., vol. 2022, 2022. [Online]. Available: https://openreview.net/forum?id=ggPhsYCsm9
- A. Kundu, et al., “Panoptic neural fields: A semantic object-aware neural scene representation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 12 871–12 881.
- S. Zhi, et al., “In-place scene labelling and understanding with implicit scene representation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 15 838–15 847.
- X. Fu, et al., “Panoptic nerf: 3d-to-2d label transfer for panoptic urban scene segmentation,” in 2022 International Conference on 3D Vision (3DV). IEEE, 2022, pp. 1–11.
- A. Mirzaei, et al., “Spin-nerf: Multiview segmentation and perceptual inpainting with neural radiance fields,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 20 669–20 679.
- Y. Wang, et al., “Pet-neus: Positional encoding triplanes for neural surfaces,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023.
- J. Kerr, et al., “Evo-nerf: Evolving nerf for sequential robot grasping of transparent objects,” in 6th Annual Conference on Robot Learning, 2022.
- Q. Dai, et al., “Graspnerf: multiview-based 6-dof grasp detection for transparent and specular objects using generalizable nerf,” in 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 1757–1763.
- M. Breyer, et al., “Volumetric grasping network: Real-time 6 dof grasp detection in clutter,” in Conference on Robot Learning. PMLR, 2021, pp. 1602–1611.
- J. Tang, et al., “Delicate textured mesh recovery from nerf via adaptive surface refinement,” arXiv preprint arXiv:2303.02091, 2022.
- H. Zhu, et al., “Rhino: Regularizing the hash-based implicit neural representation,” arXiv preprint arXiv:2309.12642, 2023.
- J. L. Schonberger and J.-M. Frahm, “Structure-from-motion revisited,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 4104–4113.
- H.-S. Fang, et al., “Graspnet-1billion: A large-scale benchmark for general object grasping,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 11 444–11 453.
- X. Liu, et al., “Unseen object few-shot semantic segmentation for robotic grasping,” IEEE Robotics and Automation Letters, vol. 8, no. 1, pp. 320–327, 2022.
- M. Tancik, et al., “Nerfstudio: A modular framework for neural radiance field development,” in ACM SIGGRAPH 2023 Conference Proceedings, 2023, pp. 1–12.
- R. Li, et al., “Nerfacc: Efficient sampling accelerates nerfs,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2023, pp. 18 537–18 546.
- J. T. Barron, et al., “Mip-nerf 360: Unbounded anti-aliased neural radiance fields,” CVPR, 2022.
- L. Ke, et al., “Segment anything in high quality,” in NeurIPS, 2023.
- D. Coleman, et al., “Reducing the barrier to entry of complex robotic software: a moveit! case study,” Journal of Software Engineering in Robotics (JOSER), 2014.
- M. Breyer, et al., “Volumetric grasping network: Real-time 6 dof grasp detection in clutter,” in Conference on Robot Learning, 2020.