Residual-NeRF: Learning Residual NeRFs for Transparent Object Manipulation (2405.06181v1)
Abstract: Transparent objects are ubiquitous in industry, pharmaceuticals, and households. Grasping and manipulating these objects is a significant challenge for robots. Existing methods have difficulty reconstructing complete depth maps for challenging transparent objects, leaving holes in the depth reconstruction. Recent work has shown neural radiance fields (NeRFs) work well for depth perception in scenes with transparent objects, and these depth maps can be used to grasp transparent objects with high accuracy. NeRF-based depth reconstruction can still struggle with especially challenging transparent objects and lighting conditions. In this work, we propose Residual-NeRF, a method to improve depth perception and training speed for transparent objects. Robots often operate in the same area, such as a kitchen. By first learning a background NeRF of the scene without transparent objects to be manipulated, we reduce the ambiguity faced by learning the changes with the new object. We propose training two additional networks: a residual NeRF learns to infer residual RGB values and densities, and a Mixnet learns how to combine background and residual NeRFs. We contribute synthetic and real experiments that suggest Residual-NeRF improves depth perception of transparent objects. The results on synthetic data suggest Residual-NeRF outperforms the baselines with a 46.1% lower RMSE and a 29.5% lower MAE. Real-world qualitative experiments suggest Residual-NeRF leads to more robust depth maps with less noise and fewer holes. Website: https://residual-nerf.github.io
- J. Ichnowski*, Y. Avigal*, J. Kerr, and K. Goldberg, “Dex-NeRF: Using a neural radiance field to grasp transparent objects,” in Conference on Robot Learning (CoRL), 2020.
- X. Chen, H. Zhang, Z. Yu, A. Opipari, and O. C. Jenkins, “Clearpose: Large-scale transparent object dataset and benchmark,” in European Conference on Computer Vision, 2022.
- C. Phillips, M. Lecce, and K. Daniilidis, “Seeing glassware: from edge detection to pose estimation and shape recovery,” 06 2016.
- C. Xu, J. Chen, M. Yao, J. Zhou, L. Zhang, and Y. Liu, “6dof pose estimation of transparent object from a single rgb-d image,” Sensors, vol. 20, no. 23, 2020. [Online]. Available: https://www.mdpi.com/1424-8220/20/23/6790
- J. L. Schönberger and J.-M. Frahm, “Structure-from-motion revisited,” in Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
- B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,” in ECCV, 2020.
- J. Kerr, L. Fu, H. Huang, Y. Avigal, M. Tancik, J. Ichnowski, A. Kanazawa, and K. Goldberg, “Evo-nerf: Evolving nerf for sequential robot grasping of transparent objects,” in Proceedings of The 6th Conference on Robot Learning, ser. Proceedings of Machine Learning Research, K. Liu, D. Kulic, and J. Ichnowski, Eds., vol. 205. PMLR, 14–18 Dec 2023, pp. 353–367. [Online]. Available: https://proceedings.mlr.press/v205/kerr23a.html
- ——, “Evo-nerf: Evolving nerf for sequential robot grasping of transparent objects,” in 6th Annual Conference on Robot Learning, 2022.
- T. Müller, A. Evans, C. Schied, and A. Keller, “Instant neural graphics primitives with a multiresolution hash encoding,” ACM Trans. Graph., vol. 41, no. 4, pp. 102:1–102:15, Jul. 2022. [Online]. Available: https://doi.org/10.1145/3528223.3530127
- C. Reiser, S. Peng, Y. Liao, and A. Geiger, “Kilonerf: Speeding up neural radiance fields with thousands of tiny mlps,” CoRR, vol. abs/2103.13744, 2021. [Online]. Available: https://arxiv.org/abs/2103.13744
- L. Liu, J. Gu, K. Z. Lin, T.-S. Chua, and C. Theobalt, “Neural sparse voxel fields,” NeurIPS, 2020.
- A. Yu, R. Li, M. Tancik, H. Li, R. Ng, and A. Kanazawa, “PlenOctrees for real-time rendering of neural radiance fields,” in ICCV, 2021.
- C. Sun, M. Sun, and H. Chen, “Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction,” in CVPR, 2022.
- S. J. Garbin, M. Kowalski, M. Johnson, J. Shotton, and J. Valentin, “Fastnerf: High-fidelity neural rendering at 200fps,” in 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Los Alamitos, CA, USA: IEEE Computer Society, oct 2021, pp. 14 326–14 335. [Online]. Available: https://doi.ieeecomputersociety.org/10.1109/ICCV48922.2021.01408
- S. Lombardi, T. Simon, G. Schwartz, M. Zollhoefer, Y. Sheikh, and J. Saragih, “Mixture of volumetric primitives for efficient neural rendering,” ACM Trans. Graph., vol. 40, no. 4, jul 2021. [Online]. Available: https://doi.org/10.1145/3450626.3459863
- K. Deng, A. Liu, J. Zhu, and D. Ramanan, “Depth-supervised nerf: Fewer views and faster training for free,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA, USA: IEEE Computer Society, jun 2022, pp. 12 872–12 881. [Online]. Available: https://doi.ieeecomputersociety.org/10.1109/CVPR52688.2022.01254
- B. Attal, E. Laidlaw, A. Gokaslan, C. Kim, C. Richardt, J. Tompkin, and M. O’Toole, “Törf: Time-of-flight radiance fields for dynamic scene view synthesis,” Advances in Neural Information Processing Systems, vol. 34, 2021.
- Y. Wei, S. Liu, Y. Rao, W. Zhao, J. Lu, and J. Zhou, “Nerfingmvs: Guided optimization of neural radiance fields for indoor multi-view stereo,” in ICCV, 2021.
- T. Neff, P. Stadlbauer, M. Parger, A. Kurz, J. H. Mueller, C. R. A. Chaitanya, A. S. Kaplanyan, and M. Steinberger, “DONeRF: Towards Real-Time Rendering of Compact Neural Radiance Fields using Depth Oracle Networks,” Computer Graphics Forum, vol. 40, no. 4, 2021. [Online]. Available: https://doi.org/10.1111/cgf.14340
- E. Sucar, S. Liu, J. Ortiz, and A. Davison, “iMAP: Implicit mapping and positioning in real-time,” in Proceedings of the International Conference on Computer Vision (ICCV), 2021.
- J. Tang, “Torch-ngp: a pytorch implementation of instant-ngp,” 2022, https://github.com/ashawkey/torch-ngp.
- D. Verbin, P. Hedman, B. Mildenhall, T. Zickler, J. T. Barron, and P. P. Srinivasan, “Ref-NeRF: Structured view-dependent appearance for neural radiance fields,” CVPR, 2022.
- B. Mildenhall, P. Hedman, R. Martin-Brualla, P. P. Srinivasan, and J. T. Barron, “NeRF in the dark: High dynamic range view synthesis from noisy raw images,” CVPR, 2022.
- J. Y. Zhang, G. Yang, S. Tulsiani, and D. Ramanan, “NeRS: Neural reflectance surfaces for sparse-view 3d reconstruction in the wild,” in Conference on Neural Information Processing Systems, 2021.
- J. Chibane, A. Bansal, V. Lazova, and G. Pons-Moll, “Stereo radiance fields (srf): Learning view synthesis from sparse views of novel scenes,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, jun 2021.
- P. Truong, M.-J. Rakotosaona, F. Manhardt, and F. Tombari, “Sparf: Neural radiance fields from sparse and noisy poses.” IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2023.
- M. Niemeyer, J. T. Barron, B. Mildenhall, M. S. M. Sajjadi, A. Geiger, and N. Radwan, “Regnerf: Regularizing neural radiance fields for view synthesis from sparse inputs,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2022.
- J. Y. Zhang, A. Lin, M. Kumar, T.-H. Yang, D. Ramanan, and S. Tulsiani, “Cameras as rays: Pose estimation via ray diffusion,” in International Conference on Learning Representations (ICLR), 2024.
- C.-H. Lin, W.-C. Ma, A. Torralba, and S. Lucey, “Barf: Bundle-adjusting neural radiance fields,” in IEEE International Conference on Computer Vision (ICCV), 2021.
- L. Yen-Chen, P. Florence, J. T. Barron, A. Rodriguez, P. Isola, and T.-Y. Lin, “iNeRF: Inverting neural radiance fields for pose estimation,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021.
- Y. Chen, X. Chen, X. Wang, Q. Zhang, Y. Guo, Y. Shan, and F. Wang, “Local-to-global registration for bundle-adjusting neural radiance fields,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 8264–8273.
- Y. Jeong, S. Ahn, C. Choy, A. Anandkumar, M. Cho, and J. Park, “Self-calibrating neural radiance fields,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2021, pp. 5846–5854.
- J. Fang, T. Yi, X. Wang, L. Xie, X. Zhang, W. Liu, M. Nießner, and Q. Tian, “Fast dynamic radiance fields with time-aware neural voxels,” in SIGGRAPH Asia 2022 Conference Papers, 2022.
- B. P. Duisterhof, Z. Mandi, Y. Yao, J.-W. Liu, M. Z. Shou, S. Song, and J. Ichnowski, “Md-splatting: Learning metric deformation from 4d gaussians in highly deformable scenes,” 2023.
- G. Wu, T. Yi, J. Fang, L. Xie, X. Zhang, W. Wei, W. Liu, Q. Tian, and W. Xinggang, “4d gaussian splatting for real-time dynamic scene rendering,” arXiv preprint arXiv:2310.08528, 2023.
- J. Luiten, G. Kopanas, B. Leibe, and D. Ramanan, “Dynamic 3d gaussians: Tracking by persistent dynamic view synthesis,” in 3DV, 2024.
- C. Reiser, S. Peng, Y. Liao, and A. Geiger, “Kilonerf: Speeding up neural radiance fields with thousands of tiny mlps,” 2021.
- M. Tancik, V. Casser, X. Yan, S. Pradhan, B. Mildenhall, P. P. Srinivasan, J. T. Barron, and H. Kretzschmar, “Block-nerf: Scalable large scene neural view synthesis,” 2022.
- X. Zhang, S. Bi, K. Sunkavalli, H. Su, and Z. Xu, “Nerfusion: Fusing radiance fields for large-scale scene reconstruction,” CVPR, 2022.
- E. Xie, W. Wang, W. Wang, P. Sun, H. Xu, D. Liang, and P. Luo, “Segmenting transparent objects in the wild with transformer,” 08 2021, pp. 1194–1200.
- Y. R. Wang, Y. Zhao, H. Xu, S. Eppel, A. Aspuru-Guzik, F. Shkurti, and A. Garg, “Mvtrans: Multi-view perception of transparent objects,” 2023.
- J. Mahler, J. Liang, S. Niyaz, M. Laskey, R. Doan, X. Liu, J. A. Ojea, and K. Goldberg, “Dex-net 2.0: Deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics,” CoRR, vol. abs/1703.09312, 2017. [Online]. Available: http://arxiv.org/abs/1703.09312
- B. O. Community, “Blender - a 3d modelling and rendering package,” 2018. [Online]. Available: http://www.blender.org
- Bardienus P. Duisterhof (9 papers)
- Yuemin Mao (3 papers)
- Si Heng Teng (2 papers)
- Jeffrey Ichnowski (55 papers)