IFFNeRF: Initialisation Free and Fast 6DoF pose estimation from a single image and a NeRF model (2403.12682v1)
Abstract: We introduce IFFNeRF to estimate the six degrees-of-freedom (6DoF) camera pose of a given image, building on the Neural Radiance Fields (NeRF) formulation. IFFNeRF is specifically designed to operate in real-time and eliminates the need for an initial pose guess that is proximate to the sought solution. IFFNeRF utilizes the Metropolis-Hasting algorithm to sample surface points from within the NeRF model. From these sampled points, we cast rays and deduce the color for each ray through pixel-level view synthesis. The camera pose can then be estimated as the solution to a Least Squares problem by selecting correspondences between the query image and the resulting bundle. We facilitate this process through a learned attention mechanism, bridging the query image embedding with the embedding of parameterized rays, thereby matching rays pertinent to the image. Through synthetic and real evaluation settings, we show that our method can improve the angular and translation error accuracy by 80.1% and 67.3%, respectively, compared to iNeRF while performing at 34fps on consumer hardware and not requiring the initial pose guess.
- P. Marion, P. Florence, L. Manuelli, and R. Tedrake, “Label fusion: A pipeline for generating ground truth labels for real rgbd data of cluttered scenes,” in ICRA, 2018.
- L. Manuelli, W. Gao, P. R. Florence, and R. Tedrake, “kpam: Keypoint affordances for category-level robotic manipulation,” in ISRR, 2019.
- Y. Xu, W. Wan, J. Zhang, H. Liu, Z. Shan, H. Shen, R. Wang, H. Geng, Y. Weng, J. Chen, et al., “Unidexgrasp: Universal robotic dexterous grasping via learning diverse proposal generation and goal-conditioned policy,” in CVPR, 2023.
- S. Rajeev, Q. Wan, K. Yau, K. Panetta, and S. Agaian, “Augmented reality-based vision-aid indoor navigation system in gps denied environment,” in Mobile Multimedia/Image Processing, Security, and Applications, 2019.
- B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,” in ECCV, 2020.
- L. Yen-Chen, P. Florence, J. T. Barron, A. Rodriguez, P. Isola, and T.-Y. Lin, “iNeRF: Inverting neural radiance fields for pose estimation,” in IROS, 2021.
- Y. Lin, T. Müller, J. Tremblay, B. Wen, S. Tyree, A. Evans, P. A. Vela, and S. Birchfield, “Parallel inversion of neural radiance fields for robust pose estimation,” in ICRA, 2023.
- M. Bortolon, A. Del Bue, and F. Poiesi, “VM-NeRF: Tackling Sparsity in NeRF with View Morphing,” in ICIAP, 2023.
- N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller, “Equation of state calculations by fast computing machines,” The journal of chemical physics, vol. 21, no. 6, 1953.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in NeurIPS, 2017.
- Y. Zhu, M. Li, W. Yao, and C. Chen, “A review of 6d object pose estimation,” in ITAIC, 2022.
- G. Marullo, L. Tanzi, P. Piazzolla, and E. Vezzetti, “6d object position estimation from 2d images: a literature review,” Multimedia Tools and Applications, vol. 82, no. 16, 2023.
- D. Maggio, M. Abate, J. Shi, C. Mario, and L. Carlone, “Loc-nerf: Monte carlo localization using neural radiance fields,” in ICRA, 2023.
- A. Moreau, N. Piasco, M. Bennehar, D. Tsishkou, B. Stanciulescu, and A. de La Fortelle, “Crossfire: Camera relocalization on self-supervised features from an implicit representation,” in ICCV, 2023.
- D. Verbin, P. Hedman, B. Mildenhall, T. Zickler, J. T. Barron, and P. P. Srinivasan, “Ref-nerf: Structured view-dependent appearance for neural radiance fields,” in CVPR, 2022.
- A. Chen, Z. Xu, A. Geiger, J. Yu, and H. Su, “Tensorf: Tensorial radiance fields,” in ECCV, 2022.
- T. Müller, A. Evans, C. Schied, and A. Keller, “Instant neural graphics primitives with a multiresolution hash encoding,” ACM Trans. Graph., vol. 41, no. 4, 2022.
- L. Masset, O. Brüls, and G. Kerschen, “Partition of the circle in cells of equal area and shape,” Structural Dynamics Research Group, Aerospace and Mechanical Engineering Department, University of Liege, ‘Institut de Mecanique et Genie Civil (B52/3), Tech. Rep., 2011.
- B. Beckers and P. Beckers, “Fast and accurate view factor generation,” in FICUP, An International Conference on Urban Physics, 2016.
- T. Malley, “A shading method for computer generated images,” Master’s thesis, Dept. of Computer Science, University of Utah, 1988.
- L. Jacques, L. Masset, and G. Kerschen, “Direction and surface sampling in ray tracing for spacecraft radiative heat transfer,” Aerospace Science and Technology, vol. 47, 2015.
- T. Tsesmelis, I. Hasan, M. Cristani, A. D. Bue, and F. Galasso, “Rgbd2lux: Dense light intensity estimation with an rgbd sensor,” in WACV, 2018.
- M. Tancik, P. P. Srinivasan, B. Mildenhall, S. Fridovich-Keil, N. Raghavan, U. Singhal, R. Ramamoorthi, J. T. Barron, and R. Ng, “Fourier features let networks learn high frequency functions in low dimensional domains,” in NeurIPS, 2020.
- M. Oquab, T. Darcet, T. Moutakanni, H. Vo, M. Szafraniec, V. Khalidov, P. Fernandez, D. Haziza, F. Massa, A. El-Nouby, M. Assran, N. Ballas, W. Galuba, R. Howes, P.-Y. Huang, S.-W. Li, I. Misra, M. Rabbat, V. Sharma, G. Synnaeve, H. Xu, H. Jegou, J. Mairal, P. Labatut, A. Joulin, and P. Bojanowski, “Dinov2: Learning robust visual features without supervision,” arXiv:2304.07193, 2023.
- A. Knapitsch, J. Park, Q.-Y. Zhou, and V. Koltun, “Tanks and temples: Benchmarking large-scale scene reconstruction,” ACM Transactions on Graphics, vol. 36, no. 4, 2017.
- L. Liu, J. Gu, K. Z. Lin, T.-S. Chua, and C. Theobalt, “Neural sparse voxel fields,” in NeurIPS, 2020.