360Roam: Real-Time Indoor Roaming Using Geometry-Aware 360$^\circ$ Radiance Fields (2208.02705v2)
Abstract: Virtual tour among sparse 360$\circ$ images is widely used while hindering smooth and immersive roaming experiences. The emergence of Neural Radiance Field (NeRF) has showcased significant progress in synthesizing novel views, unlocking the potential for immersive scene exploration. Nevertheless, previous NeRF works primarily focused on object-centric scenarios, resulting in noticeable performance degradation when applied to outward-facing and large-scale scenes due to limitations in scene parameterization. To achieve seamless and real-time indoor roaming, we propose a novel approach using geometry-aware radiance fields with adaptively assigned local radiance fields. Initially, we employ multiple 360$\circ$ images of an indoor scene to progressively reconstruct explicit geometry in the form of a probabilistic occupancy map, derived from a global omnidirectional radiance field. Subsequently, we assign local radiance fields through an adaptive divide-and-conquer strategy based on the recovered geometry. By incorporating geometry-aware sampling and decomposition of the global radiance field, our system effectively utilizes positional encoding and compact neural networks to enhance rendering quality and speed. Additionally, the extracted floorplan of the scene aids in providing visual guidance, contributing to a realistic roaming experience. To demonstrate the effectiveness of our system, we curated a diverse dataset of 360$\circ$ images encompassing various real-life scenes, on which we conducted extensive experiments. Quantitative and qualitative comparisons against baseline approaches illustrated the superior performance of our system in large-scale indoor scene roaming.
- B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,” in European conference on computer vision. Springer, 2020, pp. 405–421.
- K. Zhang, G. Riegler, N. Snavely, and V. Koltun, “Nerf++: Analyzing and improving neural radiance fields,” 2020.
- L. Liu, J. Gu, K. Z. Lin, T.-S. Chua, and C. Theobalt, “Neural sparse voxel fields,” NeurIPS, 2020.
- S. J. Garbin, M. Kowalski, M. Johnson, J. Shotton, and J. Valentin, “Fastnerf: High-fidelity neural rendering at 200fps,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 14 346–14 355.
- J. T. Barron, B. Mildenhall, M. Tancik, P. Hedman, R. Martin-Brualla, and P. P. Srinivasan, “Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 5855–5864.
- A. Pumarola, E. Corona, G. Pons-Moll, and F. Moreno-Noguer, “D-nerf: Neural radiance fields for dynamic scenes,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 10 318–10 327.
- J. T. Barron, B. Mildenhall, D. Verbin, P. P. Srinivasan, and P. Hedman, “Mip-nerf 360: Unbounded anti-aliased neural radiance fields,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 5470–5479.
- T. Müller, A. Evans, C. Schied, and A. Keller, “Instant neural graphics primitives with a multiresolution hash encoding,” ACM Trans. Graph., vol. 41, no. 4, pp. 102:1–102:15, Jul. 2022. [Online]. Available: https://doi.org/10.1145/3528223.3530127
- A. Chen, Z. Xu, A. Geiger, J. Yu, and H. Su, “Tensorf: Tensorial radiance fields,” in Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXII. Springer, 2022, pp. 333–350.
- M. Piala and R. Clark, “Terminerf: Ray termination prediction for efficient neural rendering,” in 2021 International Conference on 3D Vision (3DV). IEEE, 2021, pp. 1106–1114.
- S. Fridovich-Keil, A. Yu, M. Tancik, Q. Chen, B. Recht, and A. Kanazawa, “Plenoxels: Radiance fields without neural networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 5501–5510.
- P. Hedman, P. P. Srinivasan, B. Mildenhall, J. T. Barron, and P. Debevec, “Baking neural radiance fields for real-time view synthesis,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 5875–5884.
- Z. Chen, T. Funkhouser, P. Hedman, and A. Tagliasacchi, “Mobilenerf: Exploiting the polygon rasterization pipeline for efficient neural field rendering on mobile architectures,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 16 569–16 578.
- X. Zhang, S. Bi, K. Sunkavalli, H. Su, and Z. Xu, “Nerfusion: Fusing radiance fields for large-scale scene reconstruction,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 5449–5458.
- H. Turki, D. Ramanan, and M. Satyanarayanan, “Mega-nerf: Scalable construction of large-scale nerfs for virtual fly-throughs,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 12 922–12 931.
- Y. Xiangli, L. Xu, X. Pan, N. Zhao, A. Rao, C. Theobalt, B. Dai, and D. Lin, “Bungeenerf: Progressive neural radiance field for extreme multi-scale scene rendering,” in The European Conference on Computer Vision (ECCV), 2022.
- S. Lombardi, T. Simon, G. Schwartz, M. Zollhoefer, Y. Sheikh, and J. Saragih, “Mixture of volumetric primitives for efficient neural rendering,” ACM Transactions on Graphics (TOG), vol. 40, no. 4, pp. 1–13, 2021.
- A. Yu, R. Li, M. Tancik, H. Li, R. Ng, and A. Kanazawa, “Plenoctrees for real-time rendering of neural radiance fields,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 5752–5761.
- Q. Xu, Z. Xu, J. Philip, S. Bi, Z. Shu, K. Sunkavalli, and U. Neumann, “Point-nerf: Point-based neural radiance fields,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022, pp. 5438–5448.
- T. Neff, P. Stadlbauer, M. Parger, A. Kurz, J. H. Mueller, C. R. A. Chaitanya, A. S. Kaplanyan, and M. Steinberger, “DONeRF: Towards Real-Time Rendering of Compact Neural Radiance Fields using Depth Oracle Networks,” Computer Graphics Forum, vol. 40, no. 4, 2021. [Online]. Available: https://doi.org/10.1111/cgf.14340
- C. Reiser, S. Peng, Y. Liao, and A. Geiger, “Kilonerf: Speeding up neural radiance fields with thousands of tiny mlps,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2021, pp. 14 335–14 345.
- M. Tancik, V. Casser, X. Yan, S. Pradhan, B. Mildenhall, P. P. Srinivasan, J. T. Barron, and H. Kretzschmar, “Block-nerf: Scalable large scene neural view synthesis,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022, pp. 8248–8258.
- X. Wu, J. Xu, Z. Zhu, H. Bao, Q. Huang, J. Tompkin, and W. Xu, “Scalable neural indoor scene rendering,” ACM Transactions on Graphics (TOG), 2022.
- J. Tang, H. Zhou, X. Chen, T. Hu, E. Ding, J. Wang, and G. Zeng, “Delicate textured mesh recovery from nerf via adaptive surface refinement,” arXiv preprint arXiv:2303.02091, 2022.
- L. Yariv, P. Hedman, C. Reiser, D. Verbin, P. P. Srinivasan, R. Szeliski, J. T. Barron, and B. Mildenhall, “Bakedsdf: Meshing neural sdfs for real-time view synthesis,” arXiv preprint arXiv:2302.14859, 2023.
- A. Serrano, I. Kim, Z. Chen, S. DiVerdi, D. Gutierrez, A. Hertzmann, and B. Masia, “Motion parallax for 360° rgbd video,” IEEE Transactions on Visualization and Computer Graphics, vol. 25, no. 5, pp. 1817–1827, 2019.
- J. Huang, Z. Chen, D. Ceylan, and H. Jin, “6-dof vr videos with a single 360-camera,” in 2017 IEEE Virtual Reality (VR). IEEE, 2017, pp. 37–44.
- T. Zhou, R. Tucker, J. Flynn, G. Fyffe, and N. Snavely, “Stereo magnification: Learning view synthesis using multiplane images,” in SIGGRAPH, 2018.
- K.-E. Lin, Z. Xu, B. Mildenhall, P. P. Srinivasan, Y. Hold-Geoffroy, S. DiVerdi, Q. Sun, K. Sunkavalli, and R. Ramamoorthi, “Deep multi depth panoramas for view synthesis,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIII 16. Springer, 2020, pp. 328–344.
- B. Attal, S. Ling, A. Gokaslan, C. Richardt, and J. Tompkin, “MatryODShka: Real-time 6DoF video view synthesis using multi-sphere images,” in European Conference on Computer Vision (ECCV), Aug. 2020. [Online]. Available: https://visual.cs.brown.edu/matryodshka
- M. Broxton, J. Flynn, R. Overbeck, D. Erickson, P. Hedman, M. Duvall, J. Dourgarian, J. Busch, M. Whalen, and P. Debevec, “Immersive light field video with a layered mesh representation,” ACM Transactions on Graphics (TOG), vol. 39, no. 4, pp. 86–1, 2020.
- T. Bertel, M. Yuan, R. Lindroos, and C. Richardt, “Omniphotos: casual 360° vr photography,” ACM TOG, vol. 39, no. 6, pp. 1–12, 2020.
- S.-T. Yang, F.-E. Wang, C.-H. Peng, P. Wonka, M. Sun, and H.-K. Chu, “Dula-net: A dual-projection network for estimating room layouts from a single rgb panorama,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 3363–3372.
- C. Zou, A. Colburn, Q. Shan, and D. Hoiem, “Layoutnet: Reconstructing the 3d room layout from a single rgb image,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 2051–2059.
- C. Sun, C.-W. Hsiao, M. Sun, and H.-T. Chen, “Horizonnet: Learning room layout with 1d representation and pano stretch data augmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 1047–1056.
- F.-E. Wang, Y.-H. Yeh, M. Sun, W.-C. Chiu, and Y.-H. Tsai, “Led 2-net: Monocular 360° layout estimation via differentiable depth rendering,” in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2021, pp. 12 951–12 960.
- N. Zioulis, A. Karakottas, D. Zarpalas, F. Alvarez, and P. Daras, “Spherical view synthesis for self-supervised 360 depth estimation,” in 2019 International Conference on 3D Vision (3DV). IEEE, 2019, pp. 690–699.
- W. Zeng, S. Karaoglu, and T. Gevers, “Joint 3d layout and depth prediction from a single indoor panorama image,” in European Conference on Computer Vision. Springer, 2020, pp. 666–682.
- L. Jin, Y. Xu, J. Zheng, J. Zhang, R. Tang, S. Xu, J. Yu, and S. Gao, “Geometric structure based and regularized depth estimation from 360 indoor imagery,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 889–898.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in neural information processing systems, 2017, pp. 5998–6008.
- M. Yguel, O. Aycard, and C. Laugier, “Update policy of dense maps: Efficient algorithms and sparse representation,” in Field and Service Robotics. Springer, 2008, pp. 23–33.
- A. Hornung, K. M. Wurm, M. Bennewitz, C. Stachniss, and W. Burgard, “Octomap: An efficient probabilistic 3d mapping framework based on octrees,” Autonomous robots, vol. 34, no. 3, pp. 189–206, 2013.
- H. Huang and S.-K. Yeung, “360vo: Visual odometry using a single 360 camera,” in International Conference on Robotics and Automation (ICRA). IEEE, 2022.
- R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 586–595.
- J. Straub, T. Whelan, L. Ma, Y. Chen, E. Wijmans, S. Green, J. J. Engel, R. Mur-Artal, C. Ren, S. Verma, A. Clarkson, M. Yan, B. Budge, Y. Yan, X. Pan, J. Yon, Y. Zou, K. Leon, N. Carter, J. Briales, T. Gillingham, E. Mueggler, L. Pesqueira, M. Savva, D. Batra, H. M. Strasdat, R. D. Nardi, M. Goesele, S. Lovegrove, and R. Newcombe, “The Replica dataset: A digital replica of indoor spaces,” arXiv preprint arXiv:1906.05797, 2019.
- J. Philip, S. Morgenthaler, M. Gharbi, and G. Drettakis, “Free-viewpoint indoor neural relighting from multi-view stereo,” ACM Transactions on Graphics, 2021. [Online]. Available: http://www-sop.inria.fr/reves/Basilic/2021/PMGD21
- J. L. Schönberger and J.-M. Frahm, “Structure-from-motion revisited,” in Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
- J. L. Schönberger, E. Zheng, M. Pollefeys, and J.-M. Frahm, “Pixelwise view selection for unstructured multi-view stereo,” in European Conference on Computer Vision (ECCV), 2016.
- M. Waechter, N. Moehrle, and M. Goesele, “Let there be color! large-scale texturing of 3d reconstructions,” in ECCV. Springer, 2014, pp. 836–850.