DistGrid: Scalable Scene Reconstruction with Distributed Multi-resolution Hash Grid (2405.04416v2)
Abstract: Neural Radiance Field~(NeRF) achieves extremely high quality in object-scaled and indoor scene reconstruction. However, there exist some challenges when reconstructing large-scale scenes. MLP-based NeRFs suffer from limited network capacity, while volume-based NeRFs are heavily memory-consuming when the scene resolution increases. Recent approaches propose to geographically partition the scene and learn each sub-region using an individual NeRF. Such partitioning strategies help volume-based NeRF exceed the single GPU memory limit and scale to larger scenes. However, this approach requires multiple background NeRF to handle out-of-partition rays, which leads to redundancy of learning. Inspired by the fact that the background of current partition is the foreground of adjacent partition, we propose a scalable scene reconstruction method based on joint Multi-resolution Hash Grids, named DistGrid. In this method, the scene is divided into multiple closely-paved yet non-overlapped Axis-Aligned Bounding Boxes, and a novel segmented volume rendering method is proposed to handle cross-boundary rays, thereby eliminating the need for background NeRFs. The experiments demonstrate that our method outperforms existing methods on all evaluated large-scale scenes, and provides visually plausible scene reconstruction. The scalability of our method on reconstruction quality is further evaluated qualitatively and quantitatively.
- Building rome in a day. Commun. ACM 54, 10 (2011), 105–112.
- Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5855–5864.
- Mip-nerf 360: Unbounded anti-aliased neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5470–5479.
- Polygon mesh processing. CRC press.
- Efficient geometry-aware 3D generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16123–16133.
- Tensorf: Tensorial radiance fields. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXII. Springer, 333–350.
- Shenchang Eric Chen and Lance Williams. 1993. View interpolation for image synthesis. In Proceedings of the 20th annual conference on Computer graphics and interactive techniques. 279–288.
- Zhiqin Chen and Hao Zhang. 2019. Learning implicit fields for generative shape modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5939–5948.
- 3d-r2n2: A unified approach for single and multi-view 3d object reconstruction. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VIII 14. Springer, 628–644.
- Plenoxels: Radiance fields without neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5501–5510.
- Implicit neural representations with levels-of-experts. Advances in Neural Information Processing Systems 35 (2022), 2564–2576.
- Hiroshi Ishikawa and Davi Geiger. 1999. Mapping image restoration to a graph problem.. In NSIP. 189–193.
- Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
- Aaron Knoll. 2006. A Survey of Octree Volume Rendering Methods. In Visualization of Large and Unstructured Data Sets.
- Barf: Bundle-adjusting neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5741–5751.
- Capturing, reconstructing, and simulating: the urbanscene3d dataset. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part VIII. Springer, 93–109.
- Pixel-perfect structure-from-motion with featuremetric refinement. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5987–5997.
- Nerf in the wild: Neural radiance fields for unconstrained photo collections. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7210–7219.
- Saeed Masoudnia and Reza Ebrahimpour. 2014. Mixture of experts: a literature survey. The Artificial Intelligence Review 42, 2 (2014), 275.
- Nelson Max. 1995. Optical models for direct volume rendering. IEEE Transactions on Visualization and Computer Graphics 1, 2 (1995), 99–108.
- Occupancy networks: Learning 3d reconstruction in function space. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 4460–4470.
- Zhenxing Mi and Dan Xu. 2023. Switch-NeRF: Learning Scene Decomposition with Mixture of Experts for Large-scale Neural Radiance Fields. In International Conference on Learning Representations (ICLR). https://openreview.net/forum?id=PQ2zoIZqvm
- Nerf: Representing scenes as neural radiance fields for view synthesis. Commun. ACM 65, 1 (2021), 99–106.
- Deepdream-a code example for visualizing neural networks. Google Research 2, 5 (2015).
- Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics (ToG) 41, 4 (2022), 1–15.
- Deepsdf: Learning continuous signed distance functions for shape representation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 165–174.
- Dong Ping Tian et al. 2013. A review on image feature extraction and representation techniques. International Journal of Multimedia and Ubiquitous Engineering 8, 4 (2013), 385–396.
- Detailed real-time urban 3d reconstruction from video. International Journal of Computer Vision 78 (2008), 143–167.
- Urban radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12932–12942.
- Superglue: Learning feature matching with graph neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 4938–4947.
- Johannes L Schonberger and Jan-Michael Frahm. 2016. Structure-from-motion revisited. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4104–4113.
- Steven M Seitz and Charles R Dyer. 1999. Photorealistic scene reconstruction by voxel coloring. International journal of computer vision 35 (1999), 151–173.
- Harry Shum and Sing Bing Kang. 2000. Review of image-based rendering techniques. In Visual Communications and Image Processing 2000, Vol. 4067. SPIE, 2–13.
- Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
- Pushing the boundaries of view extrapolation with multiplane images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 175–184.
- Improved direct voxel grid optimization for radiance fields reconstruction. arXiv preprint arXiv:2206.05085 (2022).
- Block-nerf: Scalable large scene neural view synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8248–8258.
- Mega-nerf: Scalable construction of large-scale nerfs for virtual fly-throughs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12922–12931.
- SUDS: Scalable Urban Dynamic Scenes. arXiv preprint arXiv:2303.14536 (2023).
- Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. arXiv preprint arXiv:2106.10689 (2021).
- Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13, 4 (2004), 600–612.
- Andrew Woo. 1990. Fast ray-box intersection. In Graphics gems. 395–396.
- Bungeenerf: Progressive neural radiance field for extreme multi-scale scene rendering. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXII. Springer, 106–122.
- Grid-guided Neural Radiance Fields for Large Urban Scenes. arXiv preprint arXiv:2303.14001 (2023).
- Nerf++: Analyzing and improving neural radiance fields. arXiv preprint arXiv:2010.07492 (2020).
- The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition. 586–595.
- Nerfusion: Fusing radiance fields for large-scale scene reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5449–5458.
- Efficient Large-scale Scene Representation with a Hybrid of High-resolution Grid and Plane Features. arXiv preprint arXiv:2303.03003 (2023).
- Zhengyou Zhang. 1998. Determining the epipolar geometry and its uncertainty: A review. International journal of computer vision 27 (1998), 161–195.
- Stereo magnification: Learning view synthesis using multiplane images. arXiv preprint arXiv:1805.09817 (2018).
- Sidun Liu (7 papers)
- Peng Qiao (21 papers)
- Zongxin Ye (3 papers)
- Wenyu Li (19 papers)
- Yong Dou (33 papers)