fVDB: A Deep-Learning Framework for Sparse, Large-Scale, and High-Performance Spatial Intelligence (2407.01781v1)
Abstract: We present fVDB, a novel GPU-optimized framework for deep learning on large-scale 3D data. fVDB provides a complete set of differentiable primitives to build deep learning architectures for common tasks in 3D learning such as convolution, pooling, attention, ray-tracing, meshing, etc. fVDB simultaneously provides a much larger feature set (primitives and operators) than established frameworks with no loss in efficiency: our operators match or exceed the performance of other frameworks with narrower scope. Furthermore, fVDB can process datasets with much larger footprint and spatial resolution than prior works, while providing a competitive memory footprint on small inputs. To achieve this combination of versatility and performance, fVDB relies on a single novel VDB index grid acceleration structure paired with several key innovations including GPU accelerated sparse grid construction, convolution using tensorcores, fast ray tracing kernels using a Hierarchical Digital Differential Analyzer algorithm (HDDA), and jagged tensors. Our framework is fully integrated with PyTorch enabling interoperability with existing pipelines, and we demonstrate its effectiveness on a number of representative tasks such as large-scale point-cloud segmentation, high resolution 3D generative modeling, unbounded scale Neural Radiance Fields, and large-scale point cloud reconstruction.
- 2023. 3D Karton City model. https://www.turbosquid.com/3d-models/3d-karton-city-2-model-1196110. Accessed: 2023-08-01.
- TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. https://www.tensorflow.org/ Software available from tensorflow.org.
- Academy Software Foundation (ASWF). 2012 – 2024. OpenVDB. https://www.openvdb.org
- Mip-nerf 360: Unbounded anti-aliased neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5470–5479.
- Towards 3D LiDAR-based semantic scene understanding of 3D point cloud sequences: The SemanticKITTI Dataset. The International Journal on Robotics Research 40, 8-9 (2021), 959–967. https://doi.org/10.1177/02783649211006735
- JAX: composable transformations of Python+NumPy programs. http://github.com/google/jax
- Brain MRI super resolution using 3D deep densely connected neural networks. In 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018). IEEE, Washington, DC, 739–742. https://doi.org/10.1109/ISBI.2018.8363679
- Francois Chollet et al. 2015. Keras. https://github.com/fchollet/keras
- 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3075–3084.
- Spconv Contributors. 2022. Spconv: Spatially Sparse Convolution Library. https://github.com/traveller59/spconv.
- FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness. arXiv:2205.14135 [cs.LG]
- Objaverse: A Universe of Annotated 3D Objects. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 13142–13153.
- Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite. In Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 3354–3361.
- DiffTaichi: Differentiable Programming for Physical Simulation. ICLR (2020).
- Taichi: a language for high-performance computation on spatially sparse data structures. ACM Transactions on Graphics (TOG) 38, 6 (2019), 201.
- A Neural Galerkin Solver for Accurate Surface Reconstruction. ACM Trans. Graph. 41, 6, Article 229 (nov 2022), 16 pages. https://doi.org/10.1145/3550454.3555457
- Neural Kernel Surface Reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4369–4379.
- Kaolin: A PyTorch Library for Accelerating 3D Deep Learning Research. arXiv:1911.05063 [cs.CV]
- 3D Gaussian Splatting for Real-Time Radiance Field Rendering. ACM Transactions on Graphics 42, 4 (July 2023). https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/
- NeuralVDB: High-resolution Sparse Volume Representation using Hierarchical Neural Networks. arXiv:2208.04448 [cs.LG]
- Nerfacc: Efficient sampling accelerates nerfs. arXiv preprint arXiv:2305.04966 (2023).
- One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion. arXiv:2311.07885 [cs.CV]
- BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird’s-Eye View Representation. In 2023 IEEE International Conference on Robotics and Automation (ICRA). 2774–2781. https://doi.org/10.1109/ICRA48891.2023.10160968
- Image Super-Resolution for MRI Images using 3D Faster Super-Resolution Convolutional Neural Network architecture. ITM Web of Conferences 32 (2020), 03044. https://doi.org/10.1051/itmconf/20203203044
- Duane Merrill. 2015. Cub. NVIDIA Research (2015).
- Instant neural graphics primitives with a multiresolution hash encoding. ACM transactions on graphics (TOG) 41, 4 (2022), 1–15.
- Ken Museth. 2013. VDB: High-Resolution Sparse Volumes with Dynamic Topology. ACM Trans. Graph. 32, 3, Article 27 (jul 2013), 22 pages. https://doi.org/10.1145/2487228.2487235
- Ken Museth. 2014. Hierarchical Digital Differential Analyzer for Efficient Ray-Marching in OpenVDB. In ACM SIGGRAPH 2014 Talks (Vancouver, Canada) (SIGGRAPH ’14). Association for Computing Machinery, New York, NY, USA, Article 40, 1 pages. https://doi.org/10.1145/2614106.2614136
- Ken Museth. 2021. NanoVDB: A GPU-Friendly and Portable VDB Data Structure For Real-Time Rendering And Simulation. In ACM SIGGRAPH 2021 Talks (Virtual Event, USA) (SIGGRAPH ’21). Association for Computing Machinery, New York, NY, USA, Article 1, 2 pages. https://doi.org/10.1145/3450623.3464653
- PyTorch: An Imperative Style, High-Performance Deep Learning Library. arXiv:1912.01703 [cs.LG]
- PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. arXiv:1706.02413 [cs.CV]
- Accelerating 3D Deep Learning with PyTorch3D. arXiv:2007.08501 (2020).
- XCube: Large-Scale 3D Generative Modeling using Sparse Voxel Hierarchies. arXiv preprint (2023).
- PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
- Scalability in Perception for Autonomous Driving: Waymo Open Dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2443–2451.
- Nerfstudio: A Modular Framework for Neural Radiance Field Development. In ACM SIGGRAPH 2023 Conference Proceedings (SIGGRAPH ’23).
- TorchSparse: Efficient Point Cloud Inference Engine. In Conference on Machine Learning and Systems (MLSys). Indio, CA, USA.
- TorchSparse++: Efficient Training and Inference Framework for Sparse Convolution on GPUs. In IEEE/ACM International Symposium on Microarchitecture (MICRO).
- CUTLASS. https://github.com/NVIDIA/cutlass
- O-CNN: octree-based convolutional neural networks for 3D shape analysis. ACM Transactions on Graphics 36, 4 (July 2017), 1–11. https://doi.org/10.1145/3072959.3073608
- Point Transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). New York, NY, USA, 16259–16268.