2000 character limit reached
Geometry Aware Field-to-field Transformations for 3D Semantic Segmentation (2310.05133v1)
Published 8 Oct 2023 in cs.CV and cs.LG
Abstract: We present a novel approach to perform 3D semantic segmentation solely from 2D supervision by leveraging Neural Radiance Fields (NeRFs). By extracting features along a surface point cloud, we achieve a compact representation of the scene which is sample-efficient and conducive to 3D reasoning. Learning this feature space in an unsupervised manner via masked autoencoding enables few-shot segmentation. Our method is agnostic to the scene parameterization, working on scenes fit with any type of NeRF.
- 3D Semantic Parsing of Large-Scale Indoor Spaces. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1534–1543, Las Vegas, NV, USA, 2016. IEEE.
- Joint 2D-3D-Semantic Data for Indoor Scene Understanding, 2017. arXiv:1702.01105 [cs].
- Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 5835–5844, Montreal, QC, Canada, 2021. IEEE.
- Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields, 2022. arXiv:2111.12077 [cs].
- InverseForm: A Loss Function for Structured Boundary-Aware Segmentation. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5897–5907, Nashville, TN, USA, 2021. IEEE.
- nuScenes: A Multimodal Dataset for Autonomous Driving. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11618–11628, Seattle, WA, USA, 2020. IEEE.
- ShapeNet: An Information-Rich 3D Model Repository, 2015. arXiv:1512.03012 [cs].
- DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs, 2017. arXiv:1606.00915 [cs].
- Vision Transformer Adapter for Dense Predictions, 2023. arXiv:2205.08534 [cs].
- 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. arXiv:1904.08755 [cs], 2019. arXiv: 1904.08755.
- ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes, 2017. arXiv:1702.04405 [cs].
- Depth-supervised NeRF: Fewer Views and Faster Training for Free, 2022. arXiv:2107.02791 [cs].
- On the segmentation of 3D LIDAR point clouds. In 2011 IEEE International Conference on Robotics and Automation, pages 2798–2805, 2011. ISSN: 1050-4729.
- Point Transformer. IEEE Access, 9:134826–134840, 2021. arXiv:2011.00931 [cs].
- Deep Multi-Modal Object Detection and Semantic Segmentation for Autonomous Driving: Datasets, Methods, and Challenges. IEEE Transactions on Intelligent Transportation Systems, 22(3):1341–1360, 2021. Conference Name: IEEE Transactions on Intelligent Transportation Systems.
- Panoptic NeRF: 3D-to-2D Label Transfer for Panoptic Urban Scene Segmentation. In 2022 International Conference on 3D Vision (3DV), pages 1–11, Prague, Czech Republic, 2022. IEEE.
- Benjamin Graham. Spatially-sparse convolutional neural networks. arXiv:1409.6070 [cs], 2014. arXiv: 1409.6070.
- Benjamin Graham and Laurens van der Maaten. Submanifold Sparse Convolutional Networks, 2017. arXiv:1706.01307 [cs].
- 3D Semantic Segmentation with Submanifold Sparse Convolutional Networks. arXiv:1711.10275 [cs], 2017. arXiv: 1711.10275.
- 3D Semantic Segmentation with Submanifold Sparse Convolutional Networks. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9224–9232, Salt Lake City, UT, USA, 2018. IEEE.
- Kubric: A scalable dataset generator, 2022. arXiv:2203.03570 [cs].
- StyleNeRF: A Style-based 3D-Aware Generator for High-resolution Image Synthesis, 2021. arXiv:2110.08985 [cs, stat].
- MeshCNN: a network with an edge. ACM Transactions on Graphics, 38(4):1–12, 2019.
- Masked Autoencoders Are Scalable Vision Learners, 2021a. arXiv:2111.06377 [cs].
- Deep Learning based 3D Segmentation: A Survey, 2021b. arXiv:2103.05423 [cs].
- LCPFormer: Towards Effective 3D Point Cloud Analysis via Local Context Propagation in Transformers, 2023. arXiv:2210.12755 [cs].
- KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera. In Proceedings of the 24th annual ACM symposium on User interface software and technology, pages 559–568, Santa Barbara California USA, 2011. ACM.
- Panoptic Neural Fields: A Semantic Object-Aware Neural Scene Representation, 2022. arXiv:2205.04334 [cs].
- Stratified Transformer for 3D Point Cloud Segmentation. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8490–8499, 2022. ISSN: 2575-7075.
- Deep Projective 3D Semantic Segmentation, 2017. arXiv:1705.03428 [cs].
- PointGrid: A Deep Network for 3D Shape Understanding. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9204–9214, 2018. ISSN: 2575-7075.
- PointCNN: Convolution On $\mathcal{X}$-Transformed Points, 2018. arXiv:1801.07791 [cs].
- Neural Sparse Voxel Fields, 2021. arXiv:2007.11571 [cs].
- NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis, 2020. Number: arXiv:2003.08934 arXiv:2003.08934 [cs].
- Deep learning for monocular depth estimation: A review. Neurocomputing, 438:14–33, 2021.
- Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics, 41(4):1–15, 2022.
- Masked Autoencoders for Point Cloud Self-supervised Learning. In Computer Vision – ECCV 2022, pages 604–621. Springer Nature Switzerland, Cham, 2022. Series Title: Lecture Notes in Computer Science.
- Neural Body: Implicit Neural Representations with Structured Latent Codes for Novel View Synthesis of Dynamic Humans. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9050–9059, Nashville, TN, USA, 2021. IEEE.
- DreamFusion: Text-to-3D using 2D Diffusion, 2022. arXiv:2209.14988 [cs, stat].
- PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space, 2017. arXiv:1706.02413 [cs].
- Block-NeRF: Scalable Large Scene Neural View Synthesis, 2022. arXiv:2202.05263 [cs].
- Nerfstudio: A Modular Framework for Neural Radiance Field Development. In Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Proceedings, pages 1–12, 2023. arXiv:2302.04264 [cs].
- KPConv: Flexible and Deformable Convolution for Point Clouds, 2019. arXiv:1904.08889 [cs].
- Mega-NeRF: Scalable Construction of Large-Scale NeRFs for Virtual Fly- Throughs. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12912–12921, New Orleans, LA, USA, 2022. IEEE.
- Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, pages 6000–6010, Red Hook, NY, USA, 2017. Curran Associates Inc.
- Ref-NeRF: Structured View-Dependent Appearance for Neural Radiance Fields. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5481–5490, New Orleans, LA, USA, 2022. IEEE.
- NeSF: Neural Semantic Fields for Generalizable Semantic Segmentation of 3D Scenes, 2021. Number: arXiv:2111.13260 arXiv:2111.13260 [cs].
- Unsupervised Point Cloud Pre-training via Occlusion Completion. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 9762–9772, Montreal, QC, Canada, 2021. IEEE.
- InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions, 2023. arXiv:2211.05778 [cs].
- VoxSegNet: Volumetric CNNs for Semantic Part Segmentation of 3D Shapes, 2018. arXiv:1809.00226 [cs].
- PointContrast: Unsupervised Pre-training for 3D Point Cloud Understanding, 2020. arXiv:2007.10985 [cs].
- Directionally Convolutional Networks for 3D Shape Segmentation. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 2717–2726, Venice, 2017. IEEE.
- Grid-guided Neural Radiance Fields for Large Urban Scenes, 2023a. arXiv:2303.14001 [cs].
- Point-NeRF: Point-Based Neural Radiance Fields. 2022.
- Point-NeRF: Point-based Neural Radiance Fields, 2023b. arXiv:2201.08845 [cs].
- Yan Xu. Pytorch Implementation of PointNet and PointNet++, 2023. original-date: 2019-03-04T14:24:30Z.
- Masked Surfel Prediction for Self-Supervised Point Cloud Learning, 2022. arXiv:2207.03111 [cs].
- Self-Supervised Pretraining of 3D Features on any Point-Cloud. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 10232–10243, Montreal, QC, Canada, 2021. IEEE.
- HumanNeRF: Efficiently Generated Human Radiance Field from Sparse Inputs. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7733–7743, New Orleans, LA, USA, 2022. IEEE.
- Point Transformer. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 16239–16248, 2021. ISSN: 2380-7504.
- In-Place Scene Labelling and Understanding with Implicit Scene Representation. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 15818–15827, Montreal, QC, Canada, 2021. IEEE.