Geometry Aware Field-to-field Transformations for 3D Semantic Segmentation (2310.05133v1)

Published 8 Oct 2023 in cs.CV and cs.LG

Abstract: We present a novel approach to perform 3D semantic segmentation solely from 2D supervision by leveraging Neural Radiance Fields (NeRFs). By extracting features along a surface point cloud, we achieve a compact representation of the scene which is sample-efficient and conducive to 3D reasoning. Learning this feature space in an unsupervised manner via masked autoencoding enables few-shot segmentation. Our method is agnostic to the scene parameterization, working on scenes fit with any type of NeRF.

References (61)

3D Semantic Parsing of Large-Scale Indoor Spaces. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1534–1543, Las Vegas, NV, USA, 2016. IEEE.
Joint 2D-3D-Semantic Data for Indoor Scene Understanding, 2017. arXiv:1702.01105 [cs].
Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 5835–5844, Montreal, QC, Canada, 2021. IEEE.
Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields, 2022. arXiv:2111.12077 [cs].
InverseForm: A Loss Function for Structured Boundary-Aware Segmentation. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5897–5907, Nashville, TN, USA, 2021. IEEE.
nuScenes: A Multimodal Dataset for Autonomous Driving. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11618–11628, Seattle, WA, USA, 2020. IEEE.
ShapeNet: An Information-Rich 3D Model Repository, 2015. arXiv:1512.03012 [cs].
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs, 2017. arXiv:1606.00915 [cs].
Vision Transformer Adapter for Dense Predictions, 2023. arXiv:2205.08534 [cs].
4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. arXiv:1904.08755 [cs], 2019. arXiv: 1904.08755.
ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes, 2017. arXiv:1702.04405 [cs].
Depth-supervised NeRF: Fewer Views and Faster Training for Free, 2022. arXiv:2107.02791 [cs].
On the segmentation of 3D LIDAR point clouds. In 2011 IEEE International Conference on Robotics and Automation, pages 2798–2805, 2011. ISSN: 1050-4729.
Point Transformer. IEEE Access, 9:134826–134840, 2021. arXiv:2011.00931 [cs].
Deep Multi-Modal Object Detection and Semantic Segmentation for Autonomous Driving: Datasets, Methods, and Challenges. IEEE Transactions on Intelligent Transportation Systems, 22(3):1341–1360, 2021. Conference Name: IEEE Transactions on Intelligent Transportation Systems.
Panoptic NeRF: 3D-to-2D Label Transfer for Panoptic Urban Scene Segmentation. In 2022 International Conference on 3D Vision (3DV), pages 1–11, Prague, Czech Republic, 2022. IEEE.
Benjamin Graham. Spatially-sparse convolutional neural networks. arXiv:1409.6070 [cs], 2014. arXiv: 1409.6070.
Benjamin Graham and Laurens van der Maaten. Submanifold Sparse Convolutional Networks, 2017. arXiv:1706.01307 [cs].
3D Semantic Segmentation with Submanifold Sparse Convolutional Networks. arXiv:1711.10275 [cs], 2017. arXiv: 1711.10275.
3D Semantic Segmentation with Submanifold Sparse Convolutional Networks. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9224–9232, Salt Lake City, UT, USA, 2018. IEEE.
Kubric: A scalable dataset generator, 2022. arXiv:2203.03570 [cs].
StyleNeRF: A Style-based 3D-Aware Generator for High-resolution Image Synthesis, 2021. arXiv:2110.08985 [cs, stat].
MeshCNN: a network with an edge. ACM Transactions on Graphics, 38(4):1–12, 2019.
Masked Autoencoders Are Scalable Vision Learners, 2021a. arXiv:2111.06377 [cs].
Deep Learning based 3D Segmentation: A Survey, 2021b. arXiv:2103.05423 [cs].
LCPFormer: Towards Effective 3D Point Cloud Analysis via Local Context Propagation in Transformers, 2023. arXiv:2210.12755 [cs].
KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera. In Proceedings of the 24th annual ACM symposium on User interface software and technology, pages 559–568, Santa Barbara California USA, 2011. ACM.
Panoptic Neural Fields: A Semantic Object-Aware Neural Scene Representation, 2022. arXiv:2205.04334 [cs].
Stratified Transformer for 3D Point Cloud Segmentation. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8490–8499, 2022. ISSN: 2575-7075.
Deep Projective 3D Semantic Segmentation, 2017. arXiv:1705.03428 [cs].
PointGrid: A Deep Network for 3D Shape Understanding. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9204–9214, 2018. ISSN: 2575-7075.
PointCNN: Convolution On $\mathcal{X}$-Transformed Points, 2018. arXiv:1801.07791 [cs].
Neural Sparse Voxel Fields, 2021. arXiv:2007.11571 [cs].
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis, 2020. Number: arXiv:2003.08934 arXiv:2003.08934 [cs].
Deep learning for monocular depth estimation: A review. Neurocomputing, 438:14–33, 2021.
Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics, 41(4):1–15, 2022.
Masked Autoencoders for Point Cloud Self-supervised Learning. In Computer Vision – ECCV 2022, pages 604–621. Springer Nature Switzerland, Cham, 2022. Series Title: Lecture Notes in Computer Science.
Neural Body: Implicit Neural Representations with Structured Latent Codes for Novel View Synthesis of Dynamic Humans. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9050–9059, Nashville, TN, USA, 2021. IEEE.
DreamFusion: Text-to-3D using 2D Diffusion, 2022. arXiv:2209.14988 [cs, stat].
PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space, 2017. arXiv:1706.02413 [cs].
Block-NeRF: Scalable Large Scene Neural View Synthesis, 2022. arXiv:2202.05263 [cs].
Nerfstudio: A Modular Framework for Neural Radiance Field Development. In Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Proceedings, pages 1–12, 2023. arXiv:2302.04264 [cs].
KPConv: Flexible and Deformable Convolution for Point Clouds, 2019. arXiv:1904.08889 [cs].
Mega-NeRF: Scalable Construction of Large-Scale NeRFs for Virtual Fly- Throughs. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12912–12921, New Orleans, LA, USA, 2022. IEEE.
Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, pages 6000–6010, Red Hook, NY, USA, 2017. Curran Associates Inc.
Ref-NeRF: Structured View-Dependent Appearance for Neural Radiance Fields. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5481–5490, New Orleans, LA, USA, 2022. IEEE.
NeSF: Neural Semantic Fields for Generalizable Semantic Segmentation of 3D Scenes, 2021. Number: arXiv:2111.13260 arXiv:2111.13260 [cs].
Unsupervised Point Cloud Pre-training via Occlusion Completion. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 9762–9772, Montreal, QC, Canada, 2021. IEEE.
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions, 2023. arXiv:2211.05778 [cs].
VoxSegNet: Volumetric CNNs for Semantic Part Segmentation of 3D Shapes, 2018. arXiv:1809.00226 [cs].
PointContrast: Unsupervised Pre-training for 3D Point Cloud Understanding, 2020. arXiv:2007.10985 [cs].
Directionally Convolutional Networks for 3D Shape Segmentation. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 2717–2726, Venice, 2017. IEEE.
Grid-guided Neural Radiance Fields for Large Urban Scenes, 2023a. arXiv:2303.14001 [cs].
Point-NeRF: Point-Based Neural Radiance Fields. 2022.
Point-NeRF: Point-based Neural Radiance Fields, 2023b. arXiv:2201.08845 [cs].
Yan Xu. Pytorch Implementation of PointNet and PointNet++, 2023. original-date: 2019-03-04T14:24:30Z.
Masked Surfel Prediction for Self-Supervised Point Cloud Learning, 2022. arXiv:2207.03111 [cs].
Self-Supervised Pretraining of 3D Features on any Point-Cloud. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 10232–10243, Montreal, QC, Canada, 2021. IEEE.
HumanNeRF: Efficiently Generated Human Radiance Field from Sparse Inputs. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7733–7743, New Orleans, LA, USA, 2022. IEEE.
Point Transformer. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 16239–16248, 2021. ISSN: 2380-7504.
In-Place Scene Labelling and Understanding with Implicit Scene Representation. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 15818–15827, Montreal, QC, Canada, 2021. IEEE.

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Geometry Aware Field-to-field Transformations for 3D Semantic Segmentation (2310.05133v1)

Summary

Related Papers