LSK3DNet: Towards Effective and Efficient 3D Perception with Large Sparse Kernels (2403.15173v1)
Abstract: Autonomous systems need to process large-scale, sparse, and irregular point clouds with limited compute resources. Consequently, it is essential to develop LiDAR perception methods that are both efficient and effective. Although naively enlarging 3D kernel size can enhance performance, it will also lead to a cubically-increasing overhead. Therefore, it is crucial to develop streamlined 3D large kernel designs that eliminate redundant weights and work effectively with larger kernels. In this paper, we propose an efficient and effective Large Sparse Kernel 3D Neural Network (LSK3DNet) that leverages dynamic pruning to amplify the 3D kernel size. Our method comprises two core components: Spatial-wise Dynamic Sparsity (SDS) and Channel-wise Weight Selection (CWS). SDS dynamically prunes and regrows volumetric weights from the beginning to learn a large sparse 3D kernel. It not only boosts performance but also significantly reduces model size and computational cost. Moreover, CWS selects the most important channels for 3D convolution during training and subsequently prunes the redundant channels to accelerate inference for 3D vision tasks. We demonstrate the effectiveness of LSK3DNet on three benchmark datasets and five tracks compared with classical models and large kernel designs. Notably, LSK3DNet achieves the state-of-the-art performance on SemanticKITTI (i.e., 75.6% on single-scan and 63.4% on multi-scan), with roughly 40% model size reduction and 60% computing operations reduction compared to the naive large 3D kernel model.
- Adaptive graph convolution for point cloud analysis. In ICCV, 2021.
- Clustering based point cloud representation learning for 3d analysis. In ICCV, 2023.
- 2dpass: 2d priors assisted semantic segmentation on lidar point clouds. In ECCV, 2022.
- Pv-rcnn: Point-voxel feature set abstraction for 3d object detection. In CVPR, 2020.
- Voxel r-cnn: Towards high performance voxel-based 3d object detection. In AAAI, 2021.
- Is-fusion: Instance-scene collaborative fusion for multimodal 3d object detection. In CVPR, 2024.
- Interpretable3d: An ad-hoc interpretable classifier for 3d point clouds. In AAAI, 2024.
- Point transformer. In ICCV, 2021.
- Stratified transformer for 3d point cloud segmentation. In CVPR, 2022.
- Randla-net: Efficient semantic segmentation of large-scale point clouds. In CVPR, 2020.
- 3d semantic segmentation with submanifold sparse convolutional networks. In CVPR, 2018.
- 4d spatio-temporal convnets: Minkowski convolutional neural networks. In CVPR, 2019.
- Semi-supervised 3d object detection with proficient teachers. In ECCV, 2022a.
- Proposalcontrast: Unsupervised pre-training for lidar-based 3d object detection. In ECCV, 2022b.
- Cylindrical and asymmetrical 3d convolution networks for lidar segmentation. In CVPR, 2021.
- Searching efficient 3d architectures with sparse point-voxel convolution. In ECCV, 2020.
- Center-based 3d object detection and tracking. In CVPR, 2021.
- Scaling up kernels in 3d cnns. CVPR, 2023.
- Point-to-voxel knowledge distillation for lidar semantic segmentation. In CVPR, 2022.
- Scaling up your kernels to 31x31: Revisiting large kernel design in cnns. In CVPR, 2022.
- More convnets in the 2020s: Scaling up kernels beyond 51x51 using sparsity. arXiv preprint arXiv:2207.03620, 2022.
- Sparse single sweep lidar point cloud segmentation via learning contextual shape priors from scene completion. In AAAI, 2021.
- Rpvnet: A deep and efficient range-point-voxel fusion network for lidar point cloud segmentation. In ICCV, 2021.
- Spherical transformer for lidar-based 3d recognition. In CVPR, 2023.
- Semantickitti: A dataset for semantic scene understanding of lidar sequences. In ICCV, 2019.
- Deep learning based object detection for resource constrained devices-systematic review, future trends and challenges ahead. Neurocomputing, 2023.
- Scannet: Richly-annotated 3d reconstructions of indoor scenes. In CVPR, 2017.
- Vision meets robotics: The kitti dataset. The International Journal of Robotics Research, 32(11):1231–1237, 2013.
- Local relation networks for image recognition. In ICCV, 2019.
- Rethinking the inception architecture for computer vision. In CVPR, 2016.
- Inception-v4, inception-resnet and the impact of residual connections on learning. In AAAI, 2017.
- Large kernel matters–improve semantic segmentation by global convolutional network. In CVPR, 2017.
- Looking beyond single images for weakly supervised semantic segmentation learning. IEEE TPAMI, 2022.
- Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
- Deep residual learning for image recognition. In CVPR, 2016.
- Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017.
- Densely connected convolutional networks. In CVPR, 2017.
- Aggregated residual transformations for deep neural networks. In CVPR, 2017.
- Swin transformer: Hierarchical vision transformer using shifted windows. In ICCV, 2021.
- Pointnet: Deep learning on point sets for 3D classification and segmentation. In CVPR, 2017a.
- Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In NeurIPS, 2017b.
- Kpconv: Flexible and deformable convolution for point clouds. In ICCV, 2019.
- Paconv: Position adaptive convolution with dynamic kernel assembling on point clouds. In CVPR, 2021.
- Scf-net: Learning spatial contextual features for large-scale point cloud segmentation. In CVPR, 2021.
- Learning inner-group relations on point clouds. In ICCV, 2021.
- Walk in the cloud: Learning curves for point clouds shape analysis. In ICCV, 2021.
- Relation-shape convolutional neural network for point cloud analysis. In CVPR, 2019.
- Grid-gcn for fast and scalable point cloud learning. In CVPR, 2020.
- Graph attention convolution for point cloud semantic segmentation. In CVPR, 2019.
- A dense pointnet++ architecture for 3d point cloud semantic segmentation. In IGARSS, 2019.
- Pu-gcn: Point cloud upsampling using graph convolutional networks. In CVPR, 2021.
- Deep hierarchical representation of point cloud videos via spatio-temporal decomposition. IEEE TPAMI, 44(12):9918–9930, 2021.
- Point spatio-temporal transformer networks for point cloud video modeling. IEEE TPAMI, 45(2):2181–2192, 2022.
- Pstnet: Point spatio-temporal convolution on point cloud sequences. In ICLR, 2020.
- Point 4d transformer networks for spatio-temporal modeling in point cloud videos. In CVPR, 2021.
- Towards semantic segmentation of urban-scale 3d point clouds: A dataset, benchmarks and challenges. In CVPR, 2021.
- Tangent convolutions for dense prediction in 3d. In CVPR, 2018.
- Squeezeseg: Convolutional neural nets with recurrent crf for real-time road-object segmentation from 3d lidar point cloud. In ICRA, 2018.
- Squeezesegv2: Improved model structure and unsupervised domain adaptation for road-object segmentation from a lidar point cloud. In ICRA, 2019.
- Salsanext: fast, uncertainty-aware semantic segmentation of lidar point clouds for autonomous driving. arXiv preprint arXiv:2003.03653, 2020.
- Squeezesegv3: Spatially-adaptive convolution for efficient point-cloud segmentation. In ECCV, 2020.
- Polarnet: An improved grid representation for online lidar point clouds semantic segmentation. In CVPR, 2020.
- Second: Sparsely embedded convolutional detection. Sensors, 18(10):3337, 2018.
- Focal sparse convolutional networks for 3d object detection. In CVPR, 2022.
- Deep rewiring: Training very sparse deep networks. arXiv preprint arXiv:1711.05136, 2017.
- Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science. Nature communications, 9(1):2383, 2018.
- Sparse networks from scratch: Faster training without losing performance. arXiv preprint arXiv:1907.04840, 2019.
- Sparse evolutionary deep learning with over one million artificial neurons on commodity hardware. Neural Computing and Applications, 33:2589–2604, 2021.
- Rigging the lottery: Making all tickets winners. In ICML, 2020.
- Parameter efficient training of deep convolutional neural networks by dynamic sparse reparameterization. In ICML, 2019.
- Top-kast: Top-k always sparse training. In NeurIPS, 2020.
- Chasing sparsity in vision transformers: An end-to-end exploration. In NeurIPS, 2021.
- Selfish sparse rnn training. In ICML, 2021.
- Dsd: Dense-sparse-dense training for deep neural networks. In ICLR, 2016.
- Decebal Constantin Mocanu et al. Network computations in artificial intelligence. Technische Universiteit Eindhoven, 2017.
- Nest: A neural network synthesis tool based on a grow-and-prune paradigm. IEEE Transactions on Computers, 68(10):1487–1497, 2019a.
- Grow and prune compact, fast, and accurate lstms. IEEE Transactions on Computers, 69(3):441–452, 2019b.
- Sparse weight activation training. In NeurIPS, 2020.
- Snip: Single-shot network pruning based on connection sensitivity. arXiv preprint arXiv:1810.02340, 2018.
- The unreasonable effectiveness of random pruning: Return of the most naive baseline for sparse training. In ICLR, 2022.
- Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In ICCV, 2015.
- Homer1a drives homeostatic scaling-down of excitatory synapses during sleep. Science, 355(6324):511–515, 2017.
- Ultrastructural evidence for synaptic scaling across the wake/sleep cycle. Science, 355(6324):507–510, 2017.
- Network augmentation for tiny deep learning. In ICLR, 2022.
- (af)2-s3net: Attentive feature fusion with adaptive feature selection for sparse semantic segmentation network. In CVPR, 2021.
- The lovász-softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks. In CVPR, 2018.
- Point transformer v2: Grouped vector attention and partition-based pooling. In NeurIPS, 2022.
- Weakly supervised 3d object detection from lidar point cloud. In ECCV, 2020.
- Latticenet: Fast point cloud segmentation using permutohedral lattices. arXiv preprint arXiv:1912.05905, 2019.
- Lidar-based recurrent 3d semantic segmentation with temporal memory alignment. In 3DV, 2020.
- 3dmv: Joint 3d-multi-view prediction for 3d semantic scene segmentation. In ECCV, 2018.
- Panopticfusion: Online volumetric semantic mapping at the level of stuff and things. In IROS, 2019.
- Pointcnn: Convolution on x-transformed points. In NeurIPS, 2018.
- Pointconv: Deep convolutional networks on 3d point clouds. In CVPR, 2019.
- A unified point-based framework for 3d segmentation. In 3DV, 2019.
- Pointasnl: Robust point clouds processing using nonlocal neural networks with adaptive sampling. In CVPR, 2020.
- Seggcn: Efficient 3d point cloud segmentation with fuzzy spherical kernel. In CVPR, 2020.
- Jsenet: Joint semantic segmentation and edge detection network for 3d point clouds. In ECCV, 2020.
- Deep fusionnet for point cloud semantic segmentation. In ECCV, 2020.
- Fast point transformer. In CVPR, 2022.
- Deep hough voting for 3d object detection in point clouds. In ICCV, 2019.
- Pointpillars: Fast encoders for object detection from point clouds. In CVPR, 2019.
- Pointrcnn: 3d object proposal generation and detection from point cloud. In CVPR, 2019.
- From points to parts: 3d object detection from point cloud with part-aware and part-aggregation network. IEEE TPAMI, 43(8):2647–2664, 2020.
- Understanding the effective receptive field in deep convolutional neural networks. In NeurIPS, 2016.
- Link: Linear kernel for lidar-based 3d perception. In CVPR, 2023.
- Hyper-convolutions via implicit kernels for medical imaging. arXiv preprint arXiv:2202.02701, 2022.
- nuscenes: A multimodal dataset for autonomous driving. In CVPR, 2020.
- Learning 3d semantic segmentation with only 2d image supervision. In 3DV, 2021.
- Fast convnets using group-wise brain damage. In CVPR, 2016.
- The power of sparsity in convolutional neural networks.(2017). arXiv preprint cs.CV/1702.06257, 2017.
- Sparse gpu kernels for deep learning. In SC20, 2020.
- Nvidia a100 tensor core gpu: Performance and innovation. IEEE Micro, 41(2):29–35, 2021.
- Rangenet++: Fast and accurate lidar semantic segmentation. In IROS, 2019.
- Amvnet: Assertion-based multi-view fusion network for lidar semantic segmentation. arXiv preprint arXiv:2012.04934, 2020.
- Spsequencenet: Semantic segmentation network on 4d point clouds. In CVPR, 2020.