LinK: Linear Kernel for LiDAR-based 3D Perception (2303.16094v1)

Published 28 Mar 2023 in cs.CV and cs.AI

Abstract: Extending the success of 2D Large Kernel to 3D perception is challenging due to: 1. the cubically-increasing overhead in processing 3D data; 2. the optimization difficulties from data scarcity and sparsity. Previous work has taken the first step to scale up the kernel size from 3x3x3 to 7x7x7 by introducing block-shared weights. However, to reduce the feature variations within a block, it only employs modest block size and fails to achieve larger kernels like the 21x21x21. To address this issue, we propose a new method, called LinK, to achieve a wider-range perception receptive field in a convolution-like manner with two core designs. The first is to replace the static kernel matrix with a linear kernel generator, which adaptively provides weights only for non-empty voxels. The second is to reuse the pre-computed aggregation results in the overlapped blocks to reduce computation complexity. The proposed method successfully enables each voxel to perceive context within a range of 21x21x21. Extensive experiments on two basic perception tasks, 3D object detection and 3D semantic segmentation, demonstrate the effectiveness of our method. Notably, we rank 1st on the public leaderboard of the 3D detection benchmark of nuScenes (LiDAR track), by simply incorporating a LinK-based backbone into the basic detector, CenterPoint. We also boost the strong segmentation baseline's mIoU with 2.7% in the SemanticKITTI test set. Code is available at https://github.com/MCG-NJU/LinK.

Authors (5)

Tao Lu (72 papers)
Xiang Ding (6 papers)
Haisong Liu (7 papers)
Gangshan Wu (70 papers)
Limin Wang (221 papers)

Citations (25)

View on Semantic Scholar

Summary

The paper presents LinK, a novel method that uses dynamic kernel generation to significantly expand the receptive field to a 21×21×21 area.
It reduces computational complexity by reusing aggregation results and generating weights only for non-empty voxels, addressing data sparsity.
Empirical results show LinK ranks first in the nuScenes 3D detection benchmark and improves mIoU by 2.7% on the SemanticKITTI dataset.

Overview of LinK: Linear Kernel for LiDAR-based 3D Perception

The paper introduces LinK, a novel method designed to enhance 3D perception capabilities in LiDAR-based systems. It addresses two prevailing challenges encountered in extending large kernel approaches from 2D to 3D contexts: the rapidly increasing computational overhead when processing 3D data and the optimization issues due to data scarcity and sparsity. The authors propose LinK as a solution, featuring a linear kernel approach that effectively enlarges the receptive field to a $21 \times 21 \times 21$ area, which significantly surpasses the limitations of existing methods.

Methodology

The LinK method introduces two core designs:

Linear Kernel Generator: Unlike traditional static kernel matrices, LinK uses dynamic generation of kernel weights. This generator adapts to non-empty voxels within the data, thus circumventing the inefficiencies caused by static kernels when dealing with sparse data. It reduces the number of learnable parameters because weights are generated on-the-fly, only for those voxel points containing data.
Reusing Aggregation Results: The method reuses pre-computed aggregation results in overlapping blocks, reducing computational complexity. By doing so, LinK maintains computational efficiency regardless of the kernel's size, effectively decoupling the typical cubic growth of complexity associated with larger kernels.

Empirical Findings

The authors validate LinK through extensive experiments on two fundamental perception tasks: 3D object detection and 3D semantic segmentation. LinK achieves substantial improvements over baseline models, underscoring its efficacy. Notably, when integrated with a basic detector like CenterPoint, LinK ranks first in the nuScenes 3D detection benchmark, demonstrating the practical potential of the approach. Additionally, it enhances the mean Intersection over Union (mIoU) by 2.7% on the SemanticKITTI dataset for 3D semantic segmentation.

Performance and Implications

LinK's ability to extend the receptive field significantly without proportionately increasing computational demands holds considerable implications. It proves advantageous in scenarios requiring large receptive fields, such as autonomous driving, where understanding context over broader spatial extents can critically influence decision-making models. The linear approach to kernel weight generation and aggregation reuse not only provides computational benefits but also simplifies optimization despite data sparsity.

Future Developments

Further research could investigate the integration of LinK with other model architectures, such as Transformer-based models, to explore broader applications in 3D perception. The paper's contributions also open avenues for research into more efficient management of sparsity within 3D data, presenting opportunities for innovation in handling large-scale 3D datasets beyond autonomous vehicle perceptions.

In conclusion, LinK presents a significant advancement in 3D perception by leveraging innovative approaches to expand the effective receptive field of LiDAR-based models, marking a pivotal stage in efficient 3D data processing methodologies.

PDF Markdown

Related Papers

GitHub

GitHub - MCG-NJU/LinK: [CVPR 2023] LinK: Linear Kernel for LiDAR-based 3D Perception (88 stars)