LiDARFormer: A Unified Transformer-based Multi-task Network for LiDAR Perception (2303.12194v2)

Published 21 Mar 2023 in cs.CV

Abstract: There is a recent trend in the LiDAR perception field towards unifying multiple tasks in a single strong network with improved performance, as opposed to using separate networks for each task. In this paper, we introduce a new LiDAR multi-task learning paradigm based on the transformer. The proposed LiDARFormer utilizes cross-space global contextual feature information and exploits cross-task synergy to boost the performance of LiDAR perception tasks across multiple large-scale datasets and benchmarks. Our novel transformer-based framework includes a cross-space transformer module that learns attentive features between the 2D dense Bird's Eye View (BEV) and 3D sparse voxel feature maps. Additionally, we propose a transformer decoder for the segmentation task to dynamically adjust the learned features by leveraging the categorical feature representations. Furthermore, we combine the segmentation and detection features in a shared transformer decoder with cross-task attention layers to enhance and integrate the object-level and class-level features. LiDARFormer is evaluated on the large-scale nuScenes and the Waymo Open datasets for both 3D detection and semantic segmentation tasks, and it outperforms all previously published methods on both tasks. Notably, LiDARFormer achieves the state-of-the-art performance of 76.4% L2 mAPH and 74.3% NDS on the challenging Waymo and nuScenes detection benchmarks for a single model LiDAR-only method.

PDF Abstract

Summarize Bookmark Chat (Pro)

Authors (7)

Zixiang Zhou (22 papers)
Dongqiangzi Ye (5 papers)
Weijia Chen (7 papers)
Yufei Xie (10 papers)
Yu Wang (939 papers)
Panqu Wang (14 papers)
Hassan Foroosh (48 papers)

Citations (9)

View on Semantic Scholar

LiDARFormer: A Unified Transformer-based Multi-task Network for LiDAR Perception (2303.12194v2)

Related Papers