PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection (2312.08371v2)
Abstract: Recent temporal LiDAR-based 3D object detectors achieve promising performance based on the two-stage proposal-based approach. They generate 3D box candidates from the first-stage dense detector, followed by different temporal aggregation methods. However, these approaches require per-frame objects or whole point clouds, posing challenges related to memory bank utilization. Moreover, point clouds and trajectory features are combined solely based on concatenation, which may neglect effective interactions between them. In this paper, we propose a point-trajectory transformer with long short-term memory for efficient temporal 3D object detection. To this end, we only utilize point clouds of current-frame objects and their historical trajectories as input to minimize the memory bank storage requirement. Furthermore, we introduce modules to encode trajectory features, focusing on long short-term and future-aware perspectives, and then effectively aggregate them with point cloud features. We conduct extensive experiments on the large-scale Waymo dataset to demonstrate that our approach performs well against state-of-the-art methods. Code and models will be made publicly available at https://github.com/kuanchihhuang/PTT.
- Mppnet: Multi-frame feature intertwining with proxy points for 3d temporal object detection. In European Conference on Computer Vision (ECCV), 2022.
- Embracing single stride 3d object detector with sparse transformer. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- Fully Sparse 3D Object Detection. In Advances in Neural Information Processing Systems (NeurIPS), 2022.
- Chao Ma Guangsheng Shi, Ruifeng Li. Pillarnet: Real-time and high-performance pillar-based 3d object detection. European Conference on Computer Vision (ECCV), 2022.
- Msf: Motion-guided sequential fusion for efficient 3d object detection from point cloud sequences. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
- Structure aware single-stage 3d object detection from point cloud. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
- Monodtr: Monocular 3d object detection with depth-aware transformer. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- Pointpillars: Fast encoders for object detection from point clouds. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
- Modar: Using motion forecasting for 3d object detection in point cloud sequences. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
- Lidar r-cnn: An efficient and universal 3d object detector. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
- M3dssd: Monocular 3d single stage object detector. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
- Fast and furious: Real time end-to-end 3d detection, tracking and motion forecasting with a single convolutional ne. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
- Accurate monocular object detection via color-embedded 3d reconstruction for autonomous driving. In IEEE International Conference on Computer Vision (ICCV), 2019.
- Voxel transformer for 3d object detection. IEEE International Conference on Computer Vision (ICCV), 2021.
- 3d object detection with pointformer. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
- Deep hough voting for 3d object detection in point clouds. In IEEE International Conference on Computer Vision (ICCV), 2019.
- Offboard 3d object detection from point cloud sequences. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2032.
- Improving 3d object detection with channel-wise transformer. In IEEE International Conference on Computer Vision (ICCV), 2021.
- Pv-rcnn: Point-voxel feature set abstraction for 3d object detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
- Pv-rcnn++: Point-voxel feature set abstraction with local vector representation for 3d object detection. International Journal of Computer Vision (IJCV), 2023.
- Pointrcnn: 3d object proposal generation and detection from point cloud. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
- From points to parts: 3d object detection from point cloud with part-aware and part-aggregation network. In IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020.
- Point-gnn: Graph neural network for 3d object detection in a point cloud. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
- Scalability in perception for autonomous driving: Waymo open dataset. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
- Swformer: Sparse window transformer for 3d object detection in point clouds. In European Conference on Computer Vision (ECCV), 2022.
- Rsn: Range sparse net for efficient, accurate lidar 3d object detection. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
- Attention is all you need. In Advances in Neural Information Processing Systems (NeurIPS), 2017.
- Bo Li Yan Yan, Yuxing Ma. Second: Sparsely embedded convolutional detection. In Sensor, 2018.
- STD: sparse-to-dense 3d object detector for point cloud. In IEEE International Conference on Computer Vision (ICCV), 2019.
- 3d-man: 3d multi-frame attention network for object detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
- Center-based 3d object detection and tracking. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
- Not all points are equal: Learning highly efficient point-based detectors for 3d lidar point clouds. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- Voxelnet: End-to-end learning for point cloud based 3d object detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
- Centerformer: Center-based transformer for 3d object detection. In European Conference on Computer Vision (ECCV), 2022.