Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
133 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MotionTrack: End-to-End Transformer-based Multi-Object Tracing with LiDAR-Camera Fusion (2306.17000v1)

Published 29 Jun 2023 in cs.CV

Abstract: Multiple Object Tracking (MOT) is crucial to autonomous vehicle perception. End-to-end transformer-based algorithms, which detect and track objects simultaneously, show great potential for the MOT task. However, most existing methods focus on image-based tracking with a single object category. In this paper, we propose an end-to-end transformer-based MOT algorithm (MotionTrack) with multi-modality sensor inputs to track objects with multiple classes. Our objective is to establish a transformer baseline for the MOT in an autonomous driving environment. The proposed algorithm consists of a transformer-based data association (DA) module and a transformer-based query enhancement module to achieve MOT and Multiple Object Detection (MOD) simultaneously. The MotionTrack and its variations achieve better results (AMOTA score at 0.55) on the nuScenes dataset compared with other classical baseline models, such as the AB3DMOT, the CenterTrack, and the probabilistic 3D Kalman filter. In addition, we prove that a modified attention mechanism can be utilized for DA to accomplish the MOT, and aggregate history features to enhance the MOD performance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. TransFusion: Robust LiDAR-camera fusion for 3d object detection with transformers. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1080–1089, 2022.
  2. Multiple object tracking in recent times: A literature review. ArXiv, abs/2209.04796, 2022.
  3. Tracking without bells and whistles. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 941–951, 2019.
  4. Simple online and realtime tracking. 2016 IEEE International Conference on Image Processing (ICIP), pages 3464–3468, 2016.
  5. nuscenes: A multimodal dataset for autonomous driving. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11618–11628, 2019.
  6. MeMOT: Multi-object tracking with memory. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8080–8090, 2022.
  7. MOT20: A benchmark for multi object tracking in crowded scenes. arXiv preprint arXiv:2003.09003, 2020.
  8. Research advances and challenges of autonomous and connected ground vehicles. IEEE Transactions on Intelligent Transportation Systems, 22(2):683–711, 2021.
  9. Visual object tracking based on mutual learning between cohort multiscale feature-fusion networks with weighted loss. IEEE Transactions on Circuits and Systems for Video Technology, 31(3):1055–1065, 2021.
  10. Region-based convolutional networks for accurate object detection and segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(1):142–158, 2016.
  11. Extended object tracking: Introduction, overview and applications. ArXiv, abs/1604.00970, 2016.
  12. Minkowski tracker: A sparse spatio-temporal R-CNN for joint object detection and tracking. ArXiv, abs/2208.10056, 2022.
  13. Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2015.
  14. Joint multi-object detection and tracking with camera-LiDAR fusion for autonomous driving. 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 6983–6989, 2021.
  15. Multi-modal motion prediction with transformer-based neural network for autonomous driving. In 2022 International Conference on Robotics and Automation (ICRA), pages 2605–2611, 2022.
  16. EagerMOT: 3d multi-object tracking via sensor fusion. 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 11315–11321, 2021.
  17. Joint 3d object detection and tracking using spatio-temporal representation of camera image and lidar point clouds. ArXiv, abs/2112.07116, 2021.
  18. PointPillars: Fast encoders for object detection from point clouds. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12689–12697, 2018.
  19. Discriminative correlation filter with channel and spatial reliability. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6309–6318, 2017.
  20. Exploring simple 3d multi-object tracking for autonomous driving. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 10468–10477, 2021.
  21. TrackFormer: Multi-object tracking with transformers. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8834–8844, 2022.
  22. Learning multi-domain convolutional neural networks for visual tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4293–4302, 2016.
  23. Standing between past and future: Spatio-temporal modeling for multi-camera 3d multi-object tracking. ArXiv, abs/2302.03802, 2023.
  24. SimpleTrack: Understanding and rethinking 3d multi-object tracking. ArXiv, abs/2111.09621, 2021.
  25. Data association in multiple object tracking: A survey of recent techniques. Expert Systems with Applications, 192:116300, 2022.
  26. Transformers for multi-object tracking on point clouds. 2022 IEEE Intelligent Vehicles Symposium (IV), pages 852–859, 2022.
  27. Mono-camera 3d multi-object tracking using deep learning detections and PMBM filtering. 2018 IEEE Intelligent Vehicles Symposium (IV), pages 433–440, 2018.
  28. Deep network flow for multi-object tracking. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2730–2739, 2017.
  29. Beyond pixels: Leveraging geometry and shape cues for online multi-object tracking. 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 3508–3515, 2018.
  30. Apoorv Singh. Transformer-based sensor fusion for autonomous driving: A survey. ArXiv, abs/2302.11481, 2023.
  31. Surround-view vision-based 3d detection for autonomous driving: A survey. ArXiv, abs/2302.06650, 2023.
  32. TransTrack: Multiple object tracking with transformer. arXiv preprint arXiv:2012.15460, 2020.
  33. Learning to track with object permanence. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 10840–10849, 2021.
  34. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  35. Camo-mot: Combined appearance-motion optimization for 3d multi-object tracking with camera-LiDAR fusion. ArXiv, abs/2209.02540, 2022.
  36. Immortal tracker: Tracklet never dies. ArXiv, abs/2111.13672, 2021.
  37. Towards real-time multi-object tracking. ArXiv, abs/1909.12605, 2019.
  38. Ab3dmot: A baseline for 3d multi-object tracking and new evaluation metrics. ArXiv, abs/2008.08063, 2020.
  39. Simple online and realtime tracking with a deep association metric. 2017 IEEE International Conference on Image Processing (ICIP), pages 3645–3649, 2017.
  40. TransCenter: Transformers with dense representations for multiple-object tracking. IEEE transactions on pattern analysis and machine intelligence, PP, 2021.
  41. Vitpose: Simple vision transformer baselines for human pose estimation. arXiv preprint arXiv:2204.12484, 2022.
  42. MOTR: End-to-end multiple-object tracking with transformer. In Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXVII, page 659–675, Berlin, Heidelberg, 2022. Springer-Verlag.
  43. A quality index metric and method for online self-assessment of autonomous vehicles sensory perception. arXiv preprint arXiv:2203.02588, 2022.
  44. Attention-based neural network for driving environment complexity perception. In 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), pages 2781–2787. IEEE, 2021.
  45. MonoDETR: Depth-aware transformer for monocular 3d object detection. arXiv preprint arXiv:2203.13310, 2022.
  46. Mutr3d: A multi-camera tracking framework via 3d-to-2d queries. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 4536–4545, 2022.
  47. ByteTrack: Multi-object tracking by associating every detection box. In European Conference on Computer Vision, 2021.
  48. Number-adaptive prototype learning for 3d point cloud semantic segmentation. In Leonid Karlinsky, Tomer Michaeli, and Ko Nishino, editors, Computer Vision – ECCV 2022 Workshops, pages 695–703, Cham, 2023. Springer Nature Switzerland.
  49. Tracking objects as points. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IV, pages 474–490. Springer, 2020.
  50. Global tracking transformers. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8761–8770, 2022.
  51. VoxelNet: End-to-end learning for point cloud based 3d object detection. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4490–4499, 2017.
  52. Looking beyond two frames: End-to-end multi-object tracking using spatial and temporal transformers. IEEE transactions on pattern analysis and machine intelligence, PP, 2021.
Citations (12)

Summary

We haven't generated a summary for this paper yet.