Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
134 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

UCMCTrack: Multi-Object Tracking with Uniform Camera Motion Compensation (2312.08952v2)

Published 14 Dec 2023 in cs.CV

Abstract: Multi-object tracking (MOT) in video sequences remains a challenging task, especially in scenarios with significant camera movements. This is because targets can drift considerably on the image plane, leading to erroneous tracking outcomes. Addressing such challenges typically requires supplementary appearance cues or Camera Motion Compensation (CMC). While these strategies are effective, they also introduce a considerable computational burden, posing challenges for real-time MOT. In response to this, we introduce UCMCTrack, a novel motion model-based tracker robust to camera movements. Unlike conventional CMC that computes compensation parameters frame-by-frame, UCMCTrack consistently applies the same compensation parameters throughout a video sequence. It employs a Kalman filter on the ground plane and introduces the Mapped Mahalanobis Distance (MMD) as an alternative to the traditional Intersection over Union (IoU) distance measure. By leveraging projected probability distributions on the ground plane, our approach efficiently captures motion patterns and adeptly manages uncertainties introduced by homography projections. Remarkably, UCMCTrack, relying solely on motion cues, achieves state-of-the-art performance across a variety of challenging datasets, including MOT17, MOT20, DanceTrack and KITTI. More details and code are available at https://github.com/corfyi/UCMCTrack

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. BoT-SORT: Robust associations multi-pedestrian tracking. arXiv preprint arXiv:2206.14651.
  2. A dual CNN–RNN for multiple people tracking. Neurocomputing, 368: 69–83.
  3. Tracking Without Bells and Whistles. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).
  4. Evaluating multiple object tracking performance: the clear mot metrics. EURASIP Journal on Image and Video Processing, 2008: 1–10.
  5. Simple Online and Realtime Tracking. In 2016 IEEE International Conference on Image Processing (ICIP), 3464–3468.
  6. Observation-Centric SORT: Rethinking SORT for Robust Multi-Object Tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 9686–9696.
  7. Unifying Short and Long-Term Tracking With Graph Hierarchies. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 22877–22887.
  8. Choi, W. 2015. Near-online multi-target tracking with aggregated local flow descriptor. In Proceedings of the IEEE international conference on computer vision, 3029–3037.
  9. Mot20: A benchmark for multi object tracking in crowded scenes. arXiv preprint arXiv:2003.09003.
  10. Quo Vadis: Is Trajectory Forecasting the Key Towards Long-Term Multi-Object Tracking? Advances in Neural Information Processing Systems, 35: 15657–15671.
  11. GIAOTracker: A Comprehensive Framework for MCMOT With Global Information and Optimizing Strategies in VisDrone 2021. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2809–2819.
  12. Strongsort: Make deepsort great again. IEEE Transactions on Multimedia.
  13. Parametric image alignment using enhanced correlation coefficient maximization. IEEE transactions on pattern analysis and machine intelligence, 30(10): 1858–1865.
  14. Parametric Image Alignment Using Enhanced Correlation Coefficient Maximization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(10): 1858–1865.
  15. Detect to track and track to detect. In Proceedings of the IEEE international conference on computer vision, 3038–3046.
  16. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6): 381–395.
  17. YOLOX: Exceeding YOLO Series in 2021. CoRR, abs/2107.08430.
  18. Vision meets robotics: The kitti dataset. The International Journal of Robotics Research, 32(11): 1231–1237.
  19. Multiple Object Tracking from appearance by hierarchically clustering tracklets. arXiv preprint arXiv:2210.03355.
  20. MAT: Motion-aware multi-object tracking. Neurocomputing, 476: 75–86.
  21. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
  22. Learning to track at 100 fps with deep regression networks. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, 749–765. Springer.
  23. Monocular quasi-dense 3d object tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(2): 1992–2008.
  24. Detecting Invisible People. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 3174–3184.
  25. DeepMTT: A deep learning maneuvering target-tracking algorithm based on bidirectional LSTM network. Information Fusion, 53: 289–304.
  26. GSM: Graph Similarity Model for Multi-Object Tracking. In IJCAI, 530–536.
  27. SparseTrack: Multi-Object Tracking by Performing Scene Decomposition based on Pseudo-Depth. arXiv:2306.05238.
  28. Hota: A higher order metric for evaluating multi-object tracking. International journal of computer vision, 129: 548–578.
  29. Deep oc-sort: Multi-pedestrian tracking by adaptive re-identification. arXiv preprint arXiv:2302.11813.
  30. TripletTrack: 3D Object Tracking Using Triplet Embeddings and LSTM. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 4500–4510.
  31. MOT16: A benchmark for multi-object tracking. arXiv preprint arXiv:1603.00831.
  32. TrackMPNN: A message passing graph neural architecture for multi-object tracking. arXiv preprint arXiv:2101.04206.
  33. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 658–666.
  34. Performance measures and a data set for multi-target, multi-camera tracking. In Computer Vision–ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8-10 and 15-16, 2016, Proceedings, Part II, 17–35. Springer.
  35. ORB: An efficient alternative to SIFT or SURF. In 2011 International conference on computer vision, 2564–2571. Ieee.
  36. Simple Cues Lead to a Strong Multi-Object Tracker. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 13813–13823.
  37. DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 20993–21002.
  38. Learning To Track With Object Permanence. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 10860–10869.
  39. Simple Online and Realtime Tracking with a Deep Association Metric. In 2017 IEEE International Conference on Image Processing (ICIP), 3645–3649.
  40. Simple baselines for human pose estimation and tracking. In Proceedings of the European conference on computer vision (ECCV), 466–481.
  41. MotionTrack: Learning Motion Predictor for Multiple Object Tracking. arXiv:2306.02585.
  42. Hard to track objects with irregular motions and similar appearances? make it easier by buffering the matching space. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 4799–4808.
  43. Transformer-based two-source motion model for multi-object tracking. Applied Intelligence, 1–13.
  44. Three-pronged compensation and hysteresis thresholding for moving object detection in real-time video surveillance. IEEE Transactions on Industrial Electronics, 64(6): 4945–4955.
  45. General linear cameras. In Computer Vision-ECCV 2004: 8th European Conference on Computer Vision, Prague, Czech Republic, May 11-14, 2004. Proceedings, Part II 8, 14–27. Springer.
  46. Moving object detection for a moving camera based on global motion compensation and adaptive background model. International Journal of Control, Automation and Systems, 17: 1866–1874.
  47. Bytetrack: Multi-object tracking by associating every detection box. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXII, 1–21. Springer.
  48. MOTRv2: Bootstrapping End-to-End Multi-Object Tracking by Pretrained Object Detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 22056–22065.
  49. Distance-IoU loss: Faster and better learning for bounding box regression. In Proceedings of the AAAI conference on artificial intelligence, volume 34, 12993–13000.
  50. Tracking objects as points. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IV, 474–490. Springer.
Citations (22)

Summary

  • The paper introduces a novel method using uniform camera motion compensation on the ground plane, replacing traditional IoU with the Mapped Mahalanobis Distance.
  • The framework leverages a Kalman filter to achieve over 1000 FPS on a single CPU while delivering robust performance on datasets like MOT17, MOT20, DanceTrack, and KITTI.
  • The approach reduces computational overhead and improves resilience to camera-induced errors, paving the way for efficient real-time multi-object tracking.

Introduction to Multi-Object Tracking

Multi-Object Tracking (MOT) within video sequences is a complex challenge in computer vision, particularly in environments with significant camera motion. Traditional methods often employ additional appearance cues or Camera Motion Compensation (CMC) to address inaccuracies arising from camera movements. These methods, while effective, can substantially increase computational overhead, making real-time tracking more difficult.

A Novel Approach to Motion-Based Tracking

Enter UCMCTrack, a novel tracking framework designed to withstand camera movements. This motion-based tracker diverges from conventional methods by applying the same compensation parameters uniformly to an entire video sequence, rather than on a frame-by-frame basis. UCMCTrack is grounded in a Kalman filter, operating on the ground plane rather than the imaging plane. The approach introduces the Mapped Mahalanobis Distance (MMD), discarding the traditional Intersection over Union (IoU) measure. MMD effectively captures ground plane motion patterns and adeptly manages uncertainties caused by homography projections.

Advantages of Ground Plane Motion Modeling

Assessing motion patterns on the ground plane offers greater resilience to camera-induced errors. Where IoU may fail due to a lack of overlap between detection and tracking boxes (particularly in dynamic scenes), the ground plane association minimizes the impact of camera movements. This shift from reliance on the image plane to leveraging the ground plane promises superior tracking accuracy and streamlines the tracking process.

UCMCTrack Performance and Contributions

UCMCTrack has demonstrated impressive efficiency and speed, surpassing 1000 frames per second (FPS) processing on a single CPU. It achieves state-of-the-art performance on multiple challenging datasets, including MOT17, MOT20, DanceTrack, and KITTI, solely utilizing motion cues. The paper outlines three primary contributions: an innovative non-IoU distance measure based on motion cues, a uniform application of camera motion compensation parameters to reduce computational load, and UCMCTrack itself—a model showcasing the potential to complement existing distance metrics for improved MOT performance.

Experimentation and Results

Experiments conducted across various datasets corroborate the efficacy of UCMCTrack. The model's reliance on the ground plane and the innovative use of MMD have proven effective in various scenarios, including irregular target motion (DanceTrack) and intense camera motion (KITTI). The tracker performs well even in the presence of camera parameter estimation errors, highlighting its robustness.

A series of ablation studies affirm the importance of individual components within the UCMCTrack system. The tracker's adaptability is further demonstrated through its capacity to adjust to different scenes (dynamic vs. static) by altering process noise compensation factors. The results emphasize the potential advantages of pairing UCMCTrack with established MOT methodologies, setting the stage for future research.

UCMCTrack showcases a new frontier in motion-based multi-object tracking, addressing camera motion challenges efficiently. This development could have far-reaching implications for real-time applications requiring fast and accurate object tracking in video footage.