Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DiffusionTrack: Diffusion Model For Multi-Object Tracking (2308.09905v2)

Published 19 Aug 2023 in cs.CV

Abstract: Multi-object tracking (MOT) is a challenging vision task that aims to detect individual objects within a single frame and associate them across multiple frames. Recent MOT approaches can be categorized into two-stage tracking-by-detection (TBD) methods and one-stage joint detection and tracking (JDT) methods. Despite the success of these approaches, they also suffer from common problems, such as harmful global or local inconsistency, poor trade-off between robustness and model complexity, and lack of flexibility in different scenes within the same video. In this paper we propose a simple but robust framework that formulates object detection and association jointly as a consistent denoising diffusion process from paired noise boxes to paired ground-truth boxes. This novel progressive denoising diffusion strategy substantially augments the tracker's effectiveness, enabling it to discriminate between various objects. During the training stage, paired object boxes diffuse from paired ground-truth boxes to random distribution, and the model learns detection and tracking simultaneously by reversing this noising process. In inference, the model refines a set of paired randomly generated boxes to the detection and tracking results in a flexible one-step or multi-step denoising diffusion process. Extensive experiments on three widely used MOT benchmarks, including MOT17, MOT20, and Dancetrack, demonstrate that our approach achieves competitive performance compared to the current state-of-the-art methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (55)
  1. BoT-SORT: Robust associations multi-pedestrian tracking. arXiv preprint arXiv:2206.14651.
  2. Tracking without bells and whistles. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 941–951.
  3. Tracking without bells and whistles. In Proceedings of the ICCV, 941–951.
  4. Evaluating multiple object tracking performance: the clear mot metrics. EURASIP Journal on Image and Video Processing, 2008: 1–10.
  5. Simple online and realtime tracking. In 2016 IEEE international conference on image processing (ICIP), 3464–3468. IEEE.
  6. Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934.
  7. Learning a neural solver for multiple object tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 6247–6257.
  8. MeMOT: multi-object tracking with memory. In Proceedings of the CVPR, 8090–8100.
  9. Observation-centric sort: Rethinking sort for robust multi-object tracking. arXiv preprint arXiv:2203.14360.
  10. End-to-end object detection with transformers. In Proceedings of the ECCV, 213–229. Springer.
  11. Real-time multiple people tracking with deeply learned candidate selection and person re-identification. In 2018 IEEE international conference on multimedia and expo (ICME), 1–6. IEEE.
  12. Diffusiondet: Diffusion model for object detection. arXiv preprint arXiv:2211.09788.
  13. Transformer tracking. In Proceedings of the CVPR, 8126–8135.
  14. Mot20: A benchmark for multi object tracking in crowded scenes. arXiv preprint arXiv:2003.09003.
  15. Strongsort: Make deepsort great again. arXiv preprint arXiv:2202.13514.
  16. Centernet: Keypoint triplets for object detection. In Proceedings of the ICCV, 6569–6578.
  17. Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430.
  18. A new model for learning in graph domains. In Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005., volume 2, 729–734. IEEE.
  19. DiffusionInst: Diffusion Model for Instance Segmentation. arXiv preprint arXiv:2212.02773.
  20. Learnable graph matching: Incorporating graph partitioning with deep feature learning for multiple object tracking. In Proceedings of the CVPR, 5299–5309.
  21. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33: 6840–6851.
  22. Acquisition of localization confidence for accurate object detection. In Proceedings of the ECCV, 784–799.
  23. Graph neural based end-to-end data association framework for online multiple-object tracking. arXiv preprint arXiv:1907.05315.
  24. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907.
  25. Kuhn, H. W. 1955. The Hungarian method for the assignment problem. Naval research logistics quarterly, 2(1-2): 83–97.
  26. Graph networks for multiple object tracking. In Proceedings of the IEEE/CVF winter conference on applications of computer vision, 719–728.
  27. Focal Loss for Dense Object Detection. IEEE TPAMI, PP(99): 2999–3007.
  28. Feature pyramid networks for object detection. In Proceedings of the CVPR, 2117–2125.
  29. SparseTrack: Multi-Object Tracking by Performing Scene Decomposition based on Pseudo-Depth.
  30. Decoupled weight decay regularization. In Proceedings of the ICLR.
  31. Hota: A higher order metric for evaluating multi-object tracking. International journal of computer vision, 129: 548–578.
  32. Trackformer: Multi-object tracking with transformers. In Proceedings of the CVPR, 8844–8854.
  33. MOT16: A benchmark for multi-object tracking. arXiv preprint arXiv:1603.00831.
  34. Tubetk: Adopting tubes to track multi-object in a one-step training model. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 6308–6318.
  35. Trackmpnn: A message passing graph neural architecture for multi-object tracking. arXiv preprint arXiv:2101.04206.
  36. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28.
  37. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the CVPR, 658–666.
  38. Performance measures and a data set for multi-target, multi-camera tracking. In Proceedings of the ECCV, 17–35. Springer.
  39. Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems, 32.
  40. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456.
  41. Dancetrack: Multi-object tracking in uniform appearance and diverse motion. In Proceedings of the CVPR, 20993–21002.
  42. Transtrack: Multiple object tracking with transformer. arXiv preprint arXiv:2012.15460.
  43. Learning to track with object permanence. In Proceedings of the ICCV, 10860–10869.
  44. An introduction to the Kalman filter.
  45. Simple online and realtime tracking with a deep association metric. In 2017 IEEE international conference on image processing (ICIP), 3645–3649. IEEE.
  46. TransCenter: Transformers with dense representations for multiple-object tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence.
  47. Motr: End-to-end multiple-object tracking with transformer. In Proceedings of the ECCV, 659–675.
  48. mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412.
  49. Global data association for multi-object tracking using network flows. In 2008 IEEE conference on computer vision and pattern recognition, 1–8. IEEE.
  50. Bytetrack: Multi-object tracking by associating every detection box. In Proceedings of the ECCV, 1–21. Springer.
  51. Fairmot: On the fairness of detection and re-identification in multiple object tracking. International Journal of Computer Vision, 129: 3069–3087.
  52. Tracking objects as pixel-wise distributions. In Proceedings of the ECCV, 76–94. Springer.
  53. Tracking objects as points. In Proceedings of the ECCV, 474–490. Springer.
  54. Global Tracking Transformers. In CVPR.
  55. Deformable detr: Deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Run Luo (22 papers)
  2. Zikai Song (17 papers)
  3. Lintao Ma (18 papers)
  4. Jinlin Wei (2 papers)
  5. Wei Yang (349 papers)
  6. Min Yang (239 papers)
Citations (13)

Summary

We haven't generated a summary for this paper yet.