Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DiffMOT: A Real-time Diffusion-based Multiple Object Tracker with Non-linear Prediction (2403.02075v2)

Published 4 Mar 2024 in cs.CV

Abstract: In Multiple Object Tracking, objects often exhibit non-linear motion of acceleration and deceleration, with irregular direction changes. Tacking-by-detection (TBD) trackers with Kalman Filter motion prediction work well in pedestrian-dominant scenarios but fall short in complex situations when multiple objects perform non-linear and diverse motion simultaneously. To tackle the complex non-linear motion, we propose a real-time diffusion-based MOT approach named DiffMOT. Specifically, for the motion predictor component, we propose a novel Decoupled Diffusion-based Motion Predictor (D$2$MP). It models the entire distribution of various motion presented by the data as a whole. It also predicts an individual object's motion conditioning on an individual's historical motion information. Furthermore, it optimizes the diffusion process with much fewer sampling steps. As a MOT tracker, the DiffMOT is real-time at 22.7FPS, and also outperforms the state-of-the-art on DanceTrack and SportsMOT datasets with $62.3\%$ and $76.2\%$ in HOTA metrics, respectively. To the best of our knowledge, DiffMOT is the first to introduce a diffusion probabilistic model into the MOT to tackle non-linear motion prediction.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. Bot-sort: Robust associations multi-pedestrian tracking. arXiv preprint arXiv:2206.14651, 2022.
  2. Evaluating multiple object tracking performance: the clear mot metrics. EURASIP Journal on Image and Video Processing, 2008:1–10, 2008.
  3. Simple online and realtime tracking. In ICIP, pages 3464–3468. IEEE, 2016.
  4. Observation-centric sort: Rethinking sort for robust multi-object tracking. In CVPR, pages 9686–9696, 2023.
  5. Deft: Detection embeddings for tracking. arXiv preprint arXiv:2102.02267, 2021.
  6. Sportsmot: A large multi-object tracking dataset in multiple sports scenes. In ICCV, pages 9921–9931, 2023.
  7. Mot20: A benchmark for multi object tracking in crowded scenes. arXiv preprint arXiv:2003.09003, 2020.
  8. Diffusion models beat gans on image synthesis. NeurIPS, 34:8780–8794, 2021.
  9. Genie: Higher-order denoising diffusion solvers. NeurIPS, 35:30150–30166, 2022.
  10. Score-based diffusion meets annealed importance sampling. NeurIPS, 35:21482–21494, 2022.
  11. Strongsort: Make deepsort great again. IEEE Transactions on Multimedia, 2023.
  12. Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430, 2021.
  13. Denoising diffusion probabilistic models. NeurIPS, 33:6840–6851, 2020.
  14. Cascaded diffusion models for high fidelity image generation. The Journal of Machine Learning Research, 23(1):2249–2281, 2022.
  15. Decoupled diffusion models with explicit transition probability. arXiv preprint arXiv:2306.13720, 2023.
  16. Harold W Kuhn. The hungarian method for the assignment problem. Naval research logistics quarterly, 2(1-2):83–97, 1955.
  17. Explicit visual prompting for low-level structure segmentations. In CVPR, pages 19434–19445, 2023a.
  18. Sparsetrack: Multi-object tracking by performing scene decomposition based on pseudo-depth. arXiv preprint arXiv:2306.05238, 2023b.
  19. Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. NeurIPS, 35:5775–5787, 2022.
  20. Hota: A higher order metric for evaluating multi-object tracking. International journal of computer vision, 129(2):548–578, 2021.
  21. Diffusiontrack: Diffusion model for multi-object tracking. arXiv preprint arXiv:2308.09905, 2023.
  22. Deep oc-sort: Multi-pedestrian tracking by adaptive re-identification. arXiv preprint arXiv:2302.11813, 2023.
  23. Mot16: A benchmark for multi-object tracking. arXiv preprint arXiv:1603.00831, 2016.
  24. Quasi-dense similarity learning for multiple object tracking. In CVPR, pages 164–173, 2021.
  25. Flexible style image super-resolution using conditional objective. IEEE Access, 10:9774–9792, 2022.
  26. Performance measures and a data set for multi-target, multi-camera tracking. In ECCV, pages 17–35. Springer, 2016.
  27. High-resolution image synthesis with latent diffusion models. In CVPR, pages 10684–10695, 2022.
  28. Thompson sampling efficiently learns to control diffusion processes. NeurIPS, 35:3871–3884, 2022.
  29. Denoising diffusion implicit models. ICLR, 2021.
  30. Transtrack: Multiple object tracking with transformer. arXiv preprint arXiv:2012.15460, 2020.
  31. Dancetrack: Multi-object tracking in uniform appearance and diverse motion. In CVPR, pages 20993–21002, 2022.
  32. Simple online and realtime tracking with a deep association metric. In ICIP, pages 3645–3649. IEEE, 2017.
  33. Track to detect and segment: An online multi-object tracker. In CVPR, pages 12352–12361, 2021.
  34. Motiontrack: Learning motion predictor for multiple object tracking. arXiv preprint arXiv:2306.02585, 2023.
  35. Hard to track objects with irregular motions and similar appearances? make it easier by buffering the matching space. In WACV, pages 4799–4808, 2023.
  36. Motr: End-to-end multiple-object tracking with transformer. In ECCV, pages 659–675. Springer, 2022.
  37. Multiple object tracking by flowing and fusing. arXiv preprint arXiv:2001.11180, 2020.
  38. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3836–3847, 2023.
  39. Fairmot: On the fairness of detection and re-identification in multiple object tracking. International Journal of Computer Vision, 129:3069–3087, 2021.
  40. Bytetrack: Multi-object tracking by associating every detection box. In ECCV, pages 1–21. Springer, 2022.
  41. Large scale image completion via co-modulated generative adversarial networks. ICLR, 2021.
  42. Tracking objects as points. In ECCV, pages 474–490. Springer, 2020.
  43. Global tracking transformers. In CVPR, pages 8771–8780, 2022.
Citations (7)

Summary

  • The paper presents a novel diffusion-based MOT framework with a decoupled diffusion motion predictor that models complex non-linear trajectories.
  • It outperforms traditional methods by achieving 63.4% HOTA on DanceTrack and 76.2% on SportsMOT, indicating significant accuracy improvements.
  • The method employs a one-step sampling process, attaining an efficient 22.7 FPS, which makes it viable for real-time tracking applications.

Analysis of DiffMOT: A Real-time Diffusion-based Multiple Object Tracker

The paper titled "DiffMOT: A Real-time Diffusion-based Multiple Object Tracker with Non-linear Prediction" introduces a novel approach within the domain of multiple object tracking (MOT), particularly focusing on complex non-linear motion patterns. This approach leverages diffusion probabilistic models to enhance tracking performance in scenarios characterized by dynamic and non-linear object movements, such as those found in DanceTrack and SportsMOT datasets.

Methodology Overview

The core contribution of this paper is the introduction of DiffMOT, a real-time MOT framework incorporating a Decoupled Diffusion-based Motion Predictor (D2^2MP). Unlike traditional techniques relying heavily on linear models like the Kalman Filter, DiffMOT adeptly manages non-linear motion using a probabilistic diffusion model. The D2^2MP component of this framework models the distribution of various motion patterns and predicts an individual object's motion based on historical movement, substantially improving tracking accuracy in non-linear contexts.

One of the notable features of D2^2MP is its optimization of the diffusion process. Traditional diffusion models often require extensive sampling steps, which significantly impact computational efficiency. By employing a one-step sampling process based on decoupled diffusion theory, D2^2MP achieves high efficiency without sacrificing performance.

Empirical Results

The paper reports superior performance of DiffMOT over existing state-of-the-art methods on highly dynamic datasets. For instance, it achieves a 63.4% HOTA score on DanceTrack and 76.2% on SportsMOT, outperforming competitors by a margin of about 2.1% in HOTA. Moreover, DiffMOT operates at a practical 22.7 FPS, demonstrating the feasibility of real-time applications.

In addition to performance metrics, the paper discusses the architectural nuance of embedding motion prediction within a diffusion-based framework, showing its efficacy in avoiding common pitfalls like ID swaps during tracking non-linear moving objects.

Implications and Future Directions

The incorporation of a diffusion-based model in tracking non-linear motion highlights a promising direction for future research in MOT. Traditional methods struggle with the unpredictable dynamics of real-world objects; thus, diffusion-based models could offer a resilient alternative.

DiffMOT's success could pave the way for more nuanced applications in areas such as autonomous driving and sports analysis, where anticipating object trajectories accurately under non-linear motion is crucial.

However, future research directions may focus on addressing some limitations, such as improving long-term tracking capability and managing occlusions. Enhancement in computational efficiency, while ensuring scalability, remains a pertinent challenge. Exploring the synergistic application of deep learning techniques alongside diffusion models could offer hybrid solutions that balance accuracy and performance in diverse motion scenarios.

Conclusion

The paper presents a significant advancement in multiple object tracking through the introduction of a diffusion-based model adept at handling complex non-linear trajectories in real-time. The DiffMOT framework's robust performance on challenging datasets sets a new benchmark and opens avenues for integrating probabilistic approaches in computer vision tasks requiring high adaptability and precision. As AI technologies evolve, such innovative approaches will be critical in achieving sophisticated real-world applications across various dynamic environments.