DiffMOT: A Real-time Diffusion-based Multiple Object Tracker with Non-linear Prediction (2403.02075v2)
Abstract: In Multiple Object Tracking, objects often exhibit non-linear motion of acceleration and deceleration, with irregular direction changes. Tacking-by-detection (TBD) trackers with Kalman Filter motion prediction work well in pedestrian-dominant scenarios but fall short in complex situations when multiple objects perform non-linear and diverse motion simultaneously. To tackle the complex non-linear motion, we propose a real-time diffusion-based MOT approach named DiffMOT. Specifically, for the motion predictor component, we propose a novel Decoupled Diffusion-based Motion Predictor (D$2$MP). It models the entire distribution of various motion presented by the data as a whole. It also predicts an individual object's motion conditioning on an individual's historical motion information. Furthermore, it optimizes the diffusion process with much fewer sampling steps. As a MOT tracker, the DiffMOT is real-time at 22.7FPS, and also outperforms the state-of-the-art on DanceTrack and SportsMOT datasets with $62.3\%$ and $76.2\%$ in HOTA metrics, respectively. To the best of our knowledge, DiffMOT is the first to introduce a diffusion probabilistic model into the MOT to tackle non-linear motion prediction.
- Bot-sort: Robust associations multi-pedestrian tracking. arXiv preprint arXiv:2206.14651, 2022.
- Evaluating multiple object tracking performance: the clear mot metrics. EURASIP Journal on Image and Video Processing, 2008:1–10, 2008.
- Simple online and realtime tracking. In ICIP, pages 3464–3468. IEEE, 2016.
- Observation-centric sort: Rethinking sort for robust multi-object tracking. In CVPR, pages 9686–9696, 2023.
- Deft: Detection embeddings for tracking. arXiv preprint arXiv:2102.02267, 2021.
- Sportsmot: A large multi-object tracking dataset in multiple sports scenes. In ICCV, pages 9921–9931, 2023.
- Mot20: A benchmark for multi object tracking in crowded scenes. arXiv preprint arXiv:2003.09003, 2020.
- Diffusion models beat gans on image synthesis. NeurIPS, 34:8780–8794, 2021.
- Genie: Higher-order denoising diffusion solvers. NeurIPS, 35:30150–30166, 2022.
- Score-based diffusion meets annealed importance sampling. NeurIPS, 35:21482–21494, 2022.
- Strongsort: Make deepsort great again. IEEE Transactions on Multimedia, 2023.
- Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430, 2021.
- Denoising diffusion probabilistic models. NeurIPS, 33:6840–6851, 2020.
- Cascaded diffusion models for high fidelity image generation. The Journal of Machine Learning Research, 23(1):2249–2281, 2022.
- Decoupled diffusion models with explicit transition probability. arXiv preprint arXiv:2306.13720, 2023.
- Harold W Kuhn. The hungarian method for the assignment problem. Naval research logistics quarterly, 2(1-2):83–97, 1955.
- Explicit visual prompting for low-level structure segmentations. In CVPR, pages 19434–19445, 2023a.
- Sparsetrack: Multi-object tracking by performing scene decomposition based on pseudo-depth. arXiv preprint arXiv:2306.05238, 2023b.
- Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. NeurIPS, 35:5775–5787, 2022.
- Hota: A higher order metric for evaluating multi-object tracking. International journal of computer vision, 129(2):548–578, 2021.
- Diffusiontrack: Diffusion model for multi-object tracking. arXiv preprint arXiv:2308.09905, 2023.
- Deep oc-sort: Multi-pedestrian tracking by adaptive re-identification. arXiv preprint arXiv:2302.11813, 2023.
- Mot16: A benchmark for multi-object tracking. arXiv preprint arXiv:1603.00831, 2016.
- Quasi-dense similarity learning for multiple object tracking. In CVPR, pages 164–173, 2021.
- Flexible style image super-resolution using conditional objective. IEEE Access, 10:9774–9792, 2022.
- Performance measures and a data set for multi-target, multi-camera tracking. In ECCV, pages 17–35. Springer, 2016.
- High-resolution image synthesis with latent diffusion models. In CVPR, pages 10684–10695, 2022.
- Thompson sampling efficiently learns to control diffusion processes. NeurIPS, 35:3871–3884, 2022.
- Denoising diffusion implicit models. ICLR, 2021.
- Transtrack: Multiple object tracking with transformer. arXiv preprint arXiv:2012.15460, 2020.
- Dancetrack: Multi-object tracking in uniform appearance and diverse motion. In CVPR, pages 20993–21002, 2022.
- Simple online and realtime tracking with a deep association metric. In ICIP, pages 3645–3649. IEEE, 2017.
- Track to detect and segment: An online multi-object tracker. In CVPR, pages 12352–12361, 2021.
- Motiontrack: Learning motion predictor for multiple object tracking. arXiv preprint arXiv:2306.02585, 2023.
- Hard to track objects with irregular motions and similar appearances? make it easier by buffering the matching space. In WACV, pages 4799–4808, 2023.
- Motr: End-to-end multiple-object tracking with transformer. In ECCV, pages 659–675. Springer, 2022.
- Multiple object tracking by flowing and fusing. arXiv preprint arXiv:2001.11180, 2020.
- Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3836–3847, 2023.
- Fairmot: On the fairness of detection and re-identification in multiple object tracking. International Journal of Computer Vision, 129:3069–3087, 2021.
- Bytetrack: Multi-object tracking by associating every detection box. In ECCV, pages 1–21. Springer, 2022.
- Large scale image completion via co-modulated generative adversarial networks. ICLR, 2021.
- Tracking objects as points. In ECCV, pages 474–490. Springer, 2020.
- Global tracking transformers. In CVPR, pages 8771–8780, 2022.