MeMOTR: Long-Term Memory-Augmented Transformer for Multi-Object Tracking (2307.15700v3)
Abstract: As a video task, Multiple Object Tracking (MOT) is expected to capture temporal information of targets effectively. Unfortunately, most existing methods only explicitly exploit the object features between adjacent frames, while lacking the capacity to model long-term temporal information. In this paper, we propose MeMOTR, a long-term memory-augmented Transformer for multi-object tracking. Our method is able to make the same object's track embedding more stable and distinguishable by leveraging long-term memory injection with a customized memory-attention layer. This significantly improves the target association ability of our model. Experimental results on DanceTrack show that MeMOTR impressively surpasses the state-of-the-art method by 7.9% and 13.0% on HOTA and AssA metrics, respectively. Furthermore, our model also outperforms other Transformer-based methods on association performance on MOT17 and generalizes well on BDD100K. Code is available at https://github.com/MCG-NJU/MeMOTR.
- Tracking without bells and whistles. In ICCV, pages 941–951. IEEE, 2019.
- Evaluating multiple object tracking performance: The CLEAR MOT metrics. EURASIP J. Image Video Process., 2008, 2008.
- Simple online and realtime tracking. In ICIP, pages 3464–3468. IEEE, 2016.
- Memot: Multi-object tracking with memory. In CVPR, pages 8080–8090. IEEE, 2022.
- Observation-centric SORT: rethinking SORT for robust multi-object tracking. CoRR, abs/2203.14360, 2022.
- End-to-end object detection with transformers. In ECCV (1), volume 12346 of Lecture Notes in Computer Science, pages 213–229. Springer, 2020.
- A unified framework for multi-target tracking and collective activity recognition. In ECCV (4), volume 7575 of Lecture Notes in Computer Science, pages 215–230. Springer, 2012.
- SportsMOT: A large multi-object tracking dataset in multiple sports scenes. In ICCV, 2023.
- MOT20: A benchmark for multi object tracking in crowded scenes. CoRR, abs/2003.09003, 2020.
- Quo vadis: Is trajectory forecasting the key towards long-term multi-object tracking? In NeurIPS, 2022.
- Motsynth: How can synthetic data help pedestrian detection and tracking? In ICCV, pages 10829–10839. IEEE, 2021.
- YOLOX: exceeding YOLO series in 2021. CoRR, abs/2107.08430, 2021.
- Soccernet: A scalable dataset for action spotting in soccer videos. In CVPR Workshops, pages 1711–1721. Computer Vision Foundation / IEEE Computer Society, 2018.
- Social GAN: socially acceptable trajectories with generative adversarial networks. In CVPR, pages 2255–2264. Computer Vision Foundation / IEEE Computer Society, 2018.
- Deep residual learning for image recognition. In CVPR, pages 770–778. IEEE Computer Society, 2016.
- A survey on visual surveillance of object motion and behaviors. IEEE Trans. Syst. Man Cybern. Part C, 34(3):334–352, 2004.
- End-to-end tracking with a multi-query transformer. CoRR, abs/2210.14601, 2022.
- Tracking every thing in the wild. In ECCV (22), volume 13682 of Lecture Notes in Computer Science, pages 498–515. Springer, 2022.
- Microsoft COCO: common objects in context. In ECCV (5), volume 8693 of Lecture Notes in Computer Science, pages 740–755. Springer, 2014.
- DAB-DETR: dynamic anchor boxes are better queries for DETR. In ICLR. OpenReview.net, 2022.
- HOTA: A higher order metric for evaluating multi-object tracking. Int. J. Comput. Vis., 129(2):548–578, 2021.
- Trackformer: Multi-object tracking with transformers. In CVPR, pages 8834–8844. IEEE, 2022.
- MOT16: A benchmark for multi-object tracking. CoRR, abs/1603.00831, 2016.
- Quasi-dense similarity learning for multiple object tracking. In CVPR, pages 164–173. Computer Vision Foundation / IEEE, 2021.
- Faster R-CNN: towards real-time object detection with region proposal networks. In NIPS, pages 91–99, 2015.
- Performance measures and a data set for multi-target, multi-camera tracking. In ECCV Workshops (2), volume 9914 of Lecture Notes in Computer Science, pages 17–35, 2016.
- Features for multi-target multi-camera tracking and re-identification. In CVPR, pages 6036–6046. Computer Vision Foundation / IEEE Computer Society, 2018.
- Crowdhuman: A benchmark for detecting human in a crowd. CoRR, abs/1805.00123, 2018.
- Adaptive background mixture models for real-time tracking. In CVPR, pages 2246–2252. IEEE Computer Society, 1999.
- Dancetrack: Multi-object tracking in uniform appearance and diverse motion. In CVPR, pages 20961–20970. IEEE, 2022.
- Transtrack: Multiple-object tracking with transformer. CoRR, abs/2012.15460, 2020.
- Attention is all you need. In NIPS, pages 5998–6008, 2017.
- Towards real-time multi-object tracking. In ECCV (11), volume 12356 of Lecture Notes in Computer Science, pages 107–122. Springer, 2020.
- An introduction to the kalman filter. 1995.
- Simple online and realtime tracking with a deep association metric. In ICIP, pages 3645–3649. IEEE, 2017.
- Tracking by associating clips. In ECCV (25), volume 13685 of Lecture Notes in Computer Science, pages 129–145. Springer, 2022.
- Track to detect and segment: An online multi-object tracker. In CVPR, pages 12352–12361. Computer Vision Foundation / IEEE, 2021.
- Transcenter: Transformers with dense queries for multiple-object tracking. CoRR, abs/2103.15145, 2021.
- Towards grand unification of object tracking. In ECCV (21), volume 13681 of Lecture Notes in Computer Science, pages 733–751. Springer, 2022.
- Multiple object tracking challenge technical report for team mt iot, 2022.
- Hard to track objects with irregular motions and similar appearances? make it easier by buffering the matching space. In WACV, pages 4788–4797. IEEE, 2023.
- BDD100K: A diverse driving dataset for heterogeneous multitask learning. In CVPR, pages 2633–2642. Computer Vision Foundation / IEEE, 2020.
- MOTR: end-to-end multiple-object tracking with transformer. In ECCV (27), volume 13687 of Lecture Notes in Computer Science, pages 659–675. Springer, 2022.
- Bytetrack: Multi-object tracking by associating every detection box. In ECCV (22), volume 13682 of Lecture Notes in Computer Science, pages 1–21. Springer, 2022.
- Fairmot: On the fairness of detection and re-identification in multiple object tracking. Int. J. Comput. Vis., 129(11):3069–3087, 2021.
- Motrv2: Bootstrapping end-to-end multi-object tracking by pretrained object detectors, 2022.
- Tracking objects as points. In ECCV (4), volume 12349 of Lecture Notes in Computer Science, pages 474–490. Springer, 2020.
- Global tracking transformers. In CVPR, pages 8761–8770. IEEE, 2022.
- Deformable DETR: deformable transformers for end-to-end object detection. In ICLR. OpenReview.net, 2021.