MapTrack: Tracking in the Map (2402.12968v1)
Abstract: Multi-Object Tracking (MOT) aims to maintain stable and uninterrupted trajectories for each target. Most state-of-the-art approaches first detect objects in each frame and then implement data association between new detections and existing tracks using motion models and appearance similarities. Despite achieving satisfactory results, occlusion and crowds can easily lead to missing and distorted detections, followed by missing and false associations. In this paper, we first revisit the classic tracker DeepSORT, enhancing its robustness over crowds and occlusion significantly by placing greater trust in predictions when detections are unavailable or of low quality in crowded and occluded scenes. Specifically, we propose a new framework comprising of three lightweight and plug-and-play algorithms: the probability map, the prediction map, and the covariance adaptive Kalman filter. The probability map identifies whether undetected objects have genuinely disappeared from view (e.g., out of the image or entered a building) or are only temporarily undetected due to occlusion or other reasons. Trajectories of undetected targets that are still within the probability map are extended by state estimations directly. The prediction map determines whether an object is in a crowd, and we prioritize state estimations over observations when severe deformation of observations occurs, accomplished through the covariance adaptive Kalman filter. The proposed method, named MapTrack, achieves state-of-the-art results on popular multi-object tracking benchmarks such as MOT17 and MOT20. Despite its superior performance, our method remains simple, online, and real-time. The code will be open-sourced later.
- A. Bewley, Z. Ge, L. Ott, F. Ramos, and B. Upcroft, “Simple online and realtime tracking,” in 2016 IEEE international conference on image processing (ICIP). IEEE, 2016, pp. 3464–3468.
- N. Wojke, A. Bewley, and D. Paulus, “Simple online and realtime tracking with a deep association metric,” in 2017 IEEE international conference on image processing (ICIP). IEEE, 2017, pp. 3645–3649.
- Y. Zhang, P. Sun, Y. Jiang, D. Yu, F. Weng, Z. Yuan, P. Luo, W. Liu, and X. Wang, “Bytetrack: Multi-object tracking by associating every detection box,” in European Conference on Computer Vision. Springer, 2022, pp. 1–21.
- E. Bochinski, V. Eiselein, and T. Sikora, “High-speed tracking-by-detection without using image information,” in 2017 14th IEEE international conference on advanced video and signal based surveillance (AVSS). IEEE, 2017, pp. 1–6.
- X. Zhou, V. Koltun, and P. Krähenbühl, “Tracking objects as points,” in European conference on computer vision. Springer, 2020, pp. 474–490.
- P. Bergmann, T. Meinhardt, and L. Leal-Taixe, “Tracking without bells and whistles,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 941–951.
- Z. Ge, S. Liu, F. Wang, Z. Li, and J. Sun, “Yolox: Exceeding yolo series in 2021,” arXiv preprint arXiv:2107.08430, 2021.
- R. Girshick, “Fast r-cnn,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1440–1448.
- J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779–788.
- J. Redmon and A. Farhadi, “Yolo9000: better, faster, stronger,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 7263–7271.
- J. Munkres, “Algorithms for the assignment and transportation problems,” Journal of the society for industrial and applied mathematics, vol. 5, no. 1, pp. 32–38, 1957.
- Y. Zhang, C. Wang, X. Wang, W. Zeng, and W. Liu, “Fairmot: On the fairness of detection and re-identification in multiple object tracking,” International Journal of Computer Vision, vol. 129, pp. 3069–3087, 2021.
- Y. Du, Z. Zhao, Y. Song, Y. Zhao, F. Su, T. Gong, and H. Meng, “Strongsort: Make deepsort great again,” IEEE Transactions on Multimedia, 2023.
- Y. Du, J. Wan, Y. Zhao, B. Zhang, Z. Tong, and J. Dong, “Giaotracker: A comprehensive framework for mcmot with global information and optimizing strategies in visdrone 2021,” in Proceedings of the IEEE/CVF International conference on computer vision, 2021, pp. 2809–2819.
- R. E. Kalman, “A new approach to linear filtering and prediction problems,” 1960.
- J. Pang, L. Qiu, X. Li, H. Chen, Q. Li, T. Darrell, and F. Yu, “Quasi-dense similarity learning for multiple object tracking,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 164–173.
- Q. Wang, Y. Zheng, P. Pan, and Y. Xu, “Multiple object tracking with correlation learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 3876–3886.
- E. Ristani and C. Tomasi, “Features for multi-target multi-camera tracking and re-identification,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 6036–6046.
- T. Fischer, T. E. Huang, J. Pang, L. Qiu, H. Chen, T. Darrell, and F. Yu, “Qdtrack: Quasi-dense similarity learning for appearance-only multiple object tracking,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
- G. Maggiolino, A. Ahmad, J. Cao, and K. Kitani, “Deep oc-sort: Multi-pedestrian tracking by adaptive re-identification,” arXiv preprint arXiv:2302.11813, 2023.
- J. Cao, J. Pang, X. Weng, R. Khirodkar, and K. Kitani, “Observation-centric sort: Rethinking sort for robust multi-object tracking,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 9686–9696.
- F. Yang, S. Odashima, S. Masui, and S. Jiang, “Hard to track objects with irregular motions and similar appearances? make it easier by buffering the matching space,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023, pp. 4799–4808.
- N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” in European conference on computer vision. Springer, 2020, pp. 213–229.
- T. Meinhardt, A. Kirillov, L. Leal-Taixe, and C. Feichtenhofer, “Trackformer: Multi-object tracking with transformers,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 8844–8854.
- P. Sun, J. Cao, Y. Jiang, R. Zhang, E. Xie, Z. Yuan, C. Wang, and P. Luo, “Transtrack: Multiple object tracking with transformer,” arXiv preprint arXiv:2012.15460, 2020.
- X. Zhou, T. Yin, V. Koltun, and P. Krähenbühl, “Global tracking transformers,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 8771–8780.
- S. Tang, B. Andres, M. Andriluka, and B. Schiele, “Multi-person tracking by multicut and deep matching,” in Computer Vision–ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8-10 and 15-16, 2016, Proceedings, Part II 14. Springer, 2016, pp. 100–111.
- S. Tang, M. Andriluka, B. Andres, and B. Schiele, “Multiple people tracking by lifted multicut and person re-identification,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 3539–3548.
- S. Tang, B. Andres, M. Andriluka, and B. Schiele, “Subgraph decomposition for multi-target tracking,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 5033–5041.
- L. Wen, W. Li, J. Yan, Z. Lei, D. Yi, and S. Z. Li, “Multiple target tracking based on undirected hierarchical relation hypergraph,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 1282–1289.
- G. Brasó and L. Leal-Taixé, “Learning a neural solver for multiple object tracking,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 6247–6257.
- J. He, Z. Huang, N. Wang, and Z. Zhang, “Learnable graph matching: Incorporating graph partitioning with deep feature learning for multiple object tracking,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 5299–5309.
- G. Wang, Y. Wang, H. Zhang, R. Gu, and J.-N. Hwang, “Exploit the connectivity: Multi-object tracking with trackletnet,” in Proceedings of the 27th ACM International Conference on Multimedia, 2019, pp. 482–490.
- P. Sun, J. Cao, Y. Jiang, Z. Yuan, S. Bai, K. Kitani, and P. Luo, “Dancetrack: Multi-object tracking in uniform appearance and diverse motion,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 20 993–21 002.
- A. Cioppa, S. Giancola, A. Deliege, L. Kang, X. Zhou, Z. Cheng, B. Ghanem, and M. Van Droogenbroeck, “Soccernet-tracking: Multiple object tracking dataset and benchmark in soccer videos,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 3491–3502.
- Z. Qin, S. Zhou, L. Wang, J. Duan, G. Hua, and W. Tang, “Motiontrack: Learning robust short-term and long-term motions for multi-object tracking,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 17 939–17 948.
- S. Han, P. Huang, H. Wang, E. Yu, D. Liu, and X. Pan, “Mat: Motion-aware multi-object tracking,” Neurocomputing, vol. 476, pp. 75–86, 2022.
- M. Hu, X. Zhu, H. Wang, S. Cao, C. Liu, and Q. Song, “Stdformer: Spatial-temporal motion transformer for multiple object tracking,” IEEE Transactions on Circuits and Systems for Video Technology, 2023.
- F. Yu, W. Li, Q. Li, Y. Liu, X. Shi, and J. Yan, “Poi: Multiple object tracking with high performance detection and appearance feature,” in Computer Vision–ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8-10 and 15-16, 2016, Proceedings, Part II 14. Springer, 2016, pp. 36–42.
- C. Williams and C. Rasmussen, “Gaussian processes for regression,” Advances in neural information processing systems, vol. 8, 1995.
- Z. Wang, L. Zheng, Y. Liu, Y. Li, and S. Wang, “Towards real-time multi-object tracking,” in European Conference on Computer Vision. Springer, 2020, pp. 107–122.
- X. Zhou, D. Wang, and P. Krähenbühl, “Objects as points,” arXiv preprint arXiv:1904.07850, 2019.
- P. Voigtlaender, M. Krause, A. Osep, J. Luiten, B. B. G. Sekar, A. Geiger, and B. Leibe, “Mots: Multi-object tracking and segmentation,” in Proceedings of the ieee/cvf conference on computer vision and pattern recognition, 2019, pp. 7942–7951.
- S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” Advances in neural information processing systems, vol. 28, 2015.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
- J. Peng, T. Wang, W. Lin, J. Wang, J. See, S. Wen, and E. Ding, “Tpm: Multiple object tracking with tracklet-plane matching,” Pattern Recognition, vol. 107, p. 107480, 2020.
- F. Yang, X. Chang, S. Sakti, Y. Wu, and S. Nakamura, “Remot: A model-agnostic refinement for multiple object tracking,” Image and Vision Computing, vol. 106, p. 104091, 2021.
- J. Gao and R. Nevatia, “Revisiting temporal modeling for video-based person reid,” arXiv preprint arXiv:1805.02104, 2018.
- Y. Yang, Z. Ren, H. Li, C. Zhou, X. Wang, and G. Hua, “Learning dynamics via graph neural networks for human pose estimation and tracking,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 8074–8084.
- L. Zheng, Z. Bie, Y. Sun, J. Wang, C. Su, S. Wang, and Q. Tian, “Mars: A video benchmark for large-scale person re-identification,” in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VI 14. Springer, 2016, pp. 868–884.
- T. Kutschbach, E. Bochinski, V. Eiselein, and T. Sikora, “Sequential sensor fusion combining probability hypothesis density and kernelized correlation filters for multi-object tracking in video data,” in 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE, 2017, pp. 1–5.
- N. L. Baisa, “Robust online multi-target visual tracking using a hisp filter with discriminative deep appearance learning,” Journal of Visual Communication and Image Representation, vol. 77, p. 102952, 2021.
- H. Karunasekera, H. Wang, and H. Zhang, “Multiple object tracking with attention to appearance, structure, motion and size,” IEEE Access, vol. 7, pp. 104 423–104 434, 2019.
- N. L. Baisa, “Online multi-object visual tracking using a gm-phd filter with deep appearance learning,” in 2019 22th international conference on information fusion (FUSION). IEEE, 2019, pp. 1–8.
- N. L. Baisa and A. Wallace, “Development of a n-type gm-phd filter for multiple target, multiple type visual tracking,” Journal of Visual Communication and Image Representation, vol. 59, pp. 257–271, 2019.
- K. Yoon, J. Gwak, Y.-M. Song, Y.-C. Yoon, and M.-G. Jeon, “Oneshotda: Online multi-object tracker with one-shot-learning-based data association,” IEEE Access, vol. 8, pp. 38 060–38 072, 2020.
- Y. Ban, S. Ba, X. Alameda-Pineda, and R. Horaud, “Tracking multiple persons based on a variational bayesian model,” in European Conference on Computer Vision. Springer, 2016, pp. 52–67.
- J. Lohn-Jaramillo, L. Ray, R. Granger, and E. Bowen, “Clustertracker: An efficiency-focused multiple object tracking method,” Available at SSRN 4102945, 2022.
- N. Aharon, R. Orfaig, and B.-Z. Bobrovsky, “Bot-sort: Robust associations multi-pedestrian tracking,” arXiv preprint arXiv:2206.14651, 2022.
- A. Milan, L. Leal-Taixé, I. Reid, S. Roth, and K. Schindler, “Mot16: A benchmark for multi-object tracking,” arXiv preprint arXiv:1603.00831, 2016.
- P. Dendorfer, H. Rezatofighi, A. Milan, J. Shi, D. Cremers, I. Reid, S. Roth, K. Schindler, and L. Leal-Taixé, “Mot20: A benchmark for multi object tracking in crowded scenes,” arXiv preprint arXiv:2003.09003, 2020.
- J. Luiten, A. Osep, P. Dendorfer, P. Torr, A. Geiger, L. Leal-Taixé, and B. Leibe, “Hota: A higher order metric for evaluating multi-object tracking,” International journal of computer vision, vol. 129, pp. 548–578, 2021.
- K. Bernardin and R. Stiefelhagen, “Evaluating multiple object tracking performance: the clear mot metrics,” EURASIP Journal on Image and Video Processing, vol. 2008, pp. 1–10, 2008.
- E. Ristani, F. Solera, R. Zou, R. Cucchiara, and C. Tomasi, “Performance measures and a data set for multi-target, multi-camera tracking,” in European conference on computer vision. Springer, 2016, pp. 17–35.
- T.-Y. Lin, M. Maire, S. Belongie, L. Bourdev, R. Girshick, J. Hays, P. Perona, D. Ramanan, C. L. Zitnick, and P. Dollár, “Microsoft coco: Common objects in context,” 2015.
- S. Zhang, Y. Xie, J. Wan, H. Xia, S. Z. Li, and G. Guo, “Widerperson: A diverse dataset for dense pedestrian detection in the wild,” IEEE Transactions on Multimedia (TMM), 2019.
- K. Zhou, Y. Yang, A. Cavallaro, and T. Xiang, “Omni-scale feature learning for person re-identification,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 3702–3712.
- L. Wei, S. Zhang, W. Gao, and Q. Tian, “Person transfer gan to bridge domain gap for person re-identification,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 79–88.
- Fei Wang (574 papers)
- Ruohui Zhang (1 paper)
- Chenglin Chen (1 paper)
- Min Yang (239 papers)
- Yun Bai (16 papers)