OccluTrack: Rethinking Awareness of Occlusion for Enhancing Multiple Pedestrian Tracking (2309.10360v1)
Abstract: Multiple pedestrian tracking faces the challenge of tracking pedestrians in the presence of occlusion. Existing methods suffer from inaccurate motion estimation, appearance feature extraction, and association due to occlusion, leading to inadequate Identification F1-Score (IDF1), excessive ID switches (IDSw), and insufficient association accuracy and recall (AssA and AssR). We found that the main reason is abnormal detections caused by partial occlusion. In this paper, we suggest that the key insight is explicit motion estimation, reliable appearance features, and fair association in occlusion scenes. Specifically, we propose an adaptive occlusion-aware multiple pedestrian tracker, OccluTrack. We first introduce an abnormal motion suppression mechanism into the Kalman Filter to adaptively detect and suppress outlier motions caused by partial occlusion. Second, we propose a pose-guided re-ID module to extract discriminative part features for partially occluded pedestrians. Last, we design a new occlusion-aware association method towards fair IoU and appearance embedding distance measurement for occluded pedestrians. Extensive evaluation results demonstrate that our OccluTrack outperforms state-of-the-art methods on MOT-Challenge datasets. Particularly, the improvements on IDF1, IDSw, AssA, and AssR demonstrate the effectiveness of our OccluTrack on tracking and association performance.
- X. Wang, “Intelligent multi-camera video surveillance: A review,” Pattern recognition letters, vol. 34, no. 1, pp. 3–19, 2013.
- K. Wu, Y. Yang, Q. Liu, and X.-P. Zhang, “Focal stack image compression based on basis-quadtree representation,” IEEE Transactions on Multimedia, vol. 25, pp. 3975–3988, 2023.
- I. Ahmed, S. Din, G. Jeon, F. Piccialli, and G. Fortino, “Towards collaborative robotics in top view surveillance: A framework for multiple object tracking by detection using deep learning,” IEEE/CAA Journal of Automatica Sinica, vol. 8, no. 7, pp. 1253–1270, 2021.
- B. Bescos, C. Campos, J. D. Tardós, and J. Neira, “Dynaslam ii: Tightly-coupled multi-object tracking and slam,” IEEE robotics and automation letters, vol. 6, no. 3, pp. 5191–5198, 2021.
- D. Feng, C. Haase-Schütz, L. Rosenbaum, H. Hertlein, C. Glaeser, F. Timm, W. Wiesbeck, and K. Dietmayer, “Deep multi-modal object detection and semantic segmentation for autonomous driving: Datasets, methods, and challenges,” IEEE Transactions on Intelligent Transportation Systems, vol. 22, no. 3, pp. 1341–1360, 2020.
- X. Weng, J. Wang, D. Held, and K. Kitani, “3d multi-object tracking: A baseline and new evaluation metrics,” in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2020, pp. 10 359–10 366.
- A. Bewley, Z. Ge, L. Ott, F. Ramos, and B. Upcroft, “Simple online and realtime tracking,” in 2016 IEEE international conference on image processing (ICIP). IEEE, 2016, pp. 3464–3468.
- Y. Zhang, C. Wang, X. Wang, W. Zeng, and W. Liu, “Fairmot: On the fairness of detection and re-identification in multiple object tracking,” International Journal of Computer Vision, vol. 129, pp. 3069–3087, 2021.
- Y. Zhang, P. Sun, Y. Jiang, D. Yu, F. Weng, Z. Yuan, P. Luo, W. Liu, and X. Wang, “Bytetrack: Multi-object tracking by associating every detection box,” in Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXII. Springer, 2022, pp. 1–21.
- N. Aharon, R. Orfaig, and B.-Z. Bobrovsky, “Bot-sort: Robust associations multi-pedestrian tracking,” arXiv preprint arXiv:2206.14651, 2022.
- T. Meinhardt, A. Kirillov, L. Leal-Taixe, and C. Feichtenhofer, “Trackformer: Multi-object tracking with transformers,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 8844–8854.
- F. Zeng, B. Dong, Y. Zhang, T. Wang, X. Zhang, and Y. Wei, “Motr: End-to-end multiple-object tracking with transformer,” in Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXVII. Springer, 2022, pp. 659–675.
- P. Sun, J. Cao, Y. Jiang, R. Zhang, E. Xie, Z. Yuan, C. Wang, and P. Luo, “Transtrack: Multiple object tracking with transformer,” arXiv preprint arXiv:2012.15460, 2020.
- P. Chu, J. Wang, Q. You, H. Ling, and Z. Liu, “Transmot: Spatial-temporal graph transformer for multiple object tracking,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023, pp. 4870–4880.
- Y. Zhang, H. Sheng, Y. Wu, S. Wang, W. Lyu, W. Ke, and Z. Xiong, “Long-term tracking with deep tracklet association,” IEEE Transactions on Image Processing, vol. 29, pp. 6694–6706, 2020.
- Z. Qin, S. Zhou, L. Wang, J. Duan, G. Hua, and W. Tang, “Motiontrack: Learning robust short-term and long-term motions for multi-object tracking,” arXiv preprint arXiv:2303.10404, 2023.
- X. Zhou, T. Yin, V. Koltun, and P. Krähenbühl, “Global tracking transformers,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 8771–8780.
- L. Ke, X. Li, M. Danelljan, Y.-W. Tai, C.-K. Tang, and F. Yu, “Prototypical cross-attention networks for multiple object tracking and segmentation,” Advances in Neural Information Processing Systems, vol. 34, pp. 1192–1203, 2021.
- P. Dendorfer, V. Yugay, A. Osep, and L. Leal-Taixé, “Quo vadis: Is trajectory forecasting the key towards long-term multi-object tracking?” Advances in Neural Information Processing Systems, vol. 35, pp. 15 657–15 671, 2022.
- Z. Wang, L. Zheng, Y. Liu, Y. Li, and S. Wang, “Towards real-time multi-object tracking,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XI 16. Springer, 2020, pp. 107–122.
- E. Bochinski, V. Eiselein, and T. Sikora, “High-speed tracking-by-detection without using image information,” in 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), 2017, pp. 1–6.
- J. Cao, X. Weng, R. Khirodkar, J. Pang, and K. Kitani, “Observation-centric sort: Rethinking sort for robust multi-object tracking,” arXiv preprint arXiv:2203.14360, 2022.
- X. Zhou, V. Koltun, and P. Krähenbühl, “Tracking objects as points,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IV. Springer, 2020, pp. 474–490.
- X. Zhou, D. Wang, and P. Krähenbühl, “Objects as points,” in arXiv preprint arXiv:1904.07850, 2019.
- J. Peng, C. Wang, F. Wan, Y. Wu, Y. Wang, Y. Tai, C. Wang, J. Li, F. Huang, and Y. Fu, “Chained-tracker: Chaining paired attentive regression results for end-to-end joint multiple-object detection and tracking,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IV 16. Springer, 2020, pp. 145–161.
- J. Wu, J. Cao, L. Song, Y. Wang, M. Yang, and J. Yuan, “Track to detect and segment: An online multi-object tracker,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 12 352–12 361.
- J. Pang, L. Qiu, X. Li, H. Chen, Q. Li, T. Darrell, and F. Yu, “Quasi-dense similarity learning for multiple object tracking,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 164–173.
- N. Yang, Y. Wang, and L.-P. Chau, “Multi-object tracking with tracked object bounding box association,” in 2021 IEEE International Conference on Multimedia & Expo Workshops (ICMEW). IEEE, 2021, pp. 1–6.
- S. You, H. Yao, and C. Xu, “Multi-object tracking with spatial-temporal topology-based detector,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 5, pp. 3023–3035, 2021.
- Y. Zhou, Y. Wang, and L.-P. Chau, “Moving towards centers: Re-ranking with attention and memory for re-identification,” IEEE Transactions on Multimedia, 2022.
- N. Wojke, A. Bewley, and D. Paulus, “Simple online and realtime tracking with a deep association metric,” in 2017 IEEE international conference on image processing (ICIP). IEEE, 2017, pp. 3645–3649.
- J. Kong, E. Mo, M. Jiang, and T. Liu, “Motfr: Multiple object tracking based on feature recoding,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 11, pp. 7746–7757, 2022.
- Y. Jin, F. Gao, J. Yu, J. Wang, and F. Shuang, “Multi-object tracking: Decoupling features to solve the contradictory dilemma of feature requirements,” IEEE Transactions on Circuits and Systems for Video Technology, 2023.
- Y. Du, Z. Zhao, Y. Song, Y. Zhao, F. Su, T. Gong, and H. Meng, “Strongsort: Make deepsort great again,” IEEE Transactions on Multimedia, 2023.
- M. Chaabane, P. Zhang, J. R. Beveridge, and S. O’Hara, “Deft: Detection embeddings for tracking,” arXiv preprint arXiv:2102.02267, 2021.
- Y. Xu, Y. Ban, G. Delorme, C. Gan, D. Rus, and X. Alameda-Pineda, “Transcenter: Transformers with dense queries for multiple-object tracking,” arXiv e-prints, pp. arXiv–2103, 2021.
- Y. Zhang, T. Wang, and X. Zhang, “Motrv2: Bootstrapping end-to-end multi-object tracking by pretrained object detectors,” arXiv preprint arXiv:2211.09791, 2022.
- J. Gao, K.-H. Yap, Y. Wang, K. Garg, and B. S. Han, “Metformer: A motion enhanced transformer for multiple object tracking,” in 2023 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 2023, pp. 1–5.
- N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16. Springer, 2020, pp. 213–229.
- X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai, “Deformable detr: Deformable transformers for end-to-end object detection,” arXiv preprint arXiv:2010.04159, 2020.
- Z. Zhao, Z. Wu, Y. Zhuang, B. Li, and J. Jia, “Tracking objects as pixel-wise distributions,” in Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXII. Springer, 2022, pp. 76–94.
- Z. Ge, S. Liu, F. Wang, Z. Li, and J. Sun, “Yolox: Exceeding yolo series in 2021,” arXiv preprint arXiv:2107.08430, 2021.
- L. Tang, Y. Wang, and L.-P. Chau, “Weakly-supervised part-attention and mentored networks for vehicle re-identification,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 12, pp. 8887–8898, 2022.
- K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask r-cnn,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2961–2969.
- H. Zhang, C. Wu, Z. Zhang, Y. Zhu, H. Lin, Z. Zhang, Y. Sun, T. He, J. Mueller, R. Manmatha et al., “Resnest: Split-attention networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 2736–2746.
- H.-S. Fang, S. Xie, Y.-W. Tai, and C. Lu, “RMPE: Regional multi-person pose estimation,” in ICCV, 2017.
- K. Bernardin and R. Stiefelhagen, “Evaluating multiple object tracking performance: the clear mot metrics,” EURASIP Journal on Image and Video Processing, vol. 2008, pp. 1–10, 2008.
- P. Dendorfer, A. Osep, A. Milan, K. Schindler, D. Cremers, I. Reid, S. Roth, and L. Leal-Taixé, “Motchallenge: A benchmark for single-camera multiple target tracking,” International Journal of Computer Vision, vol. 129, pp. 845–881, 2021.
- M. Cordts, M. Omran, S. Ramos, T. Scharwächter, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset,” in CVPR Workshop on the Future of Datasets in Vision, vol. 2. sn, 2015.
- A. Ess, B. Leibe, K. Schindler, , and L. van Gool, “A mobile vision system for robust multi-person tracking,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR’08). IEEE Press, June 2008.
- S. Shao, Z. Zhao, B. Li, T. Xiao, G. Yu, X. Zhang, and J. Sun, “Crowdhuman: A benchmark for detecting human in a crowd,” arXiv preprint arXiv:1805.00123, 2018.
- C. Liang, Z. Zhang, X. Zhou, B. Li, S. Zhu, and W. Hu, “Rethinking the competition between detection and reid in multiobject tracking,” IEEE Transactions on Image Processing, vol. 31, pp. 3182–3196, 2022.
- E. Yu, Z. Li, S. Han, and H. Wang, “Relationtrack: Relation-aware multiple object tracking with decoupled representation,” IEEE Transactions on Multimedia, 2022.
- B. Shuai, A. Berneshawi, X. Li, D. Modolo, and J. Tighe, “Siammot: Siamese multi-object tracking,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 12 372–12 382.
- L. Zheng, M. Tang, Y. Chen, G. Zhu, J. Wang, and H. Lu, “Improving multiple object tracking with single object tracking,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 2453–2462.