Multi-task Learning for Joint Re-identification, Team Affiliation, and Role Classification for Sports Visual Tracking (2401.09942v1)
Abstract: Effective tracking and re-identification of players is essential for analyzing soccer videos. But, it is a challenging task due to the non-linear motion of players, the similarity in appearance of players from the same team, and frequent occlusions. Therefore, the ability to extract meaningful embeddings to represent players is crucial in developing an effective tracking and re-identification system. In this paper, a multi-purpose part-based person representation method, called PRTreID, is proposed that performs three tasks of role classification, team affiliation, and re-identification, simultaneously. In contrast to available literature, a single network is trained with multi-task supervision to solve all three tasks, jointly. The proposed joint method is computationally efficient due to the shared backbone. Also, the multi-task learning leads to richer and more discriminative representations, as demonstrated by both quantitative and qualitative results. To demonstrate the effectiveness of PRTreID, it is integrated with a state-of-the-art tracking method, using a part-based post-processing module to handle long-term tracking. The proposed tracking method outperforms all existing tracking methods on the challenging SoccerNet tracking dataset.
- Pose estimation of soccer players using multiple uncalibrated cameras. Multimedia Tools and Applications 75 (2016), 6809–6827.
- BoT-SORT: Robust associations multi-pedestrian tracking. arXiv preprint arXiv:2206.14651 (2022).
- Simple online and realtime tracking. In 2016 IEEE international conference on image processing (ICIP). IEEE, 3464–3468.
- Observation-centric sort: Rethinking sort for robust multi-object tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9686–9696.
- Fan Chen and Christophe De Vleeschouwer. 2011. Formulating team-sport video summarization as a resource allocation problem. IEEE Transactions on Circuits and Systems for Video Technology 21, 2 (2011), 193–205.
- Learning online smooth predictors for realtime camera planning using recurrent decision trees. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4688–4696.
- A Bottom-Up Approach Based on Semantics for the Interpretation of the Main Camera Stream in Soccer Games. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (2018), 1846–184609. https://api.semanticscholar.org/CorpusID:52027492
- Scaling up SoccerNet with multi-view spatial localization and re-identification. Scientific Data 9 (2022). https://api.semanticscholar.org/CorpusID:249894209
- ARTHuS: Adaptive real-time human segmentation in sports through online distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 0–0.
- SoccerNet-Tracking: Multiple Object Tracking Dataset and Benchmark in Soccer Videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3491–3502.
- Abdulrahman Darwish and Tallal El-Shabrway. 2022. STE: Spatio-Temporal Encoder for Action Spotting in Soccer Videos. In Proceedings of the 5th International ACM Workshop on Multimedia Content Analysis in Sports. 87–92.
- Strongsort: Make deepsort great again. IEEE Transactions on Multimedia (2023).
- Pose-guided visible part matching for occluded person ReID. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 11741–11749. https://doi.org/10.1109/CVPR42600.2020.01176 arXiv:2004.00230
- DeepSportLab: a Unified Framework for Ball Detection, Player Instance Segmentation and Pose Estimation in Team Sports Scenes. ArXiv abs/2112.00627 (2021). https://api.semanticscholar.org/CorpusID:244773081
- SoccerNet 2022 challenges results. In Proceedings of the 5th International ACM Workshop on Multimedia Content Analysis in Sports. 75–86.
- Xin He. 2022. Application of deep learning in video target tracking of soccer players. Soft Computing 26, 20 (2022), 10971–10979.
- VARS: Video Assistant Referee System for Automated Soccer Decision Making from Multiple Views. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (2023), 5086–5097. https://api.semanticscholar.org/CorpusID:258048692
- Elad Hoffer and Nir Ailon. 2015. Deep metric learning using triplet network. In Similarity-Based Pattern Recognition: Third International Workshop, SIMBAD 2015, Copenhagen, Denmark, October 12-14, 2015. Proceedings 3. Springer, 84–92.
- Self-supervised small soccer player detection and tracking. In Proceedings of the 3rd international workshop on multimedia content analysis in sports. 9–18.
- Associative embedding for team discrimination. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 0–0.
- Human semantic parsing for person re-identification. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1062–1071.
- Discriminative and efficient label propagation on complementary graphs for multi-object tracking. IEEE transactions on pattern analysis and machine intelligence 39, 1 (2016), 61–74.
- Contrastive learning for sports video: Unsupervised player classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4528–4536.
- Openpifpaf: Composite fields for semantic keypoint detection and spatio-temporal association. IEEE Transactions on Intelligent Transportation Systems 23, 8 (2021), 13498–13511.
- Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision. 2980–2988.
- Hengyue Liu and Bir Bhanu. 2019. Pose-guided R-CNN for jersey number recognition in sports. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 0–0.
- Learning to track and identify players from broadcast sports videos. IEEE transactions on pattern analysis and machine intelligence 35, 7 (2013), 1704–1716.
- Bag of tricks and a strong baseline for deep person re-identification. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops. 0–0.
- Deep oc-sort: Multi-pedestrian tracking by adaptive re-identification. arXiv preprint arXiv:2302.11813 (2023).
- Efficient tracking of team sport players with few game-specific annotations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3461–3471.
- SoccerNet-Caption: Dense Video Captioning for Soccer Broadcasts Commentaries. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (2023), 5074–5085. https://api.semanticscholar.org/CorpusID:258049025
- Multiple soccer players tracking. In 2015 The international symposium on artificial intelligence and signal processing (AISP). IEEE, 310–315.
- SoccerTrack: A dataset and tracking algorithm for soccer with fish-eye and drone videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3569–3579.
- Part-based player identification using deep convolutional representation and multi-scale pooling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 1732–1739.
- Body part-based representation learning for occluded person Re-Identification. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 1613–1623.
- Part-aligned bilinear representations for person re-identification. In Proceedings of the European conference on computer vision (ECCV). 402–419.
- Deep high-resolution representation learning for human pose estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 5693–5703.
- Beyond Part Models: Person Retrieval with Refined Part Pooling (and a Strong Convolutional Baseline). Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 11208 LNCS (nov 2017), 501–518. arXiv:1711.09349 http://arxiv.org/abs/1711.09349
- Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline). In Proceedings of the European conference on computer vision (ECCV). 480–496.
- Event detection and summarization in soccer videos using bayesian network and copula. IEEE Transactions on circuits and systems for video technology 24, 2 (2013), 291–304.
- Rajkumar Theagarajan and Bir Bhanu. 2020. An automated system for generating tactical performance statistics for individual soccer players from videos. IEEE Transactions on Circuits and Systems for Video Technology 31, 2 (2020), 632–646.
- Computer vision for sports: Current applications and research topics. Computer Vision and Image Understanding 159 (2017), 3–18.
- Automatic player labeling, tracking and field registration and trajectory mapping in broadcast soccer video. ACM Transactions on Intelligent Systems and Technology (TIST) 2, 2 (2011), 1–32.
- Automated Offside Detection by Spatio-Temporal Analysis of Football Videos. In Proceedings of the 4th International Workshop on Multimedia Content Analysis in Sports. 17–24.
- Semi-Supervised Training to Improve Player and Ball Detection in Soccer. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (2022), 3480–3489. https://api.semanticscholar.org/CorpusID:248178107
- Multi-task learning for jersey number recognition in ice hockey. In Proceedings of the 4th International Workshop on Multimedia Content Analysis in Sports. 11–15.
- Ice hockey player identification via transformers and weakly supervised learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3451–3460.
- Player tracking and identification in ice hockey. Expert Systems with Applications 213 (2023), 119250.
- Pose-guided part matching network via shrinking and reweighting for occluded person re-identification. Image and Vision Computing 111 (2021), 104186.
- Simple online and realtime tracking with a deep association metric. In 2017 IEEE international conference on image processing (ICIP). IEEE, 3645–3649.
- Dual attention-based method for occluded person re-identification. Knowledge-Based Systems 212 (2021), 106554.
- Hard to track objects with irregular motions and similar appearances? make it easier by buffering the matching space. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 4799–4808.
- Multi-camera sports players 3d localization with identification reasoning. In 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, 4497–4504.
- DeepSportradar-v1: Computer Vision Dataset for Sports Understanding with High Quality Annotations. Proceedings of the 5th International ACM Workshop on Multimedia Content Analysis in Sports (2022). https://api.semanticscholar.org/CorpusID:251623078
- Multi-camera multi-player tracking with deep player identification in sports video. Pattern Recognition 102 (2020), 107260.
- Bytetrack: Multi-object tracking by associating every detection box. In European Conference on Computer Vision. Springer, 1–21.
- Fairmot: On the fairness of detection and re-identification in multiple object tracking. International Journal of Computer Vision 129 (2021), 3069–3087.
- Scalable person re-identification: A benchmark. In Proceedings of the IEEE international conference on computer vision. 1116–1124.