Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Domain-Guided Masked Autoencoders for Unique Player Identification (2403.11328v1)

Published 17 Mar 2024 in cs.CV and cs.AI

Abstract: Unique player identification is a fundamental module in vision-driven sports analytics. Identifying players from broadcast videos can aid with various downstream tasks such as player assessment, in-game analysis, and broadcast production. However, automatic detection of jersey numbers using deep features is challenging primarily due to: a) motion blur, b) low resolution video feed, and c) occlusions. With their recent success in various vision tasks, masked autoencoders (MAEs) have emerged as a superior alternative to conventional feature extractors. However, most MAEs simply zero-out image patches either randomly or focus on where to mask rather than how to mask. Motivated by human vision, we devise a novel domain-guided masking policy for MAEs termed d-MAE to facilitate robust feature extraction in the presence of motion blur for player identification. We further introduce a new spatio-temporal network leveraging our novel d-MAE for unique player identification. We conduct experiments on three large-scale sports datasets, including a curated baseball dataset, the SoccerNet dataset, and an in-house ice hockey dataset. We preprocess the datasets using an upgraded keyframe identification (KfID) module by focusing on frames containing jersey numbers. Additionally, we propose a keyframe-fusion technique to augment keyframes, preserving spatial and temporal context. Our spatio-temporal network showcases significant improvements, surpassing the current state-of-the-art by 8.58%, 4.29%, and 1.20% in the test set accuracies, respectively. Rigorous ablations highlight the effectiveness of our domain-guided masking approach and the refined KfID module, resulting in performance enhancements of 1.48% and 1.84% respectively, compared to original architectures.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. K. Vats, W. J. McNally, P. Walters, D. A. Clausi, and J. S. Zelek, “Ice hockey player identification via transformers and weakly supervised learning,” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 3450–3459, 2021.
  2. B. Balaji, J. Bright, H. Prakash, Y. Chen, D. A. Clausi, and J. Zelek, “Jersey number recognition using keyframe identification from low-resolution broadcast videos,” in Proceedings of the 6th International Workshop on Multimedia Content Analysis in Sports, ser. MMSports ’23.   New York, NY, USA: Association for Computing Machinery, 2023, p. 123–130. [Online]. Available: https://doi.org/10.1145/3606038.3616162
  3. H. Liu and B. Bhanu, “Pose-guided r-cnn for jersey number recognition in sports,” 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 2457–2466, 2019.
  4. G. Li, S. Xu, X. Liu, L. Li, and C. Wang, “Jersey number recognition with semi-supervised spatial transformer network,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2018, pp. 1864–18 647.
  5. H. Liu, C. Aderon, N. Wagon, H. Liu, S. MacCall, and Y. Gan, “Deep learning-based automatic player identification and logging in american football videos,” arXiv preprint arXiv:2204.13809, 2022.
  6. K. Vats, P. Walters, M. Fani, D. A. Clausi, and J. S. Zelek, “Player tracking and identification in ice hockey,” ArXiv, vol. abs/2110.03090, 2021. [Online]. Available: https://api.semanticscholar.org/CorpusID:238419671
  7. W.-L. Lu, J.-A. Ting, J. Little, and K. P. Murphy, “Ieee transactions on pattern analysis and machine intelligence learning to track and identify players from broadcast sports videos.” [Online]. Available: https://api.semanticscholar.org/CorpusID:7428154
  8. K. Vats, M. Fani, D. A. Clausi, and J. Zelek, “Multi-task learning for jersey number recognition in ice hockey,” in Proceedings of the 4th International Workshop on Multimedia Content Analysis in Sports, 2021, pp. 11–15.
  9. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” 2015.
  10. A. Chan, M. D. Levine, and M. Javan, “Player identification in hockey broadcast videos,” Expert Syst. Appl., vol. 165, p. 113891, 2020.
  11. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” CoRR, vol. abs/1706.03762, 2017. [Online]. Available: http://arxiv.org/abs/1706.03762
  12. S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, pp. 1735–1780, 1997.
  13. A. Radford and K. Narasimhan, “Improving language understanding by generative pre-training,” 2018. [Online]. Available: https://api.semanticscholar.org/CorpusID:49313245
  14. J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” in North American Chapter of the Association for Computational Linguistics, 2019. [Online]. Available: https://api.semanticscholar.org/CorpusID:52967399
  15. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” ArXiv, vol. abs/2010.11929, 2020.
  16. K. He, X. Chen, S. Xie, Y. Li, P. Doll’ar, and R. B. Girshick, “Masked autoencoders are scalable vision learners,” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 15 979–15 988, 2021. [Online]. Available: https://api.semanticscholar.org/CorpusID:243985980
  17. W. G. C. Bandara, N. Patel, A. Gholami, M. Nikkhah, M. Agrawal, and V. M. Patel, “Adamae: Adaptive masking for efficient spatiotemporal learning with masked autoencoders,” 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14 507–14 517, 2022. [Online]. Available: https://api.semanticscholar.org/CorpusID:253553494
  18. I. Kakogeorgiou, S. Gidaris, B. Psomas, Y. Avrithis, A. Bursuc, K. Karantzalos, and N. Komodakis, “What to hide from your students: Attention-guided masked image modeling,” in European Conference on Computer Vision, 2022. [Online]. Available: https://api.semanticscholar.org/CorpusID:247627906
  19. H. Chen, W. Zhang, Y. Wang, and X. Yang, “Improving masked autoencoders by learning where to mask,” ArXiv, vol. abs/2303.06583, 2023. [Online]. Available: https://api.semanticscholar.org/CorpusID:257496764
  20. M. Bertalmío, G. Sapiro, V. Caselles, and C. Ballester, “Image inpainting,” Proceedings of the 27th annual conference on Computer graphics and interactive techniques, 2000. [Online]. Available: https://api.semanticscholar.org/CorpusID:308278
  21. S. Osher, M. Burger, D. Goldfarb, J. Xu, and W. Yin, “An iterative regularization method for total variation-based image restoration,” Multiscale Model. Simul., vol. 4, pp. 460–489, 2005. [Online]. Available: https://api.semanticscholar.org/CorpusID:618185
  22. C. Barnes, E. Shechtman, A. Finkelstein, and D. B. Goldman, “Patchmatch: a randomized correspondence algorithm for structural image editing,” ACM Trans. Graph., vol. 28, p. 24, 2009. [Online]. Available: https://api.semanticscholar.org/CorpusID:26169625
  23. A. A. Efros and T. K. Leung, “Texture synthesis by non-parametric sampling,” Proceedings of the Seventh IEEE International Conference on Computer Vision, vol. 2, pp. 1033–1038 vol.2, 1999. [Online]. Available: https://api.semanticscholar.org/CorpusID:221583955
  24. P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol, “Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion,” J. Mach. Learn. Res., vol. 11, pp. 3371–3408, 2010. [Online]. Available: https://api.semanticscholar.org/CorpusID:17804904
  25. M. Chen, A. Radford, J. Wu, H. Jun, P. Dhariwal, D. Luan, and I. Sutskever, “Generative pretraining from pixels,” in International Conference on Machine Learning, 2020. [Online]. Available: https://api.semanticscholar.org/CorpusID:219781060
  26. H. Bao, L. Dong, and F. Wei, “Beit: Bert pre-training of image transformers,” ArXiv, vol. abs/2106.08254, 2021. [Online]. Available: https://api.semanticscholar.org/CorpusID:235436185
  27. S. Gerke, K. Müller, and R. Schäfer, “Soccer jersey number recognition using convolutional neural networks,” 2015 IEEE International Conference on Computer Vision Workshop (ICCVW), pp. 734–741, 2015.
  28. D. Bhargavi, E. P. Coyotl, and S. Gholami, “Knock, knock. who’s there? – identifying football player jersey numbers with synthetic data,” 2022.
  29. G. Jocher, A. Chaurasia, A. Stoken, J. Borovec, NanoCode012, Y. Kwon, K. Michael, TaoXie, J. Fang, imyhxy, Lorna, . Yifu), C. Wong, A. V, D. Montes, Z. Wang, C. Fati, J. Nadar, Laughing, UnglvKitDe, V. Sonck, tkianai, yxNONG, P. Skalski, A. Hogan, D. Nair, M. Strobel, and M. Jain, “ultralytics/yolov5: v7.0 - YOLOv5 SOTA Realtime Instance Segmentation,” Nov. 2022. [Online]. Available: https://doi.org/10.5281/zenodo.7347926
  30. X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai, “Deformable detr: Deformable transformers for end-to-end object detection,” ArXiv, vol. abs/2010.04159, 2020. [Online]. Available: https://api.semanticscholar.org/CorpusID:222208633
  31. C. Weng, B. Curless, P. P. Srinivasan, J. T. Barron, and I. Kemelmacher-Shlizerman, “Humannerf: Free-viewpoint rendering of moving people from monocular video,” CoRR, vol. abs/2201.04127, 2022. [Online]. Available: https://arxiv.org/abs/2201.04127
  32. S. Hu and Z. Liu, “Gauhuman: Articulated gaussian splatting from monocular human videos,” 2023.
  33. J. Bright, Y. Chen, and J. Zelek, “Mitigating motion blur for robust 3d baseball player pose modeling for pitch analysis,” in Proceedings of the 6th International Workshop on Multimedia Content Analysis in Sports, ser. MMSports ’23.   New York, NY, USA: Association for Computing Machinery, 2023, p. 63–71. [Online]. Available: https://doi.org/10.1145/3606038.3616163
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Bavesh Balaji (5 papers)
  2. Jerrin Bright (9 papers)
  3. Sirisha Rambhatla (27 papers)
  4. Yuhao Chen (84 papers)
  5. Alexander Wong (230 papers)
  6. John Zelek (31 papers)
  7. David A Clausi (26 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.