Survey of Action Recognition, Spotting and Spatio-Temporal Localization in Soccer -- Current Trends and Research Perspectives (2309.12067v1)
Abstract: Action scene understanding in soccer is a challenging task due to the complex and dynamic nature of the game, as well as the interactions between players. This article provides a comprehensive overview of this task divided into action recognition, spotting, and spatio-temporal action localization, with a particular emphasis on the modalities used and multimodal methods. We explore the publicly available data sources and metrics used to evaluate models' performance. The article reviews recent state-of-the-art methods that leverage deep learning techniques and traditional methods. We focus on multimodal methods, which integrate information from multiple sources, such as video and audio data, and also those that represent one source in various ways. The advantages and limitations of methods are discussed, along with their potential for improving the accuracy and robustness of models. Finally, the article highlights some of the open research questions and future directions in the field of soccer action recognition, including the potential for multimodal methods to advance this field. Overall, this survey provides a valuable resource for researchers interested in the field of action scene understanding in soccer.
- S. Akan and S. Varlı. Use of deep learning in soccer videos analysis: survey. Multimedia Systems, pages 1–19, 12 2022. doi:10.1007/s00530-022-01027-0.
- Netvlad: Cnn architecture for weakly supervised place recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40:1437–1451, 2015.
- Vivit: A video vision transformer. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 6816–6826, 2021.
- Multimodal machine learning: A survey and taxonomy. IEEE transactions on pattern analysis and machine intelligence, 41(2):423–443, 2018.
- A unified taxonomy and multimodal dataset for events in invasion games. In Proceedings of the 4th International Workshop on Multimedia Content Analysis in Sports, MMSports’21, page 1–10, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9781450386708. doi:10.1145/3475722.3482792. URL https://doi.org/10.1145/3475722.3482792.
- Soft-nms — improving object detection with one line of code. 2017 IEEE International Conference on Computer Vision (ICCV), pages 5562–5570, 2017.
- Discriminative topics modelling for action feature selection and recognition. In Proceedings of the British Machine Vision Conference, pages 8.1–8.11. BMVA Press, 2010. ISBN 1-901725-40-5. doi:10.5244/C.24.8.
- Using network science to analyse football passing networks: Dynamics, space, time, and the multilayer nature of the game. Frontiers in Psychology, 9, 2018.
- Spotformer: A transformer-based framework for precise soccer action spotting. In 2022 IEEE 24th International Workshop on Multimedia Signal Processing (MMSP), pages 1–6, 2022. doi:10.1109/MMSP55362.2022.9948888.
- J. Carreira and A. Zisserman. Quo vadis, action recognition? a new model and the kinetics dataset. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6299–6308, 2017.
- A graph-based method for soccer action spotting using unsupervised player classification. In Proceedings of the 5th International ACM Workshop on Multimedia Content Analysis in Sports, MMSports ’22, page 93–102, New York, NY, USA, 2022. Association for Computing Machinery. ISBN 9781450394888. doi:10.1145/3552437.3555691. URL https://doi.org/10.1145/3552437.3555691.
- Faster-tad: Towards temporal action detection with proposal generation and classification in a unified network, 2022. URL https://arxiv.org/abs/2204.02674.
- Empirical evaluation of gated recurrent neural networks on sequence modeling. In NIPS 2014 Workshop on Deep Learning, December 2014, 2014.
- A context-aware loss function for action spotting in soccer videos. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
- Camera calibration and player localization in soccernet-v2 and investigation of their representations for action spotting. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 4532–4541, 2021.
- Scaling up soccernet with multi-view spatial localization and re-identification. Scientific Data, 9:355, 06 2022. doi:10.1038/s41597-022-01469-1.
- Very deep convolutional networks for text classification. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, pages 1107–1116, Valencia, Spain, Apr. 2017. Association for Computational Linguistics. URL https://aclanthology.org/E17-1104.
- N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), volume 1, pages 886–893 vol. 1, 2005. doi:10.1109/CVPR.2005.177.
- A. Darwish and T. El-Shabrway. Ste: Spatio-temporal encoder for action spotting in soccer videos. In Proceedings of the 5th International ACM Workshop on Multimedia Content Analysis in Sports, MMSports ’22, page 87–92, New York, NY, USA, 2022. Association for Computing Machinery. ISBN 9781450394888. doi:10.1145/3552437.3555704. URL https://doi.org/10.1145/3552437.3555704.
- Soccernet-v2: A dataset and benchmarks for holistic understanding of broadcast soccer videos. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 4503–4514, 2021. doi:10.1109/CVPRW53098.2021.00508.
- An image is worth 16x16 words: Transformers for image recognition at scale. ArXiv, abs/2010.11929, 2020.
- T. D’Orazio and M. Leo. A review of vision-based systems for soccer video analysis. Pattern Recognition, 43(8):2911–2926, 2010. ISSN 0031-3203. doi:https://doi.org/10.1016/j.patcog.2010.03.009. URL https://www.sciencedirect.com/science/article/pii/S0031320310001299.
- Event detection in soccer videos using unsupervised learning of spatio-temporal features based on pooled spatial pyramid model. Multimed. Tools Appl., 78(12):16995–17025, June 2019.
- Multiscale vision transformers. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 6804–6815, 2021.
- Holistic interaction transformer network for action detection. ArXiv, abs/2210.12686, 2022.
- Slowfast networks for video recognition. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 6201–6210, 2018.
- Sset: a dataset for shot segmentation, event detection, player tracking in soccer videos. Multimedia Tools and Applications, pages 1 – 22, 2020.
- Sharpness-aware minimization for efficiently improving generalization. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=6Tm1mposlrM.
- Transformer based multimodal scene recognition in soccer videos. In 2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW), pages 1–6, 2022. doi:10.1109/ICMEW56448.2022.9859304.
- Automatic key moment extraction and highlights generation based on comprehensive soccer video understanding. In 2020 IEEE International Conference on Multimedia and Expo Workshops (ICMEW), pages 1–6, 2020. doi:10.1109/ICMEW46912.2020.9106051.
- Soccer Game Summarization using Audio Commentary, Metadata, and Captions. In NarSUM ’22: Proceedings of the 1st Workshop on User-centric Narrative Summarization of Long Videos, pages 13–22. Association for Computing Machinery, New York, NY, USA, Oct. 2022. ISBN 978-1-45039493-2. doi:10.1145/3552463.3557019.
- Audio set: An ontology and human-labeled dataset for audio events. 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 776–780, 2017.
- S. Giancola and B. Ghanem. Temporally-aware feature pooling for action spotting in soccer broadcasts. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 4485–4494, 2021.
- Soccernet: A scalable dataset for action spotting in soccer videos. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 1792–179210, 2018. doi:10.1109/CVPRW.2018.00223.
- Ast: Audio spectrogram transformer. ArXiv, abs/2104.01778, 2021.
- Deep multimodal representation learning: A survey. IEEE Access, 7:63373–63394, 2019.
- Gta: Global temporal attention for video action understanding. In British Machine Vision Conference, 2020.
- Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2015.
- Mask r-cnn. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42:386–397, 2017.
- S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9:1735–80, 12 1997. doi:10.1162/neco.1997.9.8.1735.
- Spotting temporally precise, fine-grained events in video. In European Conference on Computer Vision, 2022.
- End-to-end soccer video scene and event classification with deep transfer learning. pages 1–4, 04 2018. doi:10.1109/ISACV.2018.8369043.
- Fuzzy rule-based reasoning approach for event detection and annotation of broadcast soccer video. Applied Soft Computing, 13(2):846–866, 2013. ISSN 1568-4946. doi:https://doi.org/10.1016/j.asoc.2012.10.007. URL https://www.sciencedirect.com/science/article/pii/S1568494612004565.
- Semantic analysis of soccer video using dynamic bayesian network. IEEE Transactions on Multimedia, 8(4):749–760, 2006. doi:10.1109/TMM.2006.876289.
- A hierarchical deep temporal model for group activity recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1971–1980, 2016. doi:10.1109/CVPR.2016.217.
- Flownet 2.0: Evolution of optical flow estimation with deep networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1647–1655, 2016.
- Event detection and recognition using hmm with whistle sounds. In 2013 International Conference on Signal-Image Technology & Internet-Based Systems, pages 14–21, 2013. doi:10.1109/SITIS.2013.14.
- Automatic soccer video event detection based on a deep neural network combined cnn and rnn. pages 490–494, 11 2016. doi:10.1109/ICTAI.2016.0081.
- Soccerdb: A large-scale database for comprehensive video understanding. MMSports ’20, page 1–8, New York, NY, USA, 2020. Association for Computing Machinery. ISBN 9781450381499. doi:10.1145/3422844.3423051. URL https://doi.org/10.1145/3422844.3423051.
- Action tubelet detector for spatio-temporal action localization. 2017 IEEE International Conference on Computer Vision (ICCV), pages 4415–4423, 2017.
- Real-time event detection in field sport videos. In Computer vision in Sports, pages 293–316. Springer, 2015.
- Soccer event detection using deep learning. arXiv preprint arXiv:2102.04331, 2021.
- Soccer video event detection using metric learning. In 2022 12th International Conference on Computer and Knowledge Engineering (ICCKE), pages 048–052, 2022. doi:10.1109/ICCKE57176.2022.9959985.
- Soccer event detection. In International Conference on Image Processing and Pattern Recognition (IPPR), pages 119–129, 04 2018a. doi:10.5121/csit.2018.80509.
- Learning deep c3d features for soccer video event detection. 2018 14th International Conference on Emerging Technologies (ICET), pages 1–6, 2018b.
- V. Khaustov and M. Mozgovoy. Recognizing events in spatiotemporal soccer data. Applied Sciences, 10(22), 2020. ISSN 2076-3417. doi:10.3390/app10228046. URL https://www.mdpi.com/2076-3417/10/22/8046.
- Y. Kim. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1746–1751, Doha, Qatar, Oct. 2014. Association for Computational Linguistics. doi:10.3115/v1/D14-1181. URL https://aclanthology.org/D14-1181.
- Pointrend: Image segmentation as rendering. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9796–9805, 2019.
- Siamese neural networks for one-shot image recognition. In ICML deep learning workshop, volume 2. Lille, 2015.
- M. H. Kolekar and S. Sengupta. Bayesian network-based customized highlight generation for broadcast soccer videos. IEEE Transactions on Broadcasting, 61(2):195–209, 2015. doi:10.1109/TBC.2015.2424011.
- You only watch once: A unified cnn architecture for real-time spatiotemporal action localization. ArXiv, abs/1911.06644, 2019.
- J. Lanagan and A. Smeaton. Using twitter to detect and tag important events in sports media. 01 2011.
- Semantic indexing of soccer audio-visual sequences: A multimodal approach based on controlled markov chains. Circuits and Systems for Video Technology, IEEE Transactions on, 14:634 – 643, 06 2004. doi:10.1109/TCSVT.2004.826751.
- Rnn fisher vectors for action recognition and image annotation. In European Conference on Computer Vision, 2015.
- A general framework for sports video summarization with its application to soccer. In 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP ’03)., volume 3, pages III–169, 2003. doi:10.1109/ICASSP.2003.1199134.
- Deepergcn: All you need to train deeper gcns. ArXiv, abs/2006.07739, 2020a.
- Actions as Moving Points, pages 68–84. 10 2020b. ISBN 978-3-030-58516-7. doi:10.1007/978-3-030-58517-4_5.
- Multisports: A multi-person video dataset of spatio-temporally localized sports actions. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 13516–13525, 2021. doi:10.1109/ICCV48922.2021.01328.
- Feature pyramid networks for object detection. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 936–944, 2016.
- Ssd: Single shot multibox detector. In European Conference on Computer Vision, 2015.
- Swin transformer: Hierarchical vision transformer using shifted windows. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 9992–10002, 2021a.
- Video swin transformer. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3192–3201, 2021b.
- Tangent bundle for human action recognition. In 2011 IEEE International Conference on Automatic Face & Gesture Recognition (FG), pages 97–102, 2011. doi:10.1109/FG.2011.5771378.
- Event detection in soccer video based on self-attention. pages 1852–1856, 12 2020. doi:10.1109/ICCC51575.2020.9344896.
- Spotting football events using two-stream convolutional neural network and dilated recurrent neural network. IEEE Access, 9:61929–61942, 2021.
- Learnable pooling with context gating for video classification. ArXiv, abs/1706.06905, 2017.
- Slicing and dicing soccer: Automatic detection of complex events from spatio-temporal data. In A. Campilho, F. Karray, and Z. Wang, editors, Image Analysis and Recognition, pages 107–121, Cham, 2020. Springer International Publishing.
- Attention bottlenecks for multimodal fusion. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 14200–14213. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper/2021/file/76ba9f564ebbc35b1014ac498fafadd0-Paper.pdf.
- A comprehensive review of computer vision in sports: Open issues, future trends and research directions. Applied Sciences, 12(9), 2022. ISSN 2076-3417. doi:10.3390/app12094429. URL https://www.mdpi.com/2076-3417/12/9/4429.
- Video transformer network. 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), pages 3156–3165, 2021.
- A novel learning-based framework for detecting interesting events in soccer videos. pages 119 – 125, 01 2009. doi:10.1109/ICVGIP.2008.71.
- Real-time detection of events in soccer videos using 3d convolutional neural networks. In 2020 IEEE International Symposium on Multimedia (ISM), pages 135–144, 2020. doi:10.1109/ISM.2020.00030.
- S. O’Hara and B. A. Draper. Scalable action recognition with a subspace forest. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pages 1210–1217, 2012. doi:10.1109/CVPR.2012.6247803.
- Multimodal feature extraction and fusion for semantic mining of soccer video: A survey. Artificial Intelligence Review, 42, 08 2012. doi:10.1007/s10462-012-9332-4.
- N. Panse and A. Mahabaleshwarkar. A dataset & methodology for computer vision based offside detection in soccer. MMSports ’20, New York, NY, USA, 2020. Association for Computing Machinery. ISBN 9781450381499. doi:10.1145/3422844.3423055. URL https://doi.org/10.1145/3422844.3423055.
- A public data set of spatio-temporal match events in soccer competitions. Sci. Data, 6(1):236, Oct. 2019.
- What is multimodality? In Proceedings of the 1st Workshop on Multimodal Semantic Representations (MMSR), pages 1–10, Groningen, Netherlands (Online), June 2021. Association for Computational Linguistics. URL https://aclanthology.org/2021.mmsr-1.1.
- Networks as a novel tool for studying team ball sports as complex social systems. Journal of science and medicine in sport, 14 2:170–6, 2011.
- A survey on event recognition and summarization in football videos. https://www.ijsr.net/get_abstract.php?paper_id=OCT14705, 2014. Accessed: 2023-2-23.
- F. Perronnin and D. Larlus. Fisher vectors meet neural networks: A hybrid classification architecture. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3743–3752, 2015.
- Lost in quantization: Improving particular object retrieval in large scale image databases. 2008 IEEE Conference on Computer Vision and Pattern Recognition, pages 1–8, 2008.
- Research on event detection of soccer video based on hidden markov model. In 2010 International Conference on Computational and Information Sciences, pages 865–868, 2010. doi:10.1109/ICCIS.2010.215.
- stagnet: An attentive semantic rnn for group activity and individual action recognition. IEEE Transactions on Circuits and Systems for Video Technology, 30:549–565, 2020.
- Soccer video event detection by fusing middle level visual semantics of an event clip. In G. Qiu, K. M. Lam, H. Kiya, X.-Y. Xue, C.-C. J. Kuo, and M. S. Lew, editors, Advances in Multimedia Information Processing - PCM 2010, pages 439–451, Berlin, Heidelberg, 2010. Springer Berlin Heidelberg. ISBN 978-3-642-15696-0.
- Hmm based soccer video event detection using enhanced mid-level semantic. Multimedia Tools and Applications - MTA, 60, 09 2011. doi:10.1007/s11042-011-0817-y.
- Robust speech recognition via large-scale weak supervision, 2022. URL https://arxiv.org/abs/2212.04356.
- Designing network design spaces. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10425–10433, 2020.
- A survey of video based action recognition in sports. Indonesian Journal of Electrical Engineering and Computer Science, 11:987–993, 09 2018. doi:10.11591/ijeecs.v11.i3.pp987-993.
- Automatic summarization of soccer highlights using audio-visual descriptors. Springerplus, 4(1):301, June 2015.
- flownet2-pytorch: Pytorch implementation of flownet 2.0: Evolution of optical flow estimation with deep networks. https://github.com/NVIDIA/flownet2-pytorch, 2017.
- Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39:1137–1149, 2015.
- Recognizing compound events in spatio-temporal football data. In IoTBD, pages 27–35, 2016.
- Action mach a spatio-temporal maximum average correlation height filter for action recognition. In 2008 IEEE Conference on Computer Vision and Pattern Recognition, pages 1–8, 2008. doi:10.1109/CVPR.2008.4587727.
- Automated event detection and classification in soccer: The potential of using multiple modalities. Mach. Learn. Knowl. Extr., 3:1030–1054, 2021.
- U-net: Convolutional networks for biomedical image segmentation. In N. Navab, J. Hornegger, W. M. Wells, and A. F. Frangi, editors, Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, pages 234–241, Cham, 2015. Springer International Publishing. ISBN 978-3-319-24574-4.
- D. Sadlier and N. O’Connor. Event detection in field sports video using audio-visual features and a support vector machine. IEEE Transactions on Circuits and Systems for Video Technology, 15(10):1225–1233, 2005. doi:10.1109/TCSVT.2005.854237.
- End-to-end camera calibration for broadcast videos. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13624–13633, 2020. doi:10.1109/CVPR42600.2020.01364.
- Action spotting in soccer videos using multiple scene encoders. In 2022 26th International Conference on Pattern Recognition (ICPR), pages 3183–3189, 2022. doi:10.1109/ICPR56361.2022.9956667.
- K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556, 2014.
- Online real-time multiple spatiotemporal action localisation and prediction. 2017 IEEE International Conference on Computer Vision (ICCV), pages 3657–3666, 2016.
- Spatio-temporal action detection under large motion. 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 5998–6007, 2022.
- Temporally precise action spotting in soccer videos using dense detection anchors. ArXiv, abs/2205.10450, 2022.
- J. V. B. Soares and A. Shah. Action spotting using dense detection anchors revisited: Submission to the soccernet challenge 2022, 2022. URL https://arxiv.org/abs/2206.07846.
- W. Song and H. Hagras. A type-2 fuzzy logic system for event detection in soccer videos. In 2017 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), pages 1–6, 2017. doi:10.1109/FUZZ-IEEE.2017.8015426.
- K. Soomro and A. Zamir. Action recognition in realistic sports videos. Advances in Computer Vision and Pattern Recognition, 71:181–208, 01 2014. doi:10.1007/978-3-319-09396-3_9.
- Ucf101: A dataset of 101 human actions classes from videos in the wild. ArXiv, abs/1212.0402, 2012.
- Making offensive play predictable-using a graph convolutional network to understand defensive performance in soccer. 2021.
- Gate-shift networks for video action recognition. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1099–1108, 2019.
- Going for goal: A resource for grounded football commentaries. ArXiv, abs/2211.04534, 2022.
- Deep fisher kernels – end to end learning of the fisher kernel gmm parameters. 2014 IEEE Conference on Computer Vision and Pattern Recognition, pages 1402–1409, 2014.
- M. Tan and Q. Le. Efficientnetv2: Smaller models and faster training. In International conference on machine learning, pages 10096–10106. PMLR, 2021.
- M. Tan and Q. V. Le. Efficientnet: Rethinking model scaling for convolutional neural networks. In K. Chaudhuri and R. Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, volume 97 of Proceedings of Machine Learning Research, pages 6105–6114. PMLR, 2019. URL http://proceedings.mlr.press/v97/tan19a.html.
- Autohighlight : Automatic highlights detection and segmentation in soccer matches. In 2018 IEEE International Conference on Big Data (Big Data), pages 4619–4624, 2018. doi:10.1109/BigData.2018.8621906.
- Event detection and summarization in soccer videos using bayesian network and copula. IEEE Transactions on Circuits and Systems for Video Technology, 24(2):291–304, 2014. doi:10.1109/TCSVT.2013.2243640.
- Computer vision for sports: Current applications and research topics. Computer Vision and Image Understanding, 159:3–18, 2017. ISSN 1077-3142. doi:https://doi.org/10.1016/j.cviu.2017.04.011. URL https://www.sciencedirect.com/science/article/pii/S1077314217300711. Computer Vision in Sports.
- Rms-net: Regression and masking for soccer event spotting. 2020 25th International Conference on Pattern Recognition (ICPR), pages 7699–7706, 2021.
- Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. ArXiv, abs/2203.12602, 2022.
- Learning spatiotemporal features with 3d convolutional networks. pages 4489–4497, 12 2015. doi:10.1109/ICCV.2015.510.
- Video classification with channel-separated convolutional networks. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 5551–5560, 2019.
- Goal!! event detection in sports video. Electronic Imaging, 2017:15–20, 01 2017. doi:10.2352/ISSN.2470-1173.2017.16.CVAS-344.
- Football action recognition using hierarchical lstm. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 155–163, 2017. doi:10.1109/CVPRW.2017.25.
- Selective search for object recognition. International Journal of Computer Vision, 104:154–171, 2013.
- B. Vanderplaetse and S. Dupont. Improved soccer action spotting using both audio and video streams. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 3921–3931, 2020.
- Attention is all you need. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf.
- Event detection in coarsely annotated sports videos via parallel multi receptive field 1d convolutions. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 3856–3865, 2020.
- Automatic event detection in football using tracking data. Sports Engineering, 25, 09 2022. doi:10.1007/s12283-022-00381-6.
- Sports highlight detection from keyword sequences using hmm. pages 599–602, 01 2004. doi:10.1109/ICME.2004.1394263.
- Non-local neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7794–7803, 2018a.
- Dynamic graph cnn for learning on point clouds. ACM Transactions on Graphics (TOG), 38:1 – 12, 2018b.
- Learning to track for spatio-temporal action localization. 2015 IEEE International Conference on Computer Vision (ICCV), pages 3164–3172, 2015.
- A survey on video action recognition in sports: Datasets, methods and applications. IEEE Transactions on Multimedia, pages 1–25, 2022. doi:10.1109/TMM.2022.3232034.
- Action recognition using context and appearance distribution features. In CVPR 2011, pages 489–496, 2011. doi:10.1109/CVPR.2011.5995624.
- Aggregated residual transformations for deep neural networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5987–5995, 2016.
- W. Xie and M. Tong. A novel framework for soccer goal detection based on semantic rule. Journal of Electronics (China), 28(4-6):670–674, 2011.
- Z. Xiong. Audio-visual sports highlights extraction using coupled hidden markov models. Pattern Anal. Appl., 8(1-2):62–71, Sept. 2005.
- Audio events detection based highlights extraction from baseball, golf and soccer games in a unified framework. In 2003 International Conference on Multimedia and Expo. ICME ’03. Proceedings (Cat. No.03TH8698), volume 3, pages III–401, 2003. doi:10.1109/ICME.2003.1221333.
- Using webcast text for semantic event detection in broadcast sports video. IEEE Transactions on Multimedia, 10(7):1342–1355, 2008. doi:10.1109/TMM.2008.2004912.
- Creating audio keywords for event detection in soccer video. In 2003 International Conference on Multimedia and Expo. ICME ’03. Proceedings (Cat. No.03TH8698), volume 2, pages II–281, 2003. doi:10.1109/ICME.2003.1221608.
- Hmm-based audio keyword generation. In Advances in Multimedia Information Processing-PCM 2004: 5th Pacific Rim Conference on Multimedia, Tokyo, Japan, November 30-December 3, 2004. Proceedings, Part III 5, pages 566–574. Springer, 2005.
- Temporal pyramid network for action recognition. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 588–597, 2020.
- Hierarchical attention networks for document classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1480–1489, San Diego, California, June 2016. Association for Computational Linguistics. doi:10.18653/v1/N16-1174. URL https://aclanthology.org/N16-1174.
- Exciting event detection in broadcast soccer video with mid-level description and incremental learning. In Proceedings of the 13th Annual ACM International Conference on Multimedia, MULTIMEDIA ’05, page 455–458, New York, NY, USA, 2005. Association for Computing Machinery. ISBN 1595930442. doi:10.1145/1101149.1101250. URL https://doi.org/10.1145/1101149.1101250.
- Comprehensive dataset of broadcast soccer videos. In 2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), pages 418–423, 2018. doi:10.1109/MIPR.2018.00090.
- Soccer video event detection based on deep learning. In I. Kompatsiaris, B. Huet, V. Mezaris, C. Gurrin, W.-H. Cheng, and S. Vrochidis, editors, MultiMedia Modeling, pages 377–389, Cham, 2019. Springer International Publishing. ISBN 978-3-030-05716-9.
- E. Zhang and Y. Zhang. Eleven Point Precision-recall Curve, pages 981–982. Springer US, Boston, MA, 2009. ISBN 978-0-387-39940-9. doi:10.1007/978-0-387-39940-9_481. URL https://doi.org/10.1007/978-0-387-39940-9_481.
- mixup: Beyond empirical risk minimization. ArXiv, abs/1710.09412, 2017.
- Event detection in soccer videos using shot focus identification. In 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR), pages 341–345, 2015. doi:10.1109/ACPR.2015.7486522.
- 3d human pose estimation with spatial and temporal transformers. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 11636–11645, 2021.
- Feature combination meets attention: Baidu soccer embeddings and transformer based temporal detection. ArXiv, abs/2106.14447, 2021.
- A transformer-based system for action spotting in soccer videos. In Proceedings of the 5th International ACM Workshop on Multimedia Content Analysis in Sports, MMSports ’22, page 103–109, New York, NY, USA, 2022. Association for Computing Machinery. ISBN 9781450394888. doi:10.1145/3552437.3555693. URL https://doi.org/10.1145/3552437.3555693.
- Eco: Efficient convolutional network for online video understanding. In Proceedings of the European conference on computer vision (ECCV), pages 695–712, 2018.