Fine-grained Action Analysis: A Multi-modality and Multi-task Dataset of Figure Skating (2307.02730v3)
Abstract: The fine-grained action analysis of the existing action datasets is challenged by insufficient action categories, low fine granularities, limited modalities, and tasks. In this paper, we propose a Multi-modality and Multi-task dataset of Figure Skating (MMFS) which was collected from the World Figure Skating Championships. MMFS, which possesses action recognition and action quality assessment, captures RGB, skeleton, and is collected the score of actions from 11671 clips with 256 categories including spatial and temporal labels. The key contributions of our dataset fall into three aspects as follows. (1) Independently spatial and temporal categories are first proposed to further explore fine-grained action recognition and quality assessment. (2) MMFS first introduces the skeleton modality for complex fine-grained action quality assessment. (3) Our multi-modality and multi-task dataset encourage more action analysis models. To benchmark our dataset, we adopt RGB-based and skeleton-based baseline methods for action recognition and action quality assessment.
- Watch this! observed tool use affects perceived distance. Psychonomic Bulletin and Review, 19(2):177–183, 2012.
- Activitynet: A large-scale video benchmark for human activity understanding. In Proceedings of the ieee conference on computer vision and pattern recognition, pages 961–970, 2015.
- A short note about kinetics-600. arXiv preprint arXiv:1808.01340, 2018.
- Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset, pages 4724–4733. IEEE Conference on Computer Vision and Pattern Recognition. 2017.
- Channel-wise topology refinement graph convolution for skeleton-based action recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 13359–13368, 2021.
- Higherhrnet: Scale-aware representation learning for bottom-up human pose estimation. In CVPR, 2020.
- FineGym: A Hierarchical Video Dataset for Fine-Grained Action Understanding. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020.
- Intra- and Inter-Action Understanding via Temporal Action Parsing. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020.
- Revisiting skeleton-based action recognition. arXiv preprint arXiv:2104.13586, 2021.
- Actions as space-time shapes. IEEE transactions on pattern analysis and machine intelligence, 29(12):2247–2253, 2007.
- The ”something something” video database for learning and evaluating visual common sense, pages 5843–5851. IEEE International Conference on Computer Vision. 2017.
- Human actions analysis: Templates generation, matching and visualization applied to motion capture of highly-skilled karate athletes. Sensors, 17(11), 2017.
- P-cnn: Part-based convolutional neural networks for fine-grained visual categorization. IEEE transactions on pattern analysis and machine intelligence, 2019.
- Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 1725–1732, 2014.
- Large-scale Video Classification with Convolutional Neural Networks, pages 1725–1732. IEEE Conference on Computer Vision and Pattern Recognition. 2014.
- HMDB: A Large Video Database for Human Motion Recognition, pages 2556–2563. IEEE International Conference on Computer Vision. 2011.
- Leveraging Weak Semantic Relevance for Complex Video Event Classification, pages 3667–3676. IEEE International Conference on Computer Vision. 2017.
- Multisports: A multi-person video dataset of spatio-temporally localized sports actions. arXiv preprint arXiv:2105.07404, 2021.
- Tsm: Temporal shift module for efficient video understanding. In Proceedings of the IEEE International Conference on Computer Vision, pages 7083–7093, 2019.
- Ntu rgb+d 120: A large-scale benchmark for 3d human activity understanding. Ieee Transactions on Pattern Analysis and Machine Intelligence, 42(10):2684–2701, 2020.
- Fsd-10: A fine-grained classification dataset for figure skating. Neurocomputing, 413:360–367, 2020.
- Identity preserve transform: Understand what activity classification models have learnt. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 8–9, 2020.
- 3D Human Sensing, Action and Emotion Recognition in Robot Assisted Therapy of Children with Autism, pages 2158–2167. IEEE Conference on Computer Vision and Pattern Recognition. 2018.
- Moments in time dataset: One million videos for event understanding. Ieee Transactions on Pattern Analysis and Machine Intelligence, 42(2):502–508, 2020.
- P. Parmar and B. Morris. Action quality assessment across multiple actions. In 2019 IEEE winter conference on applications of computer vision (WACV), pages 1468–1476. IEEE, 2019.
- P. Parmar and B. T. Morris. What and how well you performed? a multitask learning approach to action quality assessment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 304–313, 2019.
- P. Parmar and B. Tran Morris. Learning to score olympic events. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pages 20–28, 2017.
- A. Piergiovanni and M. Ryoo. Avid dataset: Anonymized videos from diverse countries. Advances in Neural Information Processing Systems, 33, 2020.
- Fine-grained Activity Recognition in Baseball Videos, pages 1821–1829. IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. 2018.
- Assessing the quality of actions. In European Conference on Computer Vision, pages 556–571. Springer, 2014.
- Action mach a spatio-temporal maximum average correlation height filter for action recognition. In 2008 IEEE conference on computer vision and pattern recognition, pages 1–8. IEEE, 2008.
- Recognizing human actions: A local SVM approach, pages 32–36. International Conference on Pattern Recognition. 2004.
- Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 12026–12035, 2019.
- H.-C. Shih. A survey of content-aware video analysis for sports. Ieee Transactions on Circuits and Systems for Video Technology, 28(5):1212–1231, 2018.
- Constructing stronger and faster baselines for skeleton-based action recognition. IEEE transactions on pattern analysis and machine intelligence, 45(2):1474–1488, 2022.
- Ucf101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402, 2012.
- Taichi: A fine-grained action recognition dataset. In Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval, pages 429–433.
- WiFinger: Leveraging Commodity WiFi for Fine-grained Finger Gesture Recognition. Mobihoc ’16: Proceedings of the 17th Acm International Symposium on Mobile Ad Hoc Networking and Computing. 2016.
- Aist dance video database: Multi-genre, multi-dancer, and multi-camera database for dance information processing. In ISMIR, pages 501–510.
- Football Action Recognition using Hierarchical LSTM, pages 155–163. IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. 2017.
- Temporal segment networks for action recognition in videos. IEEE transactions on pattern analysis and machine intelligence, 41(11):2740–2755, 2018.
- Fine-grained action recognition on a novel basketball dataset. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing. 2020.
- Finediving: A fine-grained dataset for procedure-aware action quality assessment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2949–2958, 2022.
- Spatial temporal graph convolutional networks for skeleton-based action recognition. arXiv preprint arXiv:1801.07455, 2018.
- RESOUND: Towards Action Recognition Without Representation Bias. Computer Vision - ECCV 2018. 15th European Conference. Proceedings: Lecture Notes in Computer Science. 2018.
- Group-aware contrastive regression for action quality assessment. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7919–7928, 2021.
- Auto-encoding score distribution regression for action quality assessment. arXiv preprint arXiv:2111.11029, 2021.
- Pan: Towards fast action recognition via learning persistence of appearance. arXiv preprint arXiv:2008.03462, 2020.
- Sheng-Lan Liu (2 papers)
- Yu-Ning Ding (2 papers)
- Gang Yan (33 papers)
- Si-Fan Zhang (2 papers)
- Jin-Rong Zhang (2 papers)
- Wen-Yue Chen (1 paper)
- Xue-Hai Xu (1 paper)