Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LOGO: A Long-Form Video Dataset for Group Action Quality Assessment (2404.05029v1)

Published 7 Apr 2024 in cs.CV

Abstract: Action quality assessment (AQA) has become an emerging topic since it can be extensively applied in numerous scenarios. However, most existing methods and datasets focus on single-person short-sequence scenes, hindering the application of AQA in more complex situations. To address this issue, we construct a new multi-person long-form video dataset for action quality assessment named LOGO. Distinguished in scenario complexity, our dataset contains 200 videos from 26 artistic swimming events with 8 athletes in each sample along with an average duration of 204.2 seconds. As for richness in annotations, LOGO includes formation labels to depict group information of multiple athletes and detailed annotations on action procedures. Furthermore, we propose a simple yet effective method to model relations among athletes and reason about the potential temporal logic in long-form videos. Specifically, we design a group-aware attention module, which can be easily plugged into existing AQA methods, to enrich the clip-wise representations based on contextual group information. To benchmark LOGO, we systematically conduct investigations on the performance of several popular methods in AQA and action segmentation. The results reveal the challenges our dataset brings. Extensive experiments also show that our approach achieves state-of-the-art on the LOGO dataset. The dataset and code will be released at \url{https://github.com/shiyi-zh0408/LOGO }.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (58)
  1. Mindspore. https://www.mindspore.cn/.
  2. Social scene understanding: End-to-end multi-person action localization and collective activity recognition. In CVPR, pages 4315–4324, 2017.
  3. Am i a baller? basketball performance assessment from first-person videos. In ICCV, pages 2177–2185, 2017.
  4. Quo vadis, action recognition? a new model and the kinetics dataset. In CVPR, pages 6299–6308, 2017.
  5. Action segmentation with joint self-supervised temporal domain adaptation. In CVPR, pages 9454–9463, 2020.
  6. A unified framework for multi-target tracking and collective activity recognition. In ECCV, pages 215–230, 2012.
  7. What are they doing?: Collective activity classification using spatio-temporal relationship among people. In ICCV, pages 1282–1289, 2009.
  8. Learning context for collective activity recognition. In CVPR, pages 3273–3280, 2011.
  9. Imagenet: A large-scale hierarchical image database. In CVPR, pages 248–255, 2009.
  10. The pros and cons: Rank-aware temporal attention for skill determination in long videos. In CVPR, pages 7862–7871, 2019.
  11. An asymmetric modeling for action assessment. In ECCV, pages 222–238, 2020.
  12. Actor-transformers for group activity recognition. In CVPR, pages 839–848, 2020.
  13. Andrew S Gordon. Automated video assessment of human performance. In AI-ED, volume 2, 1995.
  14. Mask r-cnn. In ICCV, pages 2961–2969, 2017.
  15. Hierarchical relational networks for group activity recognition and retrieval. In ECCV, pages 721–736, 2018.
  16. A hierarchical deep temporal model for group activity recognition. In CVPR, pages 1971–1980, 2016.
  17. Alleviating over-segmentation errors by detecting action boundaries. In WACV, pages 2322–2331, 2021.
  18. Trajectory based assessment of coordinated human activity. In ICVS, pages 534–543, 2003.
  19. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
  20. Social roles in hierarchical models for human activity recognition. In CVPR, pages 1354–1361, 2012.
  21. Discriminative latent models for recognizing contextual group activities. TPAMI, 34(8):1549–1562, 2011.
  22. Groupformer: Group activity recognition with clustered spatial-temporal transformer. In ICCV, pages 13668–13677, 2021.
  23. Ms-tcn++: Multi-stage temporal convolutional network for action segmentation. TPAMI, pages 1–1, 2020.
  24. Scoringnet: Learning key fragment for action quality assessment with ranking loss in skilled sports. In ACCV, pages 149–164, 2018.
  25. Multisports: A multi-person video dataset of spatio-temporally localized sports actions. In ICCV, pages 13536–13545, 2021.
  26. Fsd-10: a dataset for competitive sports content analysis. arXiv preprint arXiv:2002.03312, 2020.
  27. Video swin transformer. In CVPR, pages 3202–3211, 2022.
  28. Pairwise comparison-based objective score for automated skill assessment of segments in a surgical task. In IPCAI, pages 138–147, 2014.
  29. Action quality assessment across multiple actions. In WACV, pages 1468–1476, 2019.
  30. What and how well you performed? a multitask learning approach to action quality assessment. In CVPR, pages 304–313, 2019.
  31. Learning to score olympic events. In CVPR, pages 20–28, 2017.
  32. Automatic evaluation of organized basketball activity using bayesian networks. 2007.
  33. Assessing the quality of actions. In ECCV, pages 556–571, 2014.
  34. stagnet: An attentive semantic rnn for group activity recognition. In ECCV, pages 101–117, 2018.
  35. Detecting events and key actors in multi-person videos. In CVPR, pages 3043–3053, 2016.
  36. Social role discovery in human events. In CVPR, pages 2475–2482, 2013.
  37. Faster r-cnn: Towards real-time object detection with region proposal networks. NIPS, 28, 2015.
  38. Finegym: A hierarchical video dataset for fine-grained action understanding. In CVPR, pages 2616–2625, 2020.
  39. Video based assessment of osats using sequential motion textures. 2014.
  40. Joint inference of groups, events and human roles in aerial videos. In CVPR, pages 4576–4584, 2015.
  41. Rethinking the inception architecture for computer vision. In CVPR, pages 2818–2826, 2016.
  42. Coin: A large-scale dataset for comprehensive instructional video analysis. In CVPR, pages 1207–1216, 2019.
  43. Flag3d: A 3d fitness activity dataset with language instruction. arXiv preprint arXiv:2212.04638, 2022.
  44. Uncertainty-aware score distribution learning for action quality assessment. In CVPR, pages 9839–9848, 2020.
  45. Pose is all you need: The pose only group activity recognition system (pogars). arXiv preprint arXiv:2108.04186, 2021.
  46. Dynamical regularity for action analysis. In BMVC, pages 67–1, 2015.
  47. Learning actor relation graphs for group activity recognition. In CVPR, pages 9964–9974, 2019.
  48. Likert scoring with grade decoupling for long-term action assessment. In CVPR, pages 3232–3241, 2022.
  49. Learning to score figure skating sport videos. TCSVT, 30(12):4578–4590, 2019.
  50. Finediving: A fine-grained dataset for procedure-aware action quality assessment. In CVPR, pages 2949–2958, 2022.
  51. Asformer: Transformer for action segmentation. arXiv preprint arXiv:2110.08568, 2021.
  52. Group-aware contrastive regression for action quality assessment. In ICCV, pages 7919–7928, 2021.
  53. Hybrid dynamic-static context-aware attention network for action assessment in long videos. In ACM MM, pages 2526–2534, 2020.
  54. Dino: Detr with improved denoising anchor boxes for end-to-end object detection. arXiv preprint arXiv:2203.03605, 2022.
  55. Video-based motion expertise analysis in simulation-based surgical training using hierarchical dirichlet process hidden markov model. In MMAR, pages 19–24, 2011.
  56. Relative hidden markov models for video-based evaluation of motion skills in surgical training. TPAMI, 37(6):1206–1218, 2014.
  57. Automated assessment of surgical skills using frequency analysis. In MICCAI, pages 430–438, 2015.
  58. Video and accelerometer-based motion analysis for automated surgical skills assessment. IJCARS, 13(3):443–455, 2018.
Citations (17)

Summary

We haven't generated a summary for this paper yet.