CoFInAl: Enhancing Action Quality Assessment with Coarse-to-Fine Instruction Alignment (2404.13999v1)
Abstract: Action Quality Assessment (AQA) is pivotal for quantifying actions across domains like sports and medical care. Existing methods often rely on pre-trained backbones from large-scale action recognition datasets to boost performance on smaller AQA datasets. However, this common strategy yields suboptimal results due to the inherent struggle of these backbones to capture the subtle cues essential for AQA. Moreover, fine-tuning on smaller datasets risks overfitting. To address these issues, we propose Coarse-to-Fine Instruction Alignment (CoFInAl). Inspired by recent advances in LLM tuning, CoFInAl aligns AQA with broader pre-trained tasks by reformulating it as a coarse-to-fine classification task. Initially, it learns grade prototypes for coarse assessment and then utilizes fixed sub-grade prototypes for fine-grained assessment. This hierarchical approach mirrors the judging process, enhancing interpretability within the AQA framework. Experimental results on two long-term AQA datasets demonstrate CoFInAl achieves state-of-the-art performance with significant correlation gains of 5.49% and 3.55% on Rhythmic Gymnastics and Fis-V, respectively. Our code is available at https://github.com/ZhouKanglei/CoFInAl_AQA.
- Action quality assessment with temporal parsing transformer. In ECCV, pages 422–438, 2022.
- Quo vadis, action recognition? a new model and the kinetics dataset. In CVPR, pages 6299–6308, 2017.
- Pecop: Parameter efficient continual pretraining for action quality assessment. In WACV, pages 42–52, 2024.
- Sedskill: Surgical events driven method for skill assessment from thoracoscopic surgical videos. In MICCAI, pages 35–45, 2023.
- Exploring deep neural networks via layer-peeled model: Minority collapse in imbalanced training. PNAS, 118(43):e2103091118, 2021.
- On the role of neural collapse in transfer learning. arXiv preprint arXiv:2112.15121, 2021.
- Dissecting supervised contrastive learning. In ICML, pages 3821–3830, 2021.
- Neural collapse under mse loss: Proximity to and dynamics on the central path. arXiv preprint arXiv:2106.02073, 2021.
- Deep residual learning for image recognition. In CVPR, pages 770–778, 2016.
- The kinetics human action video dataset. arXiv preprint arXiv:1705.06950, 2017.
- Video swin transformer. In CVPR, pages 3202–3211, 2022.
- A figure skating jumping dataset for replay-guided action quality assessment. In ACM MM, pages 2437–2445, 2023.
- Action assessment by joint relation graphs. In ICCV, pages 6331–6340, 2019.
- Prevalence of neural collapse during the terminal phase of deep learning training. PNAS, 117(40):24652–24663, 2020.
- Action quality assessment across multiple actions. In WACV, pages 1468–1476, 2019.
- What and how well you performed? a multitask learning approach to action quality assessment. In CVPR, pages 304–313, 2019.
- Learning to score olympic events. In CVPR workshops, pages 20–28, 2017.
- Assessing the quality of actions. In ECCV, pages 556–571, 2014.
- Can you put it all together: Evaluating conversational agents’ ability to blend skills. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 2021–2030, Online, July 2020. Association for Computational Linguistics.
- Uncertainty-aware score distribution learning for action quality assessment. In CVPR, pages 9839–9848, 2020.
- Extended unconstrained features model for exploring deep neural collapse. In ICML, pages 21478–21505, 2022.
- Learning spatiotemporal features with 3d convolutional networks. In ICCV, pages 4489–4497, 2015.
- Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
- Towards accurate and interpretable surgical skill assessment: A video-based method incorporating recognized surgical gestures and skill levels. In MICCAI, pages 668–678, 2020.
- Tsa-net: Tube self-attention network for action quality assessment. In ACM MM, pages 4902–4910, 2021.
- A survey of video-based action quality assessment. In 2021 International Conference on Networking Systems of AI (INSAI), pages 1–9, 2021.
- Neural collapse inspired attraction–repulsion-balanced loss for imbalanced learning. Neurocomputing, 527:60–70, 2023.
- Learning to score figure skating sport videos. IEEE TCSVT, 30(12):4578–4590, 2019.
- Likert scoring with grade decoupling for long-term action assessment. In CVPR, pages 3232–3241, 2022.
- Do we really need a learnable classifier at the end of deep neural network? arXiv e-prints, pages arXiv–2203, 2022.
- Neural collapse inspired feature-classifier alignment for few-shot class incremental learning. arXiv preprint arXiv:2302.03004, 2023.
- Group-aware contrastive regression for action quality assessment. In ICCV, pages 7919–7928, 2021.
- Hybrid dynamic-static context-aware attention network for action assessment in long videos. In ACM MM, pages 2526–2534, 2020.
- Understanding imbalanced semantic segmentation through neural collapse. In CVPR, pages 19550–19560, 2023.
- On the optimization landscape of neural collapse under mse loss: Global optimality with unconstrained features. In ICML, pages 27179–27202, 2022.
- Lima: Less is more for alignment. arXiv preprint arXiv:2305.11206, 2023.
- A video-based augmented reality system for human-in-the-loop muscle strength assessment of juvenile dermatomyositis. IEEE TVCG, 29(5):2456–2466, 2023.
- Hierarchical graph convolutional networks for action quality assessment. IEEE TCSVT, 2023.
- Magr: Manifold-aligned graph regularization for continual action quality assessment. arXiv preprint arXiv:2403.04398, 2024.
- A geometric analysis of neural collapse with unconstrained features. NeurIPS, 34:29820–29834, 2021.