Continual Action Assessment via Task-Consistent Score-Discriminative Feature Distribution Modeling (2309.17105v5)
Abstract: Action Quality Assessment (AQA) is a task that tries to answer how well an action is carried out. While remarkable progress has been achieved, existing works on AQA assume that all the training data are visible for training at one time, but do not enable continual learning on assessing new technical actions. In this work, we address such a Continual Learning problem in AQA (Continual-AQA), which urges a unified model to learn AQA tasks sequentially without forgetting. Our idea for modeling Continual-AQA is to sequentially learn a task-consistent score-discriminative feature distribution, in which the latent features express a strong correlation with the score labels regardless of the task or action types.From this perspective, we aim to mitigate the forgetting in Continual-AQA from two aspects. Firstly, to fuse the features of new and previous data into a score-discriminative distribution, a novel Feature-Score Correlation-Aware Rehearsal is proposed to store and reuse data from previous tasks with limited memory size. Secondly, an Action General-Specific Graph is developed to learn and decouple the action-general and action-specific knowledge so that the task-consistent score-discriminative features can be better extracted across various tasks. Extensive experiments are conducted to evaluate the contributions of proposed components. The comparisons with the existing continual learning methods additionally verify the effectiveness and versatility of our approach. Data and code are available at https://github.com/iSEE-Laboratory/Continual-AQA.
- J. Carreira and A. Zisserman, “Quo vadis, action recognition? a new model and the kinetics dataset,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 6299–6308.
- C. Feichtenhofer, H. Fan, J. Malik, and K. He, “Slowfast networks for video recognition,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 6202–6211.
- G. Bertasius, H. Wang, and L. Torresani, “Is space-time attention all you need for video understanding?” in International Conference on Machine Learning, vol. 2, no. 3, 2021, p. 4.
- Z. Jin, Y. Wang, Q. Wang, Y. Shen, and H. Meng, “Ssrl: Self-supervised spatial-temporal representation learning for 3d action recognition,” IEEE Transactions on Circuits and Systems for Video Technology, 2023.
- T. Su, H. Wang, Q. Qi, L. Wang, and B. He, “Transductive learning with prior knowledge for generalized zero-shot action recognition,” IEEE Transactions on Circuits and Systems for Video Technology, 2023.
- J.-H. Pan, J. Gao, and W.-S. Zheng, “Action assessment by joint relation graphs,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 6331–6340.
- J. Xu, Y. Rao, X. Yu, G. Chen, J. Zhou, and J. Lu, “Finediving: A fine-grained dataset for procedure-aware action quality assessment,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 2949–2958.
- A. Xu, L.-A. Zeng, and W.-S. Zheng, “Likert scoring with grade decoupling for long-term action assessment,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 3232–3241.
- L.-A. Zeng, F.-T. Hong, W.-S. Zheng, Q.-Z. Yu, W. Zeng, Y.-W. Wang, and J.-H. Lai, “Hybrid dynamic-static context-aware attention network for action assessment in long videos,” in Proceedings of the 28th ACM International Conference on Multimedia, 2020, pp. 2526–2534.
- Z. Li, L. Gu, W. Wang, R. Nakamura, and Y. Sato, “Surgical skill assessment via video semantic aggregation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2022, pp. 410–420.
- D. Liu, Q. Li, T. Jiang, Y. Wang, R. Miao, F. Shan, and Z. Li, “Towards unified surgical skill assessment,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 9522–9531.
- A. Malpani, S. S. Vedula, C. C. G. Chen, and G. D. Hager, “Pairwise comparison-based objective score for automated skill assessment of segments in a surgical task,” in International Conference on Information Processing in Computer-Assisted Interventions. Springer, 2014, pp. 138–147.
- H. Doughty, D. Damen, and W. Mayol-Cuevas, “Who’s better? who’s best? pairwise deep ranking for skill determination,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 6057–6066.
- H. Doughty, W. Mayol-Cuevas, and D. Damen, “The pros and cons: Rank-aware temporal attention for skill determination in long videos,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 7862–7871.
- R. M. French, “Catastrophic forgetting in connectionist networks,” Trends in cognitive sciences, vol. 3, no. 4, pp. 128–135, 1999.
- M. McCloskey and N. J. Cohen, “Catastrophic interference in connectionist networks: The sequential learning problem,” in Psychology of Learning and Motivation. Elsevier, 1989, vol. 24, pp. 109–165.
- I. J. Goodfellow, M. Mirza, D. Xiao, A. Courville, and Y. Bengio, “An empirical investigation of catastrophic forgetting in gradient-based neural networks,” arXiv preprint arXiv:1312.6211, 2013.
- S.-A. Rebuffi, A. Kolesnikov, G. Sperl, and C. H. Lampert, “icarl: Incremental classifier and representation learning,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 2001–2010.
- S. Ebrahimi, F. Meier, R. Calandra, T. Darrell, and M. Rohrbach, “Adversarial continual learning,” in European Conference on Computer Vision. Springer, 2020, pp. 386–402.
- F. Zhu, X.-Y. Zhang, C. Wang, F. Yin, and C.-L. Liu, “Prototype augmentation and self-supervision for incremental learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 5871–5880.
- Y. Wu, Y. Chen, L. Wang, Y. Ye, Z. Liu, Y. Guo, and Y. Fu, “Large scale incremental learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 374–382.
- S. Hou, X. Pan, C. C. Loy, Z. Wang, and D. Lin, “Learning a unified classifier incrementally via rebalancing,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 831–839.
- X. Yu, Y. Rao, W. Zhao, J. Lu, and J. Zhou, “Group-aware contrastive regression for action quality assessment,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 7919–7928.
- J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska et al., “Overcoming catastrophic forgetting in neural networks,” Proceedings of the National Academy of Sciences, vol. 114, no. 13, pp. 3521–3526, 2017.
- M. Kang, J. Park, and B. Han, “Class-incremental learning by knowledge distillation with adaptive feature consolidation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 16 071–16 080.
- A. Douillard, M. Cord, C. Ollion, T. Robert, and E. Valle, “Podnet: Pooled outputs distillation for small-tasks incremental learning,” in European Conference on Computer Vision. Springer, 2020, pp. 86–102.
- Y. Tang, Z. Ni, J. Zhou, D. Zhang, J. Lu, Y. Wu, and J. Zhou, “Uncertainty-aware score distribution learning for action quality assessment,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 9839–9848.
- P. Parmar and B. T. Morris, “What and how well you performed? a multitask learning approach to action quality assessment,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 304–313.
- Y. Bai, D. Zhou, S. Zhang, J. Wang, E. Ding, Y. Guan, Y. Long, and J. Wang, “Action quality assessment with temporal parsing transformer,” in European Conference on Computer Vision. Springer, 2022, pp. 422–438.
- S. Wang, D. Yang, P. Zhai, C. Chen, and L. Zhang, “Tsa-net: Tube self-attention network for action quality assessment,” in Proceedings of the 29th ACM International Conference on Multimedia, 2021, pp. 4902–4910.
- K. Zhou, Y. Ma, H. P. H. Shum, and X. Liang, “Hierarchical graph convolutional networks for action quality assessment,” IEEE Transactions on Circuits and Systems for Video Technology, 2023.
- C. Xu, Y. Fu, B. Zhang, Z. Chen, Y.-G. Jiang, and X. Xue, “Learning to score figure skating sport videos,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 12, pp. 4578–4590, 2020.
- H. Jain, G. Harit, and A. Sharma, “Action quality assessment using siamese network-based deep metric learning,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 6, pp. 2260–2273, 2020.
- S.-J. Zhang, J.-H. Pan, J. Gao, and W.-S. Zheng, “Semi-supervised action quality assessment with self-supervised segment feature recovery,” IEEE Transactions on Circuits and Systems for Video Technology, 2022.
- P. Parmar, A. Gharat, and H. Rhodin, “Domain knowledge-informed self-supervised representations for workout form assessment,” in Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXVIII. Springer, 2022, pp. 105–123.
- S.-J. Zhang, J.-H. Pan, J. Gao, and W.-S. Zheng, “Adaptive stage-aware assessment skill transfer for skill determination,” IEEE Transactions on Multimedia, pp. 1–12, 2023.
- D. Kumaran, D. Hassabis, and J. L. McClelland, “What learning systems do intelligent agents need? complementary learning systems theory updated,” Trends in cognitive sciences, vol. 20, no. 7, pp. 512–534, 2016.
- J. L. McClelland, B. L. McNaughton, and R. C. O’Reilly, “Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory.” Psychological review, vol. 102, no. 3, p. 419, 1995.
- Z. Wang, Z. Zhang, S. Ebrahimi, R. Sun, H. Zhang, C.-Y. Lee, X. Ren, G. Su, V. Perot, J. Dy et al., “Dualprompt: Complementary prompting for rehearsal-free continual learning,” arXiv preprint arXiv:2204.04799, 2022.
- J.-H. Pan, J. Gao, and W.-S. Zheng, “Adaptive action assessment,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.
- Z. Li and D. Hoiem, “Learning without forgetting,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 12, pp. 2935–2947, 2017.
- P. Parmar and B. Morris, “Action quality assessment across multiple actions,” in 2019 IEEE Winter Conference on Applications of Computer Vision. IEEE, 2019, pp. 1468–1476.
- W. Kay, J. Carreira, K. Simonyan, B. Zhang, C. Hillier, S. Vijayanarasimhan, F. Viola, T. Green, T. Back, P. Natsev et al., “The kinetics human action video dataset,” arXiv preprint arXiv:1705.06950, 2017.
- H. Liu, K. Simonyan, and Y. Yang, “Darts: Differentiable architecture search,” arXiv preprint arXiv:1806.09055, 2018.
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
- J. Park, M. Kang, and B. Han, “Class-incremental learning for action recognition in videos,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 13 698–13 707.