Multi-Stage Contrastive Regression for Action Quality Assessment (2401.02841v1)
Abstract: In recent years, there has been growing interest in the video-based action quality assessment (AQA). Most existing methods typically solve AQA problem by considering the entire video yet overlooking the inherent stage-level characteristics of actions. To address this issue, we design a novel Multi-stage Contrastive Regression (MCoRe) framework for the AQA task. This approach allows us to efficiently extract spatial-temporal information, while simultaneously reducing computational costs by segmenting the input video into multiple stages or procedures. Inspired by the graph contrastive learning, we propose a new stage-wise contrastive learning loss function to enhance performance. As a result, MCoRe demonstrates the state-of-the-art result so far on the widely-adopted fine-grained AQA dataset.
- “Finediving: A fine-grained dataset for procedure-aware action quality assessment,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2022, pp. 2949–2958.
- “Group-aware contrastive regression for action quality assessment,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 7919–7928.
- “Assessing the quality of actions,” in Proceedings of the European Conference on Computer Vision. Springer, 2014, pp. 556–571.
- “Action assessment by joint relation graphs,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 6331–6340.
- “Falcons: Fast learner-grader for contorted poses in sports,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2020, pp. 900–901.
- “Hybrid dynamic-static context-aware attention network for action assessment in long videos,” in Proceedings of the 28th ACM International Conference on Multimedia, 2020, pp. 2526–2534.
- “Eagle-eye: Extreme-pose action grader using detail bird’s-eye view,” in Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2021, pp. 394–402.
- “Designing network design spaces,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 10428–10436.
- “Gate-shift networks for video action recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 1102–1111.
- “Uncertainty-aware score distribution learning for action quality assessment,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 9839–9848.
- “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
- “Quo vadis, action recognition? a new model and the kinetics dataset,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 6299–6308.
- “Learning to score olympic events,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017, pp. 20–28.
- “What and how well you performed? a multitask learning approach to action quality assessment,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 304–313.
- “Deep graph contrastive representation learning,” arXiv preprint arXiv:2006.04131, 2020.
- “Graph contrastive learning with augmentations,” Advances in Neural Information Processing Systems, vol. 33, pp. 5812–5823, 2020.
- “Graph contrastive learning with adaptive augmentation,” in Proceedings of the Web Conference, 2021, pp. 2069–2080.
- “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
- “Pairwise contrastive learning network for action quality assessment,” in Proceedings of the European Conference on Computer Vision. Springer, 2022, pp. 457–473.
- “Am i a baller? basketball performance assessment from first-person videos,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 2177–2185.
- “Learning phrase representations using rnn encoder-decoder for statistical machine translation,” arXiv preprint arXiv:1406.1078, 2014.
- “Rethinking spatiotemporal feature learning: Speed-accuracy trade-offs in video classification,” in Proceedings of the European Conference on Computer Vision, 2018, pp. 305–321.
- “Learning physically simulated tennis skills from broadcast videos,” ACM Transactions On Graphics, vol. 42, no. 4, pp. 1–14, 2023.
- “Semantics-aware spatial-temporal binaries for cross-modal video retrieval,” IEEE Transactions on Image Processing, vol. 30, pp. 2989–3004, 2021.
- “Stc-gan: Spatio-temporally coupled generative adversarial networks for predictive scene parsing,” IEEE Transactions on Image Processing, vol. 29, pp. 5420–5430, 2020.
- “Attentive relational networks for mapping images to scene graphs,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 3957–3966.
- “stagnet: An attentive semantic rnn for group activity recognition,” in Proceedings of the European Conference on Computer Vision, 2018, pp. 104–120.
- “Attention is all you need,” Advances in Neural Information Processing Systems, vol. 30, 2017.