Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multi-Stage Contrastive Regression for Action Quality Assessment (2401.02841v1)

Published 5 Jan 2024 in cs.CV

Abstract: In recent years, there has been growing interest in the video-based action quality assessment (AQA). Most existing methods typically solve AQA problem by considering the entire video yet overlooking the inherent stage-level characteristics of actions. To address this issue, we design a novel Multi-stage Contrastive Regression (MCoRe) framework for the AQA task. This approach allows us to efficiently extract spatial-temporal information, while simultaneously reducing computational costs by segmenting the input video into multiple stages or procedures. Inspired by the graph contrastive learning, we propose a new stage-wise contrastive learning loss function to enhance performance. As a result, MCoRe demonstrates the state-of-the-art result so far on the widely-adopted fine-grained AQA dataset.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. “Finediving: A fine-grained dataset for procedure-aware action quality assessment,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2022, pp. 2949–2958.
  2. “Group-aware contrastive regression for action quality assessment,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 7919–7928.
  3. “Assessing the quality of actions,” in Proceedings of the European Conference on Computer Vision. Springer, 2014, pp. 556–571.
  4. “Action assessment by joint relation graphs,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 6331–6340.
  5. “Falcons: Fast learner-grader for contorted poses in sports,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2020, pp. 900–901.
  6. “Hybrid dynamic-static context-aware attention network for action assessment in long videos,” in Proceedings of the 28th ACM International Conference on Multimedia, 2020, pp. 2526–2534.
  7. “Eagle-eye: Extreme-pose action grader using detail bird’s-eye view,” in Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2021, pp. 394–402.
  8. “Designing network design spaces,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 10428–10436.
  9. “Gate-shift networks for video action recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 1102–1111.
  10. “Uncertainty-aware score distribution learning for action quality assessment,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 9839–9848.
  11. “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
  12. “Quo vadis, action recognition? a new model and the kinetics dataset,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 6299–6308.
  13. “Learning to score olympic events,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017, pp. 20–28.
  14. “What and how well you performed? a multitask learning approach to action quality assessment,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 304–313.
  15. “Deep graph contrastive representation learning,” arXiv preprint arXiv:2006.04131, 2020.
  16. “Graph contrastive learning with augmentations,” Advances in Neural Information Processing Systems, vol. 33, pp. 5812–5823, 2020.
  17. “Graph contrastive learning with adaptive augmentation,” in Proceedings of the Web Conference, 2021, pp. 2069–2080.
  18. “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
  19. “Pairwise contrastive learning network for action quality assessment,” in Proceedings of the European Conference on Computer Vision. Springer, 2022, pp. 457–473.
  20. “Am i a baller? basketball performance assessment from first-person videos,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 2177–2185.
  21. “Learning phrase representations using rnn encoder-decoder for statistical machine translation,” arXiv preprint arXiv:1406.1078, 2014.
  22. “Rethinking spatiotemporal feature learning: Speed-accuracy trade-offs in video classification,” in Proceedings of the European Conference on Computer Vision, 2018, pp. 305–321.
  23. “Learning physically simulated tennis skills from broadcast videos,” ACM Transactions On Graphics, vol. 42, no. 4, pp. 1–14, 2023.
  24. “Semantics-aware spatial-temporal binaries for cross-modal video retrieval,” IEEE Transactions on Image Processing, vol. 30, pp. 2989–3004, 2021.
  25. “Stc-gan: Spatio-temporally coupled generative adversarial networks for predictive scene parsing,” IEEE Transactions on Image Processing, vol. 29, pp. 5420–5430, 2020.
  26. “Attentive relational networks for mapping images to scene graphs,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 3957–3966.
  27. “stagnet: An attentive semantic rnn for group activity recognition,” in Proceedings of the European Conference on Computer Vision, 2018, pp. 104–120.
  28. “Attention is all you need,” Advances in Neural Information Processing Systems, vol. 30, 2017.
Citations (2)

Summary

We haven't generated a summary for this paper yet.