FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality Assessment (2204.03646v1)

Published 7 Apr 2022 in cs.CV

Abstract: Most existing action quality assessment methods rely on the deep features of an entire video to predict the score, which is less reliable due to the non-transparent inference process and poor interpretability. We argue that understanding both high-level semantics and internal temporal structures of actions in competitive sports videos is the key to making predictions accurate and interpretable. Towards this goal, we construct a new fine-grained dataset, called FineDiving, developed on diverse diving events with detailed annotations on action procedures. We also propose a procedure-aware approach for action quality assessment, learned by a new Temporal Segmentation Attention module. Specifically, we propose to parse pairwise query and exemplar action instances into consecutive steps with diverse semantic and temporal correspondences. The procedure-aware cross-attention is proposed to learn embeddings between query and exemplar steps to discover their semantic, spatial, and temporal correspondences, and further serve for fine-grained contrastive regression to derive a reliable scoring mechanism. Extensive experiments demonstrate that our approach achieves substantial improvements over state-of-the-art methods with better interpretability. The dataset and code are available at \url{https://github.com/xujinglin/FineDiving}.

Citations (72)

View on Semantic Scholar

Summary

The paper introduces the FineDiving dataset with detailed action annotations and a procedure-aware method leveraging temporal segmentation to improve action quality assessment transparency and accuracy.
The proposed method uses a Temporal Segmentation Attention module and fine-grained contrastive regression to process segmented video steps for more reliable action scoring.
This approach aims to make AI-based action quality assessment more interpretable and reliable, potentially aiding athletes and informing future AI judging systems.

FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality Assessment

The paper "FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality Assessment" presents a novel approach to enhancing action quality assessment (AQA) by utilizing fine-grained data and procedure-aware methodologies. This work is predicated on the construction and utilization of the FineDiving dataset, designed explicitly for evaluating and demystifying action quality in competitive sports videos, particularly diving.

Dataset and Motivation

The FineDiving dataset is distinguished by its detailed annotations for both semantic and temporal structures. Each diving action is decomposed into consecutive steps, supported by two levels of annotations: action type and sub-action type. This vinvitation to fine-grained attention marks a departure from previous datasets which required simpler and more generalized annotations, thus opening new avenues to investigate nuances in action procedures and quality.

A key motivation behind this work is to address the limitations of existing AQA approaches, which predominantly rely on deep video features without adequately representing the intricacies of action procedures. Traditional methods often yield opaque interpretations of athletes’ performances, which lack transparent and reliable explanations of action quality scores. By coupling semantic knowledge with temporal segmentation, FineDiving aims to enhance the transparency and interpretability of AQA systems.

Methodology

This paper introduces a novel procedure-aware method, leveraging a Temporal Segmentation Attention (TSA) module, to enhance AQA scores’ accuracy and comprehensibility. The TSA module processes video inputs by segmenting them into procedural steps based on detected transitions. It identifies and ranks these steps by comparing them with exemplar instances, thereby enabling a more structured and semantically rich approach to action representation.

The approach incorporates procedure-aware cross-attention within its architecture, which enables a learning process that affluently captures semantic, spatial, and temporal correspondences. Fine-grained contrastive regression is performed on these embeddings to quantify the qualitative differences and predict action scores more reliably.

The strengths of the methodology are supported by experimental comparisons that demonstrate significant improvements in the performance metrics. The method achieves higher Spearman’s rank correlation and lower relative $\ell_2$ -distance over state-of-the-art techniques without sacrificing interpretability.

Implications and Future Work

From a theoretical perspective, the FineDiving dataset and the associated methodology herald a significant step towards more interpretable AI systems in the sports domain. It demonstrates that integrating detailed annotations and attention mechanisms can elevate the performance and transparency of action quality assessments.

Practically, the insights gleaned from the FineDiving approach could inform the development of AI judges for competitive sports, reducing subjectivity in scores and aiding athletes in performance analysis. Future research could explore expanding such detailed annotations across broader sports datasets, potentially covering a wider array of sports beyond diving.

In conclusion, the paper contributes a critical discourse to the field of sports video analysis by establishing a bridge between fine-grained action understanding and practical AQA applications. This work sets a precedent for the direction of research in interpretable and reliable action quality assessment, and with the dataset openly available, it offers a pivotal resource for subsequent studies in this field.

PDF Markdown

Related Papers

GitHub

GitHub - xujinglin/FineDiving: FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality Assessment (138 stars)