Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
133 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ExpertAF: Expert Actionable Feedback from Video (2408.00672v3)

Published 1 Aug 2024 in cs.CV

Abstract: Feedback is essential for learning a new skill or improving one's current skill-level. However, current methods for skill-assessment from video only provide scores or compare demonstrations, leaving the burden of knowing what to do differently on the user. We introduce a novel method to generate actionable feedback (AF) from video of a person doing a physical activity, such as basketball or soccer. Our method takes a video demonstration and its accompanying 3D body pose and generates (1) free-form expert commentary describing what the person is doing well and what they could improve, and (2) a visual expert demonstration that incorporates the required corrections. We show how to leverage Ego-Exo4D's [29] videos of skilled activity and expert commentary together with a strong LLM to create a weakly-supervised training dataset for this task, and we devise a multimodal video-LLM to infer coaching feedback. Our method is able to reason across multi-modal input combinations to output full spectrum, actionable coaching-expert commentary, expert video retrieval, and expert pose generation-outperforming strong vision-LLMs on both established metrics and human preference studies.

Summary

We haven't generated a summary for this paper yet.