Papers
Topics
Authors
Recent
Search
2000 character limit reached

Latent Visual Diffusion Reasoning with Monte Carlo Tree Search

Published 26 Jun 2026 in cs.CV | (2606.27988v1)

Abstract: Analyzing fine-grained skill activities (e.g., sports, surgery) requires not only recognizing visual patterns but also performing step-by-step visual reasoning that leads to the final judgment. While recent advances in action quality assessment have achieved remarkable progress in evaluating performance, existing models remain black boxes, where they lack the ability to explicitly reveal the reasoning processes underlying their judgments. To address this limitation, we propose Latent Visual Diffusion Reasoning (LVDR), a novel framework that integrates keypoint-guided Monte Carlo Tree Search (MCTS) to model and visualize the latent visual reasoning process. LVDR not only produces more accurate skill assessments but also uncovers the critical visual reasoning sequences that contribute to the final evaluation. Extensive experiments across four datasets spanning diverse sports and surgical domains demonstrate that LVDR achieves competitive quantitative performance while providing interpretable visual reasoning trajectories leading to the final predictions. Source codes and models can be found through the following link: https://github.com/XiruiTeng/LVDR_Official.git.

Authors (3)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.