Papers
Topics
Authors
Recent
Search
2000 character limit reached

Improved TokenPose with Sparsity

Published 16 Nov 2023 in cs.CV | (2311.09653v1)

Abstract: Over the past few years, the vision transformer and its various forms have gained significance in human pose estimation. By treating image patches as tokens, transformers can capture global relationships wisely, estimate the keypoint tokens by leveraging the visual tokens, and recognize the posture of the human body. Nevertheless, global attention is computationally demanding, which poses a challenge for scaling up transformer-based methods to high-resolution features. In this paper, we introduce sparsity in both keypoint token attention and visual token attention to improve human pose estimation. Experimental results on the MPII dataset demonstrate that our model has a higher level of accuracy and proved the feasibility of the method, achieving new state-of-the-art results. The idea can also provide references for other transformer-based models.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (1)

Collections

Sign up for free to add this paper to one or more collections.