Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Open-Vocabulary Animal Keypoint Detection with Semantic-feature Matching (2310.05056v4)

Published 8 Oct 2023 in cs.CV

Abstract: Current image-based keypoint detection methods for animal (including human) bodies and faces are generally divided into full-supervised and few-shot class-agnostic approaches. The former typically relies on laborious and time-consuming manual annotations, posing considerable challenges in expanding keypoint detection to a broader range of keypoint categories and animal species. The latter, though less dependent on extensive manual input, still requires necessary support images with annotation for reference during testing. To realize zero-shot keypoint detection without any prior annotation, we introduce the Open-Vocabulary Keypoint Detection (OVKD) task, which is innovatively designed to use text prompts for identifying arbitrary keypoints across any species. In pursuit of this goal, we have developed a novel framework named Open-Vocabulary Keypoint Detection with Semantic-feature Matching (KDSM). This framework synergistically combines vision and LLMs, creating an interplay between language features and local keypoint visual features. KDSM enhances its capabilities by integrating Domain Distribution Matrix Matching (DDMM) and other special modules, such as the Vision-Keypoint Relational Awareness (VKRA) module, improving the framework's generalizability and overall performance.Our comprehensive experiments demonstrate that KDSM significantly outperforms the baseline in terms of performance and achieves remarkable success in the OVKD task.Impressively, our method, operating in a zero-shot fashion, still yields results comparable to state-of-the-art few-shot species class-agnostic keypoint detection methods.We will make the source code publicly accessible.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Hao Zhang (948 papers)
  2. Lumin Xu (13 papers)
  3. Shenqi Lai (7 papers)
  4. Wenqi Shao (89 papers)
  5. Nanning Zheng (146 papers)
  6. Ping Luo (340 papers)
  7. Yu Qiao (563 papers)
  8. Kaipeng Zhang (73 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.