Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Human Pose Descriptions and Subject-Focused Attention for Improved Zero-Shot Transfer in Human-Centric Classification Tasks (2403.06904v3)

Published 11 Mar 2024 in cs.CV

Abstract: We present a novel LLM-based pipeline for creating contextual descriptions of human body poses in images using only auxiliary attributes. This approach facilitates the creation of the MPII Pose Descriptions dataset, which includes natural language annotations for 17,367 images containing people engaged in 410 distinct activities. We demonstrate the effectiveness of our pose descriptions in enabling zero-shot human-centric classification using CLIP. Moreover, we introduce the FocusCLIP framework, which incorporates Subject-Focused Attention (SFA) in CLIP for improved text-to-image alignment. Our models were pretrained on the MPII Pose Descriptions dataset and their zero-shot performance was evaluated on five unseen datasets covering three tasks. FocusCLIP outperformed the baseline CLIP model, achieving an average accuracy increase of 8.61\% (33.65\% compared to CLIP's 25.04\%). Notably, our approach yielded improvements of 3.98\% in activity recognition, 14.78\% in age classification, and 7.06\% in emotion recognition. These results highlight the potential of integrating detailed pose descriptions and subject-level guidance into general pretraining frameworks for enhanced performance in downstream tasks.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Muhammad Saif Ullah Khan (10 papers)
  2. Muhammad Ferjad Naeem (21 papers)
  3. Federico Tombari (214 papers)
  4. Luc Van Gool (569 papers)
  5. Didier Stricker (144 papers)
  6. Muhammad Zeshan Afzal (35 papers)
Citations (3)