Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 172 tok/s
Gemini 2.5 Pro 46 tok/s Pro
GPT-5 Medium 27 tok/s Pro
GPT-5 High 32 tok/s Pro
GPT-4o 99 tok/s Pro
Kimi K2 203 tok/s Pro
GPT OSS 120B 447 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Enhancing Micro Gesture Recognition for Emotion Understanding via Context-aware Visual-Text Contrastive Learning (2405.01885v1)

Published 3 May 2024 in cs.CV

Abstract: Psychological studies have shown that Micro Gestures (MG) are closely linked to human emotions. MG-based emotion understanding has attracted much attention because it allows for emotion understanding through nonverbal body gestures without relying on identity information (e.g., facial and electrocardiogram data). Therefore, it is essential to recognize MG effectively for advanced emotion understanding. However, existing Micro Gesture Recognition (MGR) methods utilize only a single modality (e.g., RGB or skeleton) while overlooking crucial textual information. In this letter, we propose a simple but effective visual-text contrastive learning solution that utilizes text information for MGR. In addition, instead of using handcrafted prompts for visual-text contrastive learning, we propose a novel module called Adaptive prompting to generate context-aware prompts. The experimental results show that the proposed method achieves state-of-the-art performance on two public datasets. Furthermore, based on an empirical study utilizing the results of MGR for emotion understanding, we demonstrate that using the textual results of MGR significantly improves performance by 6%+ compared to directly using video as input.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. N. Fragopanagos and J. G. Taylor, “Emotion recognition in human–computer interaction,” Neural Networks, vol. 18, no. 4, pp. 389–405, 2005.
  2. C. Tsiourti, A. Weiss, K. Wac, and M. Vincze, “Multimodal integration of emotional signals from voice, body, and context: Effects of (in) congruence on emotion recognition and attitudes towards robots,” International Journal of Social Robotics, vol. 11, pp. 555–573, 2019.
  3. M. Chen, X. He, J. Yang, and H. Zhang, “3-d convolutional recurrent neural networks with attention model for speech emotion recognition,” IEEE Signal Processing Letters, vol. 25, no. 10, pp. 1440–1444, 2018.
  4. A. Mollahosseini, B. Hasani, and M. H. Mahoor, “Affectnet: A database for facial expression, valence, and arousal computing in the wild,” IEEE Transactions on Affective Computing, vol. 10, no. 1, pp. 18–31, 2017.
  5. M. Sun, J. Li, H. Feng, W. Gou, H. Shen, J. Tang, Y. Yang, and J. Ye, “Multi-modal fusion using spatio-temporal and static features for group emotion recognition,” in International Conference on Multimodal Interaction, 2020, pp. 835–840.
  6. Y. Yuan, J. Zeng, and S. Shan, “Describe your facial expressions by linking image encoders and large language models,” in British Machine Vision Conference, 2023.
  7. H. Chen, G. Wang, G. Zhang, P. Zhang, and H. Yang, “Clecg: A novel contrastive learning framework for electrocardiogram arrhythmia classification,” IEEE Signal Processing Letters, vol. 28, pp. 1993–1997, 2021.
  8. W. Mellouk and W. Handouzi, “Cnn-lstm for automatic emotion recognition using contactless photoplythesmographic signals,” Biomedical Signal Processing and Control, vol. 85, p. 104907, 2023.
  9. P. Allan, “Body language, how to read others’ thoughts by their gestures,” 1995.
  10. H. Aviezer, Y. Trope, and A. Todorov, “Body cues, not facial expressions, discriminate between intense positive and negative emotions,” Science, vol. 338, no. 6111, pp. 1225–1229, 2012.
  11. B. De Gelder, A. W. de Borst, and R. Watson, “The perception of emotion in body expressions,” Wiley Interdisciplinary Reviews: Cognitive Science, vol. 6, no. 2, pp. 149–158, 2015.
  12. J. Carreira and A. Zisserman, “Quo vadis, action recognition? a new model and the kinetics dataset,” in IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 6299–6308.
  13. J. Lin, C. Gan, and S. Han, “Tsm: Temporal shift module for efficient video understanding,” in IEEE/CVF International Conference on Computer Vision, 2019, pp. 7083–7093.
  14. Z. Yu, B. Zhou, J. Wan, P. Wang, H. Chen, X. Liu, S. Z. Li, and G. Zhao, “Searching multi-rate and multi-modal temporal enhanced networks for gesture recognition,” IEEE Transactions on Image Processing, vol. 30, pp. 5626–5640, 2021.
  15. S. Yan, Y. Xiong, and D. Lin, “Spatial temporal graph convolutional networks for skeleton-based action recognition,” in AAAI conference on artificial intelligence, vol. 32, no. 1, 2018.
  16. Z. Liu, H. Zhang, Z. Chen, Z. Wang, and W. Ouyang, “Disentangling and unifying graph convolutions for skeleton-based action recognition,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 143–152.
  17. W. Peng, J. Shi, and G. Zhao, “Spatial temporal graph deconvolutional network for skeleton-based human action recognition,” IEEE Signal Processing Letters, vol. 28, pp. 244–248, 2021.
  18. X. Liu, H. Shi, X. Hong, H. Chen, D. Tao, and G. Zhao, “3d skeletal gesture recognition via hidden states exploration,” IEEE Transactions on Image Processing, vol. 29, pp. 4583–4597, 2020.
  19. X. Liu and G. Zhao, “3d skeletal gesture recognition via discriminative coding on time-warping invariant riemannian trajectories,” IEEE Transactions on Multimedia, vol. 23, pp. 1841–1854, 2020.
  20. X. Liu, H. Shi, H. Chen, Z. Yu, X. Li, and G. Zhao, “imigue: An identity-free video dataset for micro-gesture understanding and emotion analysis,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 10 631–10 642.
  21. R. Gao, X. Liu, J. Yang, and H. Yue, “Cdclr: Clip-driven contrastive learning for skeleton-based action recognition,” in IEEE International Conference on Visual Communications and Image Processing.   IEEE, 2022, pp. 1–5.
  22. A. Shah, H. Chen, H. Shi, and G. Zhao, “Efficient dense-graph convolutional network with inductive prior augmentations for unsupervised micro-gesture recognition,” in International Conference on Pattern Recognition.   IEEE, 2022, pp. 2686–2692.
  23. D. Tran, H. Wang, L. Torresani, J. Ray, Y. LeCun, and M. Paluri, “A closer look at spatiotemporal convolutions for action recognition,” in IEEE conference on Computer Vision and Pattern Recognition, 2018, pp. 6450–6459.
  24. V. Sanh, L. Debut, J. Chaumond, and T. Wolf, “Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter,” arXiv preprint arXiv:1910.01108, 2019.
  25. G. E. Hinton and S. Roweis, “Stochastic neighbor embedding,” Advances in Neural Information Processing Systems, vol. 15, 2002.
  26. A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark et al., “Learning transferable visual models from natural language supervision,” in International Conference on Machine Learning.   PMLR, 2021, pp. 8748–8763.
  27. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in Neural Information Processing Systems, vol. 30, 2017.
  28. Y. Huang, H. Wen, L. Qing, R. Jin, and L. Xiao, “Emotion recognition based on body and context fusion in the wild,” in IEEE/CVF International Conference on Computer Vision, 2021, pp. 3609–3617.
  29. T. Keshari and S. Palaniswamy, “Emotion recognition using feature-level fusion of facial expressions and body gestures,” in International Conference on Communication and Electronics Systems.   IEEE, 2019, pp. 1184–1189.
  30. I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv preprint arXiv:1711.05101, 2017.
  31. H. Chen, H. Shi, X. Liu, X. Li, and G. Zhao, “Smg: A micro-gesture dataset towards spontaneous body gestures for emotional stress state analysis,” International Journal of Computer Vision, vol. 131, no. 6, pp. 1346–1366, 2023.
  32. H. Chen, X. Liu, X. Li, H. Shi, and G. Zhao, “Analyze spontaneous gestures for emotional stress state recognition: A micro-gesture dataset and analysis with deep learning,” in IEEE International Conference on Automatic Face & Gesture Recognition.   IEEE, 2019, pp. 1–8.
  33. K. Su, X. Liu, and E. Shlizerman, “Predict & cluster: Unsupervised skeleton based action recognition,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 9631–9640.
  34. W. Yuan, S. He, and J. Dou, “Mstcn-vae: An unsupervised learning method for micro gesture recognition based on skeleton modality,” in IJCAI-MIGA Workshop & Challenge on Micro-gesture Analysis for Hidden Emotion Understanding.   Redaktion Sun SITE, 2023.
  35. A. Shah, H. Chen, and G. Zhao, “Representation learning for topology-adaptive micro-gesture recognition and analysis,” in IJCAI-MIGA Workshop & Challenge on Micro-gesture Analysis for Hidden Emotion Understanding.   Redaktion Sun SITE, 2023.
  36. K. Cheng, Y. Zhang, X. He, W. Chen, J. Cheng, and H. Lu, “Skeleton-based action recognition with shift graph convolutional network,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 183–192.
  37. M. Xu, M. Gao, Y.-T. Chen, L. S. Davis, and D. J. Crandall, “Temporal recurrent networks for online action detection,” in IEEE/CVF International Conference on Computer Vision, 2019, pp. 5532–5541.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (3)

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 tweet and received 0 likes.

Upgrade to Pro to view all of the tweets about this paper: