Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
131 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MAGIC-TBR: Multiview Attention Fusion for Transformer-based Bodily Behavior Recognition in Group Settings (2309.10765v1)

Published 19 Sep 2023 in cs.CV, cs.HC, and cs.MM

Abstract: Bodily behavioral language is an important social cue, and its automated analysis helps in enhancing the understanding of artificial intelligence systems. Furthermore, behavioral language cues are essential for active engagement in social agent-based user interactions. Despite the progress made in computer vision for tasks like head and body pose estimation, there is still a need to explore the detection of finer behaviors such as gesturing, grooming, or fumbling. This paper proposes a multiview attention fusion method named MAGIC-TBR that combines features extracted from videos and their corresponding Discrete Cosine Transform coefficients via a transformer-based approach. The experiments are conducted on the BBSI dataset and the results demonstrate the effectiveness of the proposed feature fusion with multiview attention. The code is available at: https://github.com/surbhimadan92/MAGIC-TBR

Definition Search Book Streamline Icon: https://streamlinehq.com
References (23)
  1. Bodily behaviors in social interaction: Novel annotations and state-of-the-art evaluation. In Proceedings of the 30th ACM International Conference on Multimedia. 70–79.
  2. Is space-time attention all you need for video understanding?. In ICML, Vol. 2. 4.
  3. Yuanjun Xiong D. L. Yue Zhao. 2019. “Mmaction. https://github.com/open-mmlab/mmaction,2
  4. Mohamed Hamid. 2016. DCT-based image feature extraction and its application in image self-recovery and image watermarking. Ph. D. Dissertation. Concordia University, Stanford, CA, USA. Advisor(s) Yao, Andrew C. AAT 8506171.
  5. Can body expressions contribute to automatic depression analysis?. In 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG). IEEE, 1–7.
  6. Attention-Augmented Electromagnetic Representation of Sign Language for Human-Computer Interaction in Deaf-and-Mute Community. In 2021 IEEE USNC-URSI Radio Science Meeting (Joint with AP-S Symposium). IEEE, 47–48.
  7. Employing Social Gaze and Speaking Activity for Automatic Determination of the Extraversion Trait. In Intl Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction. https://doi.org/10.1145/1891903.1891913
  8. Video swin transformer. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 3202–3211.
  9. Explainable Human-centered Traits from Head Motion and Facial Expression Dynamics. arXiv preprint arXiv:2302.09817 (2023).
  10. Head matters: explainable human-centered trait prediction from head motion dynamics. In Proceedings of the 2021 International Conference on Multimodal Interaction. 435–443.
  11. Fatik Baran Mandal. 2014. Nonverbal communication in humans. Journal of human behavior in the social environment 24, 4 (2014), 417–421.
  12. Botirova Hakima Maxanov Javoxir. 2023. BODY LANGUAGE AND VOICE DEAVELOPING. JOURNAL OF FOREIGN LANGUAGES AND LINGUISTICS 6, 1 (2023).
  13. MultiMediate ’23: Engagement Estimation and Bodily Behaviour Recognition in Social Interactions. In Proceedings of the 31st ACM International Conference on Multimedia. https://doi.org/10.1145/3581783.3613851
  14. Detecting low rapport during natural interactions in small groups from non-verbal behaviour. In 23rd International Conference on Intelligent User Interfaces. 153–164.
  15. Image enhancement based on discrete cosine transforms (DCT) and discrete wavelet transform (DWT): a review. In IOP Conference Series: Materials Science and Engineering, Vol. 557. IOP Publishing, 012027.
  16. A Transformer Based Approach for Activity Detection. In Proceedings of the 30th ACM International Conference on Multimedia (Lisboa, Portugal) (MM ’22). Association for Computing Machinery, New York, NY, USA, 7155–7159.
  17. Multichannel attention network for analyzing visual behavior in public speaking. In 2018 ieee winter conference on applications of computer vision (wacv). IEEE, 476–484.
  18. Lindsey W Smith and Roberto A Delgado. 2015. Body language: The interplay between positional behavior and gestural signaling in the genus Pan and its implications for language evolution. American Journal of Physical Anthropology 157, 4 (2015), 592–602.
  19. On the Relationship between Head Pose, Social Attention and Personality Prediction for Unstructured and Dynamic Group Interactions. In ACM on Int’l Conference on Multimodal Interaction. 3–10. https://doi.org/10.1145/2522848.2522862
  20. Gizem Öneri Uzun. 2020. A review of communication, body language and communication conflict. Int J Psychosoc Rehabilit 24 (2020), 13.
  21. Joint Estimation of Human Pose and Conversational Groups from Social Scenes. Int’l Journal on Computer Vision 126, 2-4 (2018), 410–429. https://doi.org/10.1007/s11263-017-1026-6
  22. Yingen Xiong and Francis Quek. 2006. Hand motion gesture frequency properties and multimodal discourse analysis. International Journal of Computer Vision 69 (2006), 353–371.
  23. Learning Video Representations from Large Language Models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6586–6597.
Citations (2)

Summary

We haven't generated a summary for this paper yet.