Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

GPT as Psychologist? Preliminary Evaluations for GPT-4V on Visual Affective Computing (2403.05916v2)

Published 9 Mar 2024 in cs.CV and cs.AI

Abstract: Multimodal LLMs (MLLMs) are designed to process and integrate information from multiple sources, such as text, speech, images, and videos. Despite its success in language understanding, it is critical to evaluate the performance of downstream tasks for better human-centric applications. This paper assesses the application of MLLMs with 5 crucial abilities for affective computing, spanning from visual affective tasks and reasoning tasks. The results show that \gpt has high accuracy in facial action unit recognition and micro-expression detection while its general facial expression recognition performance is not accurate. We also highlight the challenges of achieving fine-grained micro-expression recognition and the potential for further study and demonstrate the versatility and potential of \gpt for handling advanced tasks in emotion recognition and related fields by integrating with task-related agents for more complex tasks, such as heart rate estimation through signal processing. In conclusion, this paper provides valuable insights into the potential applications and challenges of MLLMs in human-centric computing. Our interesting examples are at https://github.com/EnVision-Research/GPT4Affectivity.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
  2. Deepphys: Video-based physiological measurement using convolutional attention networks. In Proceedings of the european conference on computer vision (ECCV), pages 349–365, 2018.
  3. Deep structure inference network for facial action unit recognition. In Proceedings of the european conference on computer vision (ECCV), pages 298–313, 2018.
  4. Biomechanics-guided facial action unit detection through force modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8694–8703, 2023.
  5. Compound facial expressions of emotion. Proceedings of the national academy of sciences, 111(15):E1454–E1462, 2014.
  6. Facial action coding system. Environmental Psychology & Nonverbal Behavior, 1978.
  7. Towards revealing the mystery behind chain of thought: a theoretical perspective. Advances in Neural Information Processing Systems, 36, 2024.
  8. Audio-visual deception detection: Dolos dataset and parameter-efficient crossmodal learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 22135–22145, 2023.
  9. Challenges and prospects of visual contactless physiological monitoring in clinical study. NPJ Digital Medicine, 6(1):231, 2023.
  10. Facial action unit detection with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7680–7689, 2021.
  11. Semantic relationships guided representation learning for facial action unit recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 8594–8601, 2019.
  12. Videochat: Chat-centric video understanding. arXiv preprint arXiv:2305.06355, 2023.
  13. Eac-net: Deep nets with enhancing and cropping for facial action unit detection. IEEE transactions on pattern analysis and machine intelligence, 40(11):2583–2596, 2018.
  14. Micro-expression action unit detection with spatial and channel attention. Neurocomputing, 436:221–231, 2021a.
  15. Micro-expression action unit detection with dual-view attentive similarity-preserving knowledge distillation. In 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021), pages 01–08. IEEE, 2021b.
  16. imigue: An identity-free video dataset for micro-gesture understanding and emotion analysis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10631–10642, 2021.
  17. Multi-scale promoted self-adjusting correlation learning for facial action unit detection. arXiv preprint arXiv:2308.07770, 2023.
  18. Dual-gan: Joint bvp and noise modeling for remote physiological measurement. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12404–12413, 2021.
  19. Neuron structure modeling for generalizable remote physiological measurement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18589–18599, 2023.
  20. Learning multi-dimensional edge feature-based au relation graph for facial action unit recognition. international joint conference on artificial intelligence, 2022.
  21. Valley: Video assistant with large language model enhanced ability. arXiv preprint arXiv:2306.07207, 2023.
  22. Video anomaly detection and explanation via large language models. arXiv preprint arXiv:2401.05702, 2024.
  23. Video-chatgpt: Towards detailed video understanding via large vision and language models. arXiv preprint arXiv:2306.05424, 2023.
  24. Disfa: A spontaneous facial action intensity database. IEEE Transactions on Affective Computing, 4(2):151–160, 2013.
  25. Local relationship learning with person-specific shape regularization for facial action unit detection. In Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pages 11917–11926, 2019a.
  26. Rhythmnet: End-to-end heart rate estimation from face via spatial-temporal representation. IEEE Transactions on Image Processing, 29:2409–2423, 2019b.
  27. A review of affective computing: From unimodal analysis to multimodal fusion. Information fusion, 37:98–125, 2017.
  28. Multimodal deception detection using real-life trial data. IEEE Transactions on Affective Computing, 13(1):306–319, 2020.
  29. Reliable crowdsourcing and deep locality-preserving learning for unconstrained facial expression recognition. IEEE Transactions on Image Processing, 28(1):356–370, 2018.
  30. Deep adaptive attention for joint facial action unit detection and face alignment. In Proceedings of the European conference on computer vision (ECCV), pages 705–720, 2018.
  31. Facial action unit detection using attention and relation learning. IEEE transactions on affective computing, 13(3):1274–1289, 2019.
  32. Piap-df: Pixel-interested and anti person-specific facial action unit detection net with discrete feedback learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 12899–12908, 2021.
  33. Self-supervised representation learning framework for remote physiological measurement using spatiotemporal augmentation loss. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 2431–2439, 2022a.
  34. A novel algorithm for remote photoplethysmography: Spatial subspace rotation. IEEE transactions on biomedical engineering, 63(9):1974–1984, 2015a.
  35. Algorithmic principles of remote ppg. IEEE Transactions on Biomedical Engineering, 64(7):1479–1491, 2016.
  36. Lbp with six intersection points: Reducing redundant information in lbp-top for micro-expression recognition. In Computer Vision–ACCV 2014: 12th Asian Conference on Computer Vision, Singapore, Singapore, November 1-5, 2014, Revised Selected Papers, Part I 12, pages 525–537. Springer, 2015b.
  37. A systematic review on affective computing: Emotion models, databases, and recent advances. Information Fusion, 83:19–52, 2022b.
  38. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022.
  39. On the road with gpt-4v (ision): Early explorations of visual-language model on autonomous driving. arXiv preprint arXiv:2311.05332, 2023.
  40. Indonlu: Benchmark and resources for evaluating indonesian natural language understanding. arXiv preprint arXiv:2009.05387, 2020.
  41. Clue: A chinese language understanding evaluation benchmark. arXiv preprint arXiv:2004.05986, 2020.
  42. Casme ii: An improved spontaneous micro-expression database and the baseline evaluation. PloS one, 9(1):e86041, 2014.
  43. mplug-owl: Modularization empowers large language models with multimodality. arXiv preprint arXiv:2304.14178, 2023.
  44. Facial-video-based physiological signal measurement: Recent advances and affective applications. IEEE Signal Processing Magazine, 38(6):50–58, 2021.
  45. Physformer: Facial video-based physiological measurement with temporal difference transformer. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4186–4196, 2022.
  46. Physformer++: Facial video-based physiological measurement with slowfast temporal difference transformer. International Journal of Computer Vision, 131(6):1307–1330, 2023.
  47. Facial micro-expressions: An overview. Proceedings of the IEEE, 2023.
  48. Deep region and multi-label learning for facial action unit detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3391–3399, 2016.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (15)
  1. Hao Lu (99 papers)
  2. Xuesong Niu (16 papers)
  3. Jiyao Wang (18 papers)
  4. Yin Wang (58 papers)
  5. Qingyong Hu (29 papers)
  6. Jiaqi Tang (20 papers)
  7. Yuting Zhang (30 papers)
  8. Kaishen Yuan (9 papers)
  9. Bin Huang (56 papers)
  10. Zitong Yu (119 papers)
  11. Dengbo He (15 papers)
  12. Shuiguang Deng (45 papers)
  13. Hao Chen (1006 papers)
  14. Yingcong Chen (35 papers)
  15. Shiguang Shan (136 papers)
Citations (9)