Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Robust Light-Weight Facial Affective Behavior Recognition with CLIP (2403.09915v2)

Published 14 Mar 2024 in cs.CV

Abstract: Human affective behavior analysis aims to delve into human expressions and behaviors to deepen our understanding of human emotions. Basic expression categories (EXPR) and Action Units (AUs) are two essential components in this analysis, which categorize emotions and break down facial movements into elemental units, respectively. Despite advancements, existing approaches in expression classification and AU detection often necessitate complex models and substantial computational resources, limiting their applicability in everyday settings. In this work, we introduce the first lightweight framework adept at efficiently tackling both expression classification and AU detection. This framework employs a frozen CLIP image encoder alongside a trainable multilayer perceptron (MLP), enhanced with Conditional Value at Risk (CVaR) for robustness and a loss landscape flattening strategy for improved generalization. Experimental results on the Aff-wild2 dataset demonstrate superior performance in comparison to the baseline while maintaining minimal computational demands, offering a practical solution for affective behavior analysis. The code is available at https://github.com/Purdue-M2/Affective_Behavior_Analysis_M2_PURDUE

Definition Search Book Streamline Icon: https://streamlinehq.com
References (34)
  1. E. B. Prince, K. B. Martin, D. S. Messinger, and M. Allen, “Facial action coding system,” Environmental Psychology & Nonverbal Behavior, 2015.
  2. D. Kollias, P. Tzirakis, A. Cowen, S. Zafeiriou, C. Shao, and G. Hu, “The 6th affective behavior analysis in-the-wild (abaw) competition,” arXiv preprint arXiv:2402.19344, 2024.
  3. D. Kollias, P. Tzirakis, A. Baird, A. Cowen, and S. Zafeiriou, “Abaw: Valence-arousal estimation, expression recognition, action unit detection & emotional reaction intensity estimation challenges,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5888–5897, 2023.
  4. D. Kollias, “Multi-label compound expression recognition: C-expr database & network,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5589–5598, 2023.
  5. D. Kollias, “Abaw: Learning from synthetic data & multi-task learning challenges,” in European Conference on Computer Vision, pp. 157–172, Springer, 2023.
  6. D. Kollias, “Abaw: Valence-arousal estimation, expression recognition, action unit detection & multi-task learning challenges,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2328–2336, 2022.
  7. D. Kollias and S. Zafeiriou, “Analysing affective behavior in the second abaw2 competition,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3652–3660, 2021.
  8. D. Kollias, A. Schulc, E. Hajiyev, and S. Zafeiriou, “Analysing affective behavior in the first abaw 2020 competition,” in 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020)(FG), pp. 794–800.
  9. D. Kollias, V. Sharmanska, and S. Zafeiriou, “Distribution matching for heterogeneous multi-task learning: a large-scale face study,” arXiv preprint arXiv:2105.03790, 2021.
  10. D. Kollias and S. Zafeiriou, “Affect analysis in-the-wild: Valence-arousal, expressions, action units and a unified framework,” arXiv preprint arXiv:2103.15792, 2021.
  11. D. Kollias and S. Zafeiriou, “Expression, affect, action unit recognition: Aff-wild2, multi-task learning and arcface,” arXiv preprint arXiv:1910.04855, 2019.
  12. D. Kollias, V. Sharmanska, and S. Zafeiriou, “Face behavior a la carte: Expressions, affect and action units in a single network,” arXiv preprint arXiv:1910.11111, 2019.
  13. D. Kollias, P. Tzirakis, M. A. Nicolaou, A. Papaioannou, G. Zhao, B. Schuller, I. Kotsia, and S. Zafeiriou, “Deep affect prediction in-the-wild: Aff-wild database and challenge, deep architectures, and beyond,” International Journal of Computer Vision, pp. 1–23, 2019.
  14. S. Zafeiriou, D. Kollias, M. A. Nicolaou, A. Papaioannou, G. Zhao, and I. Kotsia, “Aff-wild: Valence and arousal ‘in-the-wild’challenge,” in Computer Vision and Pattern Recognition Workshops (CVPRW), 2017 IEEE Conference on, pp. 1980–1987, IEEE, 2017.
  15. W. Zhang, B. Ma, F. Qiu, and Y. Ding, “Multi-modal facial affective analysis based on masked autoencoder,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5792–5801, 2023.
  16. J.-H. Kim, N. Kim, and C. S. Won, “Facial expression recognition with swin transformer,” arXiv preprint arXiv:2203.13472, 2022.
  17. K. He, X. Chen, S. Xie, Y. Li, P. Dollár, and R. Girshick, “Masked autoencoders are scalable vision learners,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 16000–16009, 2022.
  18. Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proceedings of the IEEE/CVF international conference on computer vision, pp. 10012–10022, 2021.
  19. K. N. Phan, H.-H. Nguyen, V.-T. Huynh, and S.-H. Kim, “Expression classification using concatenation of deep neural network for the 3rd abaw3 competition,” arXiv preprint arXiv:2203.12899, vol. 5, 2022.
  20. W. Zhou, J. Lu, Z. Xiong, and W. Wang, “Leveraging tcn and transformer for effective visual-audio fusion in continuous emotion recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5755–5762, 2023.
  21. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
  22. Z. Shao, Z. Liu, J. Cai, and L. Ma, “Jaa-net: joint facial action unit detection and face alignment via adaptive attention,” International Journal of Computer Vision, vol. 129, pp. 321–340, 2021.
  23. Y. Tang, W. Zeng, D. Zhao, and H. Zhang, “Piap-df: Pixel-interested and anti person-specific facial action unit detection net with discrete feedback learning,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 12899–12908, October 2021.
  24. A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, et al., “Learning transferable visual models from natural language supervision,” in International conference on machine learning, pp. 8748–8763, PMLR, 2021.
  25. G. Ilharco, M. Wortsman, R. Wightman, C. Gordon, N. Carlini, R. Taori, A. Dave, V. Shankar, H. Namkoong, J. Miller, H. Hajishirzi, A. Farhadi, and L. Schmidt, “Open clip.” https://github.com/mlfoundations/open_clip, 2021.
  26. S. Hu, Z. Yang, X. Wang, Y. Ying, and S. Lyu, “Outlier robust adversarial training,” in Asian Conference on Machine Learning, pp. 454–469, PMLR, 2024.
  27. Y. Ju, S. Hu, S. Jia, G. H. Chen, and S. Lyu, “Improving fairness in deepfake detection,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 4655–4665, 2024.
  28. L. Lin, X. He, Y. Ju, X. Wang, F. Ding, and S. Hu, “Preserving fairness generalization in deepfake detection,” CVPR, 2024.
  29. S. Hu, X. Wang, and S. Lyu, “Rank-based decomposable losses in machine learning: A survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
  30. S. Hu and G. H. Chen, “Distributionally robust survival analysis: A novel fairness loss without demographics,” in Machine Learning for Health, pp. 62–87, PMLR, 2022.
  31. S. Hu, L. Ke, X. Wang, and S. Lyu, “Tkml-ap: Adversarial attacks to top-k multi-label learning,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7649–7657, 2021.
  32. S. Hu, Y. Ying, X. Wang, and S. Lyu, “Sum of ranked range loss for supervised learning,” Journal of Machine Learning Research, vol. 23, no. 112, pp. 1–44, 2022.
  33. S. Hu, Y. Ying, S. Lyu, et al., “Learning by minimizing the sum of ranked range,” Advances in Neural Information Processing Systems, vol. 33, pp. 21013–21023, 2020.
  34. P. Foret, A. Kleiner, H. Mobahi, and B. Neyshabur, “Sharpness-aware minimization for efficiently improving generalization,” in International Conference on Learning Representations, 2020.
Citations (10)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com