Compound Expression Recognition via Multi Model Ensemble (2403.12572v1)
Abstract: Compound Expression Recognition (CER) plays a crucial role in interpersonal interactions. Due to the existence of Compound Expressions , human emotional expressions are complex, requiring consideration of both local and global facial expressions to make judgments. In this paper, to address this issue, we propose a solution based on ensemble learning methods for Compound Expression Recognition. Specifically, our task is classification, where we train three expression classification models based on convolutional networks, Vision Transformers, and multi-scale local attention networks. Then, through model ensemble using late fusion, we merge the outputs of multiple models to predict the final result. Our method achieves high accuracy on RAF-DB and is able to recognize expressions through zero-shot on certain portions of C-EXPR-DB.
- Emotion detection from facial expressions using augmented reality. 2023 5th International Conference on Inventive Research in Computing Applications (ICIRCA), pages 1–5, 2023.
- An image is worth 16x16 words: Transformers for image recognition at scale. ArXiv, abs/2010.11929, 2020.
- Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2015.
- Masked autoencoders are scalable vision learners. arXiv:2111.06377, 2021.
- Robovie: an interactive humanoid robot. Industrial Robot-an International Journal, 28:498–503, 2001.
- Dimitrios Kollias. Abaw: Valence-arousal estimation, expression recognition, action unit detection & multi-task learning challenges. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2328–2336, 2022a.
- Dimitrios Kollias. Abaw: Learning from synthetic data & multi-task learning challenges. arXiv preprint arXiv:2207.01138, 2022b.
- Expression, affect, action unit recognition: Aff-wild2, multi-task learning and arcface. arXiv preprint arXiv:1910.04855, 2019.
- Affect analysis in-the-wild: Valence-arousal, expressions, action units and a unified framework. arXiv preprint arXiv:2103.15792, 2021a.
- Analysing affective behavior in the second abaw2 competition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3652–3660, 2021b.
- Analysing affective behavior in the first abaw 2020 competition. In 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020)(FG), pages 794–800.
- Face behavior a la carte: Expressions, affect and action units in a single network. arXiv preprint arXiv:1910.11111, 2019a.
- Deep affect prediction in-the-wild: Aff-wild database and challenge, deep architectures, and beyond. International Journal of Computer Vision, pages 1–23, 2019b.
- Distribution matching for heterogeneous multi-task learning: a large-scale face study. arXiv preprint arXiv:2105.03790, 2021.
- The 6th affective behavior analysis in-the-wild (abaw) competition. arXiv preprint arXiv:2402.19344, 2024.
- Dimitrios D. Kollias. Multi-label compound expression recognition: C-expr database & network. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5589–5598, 2023.
- A system of driving fatigue detection based on machine vision and its application on smart device. J. Sensors, 2015:548602:1–548602:11, 2015.
- Imagenet classification with deep convolutional neural networks. Communications of the ACM, 60:84 – 90, 2012.
- Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2584–2593, 2017.
- Compound expression recognition in-the-wild with au-assisted meta multi-task learning. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 5735–5744, 2023.
- Affectnet: A database for facial expression, valence, and arousal computing in the wild. IEEE Transactions on Affective Computing, 10:18–31, 2017.
- Blended emotion in-the-wild: Multi-label facial expression recognition using crowdsourced annotations and deep locality feature learning. International Journal of Computer Vision, 127:884 – 906, 2018.
- Attention is all you need. In Neural Information Processing Systems, 2017.
- Learning deep global multi-scale and local attention features for facial expression recognition in the wild. IEEE Transactions on Image Processing, 30:6544–6556, 2021.
- Eye fixation versus pupil diameter as eye-tracking features for virtual reality emotion classification. 2021 IEEE International Conference on Computing (ICOCO), pages 315–319, 2021.
- Two birds with one stone: Knowledge-embedded temporal convolutional transformer for depression detection and emotion recognition. IEEE Transactions on Affective Computing, 14:2595–2613, 2023.