Multi-scale Dynamic and Hierarchical Relationship Modeling for Facial Action Units Recognition (2404.06443v1)
Abstract: Human facial action units (AUs) are mutually related in a hierarchical manner, as not only they are associated with each other in both spatial and temporal domains but also AUs located in the same/close facial regions show stronger relationships than those of different facial regions. While none of existing approach thoroughly model such hierarchical inter-dependencies among AUs, this paper proposes to comprehensively model multi-scale AU-related dynamic and hierarchical spatio-temporal relationship among AUs for their occurrences recognition. Specifically, we first propose a novel multi-scale temporal differencing network with an adaptive weighting block to explicitly capture facial dynamics across frames at different spatial scales, which specifically considers the heterogeneity of range and magnitude in different AUs' activation. Then, a two-stage strategy is introduced to hierarchically model the relationship among AUs based on their spatial distribution (i.e., local and cross-region AU relationship modelling). Experimental results achieved on BP4D and DISFA show that our approach is the new state-of-the-art in the field of AU occurrence recognition. Our code is publicly available at https://github.com/CVI-SZU/MDHR.
- The facial motor system. Neuroscience & Biobehavioral Reviews, 38:135–159, 2014.
 - Knowledge-driven self-supervised representation learning for facial action unit recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20417–20426, 2022.
 - Learning spatial and temporal cues for multi-label facial action unit detection. In 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017), pages 25–32, 2017a.
 - Learning spatial and temporal cues for multi-label facial action unit detection. In 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017), pages 25–32. IEEE, 2017b.
 - Aula-caps: Lifecycle-aware capsule networks for spatio-temporal analysis of facial actions. In 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021), pages 01–08. IEEE, 2021.
 - Deep structure inference network for facial action unit recognition. In Proceedings of the european conference on computer vision (ECCV), pages 298–313, 2018.
 - Knowledge augmented deep neural networks for joint facial expression and action unit recognition. Advances in Neural Information Processing Systems, 33:14338–14349, 2020.
 - Biomechanics-guided facial action unit detection through force modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8694–8703, 2023.
 - Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
 - Emopain challenge 2020: Multimodal pain evaluation from facial and bodily expressions. In 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020), pages 849–856. IEEE, 2020.
 - Facial action coding system. Environmental Psychology & Nonverbal Behavior, 1978.
 - Optical flow fusion synthesis based on adversarial learning from videos for facial action unit detection. In The International Conference on Image, Vision and Intelligent Systems (ICIVIS 2021), pages 561–571. Springer, 2022.
 - Facial action unit detection with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7680–7689, 2021.
 - Deep learning the dynamic appearance and shape of facial action units. In 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 1–8, 2016a.
 - Deep learning the dynamic appearance and shape of facial action units. In 2016 IEEE winter conference on applications of computer vision (WACV), pages 1–8. IEEE, 2016b.
 - Deep region and multi-label learning for facial action unit detection. In In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3391–3399, 2016.
 - Semantic relationships guided representation learning for facial action unit recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 8594–8601, 2019a.
 - Action unit detection with region adaptation, multi-labeling learning and optimal temporal fusing. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1841–1850, 2017.
 - Eac-net: Deep nets with enhancing and cropping for facial action unit detection. IEEE transactions on pattern analysis and machine intelligence, 40(11):2583–2596, 2018.
 - Your “attention” deserves attention: A self-diversified multi-channel attention for facial action analysis. In 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021), pages 01–08, 2021a.
 - Knowledge-spreader: Learning semi-supervised facial action dynamics by consistifying knowledge granularity. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 20979–20989, 2023.
 - Self-supervised representation learning from videos for facial action unit detection. In Proceedings of the IEEE/CVF Conference on Computer vision and pattern recognition, pages 10924–10933, 2019b.
 - Integrating semantic and temporal relationships in facial action unit detection. In Proceedings of the 29th ACM International Conference on Multimedia, pages 5519–5527, 2021b.
 - Sat-net: Self-attention and temporal fusion for facial action unit detection. In 2020 25th International Conference on Pattern Recognition (ICPR), pages 5036–5043, 2021c.
 - Learning spatial fusion for single-shot object detection. arXiv preprint arXiv:1911.09516, 2019.
 - Relation modeling with graph convolutional networks for facial action unit detection. In MultiMedia Modeling: 26th International Conference, MMM 2020, Daejeon, South Korea, January 5–8, 2020, Proceedings, Part II 26, pages 489–501. Springer, 2020a.
 - Teinet: Towards an efficient architecture for video recognition. In Proceedings of the AAAI conference on artificial intelligence, pages 11669–11676, 2020b.
 - Decoupled weight decay regularization. In International Conference on Learning Representations, 2018.
 - Learning multi-dimensional edge feature-based au relation graph for facial action unit recognition. arXiv preprint arXiv:2205.01782, 2022.
 - Automatic analysis of facial actions: A survey. IEEE transactions on affective computing, 10(3):325–347, 2017.
 - Disfa: A spontaneous facial action intensity database. IEEE Transactions on Affective Computing, 4(2):151–160, 2013.
 - Local relationship learning with person-specific shape regularization for facial action unit detection. In Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pages 11917–11926, 2019.
 - Au-expression knowledge constrained representation learning for facial expression recognition. In 2021 IEEE international conference on robotics and automation (ICRA), pages 11154–11161. IEEE, 2021.
 - Avec 2019 workshop and challenge: state-of-mind, detecting depression with ai, and cross-cultural affect recognition. In Proceedings of the 9th International on Audio/visual Emotion Challenge and Workshop, pages 3–12, 2019.
 - Deep adaptive attention for joint facial action unit detection and face alignment. In Proceedings of the European conference on computer vision (ECCV), pages 705–720, 2018.
 - Facial action unit detection using attention and relation learning. IEEE transactions on affective computing, 13(3):1274–1289, 2019.
 - Spatio-temporal relation and attention learning for facial action unit detection. arXiv preprint arXiv:2001.01168, 2020.
 - Jaa-net: joint facial action unit detection and face alignment via adaptive attention. International Journal of Computer Vision, 129:321–340, 2021.
 - Facial action unit recognition by prior and adaptive attention. Electronics, 11(19):3047, 2022.
 - Facial action unit detection via adaptive attention and relation. IEEE Transactions on Image Processing, 32:3354–3366, 2023a.
 - Facial action unit detection via adaptive attention and relation. IEEE Transactions on Image Processing, 2023b.
 - Self-supervised facial action unit detection with region and relation learning. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE, 2023.
 - Dynamic facial models for video-based dimensional affect estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pages 0–0, 2019.
 - Spectral representation of behaviour primitives for depression analysis. IEEE Transactions on Affective Computing, 13(2):829–844, 2020.
 - Gratis: Deep learning graph representation with task-specific topology and multi-dimensional edge features. arXiv preprint arXiv:2211.12482, 2022a.
 - Uncertain graph neural networks for facial action unit detection. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 5993–6001, 2021.
 - Heterogeneous spatio-temporal relation learning network for facial action unit detection. Pattern Recognition Letters, 164:268–275, 2022b.
 - Facial action unit recognition by exploiting their dynamic and semantic relationships. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(10):1683–1699, 2007.
 - Graph attention networks. arXiv preprint arXiv:1710.10903, 2017.
 - Spatio-temporal au relational graph representation learning for facial action units detection. arXiv preprint arXiv:2303.10644, 2023.
 - Au-assisted graph attention convolutional network for micro-expression recognition. In Proceedings of the 28th ACM International Conference on Multimedia, pages 2871–2880, 2020.
 - Weakly supervised regional and temporal learning for facial action unit recognition. IEEE Transactions on Multimedia, 2022.
 - Weakly supervised regional and temporal learning for facial action unit recognition. IEEE Transactions on Multimedia, 25:1760–1772, 2023.
 - Learning temporal information from a single image for au detection. In 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019), pages 1–8. IEEE, 2019.
 - Exploiting semantic embedding and visual feature for facial action unit detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10482–10491, 2021.
 - Toward robust facial action units’ detection. Proceedings of the IEEE, 2023a.
 - Fan-trans: Online knowledge distillation for facial action unit detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 6019–6027, 2023b.
 - Joint face detection and alignment using multitask cascaded convolutional networks. IEEE signal processing letters, 23(10):1499–1503, 2016.
 - Bp4d-spontaneous: a high-resolution spontaneous 3d dynamic facial expression database. Image and Vision Computing, 32(10):692–706, 2014.
 - Classifier learning with prior probabilities for facial action unit recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
 - Deep region and multi-label learning for facial action unit detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3391–3399, 2016.
 
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.