Learning Multi-dimensional Edge Feature-based AU Relation Graph for Facial Action Unit Recognition

Published 2 May 2022 in cs.CV and cs.AI | (2205.01782v2)

Abstract: The activations of Facial Action Units (AUs) mutually influence one another. While the relationship between a pair of AUs can be complex and unique, existing approaches fail to specifically and explicitly represent such cues for each pair of AUs in each facial display. This paper proposes an AU relationship modelling approach that deep learns a unique graph to explicitly describe the relationship between each pair of AUs of the target facial display. Our approach first encodes each AU's activation status and its association with other AUs into a node feature. Then, it learns a pair of multi-dimensional edge features to describe multiple task-specific relationship cues between each pair of AUs. During both node and edge feature learning, our approach also considers the influence of the unique facial display on AUs' relationship by taking the full face representation as an input. Experimental results on BP4D and DISFA datasets show that both node and edge feature learning modules provide large performance improvements for CNN and transformer-based backbones, with our best systems achieving the state-of-the-art AU recognition results. Our approach not only has a strong capability in modelling relationship cues for AU recognition but also can be easily incorporated into various backbones. Our PyTorch code is made available.

Abstract PDF Upgrade to Chat

Citations (99)

View on Semantic Scholar

Summary

The paper proposes a novel graph-based framework that leverages multi-dimensional edge features for accurate facial action unit recognition.
It integrates AU-specific feature extraction with dynamic graph generation using a GatedGCN to capture nuanced facial expressions.
Experimental results on BP4D and DISFA datasets show state-of-the-art F1 scores, surpassing traditional binary relationship models.

Overview of Learning Multi-dimensional Edge Feature-based AU Relation Graph for Facial Action Unit Recognition

The paper "Learning Multi-dimensional Edge Feature-based AU Relation Graph for Facial Action Unit Recognition" presents an innovative approach to enhancing the ability of AI systems to recognize facial action units (AUs) by modeling the intricate relationships between them. The proposed method addresses specific limitations in current AU recognition techniques by developing a deep learning-based framework that uniquely captures the dynamic interactions between AUs for each facial expression. This research is significant for advancing multi-label classification tasks that require nuanced understanding of facial expressions.

Unlike traditional models that apply a uniform graph topology or treat AU relationships as simple binary associations, this study introduces a graph-based approach that utilizes multi-dimensional edge features to account for complex inter-AU relationships. The proposed system is engineered to deep learn a unique graph representation for each facial display, providing a tailored view of AU interactions that is informed by the specific arrangement of the face in question.

Methodology

The authors introduce a novel architecture that comprises two principal modules: AUs Relationship-aware Node Feature Learning (ANFL) and Multi-dimensional Edge Feature Learning (MEFL).

AUs Relationship-aware Node Feature Learning: This module, through the AU-specific Feature Generator (AFG) and Facial Graph Generator (FGG), learns individual representations for each AU and their interconnections. The AFG extracts AU-specific features from a broader facial representation, while the FGG constructs a dynamic graph unique to each face by leveraging the extracted AU features. This adaptability allows for capturing specific relationships relevant to the facial context.
Multi-dimensional Edge Feature Learning: In this phase, the Facial display-specific AU Representation Modeling (FAM) and AU Relationship Modeling (ARM) blocks work in tandem to derive multi-dimensional edge features. These edge features encapsulate the complexity of AU interactions by drawing on specific face-related cues, leading to a richer graph representation that surpasses the simple binary or single-dimensional approaches of previous studies.

The integration of these components, coupled with a Gated Graph Convolutional Network (GatedGCN) for processing the constructed graph, enhances the recognition accuracy for different AUs by exploiting task-specific cues embedded in both node and edge features.

Results

The paper reports experimental results demonstrating the efficacy of the proposed approach on well-recognized datasets, BP4D and DISFA. Models incorporating this novel framework achieved state-of-the-art results, significantly outperforming existing methods in terms of F1 scores. The methodology proved effective with both convolutional neural network (CNN) and transformer-based backbones, highlighting the versatility of the proposed graph-based relationship modeling.

Implications and Future Directions

This study contributes substantially to the field of facial recognition and multi-label classification by addressing the need for more sophisticated AU relationship modeling. The implications are broad, extending to enhanced human-computer interaction, improved emotion recognition systems, and applications in areas such as mental health assessments and entertainment.

Moving forward, further refining the scalability and efficiency of this approach remains a pertinent avenue for research, especially as real-world applications necessitate handling more diverse and larger datasets. Exploring the extension of such graph-based models to other domains with complex inter-object relations also holds potential for future research endeavors.

This research not only provides a robust framework for AU recognition but also sets a precedent for addressing relational complexities within multi-label classification tasks through advanced graph-based modeling.

Markdown