MELLM: Exploring LLM-Powered Micro-Expression Understanding Enhanced by Subtle Motion Perception

Published 11 May 2025 in cs.CV | (2505.07007v1)

Abstract: Micro-expressions (MEs) are crucial psychological responses with significant potential for affective computing. However, current automatic micro-expression recognition (MER) research primarily focuses on discrete emotion classification, neglecting a convincing analysis of the subtle dynamic movements and inherent emotional cues. The rapid progress in multimodal LLMs (MLLMs), known for their strong multimodal comprehension and language generation abilities, offers new possibilities. MLLMs have shown success in various vision-language tasks, indicating their potential to understand MEs comprehensively, including both fine-grained motion patterns and underlying emotional semantics. Nevertheless, challenges remain due to the subtle intensity and short duration of MEs, as existing MLLMs are not designed to capture such delicate frame-level facial dynamics. In this paper, we propose a novel Micro-Expression LLM (MELLM), which incorporates a subtle facial motion perception strategy with the strong inference capabilities of MLLMs, representing the first exploration of MLLMs in the domain of ME analysis. Specifically, to explicitly guide the MLLM toward motion-sensitive regions, we construct an interpretable motion-enhanced color map by fusing onset-apex optical flow dynamics with the corresponding grayscale onset frame as the model input. Additionally, specialized fine-tuning strategies are incorporated to further enhance the model's visual perception of MEs. Furthermore, we construct an instruction-description dataset based on Facial Action Coding System (FACS) annotations and emotion labels to train our MELLM. Comprehensive evaluations across multiple benchmark datasets demonstrate that our model exhibits superior robustness and generalization capabilities in ME understanding (MEU). Code is available at https://github.com/zyzhangUstc/MELLM.

Abstract PDF Upgrade to Chat

Authors (7)

Summary

Micro-Expression Large Language Model (MELLM): A Novel Approach to Micro-Expression Understanding

The paper titled "MELLM: Exploring LLM-Powered Micro-Expression Understanding Enhanced by Subtle Motion Perception" presents a pioneering approach in the affective computing domain by utilizing Multimodal Large Language Models (MLLMs) for analyzing micro-expressions (MEs). MEs are fleeting involuntary facial expressions that provide important insights into concealed emotions. The traditional challenge lies in accurately perceiving the subtle and transient nature of these expressions, typically lasting less than 0.5 seconds, with sparse data often causing model overfitting.

Methodology

The authors propose a Micro-Expression Large Language Model (MELLM) leveraging the strong inference capabilities inherent in MLLMs. The core innovation is the creation of Micro-Expression Motion-enhanced Color Mapping (MMC-Mapping). This technique enriches facial motion representation by encoding pixel-level dynamics as color maps—hue indicating motion direction, and luminance showing magnitude—superimposed on grayscale images. This visual enhancement aims to make subtle motion cues more perceptible to MLLMs, which are naturally attuned to visual comprehension.

The model's architecture is fine-tuned using Low-Rank Adaptation (LoRA), allowing it to adapt efficiently despite limited data availability. This tuning method enhances model performance without modifying extensive pre-trained parameters, making it suitable for small-scale micro-expression datasets. The paper details a clear process for annotating datasets using the Facial Action Coding System (FACS) for AU and emotion labeling, followed by a structured reasoning framework called Flow-Guided Micro-Expression Understanding (FGMU). FGMU bridges low-level motion cues with high-level affective reasoning, facilitating a comprehensive analysis of facial dynamics.

Results

The evaluation indicates that MELLM outperforms several baselines in model robustness and generalization ability, especially when assessed against traditional frameworks that focus exclusively on discrete emotion classification. Notably, MELLM shows enhanced performance in analyzing micro-expressions across various datasets, highlighting its potential in offering interpretable insights into human emotions. Standardized metrics such as Accuracy (ACC), Unweighted F1-score (UF1), and Unweighted Average Recall (UAR) confirm these findings, underscoring the model's superior capability in understanding nuanced emotional expressions.

Implications and Future Work

The paper implies significant practical and theoretical advancements in affective computing. Practically, MELLM can be applied in areas demanding fine emotional assessment, such as security screenings, psychological evaluations, and adaptive learning environments. Theoretically, it could inform future research in multimodal AI and emotion recognition technologies. Future development might explore further refining AU recognition and integrating additional modalities, such as vocal or text inputs, to enrich contextual understanding.

In summary, this research contributes a novel dimension in ME analysis through MLLM integration, setting a foundation for advanced emotional reasoning models that provide interpretable and actionable insights.