Papers
Topics
Authors
Recent
2000 character limit reached

Spatiotemporal Recurrent Convolutional Networks for Recognizing Spontaneous Micro-expressions

Published 15 Jan 2019 in cs.CV | (1901.04656v1)

Abstract: Recently, the recognition task of spontaneous facial micro-expressions has attracted much attention with its various real-world applications. Plenty of handcrafted or learned features have been employed for a variety of classifiers and achieved promising performances for recognizing micro-expressions. However, the micro-expression recognition is still challenging due to the subtle spatiotemporal changes of micro-expressions. To exploit the merits of deep learning, we propose a novel deep recurrent convolutional networks based micro-expression recognition approach, capturing the spatial-temporal deformations of micro-expression sequence. Specifically, the proposed deep model is constituted of several recurrent convolutional layers for extracting visual features and a classificatory layer for recognition. It is optimized by an end-to-end manner and obviates manual feature design. To handle sequential data, we exploit two types of extending the connectivity of convolutional networks across temporal domain, in which the spatiotemporal deformations are modeled in views of facial appearance and geometry separately. Besides, to overcome the shortcomings of limited and imbalanced training samples, temporal data augmentation strategies as well as a balanced loss are jointly used for our deep network. By performing the experiments on three spontaneous micro-expression datasets, we verify the effectiveness of our proposed micro-expression recognition approach compared to the state-of-the-art methods.

Citations (179)

Summary

  • The paper introduces a novel STRCN architecture that leverages recurrent layers to capture subtle spatiotemporal patterns in microexpression videos.
  • It employs both appearance-based and geometric-based connectivity to automatically extract discriminative facial features.
  • Experimental results on SMIC, CASME II, and SAMM databases demonstrate improved recognition performance under diverse evaluation protocols.

Evaluating Spatiotemporal Recurrent Convolutional Networks for Micro-Expression Recognition

This paper focuses on advancing automatic micro-expression recognition (MER) utilizing deep learning frameworks, specifically spatiotemporal recurrent convolutional networks (STRCN). Micro-expressions are swift, involuntary facial movements that reveal genuine emotions and can be instrumental in applications such as lie detection and psychological analysis. Despite the promising capabilities of existing machine learning techniques, MER remains challenging due to the subtle spatiotemporal deformations of micro-expressions.

The authors propose an innovative architecture based on recurrent convolutional networks (RCNs), which extends conventional convolutional networks to account for temporal connectivity across video sequences. This setup integrates spatial and temporal feature extraction using two key approaches: appearance-based and geometric-based connectivity. The appearance-based connectivity (STRCN-A) processes sequences by selecting micro-expression-prone areas through a mask derived from a difference heat map, while the geometric-based connectivity (STRCN-G) leverages optical flow between sequence onset and apex frames to capture facial dynamics.

Key features of the proposed STRCN framework include:

  • End-to-end Optimization: This method eliminates manual feature design, facilitating the automatic learning of discriminative features across facial video sequences.
  • Handling Imbalanced Datasets: The integration of temporal data augmentation and balanced loss functions addresses the issues of limited and skewed datasets, common in MER tasks.
  • Enhanced Feature Representation: By employing recurrent layers, STRCN effectively increases receptive fields, enabling the capture of more nuanced spatiotemporal patterns within micro-expression videos.

Experimental evaluations on three prominent databases—SMIC, CASME II, and SAMM—underscore the efficacy of STRCN-G, particularly under the LOSO protocol, where it outperforms existing approaches and STRCN-A. STRCN-A showed superior performance within the LOVO protocol due to its nuanced appearance feature extraction.

Implications of these findings extend both practically and theoretically. Practically, STRCN offers an automated means to enhance emotion detection accuracy in real-time applications such as video surveillance or human-computer interaction systems. Theoretically, the integration of recurrent layers within convolutional networks opens avenues for future exploration in sequential data modeling, potentially enhancing various fields where capturing subtle temporal shifts is paramount.

While STRCN presents significant advances, future work could explore more scalable architectures or even incorporate data fusion techniques for deeper semantic understanding of facial expressions. Moreover, cross-dataset evaluation could promote generalizability and uncover latent biases, further solidifying STRCN's applicability in diverse settings.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.