Identifying Emotions from Walking using Affective and Deep Features (1906.11884v4)

Published 14 Jun 2019 in cs.CV and cs.RO

Abstract: We present a new data-driven model and algorithm to identify the perceived emotions of individuals based on their walking styles. Given an RGB video of an individual walking, we extract his/her walking gait in the form of a series of 3D poses. Our goal is to exploit the gait features to classify the emotional state of the human into one of four emotions: happy, sad, angry, or neutral. Our perceived emotion recognition approach uses deep features learned via LSTM on labeled emotion datasets. Furthermore, we combine these features with affective features computed from gaits using posture and movement cues. These features are classified using a Random Forest Classifier. We show that our mapping between the combined feature space and the perceived emotional state provides 80.07% accuracy in identifying the perceived emotions. In addition to classifying discrete categories of emotions, our algorithm also predicts the values of perceived valence and arousal from gaits. We also present an EWalk (Emotion Walk) dataset that consists of videos of walking individuals with gaits and labeled emotions. To the best of our knowledge, this is the first gait-based model to identify perceived emotions from videos of walking individuals.

Citations (56)

View on Semantic Scholar

Summary

The paper introduces a novel framework that fuses deep learning and affective features to classify four emotional states with an 80.07% accuracy rate.
It employs 3D pose estimation, LSTM networks, and a Random Forest classifier to extract temporal and geometric information from walking gaits.
The research paves the way for enhanced applications in human-computer interaction, security, and affective computing using non-verbal cues.

Overview of "Identifying Emotions from Walking Using Affective and Deep Features"

The paper "Identifying Emotions from Walking Using Affective and Deep Features" presents a novel framework combining data-driven models with psychological characterization to classify perceived emotions based on individuals' walking styles from RGB video inputs. The research pioneers a methodology leveraging both deep learning and heuristic-driven features to discern emotions such as happy, sad, angry, and neutral, enhancing applications in domains including HCI, security, and affective computing.

Methodological Approach

This paper introduces an integrated system where walking gaits are extracted via 3D pose estimation from RGB videos, transforming the walking patterns into 3D joint sequences. These gaits serve as input to a Long Short-Term Memory (LSTM) network, which models temporal dynamics and computes representational deep features. Meanwhile, affective features are calculated using posture and movement cues per psychological studies. These include angular, distance, and speed characteristics of body parts, capturing intrinsic emotional signatures.

The two-fold feature extraction approach feeds into a Random Forest classifier, which achieves an 80.07% accuracy in categorizing the four emotional states, marking a significant performance boost over previous methodologies. Moreover, the work presents a dataset (EWalk) and a taxonomy to map these perceived emotions onto affective dimensions, predicting valence and arousal levels.

Results and Implications

The combination of affective features with deep learning strategies substantiates an improvement over baseline and extant methods, as evidenced by a 13.85% increase in accuracy over current gait-based emotion classification models. Especially noteworthy is the robustness of the model across varied environments and conditions, as assessed through extensive datasets encompassing both natural and synthetic gait motions.

The implications of this research extend broadly within affective computing and related AI realms. The capacity to discern emotional states from non-verbal cues like walking introduces new avenues for interaction in autonomous systems, potentially enriching their ability to respond contextually to human affective states. Sectors ranging from security, where identifying agitation can preemptively signal concerns, to personalized entertainment and well-being applications, stand to benefit from such perceptive systems.

Limitations and Future Directions

A key limitation is the dependency on accurate 3D pose estimation; occlusions or inaccuracies in pose extraction can degrade emotion classification. The current methodology also centers on walking, limiting applicability to static or other non-standardized emotional displays. Future work may address these constraints by expanding emotion recognition to incorporate multimodal data streams, such as integrating facial or vocal cues, expanding beyond pedestrian data to encompass diverse human activities.

The proposed research not only complements existing emotion recognition technologies but also enhances the richness of human-centric AI systems, opening further investigation into nuanced and dynamic emotional perception from non-verbal behavior in naturalistic settings.

Related Papers

YouTube

Show All Videos