The Emotionally Intelligent Robot: Improving Social Navigation in Crowded Environments
(1903.03217v1)
Published 7 Mar 2019 in cs.RO
Abstract: We present a real-time algorithm for emotion-aware navigation of a robot among pedestrians. Our approach estimates time-varying emotional behaviors of pedestrians from their faces and trajectories using a combination of Bayesian-inference, CNN-based learning, and the PAD (Pleasure-Arousal-Dominance) model from psychology. These PAD characteristics are used for long-term path prediction and generating proxemic constraints for each pedestrian. We use a multi-channel model to classify pedestrian characteristics into four emotion categories (happy, sad, angry, neutral). In our validation results, we observe an emotion detection accuracy of 85.33%. We formulate emotion-based proxemic constraints to perform socially-aware robot navigation in low- to medium-density environments. We demonstrate the benefits of our algorithm in simulated environments with tens of pedestrians as well as in a real-world setting with Pepper, a social humanoid robot.
The paper introduces a data-driven algorithm that integrates facial and trajectory emotion analysis to generate proxemic constraints for socially-aware robot navigation.
It employs a convolutional neural network and Bayesian inference to classify pedestrian emotions, achieving an 85.33% accuracy in real-world tests.
Quantitative experiments with a Pepper humanoid robot demonstrate navigation with less than 25% time overhead while maintaining socially acceptable distances.
The paper introduces a real-time, data-driven planning algorithm designed to enable robots to navigate in a socially-aware manner by considering the emotional states of pedestrians. The approach leverages the Pleasure-Arousal-Dominance (PAD) model to predict pedestrian emotions from both facial cues and trajectories, integrating these predictions to guide robot navigation within socially acceptable proxemic constraints.
The methodology encompasses several key components:
Emotion Detection: A multi-channel model classifies pedestrian emotions into four categories: happy, sad, angry, and neutral. This classification is achieved by combining emotion analysis from facial expressions and trajectory data.
Facial Expression Analysis: A fully-convolutional neural network (CNN), pre-trained on the FER-2013 dataset, is employed to discern emotions from facial expressions [goodfellow2013challenges].
Trajectory Analysis: A data-driven approach maps pedestrian motion to emotional states. This Trajectory-based Emotion Model (TEM) is derived from a Mechanical Turk paper that correlates motion model parameters with perceived emotions.
Proxemic Constraints: Emotion estimates are used to generate proxemic constraints that define comfortable interaction distances for each pedestrian. These constraints are then integrated into the robot's navigation planning to ensure socially-aware behavior.
Robot Navigation: The algorithm is validated in both simulated and real-world environments, including experiments with a Pepper humanoid robot. The robot is shown to navigate through crowds while respecting proxemic and emotional cues.
The technical approach involves:
Using Bayesian inference and CNN-based learning to estimate time-varying emotional behaviors.
Employing the PAD model from psychology to represent emotional states.
Developing a multi-channel model to classify pedestrian characteristics based on facial and trajectory cues.
Formulating emotion-based proxemic constraints to enable socially-aware robot navigation.
The paper details the emotion learning process, which combines emotion learning from trajectories and facial features. The TEM is constructed from a perception paper aimed at obtaining emotion labels for pedestrian videos. Participants in the paper were asked to label the emotions of pedestrians in various scenarios, and the data collected was used to train a linear regression model that maps motion model parameters to emotional states. The accuracy of the TEM was evaluated using 10-fold cross-validation, achieving an average accuracy of 85.33%. The emotion coefficients are obtained from the multiple linear regression which maps the motion parameter vector P to the emotion vector Et as follows:
$\vec{E}<sup>t</sup> = \begin{pmatrix}
-0.15 & 0.00 & -0.12 \</p>
<p>0.24 & -0.61 & 0.20 \
-0.02 & 0.79 & 0.11
\end{pmatrix}*\vec{P}$*E<sup>t is the trajectory-based emotion vector.
P is the motion parameter vector.
The paper addresses the challenge of understanding pedestrian emotions by integrating facial expressions with trajectory information. The authors argue that relying solely on facial expressions can be unreliable due to factors such as partial visibility or distance. By combining facial expression analysis with trajectory analysis, the algorithm aims to provide a more accurate prediction of human emotional states. The joint pedestrian emotion E is computed as:
E=α+⌊max(Ef)+1/2⌋αEt+⌊max(Ef)+1/2⌋Ef
Et is the emotion from trajectory.
Ef is the emotion from facial expression.
α is the pedestrian tracking confidence metric.
The paper uses proxemic distances to enable socially-aware navigation. It incorporates the concepts of comfort distance (cde) and reachability distance (rde) into the navigation algorithm, where e indicates the computed emotion label. The robot alters its goal position and velocity in a way that takes into account both comfort and reachability: prpref+prox and vrpref+prox.
Quantitative evaluations in the paper show that the robot can reach its goal with less than 25% time overhead while ensuring that the proxemic spaces of the pedestrians are not violated.