Conveying Emotions to Robots through Touch and Sound (2412.03300v1)

Published 4 Dec 2024 in cs.RO and cs.LG

Abstract: Human emotions can be conveyed through nuanced touch gestures. However, there is a lack of understanding of how consistently emotions can be conveyed to robots through touch. This study explores the consistency of touch-based emotional expression toward a robot by integrating tactile and auditory sensory reading of affective haptic expressions. We developed a piezoresistive pressure sensor and used a microphone to mimic touch and sound channels, respectively. In a study with 28 participants, each conveyed 10 emotions to a robot using spontaneous touch gestures. Our findings reveal a statistically significant consistency in emotion expression among participants. However, some emotions obtained low intraclass correlation values. Additionally, certain emotions with similar levels of arousal or valence did not exhibit significant differences in the way they were conveyed. We subsequently constructed a multi-modal integrating touch and audio features to decode the 10 emotions. A support vector machine (SVM) model demonstrated the highest accuracy, achieving 40% for 10 classes, with "Attention" being the most accurately conveyed emotion at a balanced accuracy of 87.65%.

Citations (1)

View on Semantic Scholar

Summary

The paper investigates how consistently humans convey emotions to robots using touch and sound, analyzing multimodal sensory data.
Using custom tactile and audio sensors and 28 participants, the study found statistically significant consistency in touch-based emotion expressions to robots, though this varied by emotion.
Machine learning models achieved 40% accuracy (SVM), recognizing 'attention' best while 'sadness' and 'surprise' were often misclassified, reflecting participant feedback.

The paper investigates the consistency with which humans express emotions to robots through touch, integrating tactile and auditory sensory data from affective haptic expressions. The authors posit that while touch is a crucial medium for social exchange, there is a gap in understanding how consistently emotions can be conveyed via touch to robots. The paper employs a custom-developed piezoresistive pressure sensor and a microphone to simulate touch and sound channels, respectively. Twenty-eight participants conveyed ten emotions to a Pepper robot using spontaneous touch gestures, and the consistency of these expressions was analyzed.

The authors focus on whether people are inter-consistent in their touch-based emotion expression, which emotions are distinguishable through touch, and whether multi-modal integration can decode different emotions.

The experimental setup involved a Pepper robot equipped with a 5x5 piezoresistive pressure grid tactile sensor and a USB microphone. The tactile sensor, similar to a Smart Textile developed previously, consists of top and bottom electrode patterns with a piezoresistive Velostat\texttrademark{} sheet in between. The electrodes are made by screen-printing DuPont\texttrademark{} PE874\texttrademark{} conductive ink onto a Bemis\texttrademark{} 3914 thermoplastic polyurethane (TPU) sheet. A NINA-B306 microcontroller unit (MCU), powered by a lithium-ion polymer (LiPo) battery, reads out the sensor data and communicates it wirelessly over Bluetooth Low Energy (BLE) at approximately 45 Hz. The transduction mechanism relies on changes in resistance of the Velostat\texttrademark{} sheet under pressure. The microphone recorded audio at a sample rate of 44.1 kHz, processed in chunks of 1024 frames per buffer, and saved the data in WAV format.

Participants were tasked with conveying ten emotions: anger, fear, disgust, happiness, surprise, sadness, confusion, comfort, calm, and attention. These emotions were selected based on Russell's model, spanning different arousal and valence zones. Participants were provided with definitions of these emotions. Each participant performed three rounds of interactions, with each gesture recorded for up to 10 seconds. Subjective feedback was also collected regarding the difficulty of expressing each emotion.

The dataset comprises tactile sensor and microphone recordings from 28 participants across three rounds of interactions, resulting in 84 samples per emotion. Tactile sensor data were sampled at 45 Hz, while audio data were recorded as 10-second WAV files.

Audio features extracted include Mel-Frequency Cepstral Coefficients (MFCCs), Spectral Centroid, Spectral Bandwidth, Zero Crossing Rate, and Root Mean Square Energy (RMSE). Tactile features include Mean Pressure, Max Pressure, Pressure Variance, Pressure Gradient, Median Force, Interquartile Range Force (IQR Force), Contact Area, Rate of Pressure Change, Pressure Standard Deviation (Pressure Std), Number of Touches (Num_touches), Max Touch Duration, Min Touch Duration, and Mean Duration.

After standardizing tactile and audio features, Principal Component Analysis (PCA) was performed for dimensionality reduction, and Intraclass Correlation Coefficients (ICC) were calculated to evaluate the consistency of emotion conveyance. Results indicated statistically significant consistency across participants for all emotions, although ICC values varied. \enquote{Attention} showed the highest ICC, while \enquote{Surprise} had the lowest.

Permutational multivariate analysis of variance (PERMANOVA) revealed a significant effect of emotion on the scaled combined features ( $F = 14.06, p < 0.001$ ). However, pairwise comparisons with Holm correction indicated no significant difference between calming and sadness ( $F = 3.78, p = 0.09$ ), or comfort and sadness ( $F = 4.89, p = 0.09$ ).

Various machine learning techniques, including Random Forest, Support Vector Machine (SVM), k-Nearest Neighbors (k-NN), Gradient Boosting Machine (GBM), Decision Tree, and Naive Bayes (NB), were used to decode the emotions. An SVM model with a linear kernel achieved the highest overall accuracy on the test data, with an accuracy of 40.00\,\% and a mean accuracy of 39.02\,\% (SD = 5.50\,\%) during 10-fold cross-validation. The SVM model achieved balanced accuracies of 70.68\,\% for anger, 87.65\,\% for attention, 70.68\,\% for happiness, and 72.53\,\% for calming. However, it struggled with classes such as disgust and fear, showing balanced accuracies of 60.49\,\% and 59.26\,\%, respectively, and performed worst in sadness and surprise, with balanced accuracies of 54.01\,\% and 54.63\,\%, respectively. Analysis of feature importance indicated that Max Pressure, Mean Touch Duration, and RMSE were the top contributing features.

The confusion matrix revealed that attention was the most recognizable emotion. Misclassifications included anger being mistaken for attention, comfort confused with calming, and happiness misidentified as attention. Sadness was commonly confused with both comfort and calming, while surprise was often misclassified as fear and attention.

The authors conclude that while emotions are expressed with significant consistency, the degree of consistency varies. Emotions with similar levels of arousal or valence are easily misclassified. Subjective feedback indicated that surprise and confusion were the most challenging emotions to express, aligning with their lower consistency scores. The authors note the paper's limitations, including a limited sample size and the tactile sensor's restriction to the forearm. Future research should extend tactile sensor coverage to multiple body parts and convey emotion back to the robot, which can help the robot to build an affective response during the interaction.

PDF Markdown

Related Papers

Tweets

https://twitter.com/OWW/status/1864729248948207843