In the Blink of an Eye: Event-based Emotion Recognition (2310.04043v1)
Abstract: We introduce a wearable single-eye emotion recognition device and a real-time approach to recognizing emotions from partial observations of an emotion that is robust to changes in lighting conditions. At the heart of our method is a bio-inspired event-based camera setup and a newly designed lightweight Spiking Eye Emotion Network (SEEN). Compared to conventional cameras, event-based cameras offer a higher dynamic range (up to 140 dB vs. 80 dB) and a higher temporal resolution. Thus, the captured events can encode rich temporal cues under challenging lighting conditions. However, these events lack texture information, posing problems in decoding temporal information effectively. SEEN tackles this issue from two different perspectives. First, we adopt convolutional spiking layers to take advantage of the spiking neural network's ability to decode pertinent temporal information. Second, SEEN learns to extract essential spatial cues from corresponding intensity frames and leverages a novel weight-copy scheme to convey spatial attention to the convolutional spiking layers during training and inference. We extensively validate and demonstrate the effectiveness of our approach on a specially collected Single-eye Event-based Emotion (SEE) dataset. To the best of our knowledge, our method is the first eye-based emotion recognition method that leverages event-based cameras and spiking neural network.
- Bradley M. Appelhans and Linda J. Luecken. 2006. Heart Rate Variability as an Index of Regulated Emotional Responding. Review of General Psychology 10, 3 (2006), 229–240. https://doi.org/10.1037/1089-2680.10.3.229
- Real-Time Expression-Sensitive HMD Face Reconstruction. In SIGGRAPH Asia 2015 Technical Briefs (Kobe, Japan) (SA ’15). Association for Computing Machinery, New York, NY, USA, Article 9, 4 pages. https://doi.org/10.1145/2820903.2820910
- BoostMeUp: Improving Cognitive Performance in the Moment by Unobtrusively Regulating Emotions with a Smartwatch. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 3, 2, Article 40 (jun 2019), 23 pages. https://doi.org/10.1145/3328911
- The effect of ambient-light conditions on quantitative pupillometry: a history of rubber cup. Neurocritical Care 30 (2019), 492–493.
- Multitask emotion recognition with incomplete labels. In 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020) (Buenos Aires, Argentina). IEEE, 592–599. https://doi.org/10.1109/FG47880.2020.00131
- Mimamo net: Integrating micro-and macro-motion for video emotion recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. Assoc Advancement Artificial Intelligence, 2621–2628.
- ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. 248–255. https://doi.org/10.1109/CVPR.2009.5206848
- Biologically Inspired Dynamic Thresholds for Spiking Neural Networks. In Advances in Neural Information Processing Systems. https://doi.org/10.48550/arXiv.2206.04426
- Paul Ekman and Wallace V Friesen. 1978. Facial action coding systems. Consulting Psychologists Press.
- Event-Based Vision: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 1 (2022), 154–180. https://doi.org/10.1109/TPAMI.2020.3008413
- End-to-end learning of representations for asynchronous event-based data. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5633–5643. https://doi.org/10.1109/ICCV.2019.00573
- Combining Events and Frames Using Recurrent Asynchronous Multimodal Networks for Monocular Depth Prediction. IEEE Robotics and Automation Letters 6, 2 (2021), 2822–2829. https://doi.org/10.1109/LRA.2021.3060707
- Mariana-Iuliana Georgescu and Radu Tudor Ionescu. 2019. Recognizing facial expressions of occluded faces using convolutional neural networks. In International Conference on Neural Information Processing, Vol. 1142. Springer, 645–653. https://doi.org/10.1007/978-3-030-36808-1_70
- Wulfram Gerstner and Werner M. Kistler. 2002. Spiking Neuron Models: Single Neurons, Populations, Plasticity.
- Anna Gruebler and Kenji Suzuki. 2014. Design of a Wearable Device for Reading Positive Expressions from Facial EMG Signals. IEEE Transactions on Affective Computing 5, 3 (2014), 227–237. https://doi.org/10.1109/TAFFC.2014.2313557
- Can spatiotemporal 3d cnns retrace the history of 2d cnns and imagenet?. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 6546–6555. https://doi.org/10.1109/CVPR.2018.00685
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778. https://doi.org/10.1109/CVPR.2016.90
- Eyemotion: Classifying facial expressions in VR using eye-tracking cameras. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 1626–1635. https://doi.org/10.1109/WACV.2019.00178
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
- Bita Houshmand and Naimul Mefraz Khan. 2020. Facial expression recognition under partial occlusion from virtual reality headsets based on transfer learning. In 2020 IEEE Sixth International Conference on Multimedia Big Data (BigMM). IEEE, 70–75. https://doi.org/10.1109/BigMM50055.2020.00020
- EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware Motion Model. In ACM SIGGRAPH 2022 Conference Proceedings (SIGGRAPH ’22). 1–10. https://doi.org/10.1145/3528233.3530745
- Anil Kag and Venkatesh Saligrama. 2021. Time adaptive recurrent neural network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 15149–15158. https://doi.org/10.1109/CVPR46437.2021.01490
- Hots: a hierarchy of event-based time-surfaces for pattern recognition. IEEE transactions on pattern analysis and machine intelligence 39, 7 (2017), 1346–1359. https://doi.org/10.1109/TPAMI.2016.2574707
- Context-aware emotion recognition networks. In Proceedings of the IEEE/CVF international conference on computer vision. 10143–10152. https://doi.org/10.1109/ICCV.2019.01024
- Multi-modal recurrent attention networks for facial expression recognition. IEEE Transactions on Image Processing 29 (2020), 6977–6991. https://doi.org/10.1109/TIP.2020.2996086
- Facial Performance Sensing Head-Mounted Display. ACM Trans. Graph. 34, 4, Article 47 (jul 2015), 9 pages. https://doi.org/10.1145/2766939
- Emotion recognition from multichannel EEG signals using K-nearest neighbor classification. Technology and Health Care 26 (04 2018), 509–519. https://doi.org/10.3233/THC-174836
- EEG-Based Emotion Classification Using a Deep Neural Network and Sparse Autoencoder. Frontiers in Systems Neuroscience 14 (2020). https://doi.org/10.3389/fnsys.2020.00043
- Jorge C. Lucero and Kevin G. Munhall. 1999. A model of facial biomechanics for speech production. The Journal of the Acoustical Society of America 106 5 (1999), 2834–2842. https://doi.org/10.1121/1.428108
- Event-based vision meets deep learning on steering prediction for self-driving cars. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5419–5427. https://doi.org/10.1109/CVPR.2018.00568
- Sebastiaan Mathôt. 2018. Pupillometry: Psychology, Physiology, and Function. Journal of Cognition 1 (02 2018). https://doi.org/10.5334/joc.18
- Recurrent neural networks with intra-frame iterations for video deblurring. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8094–8103. https://doi.org/10.1109/CVPR.2019.00829
- SPIDERS: Low-Cost Wireless Glasses for Continuous In-Situ Bio-Signal Acquisition and Emotion Recognition. In 2020 IEEE/ACM Fifth International Conference on Internet-of-Things Design and Implementation (IoTDI). 27–39. https://doi.org/10.1109/IoTDI49375.2020.00011
- PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32. Curran Associates, Inc., 8024–8035. http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
- Feature decomposition and reconstruction learning for effective facial expression recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7660–7669. https://doi.org/10.1109/CVPR46437.2021.00757
- Affective Processes: stochastic modelling of temporal context for emotion and facial expression recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9074–9084. https://doi.org/10.1109/CVPR46437.2021.00896
- Cross-Corpus Acoustic Emotion Recognition: Variances and Strategies. IEEE Transactions on Affective Computing 1, 2 (2011), 119–131. https://doi.org/10.1109/T-AFFC.2010.8
- A closer look at spatiotemporal convolutions for action recognition. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 6450–6459. https://doi.org/10.1109/CVPR.2018.00675
- Event-Stream Representation for Human Gaits Identification Using Deep Neural Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 7 (2022), 3436–3449. https://doi.org/10.1109/TPAMI.2021.3054886
- EMO: Real-time emotion recognition from single-eye images for resource-constrained eyewear devices. In Proceedings of the 18th International Conference on Mobile Systems, Applications, and Services. 448–461. https://doi.org/10.1145/3386901.3388917
- Spatio-temporal backpropagation for training high-performance spiking neural networks. Frontiers in neuroscience 12 (2018), 331. https://doi.org/10.3389/fnins.2018.00331
- Transfer: Learning relation-aware facial expression representations with transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3601–3610. https://doi.org/10.1109/ICCV48922.2021.00358
- Object tracking by jointly exploiting frame and event domain. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 13043–13052. https://doi.org/10.1109/ICCV48922.2021.01280
- Relative Uncertainty Learning for Facial Expression Recognition. Advances in Neural Information Processing Systems 34 (2021), 17616–17627.
- Zengqun Zhao and Qingshan Liu. 2021. Former-DFER: Dynamic Facial Expression Recognition Transformer. In Proceedings of the 29th ACM International Conference on Multimedia. 1553–1561. https://doi.org/10.1145/3474085.3475292