Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Emotion Recognition from the perspective of Activity Recognition (2403.16263v1)

Published 24 Mar 2024 in cs.CV

Abstract: Applications of an efficient emotion recognition system can be found in several domains such as medicine, driver fatigue surveillance, social robotics, and human-computer interaction. Appraising human emotional states, behaviors, and reactions displayed in real-world settings can be accomplished using latent continuous dimensions. Continuous dimensional models of human affect, such as those based on valence and arousal are more accurate in describing a broad range of spontaneous everyday emotions than more traditional models of discrete stereotypical emotion categories (e.g. happiness, surprise). Most of the prior work on estimating valence and arousal considers laboratory settings and acted data. But, for emotion recognition systems to be deployed and integrated into real-world mobile and computing devices, we need to consider data collected in the world. Action recognition is a domain of Computer Vision that involves capturing complementary information on appearance from still frames and motion between frames. In this paper, we treat emotion recognition from the perspective of action recognition by exploring the application of deep learning architectures specifically designed for action recognition, for continuous affect recognition. We propose a novel three-stream end-to-end deep learning regression pipeline with an attention mechanism, which is an ensemble design based on sub-modules of multiple state-of-the-art action recognition systems. The pipeline constitutes a novel data pre-processing approach with a spatial self-attention mechanism to extract keyframes. The optical flow of high-attention regions of the face is extracted to capture temporal context. AFEW-VA in-the-wild dataset has been used to conduct comparative experiments. Quantitative analysis shows that the proposed model outperforms multiple standard baselines of both emotion recognition and action recognition models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (75)
  1. Facial expression recognition using a hybrid cnn-sift aggregator. arXiv preprint arXiv:1608.02833, 2016.
  2. Emotion recognition with spatial attention and temporal softmax pooling. Image Analysis and Recognition, page 323–331, 2019.
  3. The automatic detection of chronic pain-related expression: requirements, challenges and the multimodal emopain dataset. IEEE transactions on affective computing, 7(4):435–451, 2015.
  4. Fully automatic facial action recognition in spontaneous behavior. In 7th International Conference on Automatic Face and Gesture Recognition (FGR06), pages 223–230. IEEE, 2006.
  5. Affwild net and aff-wild database, 2019.
  6. Multi-modal audio, video and physiological sensor learning for continuous emotion prediction. In Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge, pages 97–104, 2016.
  7. Ets system for av+ ec 2015 challenge. In Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge, pages 17–23, 2015.
  8. Multi-modal dimensional emotion recognition using recurrent neural networks. In Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge, pages 49–56, 2015.
  9. Survey on rgb, 3d, thermal, and multimodal approaches for facial expression recognition: History, trends, and affect-related applications. IEEE transactions on pattern analysis and machine intelligence, 38(8):1548–1568, 2016.
  10. Describing the emotional states that are expressed in speech. Speech communication, 40(1-2):5–32, 2003.
  11. Handbook of cognition and emotion. John Wiley & Sons, 2000.
  12. Convolutional two-stream network fusion for video action recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2016.
  13. Learning dynamics from kinematics: Estimating 2d foot pressure maps from video frames. arXiv preprint arXiv:1811.12607, 2018.
  14. Categorical and dimensional affect analysis in continuous input: Current trends and future directions. Image and Vision Computing, 31(2):120–136, 2013.
  15. Multimodal prediction of affective dimensions and depression in human-computer interactions. In Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge, pages 33–40, 2014.
  16. Multimodal affective dimension prediction using deep bidirectional long short-term memory recurrent neural networks. In Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge, pages 73–80, 2015.
  17. Determining optical flow. In Techniques and Applications of Image Understanding, volume 281, pages 319–331. International Society for Optics and Photonics, 1981.
  18. 3d convolutional neural networks for human action recognition. IEEE transactions on pattern analysis and machine intelligence, 35(1):221–231, 2012.
  19. Joint fine-tuning in deep neural networks for facial expression recognition. In Proceedings of the IEEE international conference on computer vision, pages 2983–2991, 2015.
  20. Inferring depression and affect from application dependent meta knowledge. In Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge, pages 41–48, 2014.
  21. Ensemble methods for continuous affect recognition: Multi-modality, temporality, and challenges. In Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge, pages 9–16, 2015.
  22. Doubly sparse relevance vector machine for continuous facial behavior estimation. IEEE transactions on pattern analysis and machine intelligence, 38(9):1748–1761, 2015.
  23. Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 1725–1732, 2014.
  24. Real-time normalization and feature extraction of 3d face data using curvature characteristics. In Proceedings 10th IEEE International Workshop on Robot and Human Interactive Communication. ROMAN 2001 (Cat. No. 01TH8591), pages 74–79. IEEE, 2001.
  25. Interweaving deep learning and semantic techniques for emotion analysis in human-machine interaction. In 2015 10th International Workshop on Semantic and Social Media Adaptation and Personalization (SMAP), pages 1–6. IEEE, 2015.
  26. On line emotion detection using retrainable deep neural networks. In 2016 IEEE Symposium Series on Computational Intelligence (SSCI), pages 1–8. IEEE, 2016.
  27. Deep neural architectures for prediction in healthcare. Complex & Intelligent Systems, 4(2):119–131, 2018.
  28. Adaptation and contextualization of deep neural network models. In 2017 IEEE symposium series on computational intelligence (SSCI), pages 1–8. IEEE.
  29. Training deep neural networks with different datasets in-the-wild: The emotion recognition paradigm. In 2018 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE, 2018.
  30. Afew-va database for valence and arousal estimation in-the-wild. Image and Vision Computing, 65:23–36, 2017.
  31. A new rainfall-induced deep learning strategy for landslide susceptibility prediction. In AGU Fall Meeting Abstracts, volume 2021, pages NH35E–0504, 2021.
  32. The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression. In 2010 ieee computer society conference on computer vision and pattern recognition-workshops, pages 94–101. IEEE, 2010.
  33. Painful data: The unbc-mcmaster shoulder pain expression archive database. In Face and Gesture 2011, pages 57–64. IEEE, 2011.
  34. Computationally modeling human emotion. Communications of the ACM, 57(12):56–67, 2014.
  35. The semaine database: Annotated multimodal records of emotionally colored conversations between a person and a limited agent. IEEE transactions on affective computing, 3(1):5–17, 2011.
  36. Depression recognition based on dynamic facial and vocal expression features using partial least square regression. In Proceedings of the 3rd ACM international workshop on Audio/visual emotion challenge, pages 21–30, 2013.
  37. Robust face recognition system in video using hybrid scale invariant feature transform. Procedia Computer Science, 93:503–512, 2016.
  38. Affectnet: A database for facial expression, valence, and arousal computing in the wild. IEEE Transactions on Affective Computing, 10(1):18–31, 2017.
  39. An efficient deep learning mechanism for cross-region generalization of landslide events. In AGU Fall Meeting Abstracts, volume 2020, pages NH030–0010, 2020.
  40. Patchrefinenet: Improving binary segmentation by incorporating signals from optimal patch-wise binarization. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 1361–1372, 2024.
  41. Constructing a large-scale landslide database across heterogeneous environments using task-specific model updates. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 15:4349–4370, 2022.
  42. Cloud-based interactive database management suite integrated with deep learning-based annotation tool for landslide mapping. In AGU Fall Meeting Abstracts, volume 2020, pages NH030–0011, 2020.
  43. Comparison of reinforcement learning algorithms applied to the cart-pole problem. In 2017 international conference on advances in computing, communications and informatics (ICACCI), pages 26–32. IEEE, 2017.
  44. Threshnet: Segmentation refinement inspired by region-specific thresholding. arXiv preprint arXiv:2211.06560, 2, 2022.
  45. Estimating uncertainty in landslide segmentation models. arXiv preprint arXiv:2311.11138, 2023.
  46. Continuous prediction of spontaneous affect from multiple cues and modalities in valence-arousal space. IEEE Transactions on Affective Computing, 2(2):92–105, 2011.
  47. Robust continuous prediction of human emotions using multiscale dynamic cues. In Proceedings of the 14th ACM international conference on Multimodal interaction, pages 501–508, 2012.
  48. Toward an affect-sensitive multimodal human-computer interaction. Proceedings of the IEEE, 91(9):1370–1390, 2003.
  49. Deep face recognition. 2015.
  50. Utilizing an interactive ai-empowered web portal for landslide labeling for establishing a landslide database in washington state, usa. In EGU General Assembly Conference Abstracts, pages EGU21–13974, 2021.
  51. Learning latent subevents in activity videos using temporal attention filters. In Thirty-First AAAI Conference on Artificial Intelligence, 2017.
  52. Av+ ec 2015: The first affect recognition challenge bridging across audio, video, and physiological data. In Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge, pages 3–8, 2015.
  53. Audiovisual three-level fusion for continuous estimation of russell’s emotion circumplex. In Proceedings of the 3rd ACM international workshop on Audio/visual emotion challenge, pages 31–40, 2013.
  54. Automatic analysis of facial affect: A survey of registration, representation, and recognition. IEEE transactions on pattern analysis and machine intelligence, 37(6):1113–1133, 2014.
  55. Avec 2011–the first international audio/visual emotion challenge. In International Conference on Affective Computing and Intelligent Interaction, pages 415–424. Springer, 2011.
  56. Two-stream convolutional networks for action recognition in videos. In Advances in neural information processing systems, pages 568–576, 2014.
  57. A multimodal fuzzy inference system using a continuous facial expression representation for emotion detection. In Proceedings of the 14th ACM international conference on Multimodal interaction, pages 493–500, 2012.
  58. Assessment of parkinson’s disease based on deep neural networks. In International Conference on Engineering Applications of Neural Networks, pages 391–403. Springer, 2017.
  59. Machine learning for neurodegenerative disorder diagnosis—survey of practices and launch of benchmark dataset. International Journal on Artificial Intelligence Tools, 27(03):1850011, 2018.
  60. Recognizing action units for facial expression analysis. IEEE Transactions on pattern analysis and machine intelligence, 23(2):97–115, 2001.
  61. End-to-end multimodal emotion recognition using deep neural networks. IEEE Journal of Selected Topics in Signal Processing, 11(8):1301–1309, 2017.
  62. Avec 2016: Depression, mood, and emotion recognition workshop and challenge. In Proceedings of the 6th international workshop on audio/visual emotion challenge, pages 3–10, 2016.
  63. Induced disgust, happiness and surprise: an addition to the mmi facial expression database.
  64. Avec 2013: the continuous audio/visual emotion and depression recognition challenge. In Proceedings of the 3rd ACM international workshop on Audio/visual emotion challenge, pages 3–10, 2013.
  65. Attention is all you need. In Advances in neural information processing systems, pages 5998–6008, 2017.
  66. Action recognition with improved trajectories. In Proceedings of the IEEE international conference on computer vision, pages 3551–3558, 2013.
  67. Temporal segment networks: Towards good practices for deep action recognition. In European conference on computer vision, pages 20–36. Springer, 2016.
  68. Temporal segment networks: Towards good practices for deep action recognition. Lecture Notes in Computer Science, page 20–36, 2016.
  69. A 3d facial expression database for facial behavior research. In 7th international conference on automatic face and gesture recognition (FGR06), pages 211–216. IEEE, 2006.
  70. A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE transactions on pattern analysis and machine intelligence, 31(1):39–58, 2008.
  71. Face detection based on multi-block lbp representation. In International conference on biometrics, pages 11–18. Springer, 2007.
  72. A high-resolution spontaneous 3d dynamic facial expression database. In 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), pages 1–6. IEEE, 2013.
  73. Lie detection from speech analysis based on k–svd deep belief network model. In International Conference on Intelligent Computing, pages 189–196. Springer, 2015.
  74. A rapid and realistic 3d stratigraphic model generator conditioned on reference well log data. In Second EAGE Digitalization Conference and Exhibition, volume 2022, pages 1–5. European Association of Geoscientists & Engineers, 2022.
  75. Karel Zuiderveld. Contrast limited adaptive histogram equalization. In Graphics gems IV, pages 474–485. Academic Press Professional, Inc., 1994.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Savinay Nagendra (8 papers)
  2. Prapti Panigrahi (1 paper)
Citations (1)

Summary

We haven't generated a summary for this paper yet.