TCCT-Net: Two-Stream Network Architecture for Fast and Efficient Engagement Estimation via Behavioral Feature Signals
Abstract: Engagement analysis finds various applications in healthcare, education, advertisement, services. Deep Neural Networks, used for analysis, possess complex architecture and need large amounts of input data, computational power, inference time. These constraints challenge embedding systems into devices for real-time use. To address these limitations, we present a novel two-stream feature fusion "Tensor-Convolution and Convolution-Transformer Network" (TCCT-Net) architecture. To better learn the meaningful patterns in the temporal-spatial domain, we design a "CT" stream that integrates a hybrid convolutional-transformer. In parallel, to efficiently extract rich patterns from the temporal-frequency domain and boost processing speed, we introduce a "TC" stream that uses Continuous Wavelet Transform (CWT) to represent information in a 2D tensor form. Evaluated on the EngageNet dataset, the proposed method outperforms existing baselines, utilizing only two behavioral features (head pose rotations) compared to the 98 used in baseline models. Furthermore, comparative analysis shows TCCT-Net's architecture offers an order-of-magnitude improvement in inference speed compared to state-of-the-art image-based Recurrent Neural Network (RNN) methods. The code will be released at https://github.com/vedernikovphoto/TCCT_Net.
- Improving state-of-the-art in detecting student engagement with resnet and tcn hybrid network. In Proceedings - 2021 18th Conference on Robots and Vision, CRV 2021, page 151 – 157, 2021. Cited by: 24; All Open Access, Green Open Access.
- Affect-driven ordinal engagement measurement from video. Multimedia Tools and Applications, 2023. Cited by: 2; All Open Access, Green Open Access.
- Detecting cognitive engagement using word embeddings within an online teacher professional development community. Computers & Education, 140:103594, 2019.
- Attention guided 3d cnn-lstm model for accurate speech based emotion recognition. Applied Acoustics, 182, 2021. Cited by: 62.
- Openface 2.0: Facial behavior analysis toolkit. In 2018 13th IEEE international conference on automatic face & gesture recognition (FG 2018), pages 59–66. IEEE, 2018.
- Patricia Bota et al. Group synchrony for emotion recognition using physiological signals. IEEE Transactions on Affective Computing, 14(4):2614 – 2625, 2023. Cited by: 3; All Open Access, Green Open Access.
- Smg: A micro-gesture dataset towards spontaneous body gestures for emotional stress state analysis. International Journal of Computer Vision, 131(6):1346 – 1366, 2023. Cited by: 13; All Open Access, Green Open Access, Hybrid Gold Open Access.
- Immersion measurement in watching videos using eye-tracking data. IEEE Transactions on Affective Computing, 13(4):1759–1770, 2022.
- Fast and accurate deep network learning by exponential linear units (elus). In 4th International Conference on Learning Representations, ICLR 2016 - Conference Track Proceedings, 2016. Cited by: 1326.
- Emotiw 2023: Emotion recognition in the wild challenge. In Proceedings of the 25th International Conference on Multimodal Interaction, page 746–749, New York, NY, USA, 2023. Association for Computing Machinery.
- Ilana Dubovi. Cognitive and emotional engagement while learning with vr: The perspective of multimodal methodology. Computers & Education, 183:104495, 2022.
- Facilitating the child–robot interaction by endowing the robot with the capability of understanding the child engagement: The case of mio amico robot. International Journal of Social Robotics, 13:677–689, 2021.
- Affective personalization of a social robot tutor for children’s second language skills. In Proceedings of the AAAI conference on artificial intelligence, 2016.
- Daisee: towards user engagement recognition in the wild. arXiv preprint arXiv:1609.01885, 2016.
- Analyzing tweeting patterns and public engagement on twitter during the recognition period of the covid-19 pandemic: A study of two u.s. states. IEEE Access, 10:72879 – 72894, 2022. Cited by: 4; All Open Access, Gold Open Access.
- Emotion recognition using deep learning approach from audio–visual emotional big data. Information Fusion, 49:69 – 78, 2019. Cited by: 294.
- Conversational engagement recognition using auditory and visual cues. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, page 590 – 594, 2016. Cited by: 11.
- Engagement detection based on analyzing micro body gestures using 3d cnn. Computers, Materials & Continua, 70(2), 2022.
- Measuring non-typical emotions for mental health: A survey of computational approaches, 2024.
- Exploring temporal representations by leveraging attention-based bidirectional lstm-rnns for multi-modal emotion recognition. Information Processing and Management, 57(3), 2020. Cited by: 119.
- From regional to global brain: A novel hierarchical spatial-temporal neural network model for eeg emotion recognition. IEEE Transactions on Affective Computing, 13(2):568 – 578, 2022. Cited by: 82; All Open Access, Green Open Access.
- Fabien Lotte. Signal processing approaches to minimize or suppress calibration time in oscillatory activity-based brain-computer interfaces. Proceedings of the IEEE, 103(6):871 – 890, 2015. Cited by: 192; All Open Access, Green Open Access.
- A review of classification algorithms for eeg-based brain-computer interfaces: A 10 year update. Journal of Neural Engineering, 15(3), 2018. Cited by: 1226; All Open Access, Bronze Open Access, Green Open Access.
- Estimating audience engagement to predict movie ratings. IEEE Transactions on Affective Computing, 10(1):48–59, 2017.
- Demodulation of vibration signals generated by defects in rolling element bearings using complex shifted morlet wavelets. Mechanical Systems and Signal Processing, 16(4):677 – 694, 2002. Cited by: 262.
- Detecting patterns of engagement in a digital cognitive skills training game. Computers & Education, 165:104144, 2021.
- Automatic analysis of facial expressions based on deep covariance trajectories. IEEE Transactions on Neural Networks and Learning Systems, 31(10):3892–3905, 2020.
- Multimodal student engagement recognition in prosocial games. IEEE Transactions on Games, 10(3):292–303, 2017.
- Fast algorithms for discrete and continuous wavelet transforms. IEEE Transactions on Information Theory, 38(2):569 – 586, 1992. Cited by: 490; All Open Access, Green Open Access.
- Leveraged mel spectrograms using harmonic and percussive components in speech emotion recognition. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 13281 LNAI:392 – 404, 2022. Cited by: 4; All Open Access, Green Open Access.
- Automatic context-aware inference of engagement in hmi: A survey. IEEE Transactions on Affective Computing, page 1–20, 2023. Cited by: 2; All Open Access, Hybrid Gold Open Access.
- Classifying emotions and engagement in online learning based on a single facial expression recognition neural network. IEEE Transactions on Affective Computing, 13(4):2132 – 2143, 2022. Cited by: 26.
- Robin Tibor Schirrmeister et al. Deep learning with convolutional neural networks for eeg decoding and visualization. Human Brain Mapping, 38(11):5391 – 5420, 2017. Cited by: 1560; All Open Access, Green Open Access, Hybrid Gold Open Access.
- Students engagement level detection in online e-learning using hybrid efficientnetb7 together with tcn, lstm, and bi-lstm. IEEE Access, 10:99573 – 99583, 2022. Cited by: 6; All Open Access, Gold Open Access.
- Fusing visual attention cnn and bag of visual words for cross-corpus speech emotion recognition. Sensors (Switzerland), 20(19):1 – 21, 2020. Cited by: 28; All Open Access, Gold Open Access, Green Open Access.
- 1d multi-point local ternary pattern: A novel feature extraction method for analyzing cognitive engagement of students in flipped learning pedagogy. Cognitive Computation, pages 1–14, 2022.
- 1d multi-point local ternary pattern: A novel feature extraction method for analyzing cognitive engagement of students in flipped learning pedagogy. Cognitive Computation, 15(4):1243 – 1256, 2023. Cited by: 1; All Open Access, Bronze Open Access, Green Open Access.
- Do i have your attention: A large scale engagement prediction dataset and baselines. In Proceedings of the 25th International Conference on Multimodal Interaction, page 174–182, New York, NY, USA, 2023. Association for Computing Machinery.
- Eeg conformer: Convolutional transformer for eeg decoding and visualization. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 31:710–719, 2023.
- Estimating stress in online meetings by remote physiological signal and behavioral features. In Adjunct Proceedings of the 2022 ACM International Joint Conference on Pervasive and Ubiquitous Computing and the 2022 ACM International Symposium on Wearable Computers, page 216–220, New York, NY, USA, 2023. Association for Computing Machinery.
- Attention is all you need. In Advances in Neural Information Processing Systems, page 5999 – 6009, 2017. Cited by: 49352.
- Analyzing participants’ engagement during online meetings using unsupervised remote photoplethysmography with behavioral features, 2024.
- The faces of engagement: Automatic recognition of student engagement from facial expressions. IEEE Transactions on Affective Computing, 5(1):86 – 98, 2014. Cited by: 379.
- A transformer-based approach combining deep learning network and spatial-temporal information for raw eeg classification. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 30:2126–2136, 2022.
- Survey on emotion sensing using mobile devices. IEEE Transactions on Affective Computing, 14(4):2678 – 2696, 2023. Cited by: 3.
- Spontaneous speech emotion recognition using multiscale deep convolutional lstm. IEEE Transactions on Affective Computing, 13(2):680 – 688, 2022. Cited by: 48.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.