Engagement Measurement Based on Facial Landmarks and Spatial-Temporal Graph Convolutional Networks (2403.17175v2)
Abstract: Engagement in virtual learning is crucial for a variety of factors including student satisfaction, performance, and compliance with learning programs, but measuring it is a challenging task. There is therefore considerable interest in utilizing artificial intelligence and affective computing to measure engagement in natural settings as well as on a large scale. This paper introduces a novel, privacy-preserving method for engagement measurement from videos. It uses facial landmarks, which carry no personally identifiable information, extracted from videos via the MediaPipe deep learning solution. The extracted facial landmarks are fed to Spatial-Temporal Graph Convolutional Networks (ST-GCNs) to output the engagement level of the student in the video. To integrate the ordinal nature of the engagement variable into the training process, ST-GCNs undergo training in a novel ordinal learning framework based on transfer learning. Experimental results on two video student engagement measurement datasets show the superiority of the proposed method compared to previous methods with improved state-of-the-art on the EngageNet dataset with a 3.1% improvement in four-class engagement level classification accuracy and on the Online Student Engagement dataset with a 1.5% improvement in binary engagement classification accuracy. Gradient-weighted Class Activation Mapping (Grad-CAM) was applied to the developed ST-GCNs to interpret the engagement measurements obtained by the proposed method in both the spatial and temporal domains. The relatively lightweight and fast ST-GCN and its integration with the real-time MediaPipe make the proposed approach capable of being deployed on virtual learning platforms and measuring engagement in real-time.
- Brandon M. Booth, Nigel Bosch and Sidney K. D’Mello “Engagement Detection and Its Applications in Learning: A Tutorial and Selective Review” In Proceedings of the IEEE 111.10, 2023, pp. 1398–1422 DOI: 10.1109/JPROC.2023.3309560
- Suzanne Hidi and K Ann Renninger “The four-phase model of interest development” In Educational psychologist 41.2 Taylor & Francis, 2006, pp. 111–127
- “Graph-based facial affect analysis: A review” In IEEE Transactions on Affective Computing IEEE, 2022
- “Rendering of eyes for eye-shape registration and gaze estimation” In Proceedings of the IEEE international conference on computer vision, 2015, pp. 3756–3764
- “A review on mental stress detection using wearable sensors and machine learning techniques” In IEEE Access 9 IEEE, 2021, pp. 84045–84066
- Jennifer A Fredricks, Phyllis C Blumenfeld and Alison H Paris “School engagement: Potential of the concept, state of the evidence” In Review of educational research 74.1 Sage Publications Sage CA: Thousand Oaks, CA, 2004, pp. 59–109
- “Dynamics of affective states during complex learning” In Learning and Instruction 22.2 Elsevier, 2012, pp. 145–157
- Jaclyn Ocumpaugh “Baker Rodrigo Ocumpaugh monitoring protocol (BROMP) 2.0 technical and training manual” In New York, NY and Manila, Philippines: Teachers College, Columbia University and Ateneo Laboratory for the Learning Sciences 60, 2015
- Shofiyati Nur Karimah and Shinobu Hasegawa “Automatic engagement estimation in smart education/learning settings: a systematic review of engagement definitions, datasets, and methods” In Smart Learning Environments 9.1 SpringerOpen, 2022, pp. 1–48
- M Dewan, Mahbub Murshed and Fuhua Lin “Engagement detection in online learning: a review” In Smart Learning Environments 6.1 Springer, 2019, pp. 1–20
- Shehroz S Khan, Ali Abedi and Tracey Colella “Inconsistencies in Measuring Student Engagement in Virtual Learning–A Critical Review” In arXiv preprint arXiv:2208.04548, 2022
- Ali Abedi and Shehroz S Khan “Affect-driven ordinal engagement measurement from video” In Multimedia Tools and Applications Springer, 2023, pp. 1–20
- “Mediapipe: A framework for building perception pipelines” In arXiv preprint arXiv:1906.08172, 2019
- “Geometric graph representation with learnable graph structure and adaptive au constraint for micro-expression recognition” In IEEE Transactions on Affective Computing IEEE, 2023
- “A Skeleton-Based Rehabilitation Exercise Assessment System With Rotation Invariance” In IEEE Transactions on Neural Systems and Rehabilitation Engineering 31, 2023, pp. 2612–2621 DOI: 10.1109/TNSRE.2023.3282675
- Pratik K Mishra, Alex Mihailidis and Shehroz S Khan “Skeletal Video Anomaly Detection using Deep Learning: Survey, Challenges and Future Directions” In arXiv preprint arXiv:2301.00114, 2022
- “FaceEngage: robust estimation of gameplay engagement from user-contributed (YouTube) videos” In IEEE Transactions on Affective Computing IEEE, 2019
- “Advanced multi-instance learning method with multi-features engineering and conservative optimization for engagement intensity prediction” In Proceedings of the 2020 International Conference on Multimodal Interaction, 2020, pp. 777–783
- “Automatic student engagement in online learning environment based on neural turing machine” In International Journal of Information and Education Technology 11.3, 2021, pp. 107–111
- “Engagement Detection with Multi-Task Training in E-Learning Environments” In International Conference on Image Analysis and Processing, 2022, pp. 411–422 Springer
- “Detecting disengagement in virtual learning as an anomaly using temporal convolutional network autoencoder” In Signal, Image and Video Processing Springer, 2023, pp. 1–9
- Chinchu Thomas, Nitin Nair and Dinesh Babu Jayagopi “Predicting engagement intensity in the wild using temporal convolutional network” In Proceedings of the 20th ACM International Conference on Multimodal Interaction, 2018, pp. 604–610
- “Automatic prediction of presentation style and student engagement from videos” In Computers and Education: Artificial Intelligence Elsevier, 2022, pp. 100079
- “Daisee: Towards user engagement recognition in the wild” In arXiv preprint arXiv:1609.01885, 2016
- “An novel end-to-end network for automatic student engagement recognition” In 2019 IEEE 9th International Conference on Electronics Information and Emergency Communication (ICEIEC), 2019, pp. 342–345 IEEE
- Ali Abedi and Shehroz S Khan “Improving state-of-the-art in Detecting Student Engagement with Resnet and TCN Hybrid Network” In 2021 18th Conference on Robots and Vision (CRV), 2021, pp. 151–157 IEEE
- “Three-dimensional DenseNet self-attention neural network for automatic detection of student’s engagement” In Applied Intelligence Springer, 2022, pp. 1–21
- Hua Leong Fwa “Fine-grained detection of academic emotions with spatial temporal graph attention networks using facial landmarks”, 2022
- Sijie Yan, Yuanjun Xiong and Dahua Lin “Spatial temporal graph convolutional networks for skeleton-based action recognition” In Proceedings of the AAAI conference on artificial intelligence 32.1, 2018
- Xusheng Ai, Victor S Sheng and Chunhua Li “Class-attention Video Transformer for Engagement Intensity Prediction” In arXiv preprint arXiv:2208.07216, 2022
- Jiacheng Liao, Yan Liang and Jiahui Pan “Deep facial spatiotemporal network for engagement prediction in online learning” In Applied Intelligence 51.10 Springer, 2021, pp. 6609–6621
- Tasneem Selim, Islam Elkabani and Mohamed A Abdou “Students Engagement Level Detection in Online e-Learning Using Hybrid EfficientNetB7 Together With TCN, LSTM, and Bi-LSTM” In IEEE Access 10 IEEE, 2022, pp. 99573–99583
- “Marlin: Masked autoencoder for facial video representation learning” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 1493–1504
- “Do I Have Your Attention: A Large Scale Engagement Prediction Dataset and Baselines” In Proceedings of the 25th International Conference on Multimodal Interaction, ICMI ’23 <conf-loc>, <city>Paris</city>, <country>France</country>, </conf-loc>: Association for Computing Machinery, 2023, pp. 174–182 DOI: 10.1145/3577190.3614164
- “Estimation of continuous valence and arousal levels from faces in naturalistic conditions” In Nature Machine Intelligence 3.1 Nature Publishing Group, 2021, pp. 42–50
- “Openface 2.0: Facial behavior analysis toolkit” In 2018 13th IEEE international conference on automatic face & gesture recognition (FG 2018), 2018, pp. 59–66 IEEE
- “Bag of States: A Non-sequential Approach to Video-based Engagement Measurement” In arXiv preprint arXiv:2301.06730, 2023
- “Predicting student engagement using sequential ensemble model” In IEEE Transactions on Learning Technologies IEEE, 2023
- “The faces of engagement: Automatic recognition of student engagementfrom facial expressions” In IEEE Transactions on Affective Computing 5.1 IEEE, 2014, pp. 86–98
- “Automatic engagement prediction with GAP feature” In Proceedings of the 20th ACM International Conference on Multimodal Interaction, 2018, pp. 599–603
- “Fine-grained engagement recognition in online learning environment” In 2019 IEEE 9th international conference on electronics information and emergency communication (ICEIEC), 2019, pp. 338–341 IEEE
- “Multimodal approach to engagement and disengagement detection with highly imbalanced in-the-wild data” In Proceedings of the Workshop on Modeling Cognitive Processes from Multimodal Data, 2018, pp. 1–9
- “Toward active and unobtrusive engagement assessment of distance learners” In 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII), 2017, pp. 470–476 IEEE
- “Prediction and localization of student engagement in the wild” In 2018 Digital Image Computing: Techniques and Applications (DICTA), 2018, pp. 1–8 IEEE
- “Convolutional experts constrained local model for 3d facial landmark detection” In Proceedings of the IEEE International Conference on Computer Vision Workshops, 2017, pp. 2519–2528
- “Joint face detection and alignment using multitask cascaded convolutional networks” In IEEE signal processing letters 23.10 IEEE, 2016, pp. 1499–1503
- “Facial Expression Recognition Using Spatial-Temporal Semantic Graph Network” In 2020 IEEE International Conference on Image Processing (ICIP), 2020, pp. 1961–1965 DOI: 10.1109/ICIP40778.2020.9191181
- Davis King “Dlib C++ Library” [Online; accessed [Access Year]], http://dlib.net/, 2024
- “Attention mesh: High-fidelity face mesh prediction in real-time” In arXiv preprint arXiv:2006.10962, 2020
- “Reliability and validity analysis of MediaPipe-based measurement system for some human rehabilitation motions” In Measurement 214 Elsevier, 2023, pp. 112826
- “Head pose estimation using facial-landmarks classification for children rehabilitation games” In Pattern Recognition Letters 152 Elsevier, 2021, pp. 406–412
- Geethu Miriam Jacob and Bjorn Stenger “Facial action unit detection with transformers” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 7680–7689
- “Facial expression recognition via deep action units graph network based on psychological mechanism” In IEEE Transactions on Cognitive and Developmental Systems 12.2 IEEE, 2019, pp. 311–322
- Thomas N Kipf and Max Welling “Semi-supervised classification with graph convolutional networks” In arXiv preprint arXiv:1609.02907, 2016
- Georgios N Yannakakis, Roddy Cowie and Carlos Busso “The ordinal nature of emotions: An emerging approach” In IEEE Transactions on Affective Computing 12.1 IEEE, 2018, pp. 16–35
- “A simple approach to ordinal classification” In Machine Learning: ECML 2001: 12th European Conference on Machine Learning Freiburg, Germany, September 5–7, 2001 Proceedings 12, 2001, pp. 145–156 Springer
- “Human expert labeling process (HELP): towards a reliable higher-order user state labeling process and tool to assess student engagement” In Educational Technology JSTOR, 2017, pp. 53–59
- “A Skeleton-based Rehabilitation Exercise Assessment System with Rotation Invariance” In IEEE Transactions on Neural Systems and Rehabilitation Engineering IEEE, 2023
- “Gradient-Weighted Class Activation Mapping for Spatio Temporal Graph Convolutional Network” In ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 4043–4047 DOI: 10.1109/ICASSP43922.2022.9746621
- Lilang Lin, Jiahang Zhang and Jiaying Liu “Actionlet-Dependent Contrastive Learning for Unsupervised Skeleton-Based Action Recognition” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 2363–2372
- Ali Abedi, Mobin Malmirian and Shehroz S Khan “Cross-Modal Video to Body-joints Augmentation for Rehabilitation Exercise Quality Assessment” In arXiv preprint arXiv:2306.09546, 2023