Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
175 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

M&M: Multimodal-Multitask Model Integrating Audiovisual Cues in Cognitive Load Assessment (2403.09451v1)

Published 14 Mar 2024 in cs.CV, cs.MM, cs.SD, and eess.AS

Abstract: This paper introduces the M&M model, a novel multimodal-multitask learning framework, applied to the AVCAffe dataset for cognitive load assessment (CLA). M&M uniquely integrates audiovisual cues through a dual-pathway architecture, featuring specialized streams for audio and video inputs. A key innovation lies in its cross-modality multihead attention mechanism, fusing the different modalities for synchronized multitasking. Another notable feature is the model's three specialized branches, each tailored to a specific cognitive load label, enabling nuanced, task-specific analysis. While it shows modest performance compared to the AVCAffe's single-task baseline, M&M demonstrates a promising framework for integrated multimodal processing. This work paves the way for future enhancements in multimodal-multitask learning systems, emphasizing the fusion of diverse data types for complex task handling.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. Multi-channel spectrograms for speech processing applications using deep learning methods. Pattern Analysis and Applications, 24:423–431.
  2. Beecham, R. et al. (2017). The impact of cognitive load theory on the practice of instructional design. Educational Psychology Review, 29(2):239–254.
  3. How cognitive load affects duration judgments: A meta-analytic review. Acta Psychologica, 134(3):330–343.
  4. Busso, C. et al. (2008). Iemocap: Interactive emotional dyadic motion capture database. Language Resources and Evaluation, 42(4):335–359.
  5. Classification of drivers’ mental workload levels: Comparison of machine learning methods based on ecg and infrared thermal signals. Sensors (Basel, Switzerland), 22.
  6. Quo vadis, action recognition? a new model and the kinetics dataset.
  7. Chen, S. (2020). Multimodal event-based task load estimation from wearables. In 2020 International Joint Conference on Neural Networks (IJCNN), pages 1–9.
  8. Matt: Multimodal attention level estimation for e-learning platforms. ArXiv, abs/2301.09174.
  9. Multimodal multitask deep learning model for alzheimer’s disease progression detection based on time series data. Neurocomputing, 412:197–215.
  10. Fundamental cognitive workload assessment: A machine learning comparative approach. In 2017 Conference Proceedings, pages 275–284. Springer.
  11. Cross-connected networks for multi-task learning of detection and segmentation. 2019 IEEE International Conference on Image Processing (ICIP), pages 3636–3640.
  12. Cognitive state monitoring and the design of adaptive instruction in digital environments: lessons learned from cognitive workload assessment using a passive brain-computer interface approach. Frontiers in Neuroscience, 8.
  13. Hemodynamic analysis for cognitive load assessment and classification in motor learning tasks using type-2 fuzzy sets. IEEE Transactions on Emerging Topics in Computational Intelligence, 3:245–260.
  14. Datasets for cognitive load inference using wearable sensors and psychological traits. Applied Sciences.
  15. Group, N. N. (2013). Minimize cognitive load to maximize usability.
  16. Effective assessment of cognitive load in real-world scenarios using wrist-worn sensor data. In Proceedings of the Workshop on Body-Centric Computing Systems.
  17. Deep convolutional neural networks for mental load classification based on eeg data. Pattern Recognition, 76:582–595.
  18. Measuring cognitive load using linguistic features: Implications for usability evaluation and adaptive interaction design. International Journal of Human-Computer Interaction, 30:343–368.
  19. Kossaifi, J. et al. (2019). Sewa db: A rich database for audio-visual emotion and sentiment research in the wild. Transactions on Pattern Analysis and Machine Intelligence.
  20. Kress, G. (2009). Multimodality: A Social Semiotic Approach to Contemporary Communication. Routledge, London ; New York.
  21. Cognitive analysis of working memory load from eeg, by a deep recurrent neural network. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 2576–2580.
  22. Tracking of mental workload with a mobile eeg sensor. Sensors (Basel, Switzerland), 21(15).
  23. Mmod-cog: A database for multimodal cognitive load classification. 2019 11th International Symposium on Image and Signal Processing and Analysis (ISPA), pages 15–20.
  24. Clinical risk prediction with temporal probabilistic asymmetric multi-task learning. ArXiv, abs/2006.12777.
  25. Adabase: A multimodal dataset for cognitive load estimation. Sensors (Basel, Switzerland).
  26. Preliminary classification of cognitive load states in a human machine interaction scenario. In 2017 International Conference on Companion Technology (ICCT), pages 1–5. IEEE.
  27. Cognitive load theory: A broader view on the role of memory in learning and education. Educational Psychology Review, 32:1053–1072.
  28. Formal algorithms for transformers.
  29. Multitask representation learning for multimodal estimation of depression level. IEEE Intelligent Systems, 34:45–52.
  30. Ruder, S. (2017). An overview of multi-task learning in deep neural networks. ArXiv, abs/1706.05098.
  31. Classification of eeg signals for cognitive load estimation using deep learning architectures. In 2018 Conference Proceedings, pages 59–68. Springer.
  32. Avcaffe: A large scale audio-visual dataset of cognitive load in remote work environments.
  33. Identifying beneficial task relations for multi-task learning in deep neural networks. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, pages 164–169.
  34. Bioinspired multisensory neural network with crossmodal integration and recognition. Nature Communications, 12.
  35. Personalized multitask learning for predicting tomorrow’s mood, stress, and health. IEEE Transactions on Affective Computing, 11:200–213.
  36. Comparing two subjective rating scales assessing cognitive load during technology-enhanced stem laboratory courses. Frontiers in Education, 6.
  37. Multimodal Transformer for Unaligned Multimodal Language Sequences. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 6558–6569, Florence, Italy. Association for Computational Linguistics.
  38. Towards measuring cognitive load through multimodal physiological data. Cognition, Technology & Work, 23:567–585.
  39. Can cnns be more robust than transformers?
  40. Cognitive load theory: Implications for medical education: Amee guide no. 86. Medical Teacher, 36(5):371–384.
  41. Photoplethysmogram-based cognitive load assessment using multi-feature fusion model. ACM Transactions on Applied Perception (TAP), 16:1 – 17.
  42. Task similarity estimation through adversarial multitask neural network. IEEE Transactions on Neural Networks and Learning Systems, 32:466–480.

Summary

We haven't generated a summary for this paper yet.