Real-Time Multimodal Cognitive Assistant for Emergency Medical Services (2403.06734v1)
Abstract: Emergency Medical Services (EMS) responders often operate under time-sensitive conditions, facing cognitive overload and inherent risks, requiring essential skills in critical thinking and rapid decision-making. This paper presents CognitiveEMS, an end-to-end wearable cognitive assistant system that can act as a collaborative virtual partner engaging in the real-time acquisition and analysis of multimodal data from an emergency scene and interacting with EMS responders through Augmented Reality (AR) smart glasses. CognitiveEMS processes the continuous streams of data in real-time and leverages edge computing to provide assistance in EMS protocol selection and intervention recognition. We address key technical challenges in real-time cognitive assistance by introducing three novel components: (i) a Speech Recognition model that is fine-tuned for real-world medical emergency conversations using simulated EMS audio recordings, augmented with synthetic data generated by LLMs; (ii) an EMS Protocol Prediction model that combines state-of-the-art (SOTA) tiny LLMs with EMS domain knowledge using graph-based attention mechanisms; (iii) an EMS Action Recognition module which leverages multimodal audio and video data and protocol predictions to infer the intervention/treatment actions taken by the responders at the incident scene. Our results show that for speech recognition we achieve superior performance compared to SOTA (WER of 0.290 vs. 0.618) on conversational data. Our protocol prediction component also significantly outperforms SOTA (top-3 accuracy of 0.800 vs. 0.200) and the action recognition achieves an accuracy of 0.727, while maintaining an end-to-end latency of 3.78s for protocol prediction on the edge and 0.31s on the server.
- J. Sweller, “Cognitive load theory,” in Psychology of learning and motivation. Elsevier, 2011, vol. 55, pp. 37–76.
- S. M. Preum, S. Shu, J. Ting, V. Lin, R. Williams, J. Stankovic, and H. Alemzadeh, “Towards a cognitive assistant system for emergency response,” in 2018 ACM/IEEE 9th International Conference on Cyber-Physical Systems (ICCPS). IEEE, 2018, pp. 347–348.
- S. Preum, S. Shu, M. Hotaki, R. Williams, J. Stankovic, and H. Alemzadeh, “Cognitiveems: A cognitive assistant system for emergency medical services,” ACM SIGBED Review, vol. 16, no. 2, pp. 51–60, 2019.
- S. Shu, S. Preum, H. M. Pitchford, R. D. Williams, J. Stankovic, and H. Alemzadeh, “A behavior tree cognitive assistant system for emergency medical services,” in 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2019, pp. 6188–6195.
- M. A. Rahman, L. Jia, E. Mirza, S. M. Preum, H. Alemzadeh, R. D. Williams, and J. A. Stankovic, “emsreact: A real-time interactive cognitive assistant for cardiac arrest training in emergency medical services,” in 2023 19th International Conference on Distributed Computing in Smart Systems and the Internet of Things (DCOSS-IoT), 2023, pp. 120–128.
- L. Jin, T. Liu, A. Haroon, R. Stoleru, M. Middleton, Z. Zhu, and T. Chaspari, “Emsassist: An end-to-end mobile voice assistant at the edge for emergency medical services,” in Proceedings of the 21st Annual International Conference on Mobile Systems, Applications and Services, 2023, pp. 275–288.
- S. M. Preum, S. Shu, H. Alemzadeh, and J. A. Stankovic, “Emscontext: Ems protocol-driven concept extraction for cognitive assistance in emergency response,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 13 350–13 355, Apr. 2020. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/view/7048
- M. A. Rahman, S. M. Preum, R. Williams, H. Alemzadeh, and J. A. Stankovic, “Grace: generating summary reports automatically for cognitive assistance in emergency response,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 08, 2020, pp. 13 356–13 362.
- S. Kim, W. Guo, R. Williams, J. Stankovic, and H. Alemzadeh, “Information extraction from patient care reports for intelligent emergency medical services,” in 2021 IEEE/ACM Conference on Connected Health: Applications, Systems and Engineering Technologies (CHASE). IEEE, 2021, pp. 58–69.
- M. A. Rahman, K. Weerasinghe, L. Wijayasingha, H. Alemzadeh, R. D. Williams, and J. Stankovic, “Senseems-towards a hand activity recognition and monitoring system for emergency medical services,” in The 22nd International Conference on Information Processing in Sensor Networks, 2023, pp. 310–311.
- O. Protocols, “2.7 general – cardiac arrest,” https://www.odemsaprotocols.com/Protocols/Section02/2.7%20ALS%20Adult%20Cardiac%20Arrest.pdf.
- S. M. Preum, S. Munir, M. Ma, M. S. Yasar, D. J. Stone, R. Williams, H. Alemzadeh, and J. A. Stankovic, “A review of cognitive assistants for healthcare: Trends, prospects, and future directions,” ACM Computing Surveys (CSUR), vol. 53, no. 6, pp. 1–37, 2021.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
- A. Gulati, J. Qin, C.-C. Chiu, N. Parmar, Y. Zhang, J. Yu, W. Han, S. Wang, Z. Zhang, Y. Wu et al., “Conformer: Convolution-augmented transformer for speech recognition,” 2020.
- A. Radford, J. W. Kim, T. Xu, G. Brockman, C. McLeavey, and I. Sutskever, “Robust speech recognition via large-scale weak supervision,” in International Conference on Machine Learning. PMLR, 2023, pp. 28 492–28 518.
- C.-C. Chiu, A. Tripathi, K. Chou, C. Co, N. Jaitly, D. Jaunzeikare, A. Kannan, P. Nguyen, H. Sak, A. Sankar, J. J. Tansuwan, N. Wan, Y. Wu, and F. Zhang, “Speech recognition for medical conversations,” 2018. [Online]. Available: https://arxiv.org/pdf/1711.07274.pdf
- J. Luo, J. Wang, N. Cheng, E. Xiao, J. Xiao, G. Kucsko, P. O’Neill, J. Balam, S. Deng, A. Flores et al., “Cross-language transfer learning and domain adaptation for end-to-end automatic speech recognition,” in 2021 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2021, pp. 1–6.
- A. Fazel, W. Yang, Y. Liu, R. Barra-Chicote, Y. Meng, R. Maas, and J. Droppo, “Synthasr: Unlocking synthetic data for speech recognition,” arXiv preprint arXiv:2106.07803, 2021.
- A. Rios and R. Kavuluru, “Few-shot and zero-shot multi-label learning for structured label spaces,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing, vol. 2018. NIH Public Access, 2018, p. 3132.
- J. Mullenbach, S. Wiegreffe, J. Duke, J. Sun, and J. Eisenstein, “Explainable prediction of medical codes from clinical text,” arXiv preprint arXiv:1802.05695, 2018.
- J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
- X. Ge, R. D. Williams, J. A. Stankovic, and H. Alemzadeh, “Dkec: Domain knowledge enhanced multi-label classification for electronic health records,” arXiv preprint arXiv:2310.07059, 2023.
- Y. Kong and Y. Fu, “Human action recognition and prediction: A survey,” International Journal of Computer Vision, vol. 130, no. 5, pp. 1366–1401, 2022.
- L. Schrader, A. Vargas Toro, S. Konietzny, S. Rüping, B. Schäpers, M. Steinböck, C. Krewer, F. Müller, J. Güttler, and T. Bock, “Advanced sensing and human activity recognition in early intervention and rehabilitation of elderly people,” Journal of Population Ageing, vol. 13, pp. 139–165, 2020.
- E. Kańtoch, “Human activity recognition for physical rehabilitation using wearable sensors fusion and artificial neural networks,” in 2017 Computing in Cardiology (CinC). IEEE, 2017, pp. 1–4.
- D. Mukherjee, R. Mondal, P. K. Singh, R. Sarkar, and D. Bhattacharjee, “Ensemconvnet: a deep learning approach for human activity recognition using smartphone sensors for healthcare applications,” Multimedia Tools and Applications, vol. 79, pp. 31 663–31 690, 2020.
- P. Pareek and A. Thakkar, “A survey on video-based human action recognition: recent updates, datasets, challenges, and applications,” Artificial Intelligence Review, vol. 54, pp. 2259–2322, 2021.
- Y. Zhang, Q. Guo, Z. Du, and A. Wu, “Human action recognition for dynamic scenes of emergency rescue based on spatial-temporal fusion network,” Electronics, vol. 12, no. 3, p. 538, 2023.
- W. Wang, V. W. Zheng, H. Yu, and C. Miao, “A survey of zero-shot learning: Settings, methods, and applications,” ACM Transactions on Intelligent Systems and Technology (TIST), vol. 10, no. 2, pp. 1–37, 2019.
- Y. Xian, B. Schiele, and Z. Akata, “Zero-shot learning-the good, the bad and the ugly,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4582–4591.
- A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark et al., “Learning transferable visual models from natural language supervision,” in International conference on machine learning. PMLR, 2021, pp. 8748–8763.
- J. Postel, “User datagram protocol,” Tech. Rep., 1980.
- R. OpenAI, “Gpt-4 technical report,” arXiv, pp. 2303–08 774, 2023.
- T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz et al., “Huggingface’s transformers: State-of-the-art natural language processing,” arXiv preprint arXiv:1910.03771, 2019.
- V. Panayotov, G. Chen, D. Povey, and S. Khudanpur, “Librispeech: an asr corpus based on public domain audio books,” in 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2015, pp. 5206–5210.
- G. Gerganov, “ggml,” https://github.com/ggerganov/ggml, 2023.
- ——, “whisper.cpp,” https://github.com/ggerganov/whisper.cpp, 2023.
- O. Rohanian, M. Nouriborji, H. Jauncey, S. Kouchaki, I. C. C. Group, L. Clifton, L. Merson, and D. A. Clifton, “Lightweight transformers for clinical natural language processing,” arXiv preprint arXiv:2302.04725, 2023.
- X. Yang, A. Chen, N. PourNejatian, H. C. Shin, K. E. Smith, C. Parisien, C. Compas, C. Martin, M. G. Flores, Y. Zhang et al., “Gatortron: A large clinical language model to unlock patient information from unstructured electronic health records,” arXiv preprint arXiv:2203.03540, 2022.
- Z. Hu, Y. Dong, K. Wang, and Y. Sun, “Heterogeneous graph transformer,” in Proceedings of the web conference 2020, 2020, pp. 2704–2710.
- M. Sariogoz, “clip.cpp,” https://github.com/monatis/clip.cpp, 2023.
- M. Nagel, M. Fournarakis, R. A. Amjad, Y. Bondarenko, M. Van Baalen, and T. Blankevoort, “A white paper on neural network quantization,” arXiv preprint arXiv:2106.08295, 2021.
- A. P. H. O. of Emergency Medical Services, “Alabama paramedic protocol scenarios,” https://adph.org/ems/assets/ALParamedicProtocolScenario021113.pdf.
- E. Online, “Cbt 434 cardiac 5,” https://www.emsonline.net/cbtinstructor/assets/cbt434cardiacscenarios1-5.pdf.
- P. Szymański and T. Kajdanowicz, “A scikit-based python environment for performing multi-label classification,” arXiv preprint arXiv:1702.01460, 2017.
- A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga et al., “Pytorch: An imperative style, high-performance deep learning library,” Advances in neural information processing systems, vol. 32, 2019.
- G. Bradski, “The opencv library.” Dr. Dobb’s Journal: Software Tools for the Professional Programmer, vol. 25, no. 11, pp. 120–123, 2000.