Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Vision-language models for decoding provider attention during neonatal resuscitation (2404.01207v1)

Published 1 Apr 2024 in cs.CV

Abstract: Neonatal resuscitations demand an exceptional level of attentiveness from providers, who must process multiple streams of information simultaneously. Gaze strongly influences decision making; thus, understanding where a provider is looking during neonatal resuscitations could inform provider training, enhance real-time decision support, and improve the design of delivery rooms and neonatal intensive care units (NICUs). Current approaches to quantifying neonatal providers' gaze rely on manual coding or simulations, which limit scalability and utility. Here, we introduce an automated, real-time, deep learning approach capable of decoding provider gaze into semantic classes directly from first-person point-of-view videos recorded during live resuscitations. Combining state-of-the-art, real-time segmentation with vision-LLMs (CLIP), our low-shot pipeline attains 91\% classification accuracy in identifying gaze targets without training. Upon fine-tuning, the performance of our gaze-guided vision transformer exceeds 98\% accuracy in gaze classification, approaching human-level precision. This system, capable of real-time inference, enables objective quantification of provider attention dynamics during live neonatal resuscitation. Our approach offers a scalable solution that seamlessly integrates with existing infrastructure for data-scarce gaze analysis, thereby offering new opportunities for understanding and refining clinical decision making.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. Evaluating clip: towards characterization of broader capabilities and downstream implications. arXiv preprint arXiv:2108.02818, 2021.
  2. The newborn delivery room of tomorrow: emerging and future technologies. Pediatr Res, pages 1–9, 2022. Publisher: Nature Publishing Group.
  3. DeepEthogram, a machine learning pipeline for supervised behavior classification from raw pixels. Elife, 10:e63377, 2021. Publisher: eLife Sciences Publications Limited.
  4. G. Bradski. The OpenCV Library. Dr. Dobb’s Journal of Software Tools, 2000.
  5. The use and limits of eye-tracking in high-fidelity clinical scenarios: A pilot study. Int Emerg Nurs, 25:43–47, 2016.
  6. Deep semantic gaze embedding and scanpath comparison for expertise classification during OPT viewing. In ACM Symposium on Eye Tracking Research and Applications, pages 1–10, New York, NY, USA, 2020. Association for Computing Machinery.
  7. Linda Gai Rui Chen and Brenda Hiu Yan Law. Use of eye-tracking to evaluate human factors in accessing neonatal resuscitation equipment and medications for advanced resuscitation: A simulation study. Frontiers in Pediatrics, 11, 2023.
  8. MMPreTrain Contributors. OpenMMLab’s Pre-training Toolbox and Benchmark, 2023.
  9. Analysis of eye-tracking behaviours in a pediatric trauma simulation. CJEM, 21(1):138–140, 2019.
  10. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, 2021. arXiv:2010.11929 [cs].
  11. Simulation in Neonatal Resuscitation. Frontiers in Pediatrics, 8, 2020.
  12. Deep residual learning for image recognition. pages 770–778, 2016.
  13. Provider visual attention on a respiratory function monitor during neonatal resuscitation. Archives of Disease in Childhood - Fetal and Neonatal Edition, 105(6):666–668, 2020a. Publisher: BMJ Publishing Group Section: Short report.
  14. Impact of flow disruptions in the delivery room. Resuscitation, 150:29–35, 2020b.
  15. Creation and validation of a chest X-ray dataset with eye-tracking and report dictation for AI development. Sci Data, 8(1):92, 2021. Number: 1 Publisher: Nature Publishing Group.
  16. Visual attention on a respiratory function monitor during simulated neonatal resuscitation: an eye-tracking study. Arch Dis Child Fetal Neonatal Ed, 104(3):F259–F264, 2019.
  17. A collaborative computer aided diagnosis (C-CAD) system with eye-tracking, sparse attentional model, and deep learning. Medical Image Analysis, 51:101–115, 2019.
  18. Scene-dependent, feedforward eye gaze metrics can differentiate technical skill levels of trainees in laparoscopic surgery. Surg Endosc, 37(2):1569–1580, 2023.
  19. Analysis of neonatal resuscitation using eye tracking: a pilot study. Arch Dis Child Fetal Neonatal Ed, 103(1):F82–F84, 2018.
  20. Identification of gaze pattern and blind spots by upper gastrointestinal endoscopy using an eye-tracking technique. Surg Endosc, 36(4):2574–2581, 2022.
  21. Tina A. Leone. Using video to assess and improve patient safety during simulated and actual neonatal resuscitation. Seminars in Perinatology, 43(8):151179, 2019.
  22. Eye-gaze-guided Vision Transformer for Rectifying Shortcut Learning, 2022. arXiv:2205.12466 [cs].
  23. Clinicians’ gaze behaviour in simulated paediatric emergencies. Arch Dis Child, 103(12):1146–1149, 2018.
  24. MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer, 2022. arXiv:2110.02178 [cs].
  25. Improving Zero-Shot Detection of Low Prevalence Chest Pathologies using Domain Pre-trained Language Models. arXiv preprint arXiv:2306.08000, 2023.
  26. Vision transformer for covid-19 cxr diagnosis using chest x-ray feature corpus. arXiv preprint arXiv:2103.07055, 2021.
  27. Learning Transferable Visual Models From Natural Language Supervision. In Proceedings of the 38th International Conference on Machine Learning, pages 8748–8763. PMLR, 2021. ISSN: 2640-3498.
  28. ImageNet Large Scale Visual Recognition Challenge, 2015. arXiv:1409.0575 [cs].
  29. Laion-400m: Open dataset of clip-filtered 400 million image-text pairs. arXiv preprint arXiv:2111.02114, 2021.
  30. Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization. Int J Comput Vis, 128(2):336–359, 2020. arXiv:1610.02391 [cs].
  31. Unsupervised domain adaptation for clinician pose estimation and instance segmentation in the operating room. Medical Image Analysis, 80:102525, 2022.
  32. Eye Tracking for Deep Learning Segmentation Using Convolutional Neural Networks. Journal of Digital Imaging, 32(4):597–604, 2019.
  33. Integrating Eye Tracking and Speech Recognition Accurately Annotates MR Brain Images for Deep Learning: Proof of Principle. Radiology: Artificial Intelligence, 3(1):e200047, 2021. Publisher: Radiological Society of North America.
  34. Neural Networks for Semantic Gaze Analysis in XR Settings. pages 1–11, 2021.
  35. Combining first-person video and gaze-tracking in medical simulation: a technical feasibility study. ScientificWorldJournal, 2014:975752, 2014.
  36. Going deeper with image transformers. pages 32–42, 2021.
  37. How visual search relates to visual diagnostic performance: a narrative systematic review of eye-tracking research in radiology. Adv in Health Sci Educ, 22(3):765–787, 2017.
  38. Evaluation of eye tracking for a decision support application. JAMIA Open, 4(3):ooab059, 2021.
  39. Eye-tracking during simulation-based neonatal airway management. Pediatr Res, 87(3):518–522, 2020. Number: 3 Publisher: Nature Publishing Group.
  40. Visual attention during pediatric resuscitation with feedback devices: a randomized simulation study. Pediatr Res, 91(7):1762–1768, 2022. Number: 7 Publisher: Nature Publishing Group.
  41. GazeSAM: What You See is What You Segment, 2023a. arXiv:2304.13844 [cs].
  42. GazeGNN: A Gaze-Guided Graph Neural Network for Chest X-ray Classification, 2023b. arXiv:2305.18221 [cs].
  43. Follow My Eye: Using Gaze to Supervise Computer-Aided Diagnosis. IEEE Trans Med Imaging, 41(7):1688–1698, 2022.
  44. Machine learning based on eye-tracking data to identify Autism Spectrum Disorder: A systematic review and meta-analysis. Journal of Biomedical Informatics, 137:104254, 2023.
  45. Visual attention patterns of team leaders during delivery room resuscitation. Resuscitation, 147:21–25, 2020.
  46. Textbook of Neonatal Resuscitation. American Academy of Pediatrics.
  47. Can eye-tracking technology improve situational awareness in paramedic clinical education? Open Access Emerg Med, 5:23–28, 2013.
  48. Gaze training enhances laparoscopic technical skill acquisition and multi-tasking performance: a randomized, controlled study. Surg Endosc, 25(12):3731–3739, 2011.
  49. Automating Areas of Interest Analysis in Mobile Eye Tracking Experiments based on Machine Learning. J Eye Mov Res, 11(6):10.16910/jemr.11.6.6, 2018.
  50. Predicting human gaze beyond pixels. Journal of Vision, 14(1):28, 2014.
  51. Analysis and classification of errors made by teams during neonatal resuscitation. Resuscitation, 96:109–113, 2015. Publisher: Elsevier.
  52. Using eye-tracking augmented cognitive task analysis to explore healthcare professionals’ cognition during neonatal resuscitation. Resuscitation Plus, 6:100119, 2021.
  53. Faster segment anything: Towards lightweight sam for mobile applications. arXiv preprint arXiv:2306.14289, 2023.
  54. Tip-adapter: Training-free clip-adapter for better vision-language modeling. CoRR, abs/2111.03930, 2021.

Summary

We haven't generated a summary for this paper yet.