Recent Advances, Applications, and Open Challenges in Machine Learning for Health: Reflections from Research Roundtables at ML4H 2023 Symposium (2403.01628v2)
Abstract: The third ML4H symposium was held in person on December 10, 2023, in New Orleans, Louisiana, USA. The symposium included research roundtable sessions to foster discussions between participants and senior researchers on timely and relevant topics for the \ac{ML4H} community. Encouraged by the successful virtual roundtables in the previous year, we organized eleven in-person roundtables and four virtual roundtables at ML4H 2022. The organization of the research roundtables at the conference involved 17 Senior Chairs and 19 Junior Chairs across 11 tables. Each roundtable session included invited senior chairs (with substantial experience in the field), junior chairs (responsible for facilitating the discussion), and attendees from diverse backgrounds with interest in the session's topic. Herein we detail the organization process and compile takeaways from these roundtable discussions, including recent advances, applications, and open challenges for each topic. We conclude with a summary and lessons learned across all roundtables. This document serves as a comprehensive review paper, summarizing the recent advancements in machine learning for healthcare as contributed by foremost researchers in the field.
- Matching on the estimated propensity score. Econometrica, 84(2):781–807, 2016.
- Can large language models support medical facilitation work? a speculative analysis. In 4th African Human Computer Interaction Conference Proceedings (AfriCHI). ACM, ACM, November 2023.
- Mitigating the impact of biased artificial intelligence in emergency decision-making. Communications Medicine, 2(1):149, 2022.
- Knowledge distillation in deep learning and its applications. PeerJ Computer Science, 7:e474, April 2021. ISSN 2376-5992. 10.7717/peerj-cs.474. URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8053015/.
- The evidence of impact and ethical considerations of multimodal learning analytics: A systematic literature review. The Multimodal Learning Analytics Handbook, pages 289–325, 2022.
- Invariant risk minimization. arXiv preprint arXiv:1907.02893, 2019.
- Jan Auernhammer. Human-centered ai: The role of human-centered design research in the development of ai, 2020.
- Impact of missing data on bias and precision when estimating change in patient-reported outcomes from a clinical registry. Health and quality of life outcomes, 17:1–9, 2019.
- Judging facts, judging norms: Training machine learning models to judge humans requires a modified approach to labeling data. Science Advances, 9(19):eabq0701, 2023.
- A human-centered evaluation of a deep learning system deployed in clinics for the detection of diabetic retinopathy. In Proceedings of the 2020 CHI conference on human factors in computing systems, pages 1–12, 2020.
- Analysis of representations for domain adaptation. Advances in neural information processing systems, 19, 2006.
- Using propensity score weighting to reduce selection bias in large-scale data sets. Journal of Early Intervention, 40(4):347–362, 2018.
- On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.
- Taking off with ai: Lessons from aviation for healthcare. In Proceedings of the 3rd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization, pages 1–14, 2023.
- Early, intermediate and late fusion strategies for robust deep learning-based multimodal action recognition. Machine Vision and Applications, 32(6):121, 2021.
- Ethical dilemmas, mental health, artificial intelligence, and llm-based chatbots. In International Work-Conference on Bioinformatics and Biomedical Engineering, pages 313–326. Springer, 2023.
- Why is my classifier discriminatory? Advances in neural information processing systems, 31, 2018.
- Ethical machine learning in healthcare. Annual review of biomedical data science, 4:123–144, 2021.
- CrimeMuseum. Golden State Killer, 2023. URL https://www.crimemuseum.org/crime-library/famous-murders/golden-state-killer/.
- Singh S Dai T. Artificial intelligence on call: The physician’s decision of whether to use ai in clinical practice. Available at SSRN, 2021.
- The nucleotide transformer: Building and evaluating robust foundation models for human genomics. bioRxiv, pages 2023–01, 2023.
- Using explainable machine learning to characterise data drift and detect emergent health risks for emergency department admissions during covid-19. Scientific reports, 11(1):23017, 2021.
- Multimodal learning with graphs. Nature Machine Intelligence, 5(4):340–350, 2023.
- E Ferrara. Fairness and bias in artificial intelligence: A brief survey of sources. Impacts, And Mitigation Strategies. arXiv, 2023.
- Considering biased data as informative artifacts in ai-assisted health care. New England Journal of Medicine, 389(9):833–838, 2023.
- Medalign: A clinician-generated dataset for instruction following with electronic medical records. arXiv preprint arXiv:2308.14089, 2023.
- The pile: An 800gb dataset of diverse text for language modeling. arXiv preprint arXiv:2101.00027, 2020.
- Mining for equitable health: Assessing the impact of missing data in electronic health records. Journal of Biomedical Informatics, 139:104269, 2023.
- Marzyeh Ghassemi. Presentation matters for ai-generated clinical advice. Nature Human Behaviour, pages 1–3, 2023.
- Ai recognition of patient race in medical imaging: a modelling study. The Lancet Digital Health, 4(6):e406–e414, 2022.
- Domain adaptation for medical image analysis: a survey. IEEE Transactions on Biomedical Engineering, 69(3):1173–1185, 2021.
- In search of lost domain generalization. arXiv preprint arXiv:2007.01434, 2020.
- Textbooks are all you need. arXiv preprint arXiv:2306.11644, 2023.
- Evaluation of domain generalization and adaptation on improving model robustness to temporal dataset shift in clinical medicine. Scientific reports, 12(1):2726, 2022.
- A multi-center study on the adaptability of a shared foundation model for electronic health records. arXiv preprint arXiv:2311.11483, 2023.
- Gamze Gürsoy. Genome privacy and trust. Annual Review of Biomedical Data Science, 5:163–181, 2022.
- Don’t stop pretraining: Adapt language models to domains and tasks. arXiv preprint arXiv:2004.10964, 2020.
- How the fda regulates ai. Academic radiology, 27(1):58–61, 2020.
- Recent advances, applications and open challenges in machine learning for health: Reflections from research roundtables at ml4h 2022 symposium, 2022. URL https://doi.org/10.5281/zenodo.7951122.
- Invariant causal prediction for nonlinear models. Journal of Causal Inference, 6(2):20170016, 2018.
- A baseline for detecting misclassified and out-of-distribution examples in neural networks. arXiv preprint arXiv:1610.02136, 2016.
- Human-centered design for global health equity. Information technology for development, 26(3):477–505, 2020.
- Multimodal fusion with deep neural networks for leveraging ct imaging and electronic health record: a case-study in pulmonary embolism detection. Scientific reports, 10(1):22147, 2020.
- Event-based contrastive learning for medical time series. arXiv preprint arXiv:2312.10308, 2023a.
- Deep metric learning for the hemodynamics inference with electrocardiogram signals. arXiv preprint arXiv:2308.04650, 2023b.
- Federated benchmarking of medical artificial intelligence with medperf. Nature Machine Intelligence, 5(7):799–810, 2023.
- Multimodal machine learning in precision health: A scoping review. npj Digital Medicine, 5(1):171, 2022.
- Deep multimodal fusion for surgical feedback classification. In Machine Learning for Health (ML4H), pages 256–267. PMLR, 2023.
- Wilds: A benchmark of in-the-wild distribution shifts. In International Conference on Machine Learning, pages 5637–5664. PMLR, 2021.
- Towards explaining distribution shifts. In International Conference on Machine Learning, pages 17931–17952. PMLR, 2023.
- Llava-med: Training a large language-and-vision assistant for biomedicine in one day. arXiv preprint arXiv:2306.00890, 2023.
- Holistic evaluation of language models, 2023.
- Hospital strategies to engage physicians in quality improvement. Issue Brief Cent Stud Health Syst Change, 127:1–4, 2009.
- Multimodal learning on graphs for disease relation extraction. Journal of Biomedical Informatics, page 104415, 2023a.
- Evolutionary-scale prediction of atomic-level protein structure with a language model. Science, 379(6637):1123–1130, 2023b.
- Artificial intelligence for multimodal data integration in oncology. Cancer cell, 40(10):1095–1110, 2022.
- Michael Mattioli. The data-pooling problem. Berkeley Technology Law Journal, 32(1):179–236, 2017.
- A comprehensive evaluation of multi-task learning and multi-task pre-training on ehr time-series data. arXiv preprint arXiv:2007.10185, 2020.
- Event stream gpt: A data pre-processing and modeling library for generative, pre-trained transformers over continuous-time sequences of complex events. arXiv preprint arXiv:2306.11547, 2023.
- Reliable and safe use of machine translation in medical settings. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, pages 2016–2025, 2022.
- Physician detection of clinical harm in machine translation: Quality estimation aids in reliance and backtranslation identifies critical errors. arXiv preprint arXiv:2310.16924, 2023.
- Chatgpt and physicians’ malpractice risk. In JAMA Health Forum, volume 4, pages e231938–e231938. American Medical Association, 2023.
- President biden’s executive order on artificial intelligence—implications for health care organizations. JAMA, 2023.
- Foundation models for generalist medical artificial intelligence. Nature, 616(7956):259–265, 2023a.
- Med-flamingo: a multimodal medical few-shot learner. In Machine Learning for Health (ML4H), pages 353–367. PMLR, 2023b.
- Susan Morse. Most of the data generated is not used to its fullest potential — healthcarefinancenews.com. https://www.healthcarefinancenews.com/news/most-data-generated-not-used-its-fullest-potential, 2023. [Accessed 20-12-2023].
- Feature robustness in non-stationary health records: caveats to deployable model performance in common clinical machine learning tasks. In Machine Learning for Healthcare Conference, pages 381–405. PMLR, 2019.
- Hyenadna: Long-range genomic sequence modeling at single nucleotide resolution. arXiv preprint arXiv:2306.15794, 2023.
- Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464):447–453, 2019.
- Updating clinical risk stratification models using rank-based compatibility: Approaches for evaluating and optimizing clinician-model team performance. arXiv preprint arXiv:2308.05619, 2023.
- Machine Learning for Health (ML4H) 2022. In Antonio Parziale, Monica Agrawal, Shalmali Joshi, Irene Y. Chen, Shengpu Tang, Luis Oala, and Adarsh Subbaswamy, editors, Proceedings of the 2nd Machine Learning for Health symposium, volume 193 of Proceedings of Machine Learning Research, pages 1–11. PMLR, 28 Nov 2022. URL https://proceedings.mlr.press/v193/parziale22a.html.
- Predictably unequal: understanding and addressing concerns that algorithmic clinical prediction may increase health disparities. NPJ digital medicine, 3(1):99, 2020.
- Emma Pierson. Accuracy and equity in clinical risk prediction. New England Journal of Medicine, 2024.
- An algorithmic approach to reducing unexplained pain disparities in underserved populations. Nature Medicine, 27(1):136–140, 2021.
- Automated cardiovascular record retrieval by multimodal learning between electrocardiogram and clinical report. In Machine Learning for Health (ML4H), pages 480–497. PMLR, 2023.
- Sequential multi-dimensional self-supervised learning for clinical time series. In International Conference on Machine Learning. PMLR, 2023.
- Ai in health and medicine. Nature Medicine, 28(1):31–38, 2022.
- Performance of a machine learning algorithm using electronic health record data to predict postoperative complications and report on a mobile platform. JAMA Network Open, 5(5):e2211973–e2211973, 2022.
- The risks of invariant risk minimization. arXiv preprint arXiv:2010.05761, 2020.
- Machine learning for health (ml4h) 2021. In Subhrajit Roy, Stephen Pfohl, Emma Rocheteau, Girmaw Abebe Tadesse, Luis Oala, Fabian Falck, Yuyin Zhou, Liyue Shen, Ghada Zamzmi, Purity Mugambi, Ayah Zirikly, Matthew B. A. McDermott, and Emily Alsentzer, editors, Proceedings of Machine Learning for Health, volume 158 of Proceedings of Machine Learning Research, pages 1–12. PMLR, 04 Dec 2021. URL https://proceedings.mlr.press/v158/roy21a.html.
- “everyone wants to do the model work, not the data work”: Data cascades in high-stakes ai. In proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, pages 1–15, 2021.
- Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004, 2023.
- A path for translation of machine learning products into healthcare delivery. EMJ Innov, 10:19–00172, 2020.
- Underdiagnosis bias of artificial intelligence algorithms applied to chest radiographs in under-served patient populations. Nature medicine, 27(12):2176–2182, 2021.
- Deepsofa: a continuous acuity score for critically ill patients using clinically interpretable deep learning. Scientific reports, 9(1):1879, 2019.
- Towards expert-level medical question answering with large language models, 2023.
- Clinical text data in machine learning: systematic review. JMIR medical informatics, 8(3):e17984, 2020.
- Respecting autonomy and enabling diversity: The effect of eligibility and enrollment on research data demographics: Study examines the effect of eligibility and enrollment on research data demographics. Health Affairs, 40(12):1892–1899, 2021.
- Multimodal deep learning for biomedical data fusion: a review. Briefings in Bioinformatics, 23(2):bbab569, 2022.
- From development to deployment: dataset shift, causality, and shift-stable models in health ai. Biostatistics, 21(2):345–352, 2020.
- When personalization harms performance: reconsidering the use of group attributes in prediction. In International Conference on Machine Learning, pages 33209–33228. PMLR, 2023.
- Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805, 2023.
- Multimodal transformer for unaligned multimodal language sequences. In Proceedings of the conference. Association for Computational Linguistics. Meeting, volume 2019, page 6558. NIH Public Access, 2019.
- Towards generalist biomedical ai, 2023a.
- Towards generalist biomedical ai. arXiv preprint arXiv:2307.14334, 2023b.
- Expanding impact of mobile health programs: Saheli for maternal and child care. AI Magazine, 44(4):363–376, 2023.
- Natural history of pain and disability among african–americans and whites with or at risk for knee osteoarthritis: A longitudinal study. Osteoarthritis and cartilage, 26(4):471–479, 2018.
- Variational Model Inversion Attacks, January 2022. URL http://arxiv.org/abs/2201.10787. arXiv:2201.10787 [cs].
- Mimic-extract: A data extraction, preprocessing, and representation pipeline for mimic-iii. In Proceedings of the ACM conference on health, inference, and learning, pages 222–235, 2020.
- Learning to diversify for single domain generalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 834–843, 2021.
- WEF. 4 ways data is improving healthcare, 2019. URL https://www.weforum.org/agenda/2019/12/four-ways-data-is-improving-healthcare/.
- Emergent abilities of large language models. Transactions on Machine Learning Research, 2022. ISSN 2835-8856. URL https://openreview.net/forum?id=yzkSU5zdwD. Survey Certification.
- Do no harm: a roadmap for responsible machine learning for health care. Nature medicine, 25(9):1337–1340, 2019.
- Wikipedia contributors. Latanya Sweeney, September 2023. URL https://en.wikipedia.org/w/index.php?title=Latanya_Sweeney&oldid=1177812360. Page Version ID: 1177812360.
- External validation of a widely implemented proprietary sepsis prediction model in hospitalized patients. JAMA Internal Medicine, 181(8):1065–1070, 2021.
- EHRSHOT: An EHR Benchmark for Few-Shot Evaluation of Foundation Models, 2023a.
- The shaky foundations of large language models and foundation models for electronic health records. npj Digital Medicine, 6(1):135, Jul 2023b. ISSN 2398-6352. 10.1038/s41746-023-00879-8. URL https://doi.org/10.1038/s41746-023-00879-8.
- A snp panel for identification of dna and rna specimens. BMC genomics, 19:1–12, 2018.
- External validation of ai models in health should be replaced with recurring local validation. Nature Medicine, pages 1–2, 2023.
- Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: a cross-sectional study. PLoS medicine, 15(11):e1002683, 2018.
- A comprehensive study of knowledge editing for large language models. arXiv preprint arXiv:2401.01286, 2024.
- Large-scale domain-specific pretraining for biomedical vision-language processing, 2023.