Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

EHRXQA: A Multi-Modal Question Answering Dataset for Electronic Health Records with Chest X-ray Images (2310.18652v2)

Published 28 Oct 2023 in cs.CL, cs.AI, and cs.CV

Abstract: Electronic Health Records (EHRs), which contain patients' medical histories in various multi-modal formats, often overlook the potential for joint reasoning across imaging and table modalities underexplored in current EHR Question Answering (QA) systems. In this paper, we introduce EHRXQA, a novel multi-modal question answering dataset combining structured EHRs and chest X-ray images. To develop our dataset, we first construct two uni-modal resources: 1) The MIMIC-CXR-VQA dataset, our newly created medical visual question answering (VQA) benchmark, specifically designed to augment the imaging modality in EHR QA, and 2) EHRSQL (MIMIC-IV), a refashioned version of a previously established table-based EHR QA dataset. By integrating these two uni-modal resources, we successfully construct a multi-modal EHR QA dataset that necessitates both uni-modal and cross-modal reasoning. To address the unique challenges of multi-modal questions within EHRs, we propose a NeuralSQL-based strategy equipped with an external VQA API. This pioneering endeavor enhances engagement with multi-modal EHR sources and we believe that our dataset can catalyze advances in real-world medical scenarios such as clinical decision-making and research. EHRXQA is available at https://github.com/baeseongsu/ehrxqa.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (57)
  1. Vqa-med: Overview of the medical visual question answering task at imageclef 2019. CLEF (working notes), 2(6), 2019.
  2. Vqa: Visual question answering. In Proceedings of the IEEE international conference on computer vision, pages 2425–2433, 2015.
  3. Question answering for complex electronic health records database using unified encoder-decoder architecture. In Machine Learning for Health, pages 13–25. PMLR, 2021.
  4. Drugehrqa: A question answering dataset on structured and unstructured electronic health records for medicine related queries. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 1083–1097, 2022.
  5. Overview of the vqa-med task at imageclef 2021: Visual question answering and generation in the medical domain. In Proceedings of the CLEF 2021 Conference and Labs of the Evaluation Forum-working notes. 21-24 September 2021, 2021.
  6. Semantic parsing on freebase from question-answer pairs. In Proceedings of the 2013 conference on empirical methods in natural language processing, pages 1533–1544, 2013.
  7. Discrepancy and error in radiology: concepts, causes and consequences. The Ulster medical journal, 81(1):3, 2012.
  8. Adrian P Brady. Error and discrepancy in radiology: inevitable or avoidable? Insights into imaging, 8:171–182, 2017.
  9. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  10. Webqa: Multihop and multimodal qa. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16495–16504, 2022.
  11. Mark Chen et al. Evaluating large language models trained on code. In arXiv, 2021.
  12. Murag: Multimodal retrieval-augmented generator for open question answering over images and text. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 5558–5570, 2022.
  13. Multi-modal masked autoencoders for medical vision-and-language pre-training. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2022: 25th International Conference, Singapore, September 18–22, 2022, Proceedings, Part V, pages 679–689. Springer, 2022.
  14. Symphony: Towards natural language query answering over multi-modal data lakes. In Conference on Innovative Data Systems Research, CIDR, pages 8–151, 2023.
  15. Binding language models in symbolic languages. arXiv preprint arXiv:2210.02875, 2022.
  16. Conversational question answering on heterogeneous sources. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 144–154, 2022.
  17. Leafai: query generator for clinical cohort discovery rivaling a human programmer. arXiv preprint arXiv:2304.06203, 2023.
  18. Pubmedclip: How much does clip benefit visual question answering in the medical domain? In Findings of the Association for Computational Linguistics: EACL 2023, pages 1151–1163, 2023.
  19. Vqa-lol: Visual question answering under the lens of logic. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXI 16, pages 379–396. Springer, 2020.
  20. Manymodalqa: Modality disambiguation and qa over diverse inputs. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 7879–7886, 2020.
  21. Overview of imageclef 2018 medical domain visual question answering task. In CLEF (Working Notes), 2018.
  22. Xuehai He. Towards visual question answering on pathology images. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, volume 2, 2021.
  23. Interpretable medical image visual question answering via multi-modal relationship graph learning. arXiv preprint arXiv:2302.09636, 2023.
  24. Medical knowledge-based network for patient-oriented visual question answering. Information Processing & Management, 2023.
  25. Ovqa: A clinically generated visual question answering dataset. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2924–2938, 2022.
  26. Gqa: A new dataset for real-world visual reasoning and compositional question answering. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6700–6709, 2019.
  27. Mimic-iv, a freely accessible electronic health record dataset. Scientific data, 10(1):1, 2023.
  28. Mimic-cxr, a de-identified publicly available database of chest radiographs with free-text reports. Scientific data, 6(1):317, 2019.
  29. Mimic-iii, a freely accessible critical care database. Scientific data, 3(1):1–9, 2016.
  30. Clevr: A diagnostic dataset for compositional language and elementary visual reasoning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2901–2910, 2017.
  31. Towards visual dialog for radiology. In Proceedings of the 19th SIGBioMed Workshop on Biomedical Language Processing, pages 60–69, 2020.
  32. A dataset of clinically generated visual questions and answers about radiology images. Scientific data, 5(1):1–10, 2018.
  33. Ehrsql: A practical text-to-sql benchmark for electronic health records. Advances in Neural Information Processing Systems, 35:15589–15601, 2022.
  34. Learning to ask like a physician. In Workshop on Clinical Natural Language Processing, 2022.
  35. Can llm already serve as a database interface? a big bench for large-scale database grounded text-to-sqls. arXiv preprint arXiv:2305.03111, 2023.
  36. Mmcoqa: Conversational question answering over text, tables, and images. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4220–4231, 2022.
  37. Medical visual question answering: A survey. arXiv preprint arXiv:2111.10056, 2021.
  38. Slake: A semantically-labeled knowledge-enhanced dataset for medical visual question answering. In 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI), pages 1650–1654. IEEE, 2021.
  39. Learn to explain: Multimodal reasoning via thought chains for science question answering. Advances in Neural Information Processing Systems, 35:2507–2521, 2022.
  40. Multi-modal understanding and generation for medical images and text via vision-language pre-training. IEEE Journal of Biomedical and Health Informatics, 26(12):6070–6080, 2022.
  41. Capabilities of gpt-4 on medical challenge problems. 2023.
  42. OpenAI. Introducing chatgpt, 2022.
  43. OpenAI. Gpt-4 technical report. In arXiv, 2023.
  44. emrqa: A large corpus for question answering on electronic medical records. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2357–2368, 2018.
  45. The eicu collaborative research database, a freely available multi-center database for critical care research. Scientific data, 5(1):1–13, 2018.
  46. emrkbqa: A clinical knowledge-base question answering dataset. In Proceedings of the 20th Workshop on Biomedical Language Processing, pages 64–73, 2021.
  47. The probabilistic relevance framework: Bm25 and beyond. Foundations and Trends® in Information Retrieval, 3(4):333–389, 2009.
  48. Mimoqa: Multimodal input multimodal output question answering. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5317–5332, 2021.
  49. quehry: a question answering system to query electronic health records. Journal of the American Medical Informatics Association, 30(6):1091–1102, 2023.
  50. Multimodalqa: Complex question answering over text, tables and images. arXiv preprint arXiv:2104.06039, 2021.
  51. Towards understanding the generalization of medical text-to-sql models and datasets. arXiv preprint arXiv:2303.12898, 2023.
  52. Towards multi-modal dbmss for seamless querying of texts and tables. arXiv preprint arXiv:2304.13559, 2023.
  53. Text-to-sql generation for question answering on electronic medical records. In Proceedings of The Web Conference 2020, pages 350–361, 2020.
  54. Chest imagenome dataset for clinical reasoning. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2).
  55. Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3911–3921, 2018.
  56. Multihiertt: Numerical reasoning over multi hierarchical tabular and textual data. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6588–6600, 2022.
  57. Seq2sql: Generating structured queries from natural language using reinforcement learning. arXiv preprint arXiv:1709.00103, 2017.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (11)
  1. Seongsu Bae (12 papers)
  2. Daeun Kyung (7 papers)
  3. Jaehee Ryu (2 papers)
  4. Eunbyeol Cho (6 papers)
  5. Gyubok Lee (12 papers)
  6. Sunjun Kweon (7 papers)
  7. Jungwoo Oh (11 papers)
  8. Lei Ji (33 papers)
  9. Eric I-Chao Chang (20 papers)
  10. Tackeun Kim (4 papers)
  11. Edward Choi (90 papers)
Citations (14)

Summary

We haven't generated a summary for this paper yet.