Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

EHRSQL: A Practical Text-to-SQL Benchmark for Electronic Health Records (2301.07695v5)

Published 16 Jan 2023 in cs.CL and cs.AI

Abstract: We present a new text-to-SQL dataset for electronic health records (EHRs). The utterances were collected from 222 hospital staff members, including physicians, nurses, and insurance review and health records teams. To construct the QA dataset on structured EHR data, we conducted a poll at a university hospital and used the responses to create seed questions. We then manually linked these questions to two open-source EHR databases, MIMIC-III and eICU, and included various time expressions and held-out unanswerable questions in the dataset, which were also collected from the poll. Our dataset poses a unique set of challenges: the model needs to 1) generate SQL queries that reflect a wide range of needs in the hospital, including simple retrieval and complex operations such as calculating survival rate, 2) understand various time expressions to answer time-sensitive questions in healthcare, and 3) distinguish whether a given question is answerable or unanswerable. We believe our dataset, EHRSQL, can serve as a practical benchmark for developing and assessing QA models on structured EHR data and take a step further towards bridging the gap between text-to-SQL research and its real-life deployment in healthcare. EHRSQL is available at https://github.com/glee4810/EHRSQL.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. Sai Vamsi Alisetti. Paraphrase generator with t5, 2020. URL https://github.com/Vamsi995/Paraphrase-Generator.
  2. Question answering for complex electronic health records database using unified encoder-decoder architecture. In Proceedings of Machine Learning for Health, volume 158 of Proceedings of Machine Learning Research, pages 13–25. PMLR, 04 Dec 2021. URL https://proceedings.mlr.press/v158/bae21a.html.
  3. Drugehrqa: A question answering dataset on structured and unstructured electronic health records for medicine related queries. arXiv preprint arXiv:2205.01290, 2022.
  4. Simple and effective multi-paragraph reading comprehension. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 845–855, Melbourne, Australia, July 2018. Association for Computational Linguistics. doi: 10.18653/v1/P18-1078. URL https://aclanthology.org/P18-1078.
  5. Prithiviraj Damodaran. Parrot: Paraphrase generation for nlu, 2021a. URL https://github.com/PrithivirajDamodaran/Parrot_Paraphraser.
  6. Prithiviraj Damodaran. Styleformer, 2021b. URL https://github.com/PrithivirajDamodaran/Styleformer.
  7. Speak to your parser: Interactive text-to-SQL with natural language feedback. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 2065–2077, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.187. URL https://aclanthology.org/2020.acl-main.187.
  8. Beyond english-centric multilingual machine translation. Journal of Machine Learning Research, 22(107):1–48, 2021.
  9. Improving text-to-sql evaluation methodology. arXiv preprint arXiv:1806.09029, 2018.
  10. The pile: An 800gb dataset of diverse text for language modeling. arXiv preprint arXiv:2101.00027, 2020.
  11. Physiobank, physiotoolkit, and physionet: components of a new research resource for complex physiologic signals. circulation, 101(23):e215–e220, 2000.
  12. Text-to-sql in the wild: A naturally-occurring dataset based on stack exchange data. arXiv preprint arXiv:2106.05006, 2021.
  13. The ATIS spoken language systems pilot corpus. In Speech and Natural Language: Proceedings of a Workshop Held at Hidden Valley, Pennsylvania, June 24-27,1990, 1990. URL https://aclanthology.org/H90-1021.
  14. Adversarial examples for evaluating reading comprehension systems. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2021–2031, Copenhagen, Denmark, September 2017. Association for Computational Linguistics. doi: 10.18653/v1/D17-1215. URL https://aclanthology.org/D17-1215.
  15. Mimic-iv (version 1.0). PhysioNet, 2021. doi: https://doi.org/10.13026/s6n6-xd98.
  16. Mimic-iii, a freely accessible critical care database. Scientific data, 3(1):1–9, 2016.
  17. Mimic-cxr, a de-identified publicly available database of chest radiographs with free-text reports. Scientific data, 6(1):1–8, 2019.
  18. TriviaQA: A large scale distantly supervised challenge dataset for reading comprehension. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1601–1611, Vancouver, Canada, July 2017. Association for Computational Linguistics. doi: 10.18653/v1/P17-1147. URL https://aclanthology.org/P17-1147.
  19. Kaggledbqa: Realistic evaluation of text-to-sql parsers. arXiv preprint arXiv:2106.11455, 2021.
  20. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
  21. Uncertainty estimation in autoregressive structured prediction. arXiv preprint arXiv:2002.07650, 2020.
  22. Oodgan: Generative adversarial network for out-of-domain data generation. arXiv preprint arXiv:2104.02484, 2021.
  23. emrqa: A large corpus for question answering on electronic medical records. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2357–2368, 2018.
  24. Knowledge graph-based question answering with electronic health records. In Machine Learning for Healthcare Conference, pages 36–53. PMLR, 2021.
  25. The eicu collaborative research database, a freely available multi-center database for critical care research. Scientific data, 5(1):1–13, 2018.
  26. Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683, 2019.
  27. emrkbqa: A clinical knowledge-base question answering dataset. In Proceedings of the 20th Workshop on Biomedical Language Processing, pages 64–73, 2021.
  28. Know what you don’t know: Unanswerable questions for squad. arXiv preprint arXiv:1806.03822, 2018.
  29. Learning contextual representations for semantic parsing with generation-augmented pre-training. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 13806–13814, 2021.
  30. Exploring unexplored generalization challenges for cross-database semantic parsing. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8372–8388, 2020.
  31. Multilingual translation with extensible multilingual pretraining and finetuning. arXiv preprint arXiv:2008.00401, 2020.
  32. Text-to-sql generation for question answering on electronic medical records. In Proceedings of The Web Conference 2020, pages 350–361, 2020.
  33. On hallucination and predictive uncertainty in conditional language generation. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 2734–2744, 2021.
  34. Sqlizer: query synthesis from natural language. Proceedings of the ACM on Programming Languages, 1(OOPSLA):1–26, 2017.
  35. Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3911–3921, 2018.
  36. Sparc: Cross-domain semantic parsing in context. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4511–4523, 2019.
  37. Learning to parse database queries using inductive logic programming. In Proceedings of the national conference on artificial intelligence, pages 1050–1055, 1996.
  38. Did you ask a good question? a cross-domain question intention classification benchmark for text-to-sql. arXiv preprint arXiv:2010.12634, 2020.
  39. Out-of-domain detection for natural language understanding in dialog systems. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28:1198–1209, 2020.
  40. Seq2sql: Generating structured queries from natural language using reinforcement learning. arXiv preprint arXiv:1709.00103, 2017.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Gyubok Lee (12 papers)
  2. Hyeonji Hwang (3 papers)
  3. Seongsu Bae (12 papers)
  4. Yeonsu Kwon (6 papers)
  5. Woncheol Shin (5 papers)
  6. Seongjun Yang (6 papers)
  7. Minjoon Seo (82 papers)
  8. Jong-Yeup Kim (2 papers)
  9. Edward Choi (90 papers)
Citations (15)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com