RJUA-QA: A Comprehensive QA Dataset for Urology (2312.09785v3)
Abstract: We introduce RJUA-QA, a novel medical dataset for question answering (QA) and reasoning with clinical evidence, contributing to bridge the gap between general LLMs and medical-specific LLM applications. RJUA-QA is derived from realistic clinical scenarios and aims to facilitate LLMs in generating reliable diagnostic and advice. The dataset contains 2,132 curated Question-Context-Answer pairs, corresponding about 25,000 diagnostic records and clinical cases. The dataset covers 67 common urological disease categories, where the disease coverage exceeds 97.6\% of the population seeking medical services in urology. Each data instance in RJUA-QA comprises: (1) a question mirroring real patient to inquiry about clinical symptoms and medical conditions, (2) a context including comprehensive expert knowledge, serving as a reference for medical examination and diagnosis, (3) a doctor response offering the diagnostic conclusion and suggested examination guidance, (4) a diagnosed clinical disease as the recommended diagnostic outcome, and (5) clinical advice providing recommendations for medical examination. RJUA-QA is the first medical QA dataset for clinical reasoning over the patient inquiries, where expert-level knowledge and experience are required for yielding diagnostic conclusions and medical examination advice. A comprehensive evaluation is conducted to evaluate the performance of both medical-specific and general LLMs on the RJUA-QA dataset. Our data is are publicly available at \url{https://github.com/alipay/RJU_Ant_QA}.
- Anmol Arora and Ananya Arora. 2023. The promise of large language models in health care. The Lancet, 401(10377):641.
- Qwen technical report. arXiv preprint arXiv:2309.16609.
- Baichuan. 2023. Baichuan 2: Open large-scale language models. arXiv preprint arXiv:2309.10305.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
- Kathi Canese and Sarah Weis. 2013. Pubmed: the bibliographic database. The NCBI handbook, 2(1).
- PubMedQA: A dataset for biomedical research question answering. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2567–2577, Hong Kong, China. Association for Computational Linguistics.
- Kiran Kamble and Waseem Alshikh. 2023. Palmyra-med: Instruction-based fine-tuning of llms enhancing medical domain performance.
- From beginner to expert: Modeling medical knowledge into general llms.
- Think-in-memory: Recalling and post-thinking enable llms with long-term memory.
- Can large language models reason about medical questions?
- Capabilities of gpt-4 on medical challenge problems.
- OpenAI. 2022. Chatgpt.
- OpenAI. 2023. Gpt-4 technical report.
- A study of generative large language model for medical research and healthcare.
- Large language models encode clinical knowledge.
- Towards expert-level medical question answering with large language models.
- Glm-130b: An open bilingual pre-trained model. arXiv preprint arXiv:2210.02414.
- Huatuogpt, towards taming language model to be a doctor. arXiv preprint arXiv:2305.15075.
- Shiwei Lyu (4 papers)
- Chenfei Chi (3 papers)
- Hongbo Cai (23 papers)
- Lei Shi (262 papers)
- Xiaoyan Yang (50 papers)
- Lei Liu (332 papers)
- Xiang Chen (343 papers)
- Deng Zhao (3 papers)
- Zhiqiang Zhang (129 papers)
- Xianguo Lyu (1 paper)
- Ming Zhang (313 papers)
- Fangzhou Li (5 papers)
- Xiaowei Ma (3 papers)
- Yue Shen (243 papers)
- Jinjie Gu (50 papers)
- Wei Xue (150 papers)
- Yiran Huang (13 papers)