The Radiation Oncology NLP Database (2401.10995v1)

Published 19 Jan 2024 in cs.CL and physics.med-ph

Abstract: We present the Radiation Oncology NLP Database (ROND), the first dedicated NLP dataset for radiation oncology, an important medical specialty that has received limited attention from the NLP community in the past. With the advent of AGI, there is an increasing need for specialized datasets and benchmarks to facilitate research and development. ROND is specifically designed to address this gap in the domain of radiation oncology, a field that offers many opportunities for NLP exploration. It encompasses various NLP tasks including Logic Reasoning, Text Classification, Named Entity Recognition (NER), Question Answering (QA), Text Summarization, and Patient-Clinician Conversations, each with a distinct focus on radiation oncology concepts and application cases. In addition, we have developed an instruction-tuning dataset consisting of over 20k instruction pairs (based on ROND) and trained a LLM, CancerChat. This serves to demonstrate the potential of instruction-tuning LLMs within a highly-specialized medical domain. The evaluation results in this study could serve as baseline results for future research. ROND aims to stimulate advancements in radiation oncology and clinical NLP by offering a platform for testing and improving algorithms and models in a domain-specific context. The ROND dataset is a joint effort of multiple U.S. health institutions. The data is available at https://github.com/zl-liu/Radiation-Oncology-NLP-Database.

PDF HTML Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

References (31)

Authors (15)

Zhengliang Liu (91 papers)
Jason Holmes (19 papers)
Wenxiong Liao (9 papers)
Chenbin Liu (8 papers)
Lian Zhang (32 papers)
Hongying Feng (14 papers)
Peilong Wang (16 papers)
Muhammad Ali Elahi (1 paper)
Hongmin Cai (18 papers)
Lichao Sun (186 papers)
Quanzheng Li (122 papers)
Xiang Li (1002 papers)
Tianming Liu (161 papers)
Jiajian Shen (12 papers)
Wei Liu (1135 papers)

Citations (1)

View on Semantic Scholar

GitHub

GitHub - Mayo-Clinic-RadOnc-Foundation-Models/Radiation-Oncology-NLP-Database: Radiation Oncology NLP Database (13 stars)

The Radiation Oncology NLP Database (2401.10995v1)

Related Papers

GitHub

Tweets