Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

The Radiation Oncology NLP Database (2401.10995v1)

Published 19 Jan 2024 in cs.CL and physics.med-ph

Abstract: We present the Radiation Oncology NLP Database (ROND), the first dedicated NLP dataset for radiation oncology, an important medical specialty that has received limited attention from the NLP community in the past. With the advent of AGI, there is an increasing need for specialized datasets and benchmarks to facilitate research and development. ROND is specifically designed to address this gap in the domain of radiation oncology, a field that offers many opportunities for NLP exploration. It encompasses various NLP tasks including Logic Reasoning, Text Classification, Named Entity Recognition (NER), Question Answering (QA), Text Summarization, and Patient-Clinician Conversations, each with a distinct focus on radiation oncology concepts and application cases. In addition, we have developed an instruction-tuning dataset consisting of over 20k instruction pairs (based on ROND) and trained a LLM, CancerChat. This serves to demonstrate the potential of instruction-tuning LLMs within a highly-specialized medical domain. The evaluation results in this study could serve as baseline results for future research. ROND aims to stimulate advancements in radiation oncology and clinical NLP by offering a platform for testing and improving algorithms and models in a domain-specific context. The ROND dataset is a joint effort of multiple U.S. health institutions. The data is available at https://github.com/zl-liu/Radiation-Oncology-NLP-Database.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. Falcon-40B: an open large language model with state-of-the-art performance.
  2. Palm 2 technical report. arXiv preprint arXiv:2305.10403.
  3. Radiation oncology: a century of achievements. Nature Reviews Cancer, 4(9):737–747.
  4. Clinical natural language processing for radiation oncology: a review and practical primer. International Journal of Radiation Oncology* Biology* Physics, 110(3):641–655.
  5. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712.
  6. Chestxraybert: A pretrained language model for chest radiology report summarization. IEEE Transactions on Multimedia.
  7. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311.
  8. Automatic text summarization: A comprehensive survey. Expert systems with applications, 165:113679.
  9. Hugging Face. Open LLM Leaderboard - a Hugging Face Space by HuggingFaceH4 — huggingface.co. https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard.
  10. Summarization of clinical information: a conceptual model. Journal of biomedical informatics, 44(4):688–699.
  11. The abr written and oral examinations in medical physics as currently conducted are sufficiently comprehensive and demanding to ensure that successful candidates have adequate knowledge and experience to practice in the designated specialty field. Medical physics, 34(9):3417–3419.
  12. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685.
  13. Small-spot intensity-modulated proton therapy and volumetric-modulated arc therapies for patients with locally advanced non-small-cell lung cancer: a dosimetric comparative study. Journal of applied clinical medical physics, 19(6):140–148.
  14. Robust optimization of intensity modulated proton therapy. Medical physics, 39(2):1079–1091.
  15. Summary of chatgpt/gpt-4 research and perspective towards the future of large language models. arXiv preprint arXiv:2304.01852.
  16. Deid-gpt: Zero-shot medical text de-identification by gpt-4. arXiv preprint arXiv:2303.11032.
  17. Hiba Omer. 2021. Radiobiological effects and medical applications of non-ionizing radiation. Saudi Journal of Biological Sciences, 28(10):5585–5592.
  18. OpenAI. Introducing ChatGPT — openai.com. https://openai.com/blog/chatgpt.
  19. OpenAI. 2023. Gpt-4 technical report. arXiv.
  20. Training language models to follow instructions with human feedback. arXiv preprint arXiv:2203.02155.
  21. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318.
  22. Instruction tuning with gpt-4. arXiv preprint arXiv:2304.03277.
  23. Clinicalradiobert: Knowledge-infused few shot learning for clinical notes named entity recognition. In Machine Learning in Medical Imaging: 13th International Workshop, MLMI 2022, Held in Conjunction with MICCAI 2022, Singapore, September 18, 2022, Proceedings, pages 269–278. Springer.
  24. Proton beam therapy for locally advanced lung cancer: A review. World journal of clinical oncology, 5(4):568.
  25. Large language models encode clinical knowledge. arXiv preprint arXiv:2212.13138.
  26. Prioritizing clinical trial quality assurance for photons and protons: A failure modes and effects analysis (fmea) comparison. Radiotherapy and Oncology, 182:109494.
  27. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
  28. Robust radiotherapy planning. Physics in Medicine & Biology, 63(22):22TR02.
  29. Molecular mechanism of bystander effects and related abscopal/cohort effects in cancer therapy. Oncotarget, 9(26):18637.
  30. Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652.
  31. When brain-inspired ai meets agi. arXiv preprint arXiv:2303.15935.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (15)
  1. Zhengliang Liu (91 papers)
  2. Jason Holmes (19 papers)
  3. Wenxiong Liao (9 papers)
  4. Chenbin Liu (8 papers)
  5. Lian Zhang (32 papers)
  6. Hongying Feng (14 papers)
  7. Peilong Wang (16 papers)
  8. Muhammad Ali Elahi (1 paper)
  9. Hongmin Cai (18 papers)
  10. Lichao Sun (186 papers)
  11. Quanzheng Li (122 papers)
  12. Xiang Li (1002 papers)
  13. Tianming Liu (161 papers)
  14. Jiajian Shen (12 papers)
  15. Wei Liu (1135 papers)
Citations (1)
Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets