Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

From Generalist to Specialist: Improving Large Language Models for Medical Physics Using ARCoT (2405.11040v1)

Published 17 May 2024 in cs.CL and physics.med-ph

Abstract: LLMs have achieved remarkable progress, yet their application in specialized fields, such as medical physics, remains challenging due to the need for domain-specific knowledge. This study introduces ARCoT (Adaptable Retrieval-based Chain of Thought), a framework designed to enhance the domain-specific accuracy of LLMs without requiring fine-tuning or extensive retraining. ARCoT integrates a retrieval mechanism to access relevant domain-specific information and employs step-back and chain-of-thought prompting techniques to guide the LLM's reasoning process, ensuring more accurate and context-aware responses. Benchmarking on a medical physics multiple-choice exam, our model outperformed standard LLMs and reported average human performance, demonstrating improvements of up to 68% and achieving a high score of 90%. This method reduces hallucinations and increases domain-specific performance. The versatility and model-agnostic nature of ARCoT make it easily adaptable to various domains, showcasing its significant potential for enhancing the accuracy and reliability of LLMs in specialized fields.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  2. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
  3. Evaluating capabilities of large language models: Performance of gpt-4 on surgical knowledge assessments. Surgery, 2024.
  4. Tailoring large language models to radiology: A preliminary approach to llm adaptation for a highly specialized domain. In International Workshop on Machine Learning in Medical Imaging, pages 464–473. Springer, 2023.
  5. Betty van Aken. Exploration and adaptation of large language models for specialized domains. 2023.
  6. Evaluating large language models on a highly-specialized topic, radiation oncology physics. Frontiers in Oncology, 13, 2023.
  7. Hallucinations in neural machine translation. 2018.
  8. A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity. arXiv preprint arXiv:2302.04023, 2023.
  9. A survey of hallucination in large foundation models. arXiv preprint arXiv:2309.05922, 2023.
  10. Gpt-4 in a cancer center—institute-wide deployment challenges and lessons learned. NEJM AI, 1(4):AIcs2300191, 2024.
  11. A study of generative large language model for medical research and healthcare. NPJ Digital Medicine, 6(1):210, 2023.
  12. Fine-tuning large language models for domain-specific machine translation. arXiv preprint arXiv:2402.15061, 2024.
  13. How fine can fine-tuning be? learning efficient language models. In International Conference on Artificial Intelligence and Statistics, pages 2435–2443. PMLR, 2020.
  14. Fine-tuning pretrained language models: Weight initializations, data orders, and early stopping. arXiv preprint arXiv:2002.06305, 2020.
  15. Fine-tuning language models from human preferences. arXiv preprint arXiv:1909.08593, 2019.
  16. Parameter-efficient fine-tuning of large-scale pre-trained language models. Nature Machine Intelligence, 5(3):220–235, 2023.
  17. Longlora: Efficient fine-tuning of long-context large language models. arXiv preprint arXiv:2309.12307, 2023a.
  18. An empirical study of catastrophic forgetting in large language models during continual fine-tuning. arXiv preprint arXiv:2308.08747, 2023.
  19. An empirical investigation of catastrophic forgetting in gradient-based neural networks. arXiv preprint arXiv:1312.6211, 2013.
  20. Retrieval augmented generation: Streamlining the creation of intelligent natural language processing models. URL https://research.facebook.com/file/4283170945104179/Retrieval-Augmented-Generation-for-Knowledge-Intensive-NLP-Tasks.pdf.
  21. Ragtruth: A hallucination corpus for developing trustworthy retrieval-augmented language models. arXiv preprint arXiv:2401.00396, 2023.
  22. A comprehensive survey of hallucination mitigation techniques in large language models. arXiv preprint arXiv:2401.01313, 2024.
  23. Lost in the middle: How language models use long contexts. Transactions of the Association for Computational Linguistics, 12:157–173, 2024.
  24. Unleashing the potential of prompt engineering in large language models: a comprehensive review. arXiv preprint arXiv:2310.14735, 2023b.
  25. Prompt engineering in large language models. In International Conference on Data Intelligence and Cognitive Informatics, pages 387–402. Springer, 2023.
  26. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35:24824–24837, 2022.
  27. Cue-cot: Chain-of-thought prompting for responding to in-depth dialogue questions with llms. In The 2023 Conference on Empirical Methods in Natural Language Processing, 2023.
  28. Chain-of-thought prompting for responding to in-depth dialogue questions with llm. arXiv preprint arXiv:2305.11792, 2023.
  29. Unstructured-IO. Unstructured-io/unstructured: Open source libraries and apis to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines. URL https://github.com/Unstructured-IO/unstructured.
  30. Multimodal chain-of-thought reasoning in language models. arXiv preprint arXiv:2302.00923, 2023.
  31. Take a step back: Evoking reasoning via abstraction in large language models. arXiv preprint arXiv:2310.06117, 2023.
  32. Blended rag: Improving rag (retriever-augmented generation) accuracy with semantic search and hybrid query-based retrievers. arXiv preprint arXiv:2404.07220, 2024.
  33. Semantic compression with large language models. In 2023 Tenth International Conference on Social Networks Analysis, Management and Security (SNAMS), pages 1–8. IEEE, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Jace Grandinetti (2 papers)
  2. Rafe Mcbeth (11 papers)