Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MedSafetyBench: Evaluating and Improving the Medical Safety of Large Language Models (2403.03744v5)

Published 6 Mar 2024 in cs.AI

Abstract: As LLMs develop increasingly sophisticated capabilities and find applications in medical settings, it becomes important to assess their medical safety due to their far-reaching implications for personal and public health, patient safety, and human rights. However, there is little to no understanding of the notion of medical safety in the context of LLMs, let alone how to evaluate and improve it. To address this gap, we first define the notion of medical safety in LLMs based on the Principles of Medical Ethics set forth by the American Medical Association. We then leverage this understanding to introduce MedSafetyBench, the first benchmark dataset designed to measure the medical safety of LLMs. We demonstrate the utility of MedSafetyBench by using it to evaluate and improve the medical safety of LLMs. Our results show that publicly-available medical LLMs do not meet standards of medical safety and that fine-tuning them using MedSafetyBench improves their medical safety while preserving their medical performance. By introducing this new benchmark dataset, our work enables a systematic study of the state of medical safety in LLMs and motivates future work in this area, paving the way to mitigate the safety risks of LLMs in medicine. The benchmark dataset and code are available at https://github.com/AI4LIFE-GROUP/med-safety-bench.

Towards Safe LLMs for Medicine: A Detailed Expert Overview

In the expanding domain of artificial intelligence, LLMs have demonstrated remarkable capabilities across a multitude of applications. However, their deployment in specialized fields such as medicine demands a thorough evaluation of their safety and alignment with ethical and professional standards. The paper "Towards Safe LLMs for Medicine" spearheads critical exploration into the safety concerns unique to medical LLMs and presents methodologies to enhance their reliability in clinical contexts.

The paper identifies an urgent need to assess and mitigate risks posed by medical LLMs—a subset of LLMs trained on vast corpora of medical data. Unlike generic LLMs, medical LLMs have the potential to engage in activities that may gravely impact individual health outcomes, privacy, and broader public health systems. The research manuscript systematically illuminates current deficiencies within existing medical LLMs concerning compliance with harmful and ethically inappropriate requests, thereby positioning its research as a crucial step toward safer AI in healthcare.

Key Findings and Methods

The authors divide their investigation into three primary sections: defining medical safety, evaluating current LLMs, and implementing techniques to improve these models' safety profiles.

  1. Defining Medical Safety: The paper asserts a foundation based on the American Medical Association's Principles of Medical Ethics. By aligning LLM output to these ethical benchmarks, the paper proposes a framework to assess whether the technology honors patient rights, confidentiality, and contributes to public health.
  2. Evaluating Medical LLMs: Employing harmfulness scores, the research quantifies the tendency of LLMs to comply with harmful requests. Findings reveal that existing medical LLMs such as Medalpaca, Meditron, and others, lack robust safety alignment and often respond to harmful prompts in ways that may lead to ethical violations and potential patient harm. While state-of-the-art, safety-aligned models like GPT-4 set benchmarks, medical models fall short of these standards.
  3. Improving Safety via Fine-Tuning: Demonstrating a substantial reduction in harmfulness post fine-tuning, the research highlights fine-tuning with safety demonstrations as a competent method to elevate model safety without degrading medical task performance. The strategy encompasses the use of both general and medical safety datasets, showcasing that improved safety does not inherently compromise medical efficacy.

Implications for Future Research and Practice

The implications of these findings are profoundly intertwined with both practical medical practice and theoretical advancements in AI ethics. For practitioners and developers, this research provides a roadmap suggesting that responsible LLM deployment in sensitive environments like healthcare can be viable through structured safety protocols and iterative refinement of ethical guidelines.

Theoretically, the paper opens avenues for expansion beyond the initial evaluation and fine-tuning methods. Future efforts may explore the integration of reinforcement learning methodologies, such as human feedback loops, thereby embracing a more comprehensive approach to model training. Moreover, the adoption of multi-dimensional safety assessments, which consider domain-specific ethical nuances, can further hone AI alignment with complex moral landscapes inherent in diverse medical practices.

Conclusion

In summary, "Towards Safe LLMs for Medicine" is a seminal contribution to the discourse on AI safety, particularly within the medical field. By highlighting current LLM deficiencies and presenting viable paths to strengthen their safety, this research sets a precedent for the development of aligned, ethically sound AI models. The paper serves as both a cautionary tale and a guide for developers to integrate safety considerations into the foundation of LLM creation, use, and evolution, aligning with global calls for responsible AI development.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (23)
  1. Large language models encode clinical knowledge. Nature, 620(7972):172–180, 2023.
  2. Can generalist foundation models outcompete special-purpose tuning? case study in medicine. arXiv preprint arXiv:2311.16452, 2023.
  3. Alignment of language agents. arXiv preprint arXiv:2103.14659, 2021.
  4. Bletchley Declaration. Bletchley Declaration. https://www.gov.uk/government/publications/ai-safety-summit-2023-the-bletchley-declaration/the-bletchley-declaration-by-countries-attending-the-ai-safety-summit-1-2-november-2023, 2023. [Online; accessed 05-March-2024].
  5. To do no harm—and the most good—with ai in health care. Nature Medicine, pages 1–4, 2024a.
  6. To do no harm—and the most good—with ai in health care. New England Journal of Medicine AI, 2024b.
  7. American Medical Association. Code of Medical Ethics. https://code-medical-ethics.ama-assn.org/, 2001a. [Online; accessed 05-March-2024].
  8. American Medical Association. Principles of Medical Ethics. https://code-medical-ethics.ama-assn.org/principles, 2001b. [Online; accessed 05-March-2024].
  9. American Medical Association. History of the Code. https://www.ama-assn.org/sites/ama-assn.org/files/corp/media-browser/public/ethics/ama-code-ethics-history.pdf, 2017. [Online; accessed 05-March-2024].
  10. Fine-tuning aligned language models compromises safety, even when users do not intend to! In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=hTEGyKf0dZ.
  11. Red-teaming large language models using chain of utterances for safety-alignment, 2023.
  12. Safetybench: Evaluating the safety of large language models with multiple choice questions, 2023.
  13. Gpt-4 technical report, 2024.
  14. Llama 2: Open foundation and fine-tuned chat models, 2023a.
  15. Universal and transferable adversarial attacks on aligned language models, 2023.
  16. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023b.
  17. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  18. Medalpaca–an open-source collection of medical conversational ai models and training data. arXiv preprint arXiv:2304.08247, 2023.
  19. Meditron-70b: Scaling medical pretraining for large language models. arXiv preprint arXiv:2311.16079, 2023.
  20. Clinical camel: An open-source expert-level medical language model with dialogue-based knowledge encoding. arXiv preprint arXiv:2305.12031, 2023.
  21. Med42 - A Clinical Large Language Model. https://huggingface.co/m42-health/med42-70b, 2023. [Online; accessed 05-March-2024].
  22. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  23. Meta. Llama 2 Acceptable Use Policy. https://ai.meta.com/llama/use-policy/, 2023. [Online; accessed December 2023].
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Tessa Han (7 papers)
  2. Aounon Kumar (16 papers)
  3. Chirag Agarwal (39 papers)
  4. Himabindu Lakkaraju (88 papers)
Citations (2)