MedSafetyBench: Evaluating and Improving the Medical Safety of Large Language Models
Abstract: As LLMs develop increasingly sophisticated capabilities and find applications in medical settings, it becomes important to assess their medical safety due to their far-reaching implications for personal and public health, patient safety, and human rights. However, there is little to no understanding of the notion of medical safety in the context of LLMs, let alone how to evaluate and improve it. To address this gap, we first define the notion of medical safety in LLMs based on the Principles of Medical Ethics set forth by the American Medical Association. We then leverage this understanding to introduce MedSafetyBench, the first benchmark dataset designed to measure the medical safety of LLMs. We demonstrate the utility of MedSafetyBench by using it to evaluate and improve the medical safety of LLMs. Our results show that publicly-available medical LLMs do not meet standards of medical safety and that fine-tuning them using MedSafetyBench improves their medical safety while preserving their medical performance. By introducing this new benchmark dataset, our work enables a systematic study of the state of medical safety in LLMs and motivates future work in this area, paving the way to mitigate the safety risks of LLMs in medicine. The benchmark dataset and code are available at https://github.com/AI4LIFE-GROUP/med-safety-bench.
- Large language models encode clinical knowledge. Nature, 620(7972):172–180, 2023.
- Can generalist foundation models outcompete special-purpose tuning? case study in medicine. arXiv preprint arXiv:2311.16452, 2023.
- Alignment of language agents. arXiv preprint arXiv:2103.14659, 2021.
- Bletchley Declaration. Bletchley Declaration. https://www.gov.uk/government/publications/ai-safety-summit-2023-the-bletchley-declaration/the-bletchley-declaration-by-countries-attending-the-ai-safety-summit-1-2-november-2023, 2023. [Online; accessed 05-March-2024].
- To do no harm—and the most good—with ai in health care. Nature Medicine, pages 1–4, 2024a.
- To do no harm—and the most good—with ai in health care. New England Journal of Medicine AI, 2024b.
- American Medical Association. Code of Medical Ethics. https://code-medical-ethics.ama-assn.org/, 2001a. [Online; accessed 05-March-2024].
- American Medical Association. Principles of Medical Ethics. https://code-medical-ethics.ama-assn.org/principles, 2001b. [Online; accessed 05-March-2024].
- American Medical Association. History of the Code. https://www.ama-assn.org/sites/ama-assn.org/files/corp/media-browser/public/ethics/ama-code-ethics-history.pdf, 2017. [Online; accessed 05-March-2024].
- Fine-tuning aligned language models compromises safety, even when users do not intend to! In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=hTEGyKf0dZ.
- Red-teaming large language models using chain of utterances for safety-alignment, 2023.
- Safetybench: Evaluating the safety of large language models with multiple choice questions, 2023.
- Gpt-4 technical report, 2024.
- Llama 2: Open foundation and fine-tuned chat models, 2023a.
- Universal and transferable adversarial attacks on aligned language models, 2023.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023b.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Medalpaca–an open-source collection of medical conversational ai models and training data. arXiv preprint arXiv:2304.08247, 2023.
- Meditron-70b: Scaling medical pretraining for large language models. arXiv preprint arXiv:2311.16079, 2023.
- Clinical camel: An open-source expert-level medical language model with dialogue-based knowledge encoding. arXiv preprint arXiv:2305.12031, 2023.
- Med42 - A Clinical Large Language Model. https://huggingface.co/m42-health/med42-70b, 2023. [Online; accessed 05-March-2024].
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
- Meta. Llama 2 Acceptable Use Policy. https://ai.meta.com/llama/use-policy/, 2023. [Online; accessed December 2023].
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.