2000 character limit reached
A Clinical Trial Design Approach to Auditing Language Models in Healthcare Setting (2411.16702v2)
Published 11 Nov 2024 in cs.CY and cs.LG
Abstract: We present an audit mechanism for LLMs, with a focus on models deployed in the healthcare setting. Our proposed mechanism takes inspiration from clinical trial design where we posit the LLM audit as a single blind equivalence trial, with the comparison of interest being the subject matter experts. We show that using our proposed method, we can follow principled sample size and power calculations, leading to the requirement of sampling minimum number of records while maintaining the audit integrity and statistical soundness. Finally, we provide a real-world example of the audit used in a production environment in a large-scale public health network.