Deciding When Legal AI Systems Are Good Enough for Use
Determine task-specific conditions under which an AI system is sufficiently reliable to be used in legal settings, based on auditing metrics such as accuracy, consistency, and groundedness, recognizing that acceptable baselines may differ across tasks such as court reporting and hallucination detection.
References
Conducting audits of AI systems — ideally across a suite of metrics that may address accuracy, consistency, and groundedness as is relevant to the task at hand — allows practitioners to better understand the limitations of the AI system for that task. An open question may remain: when is a system good enough to use?
— Tasks and Roles in Legal AI: Data Curation, Annotation, and Verification
(2504.01349 - Koenecke et al., 2 Apr 2025) in Challenge 3: Output Verification, paragraph beginning “Conducting audits of AI systems”