Required rigor for high-stakes AI evaluations
Ascertain how to determine the appropriate level of rigor for evaluations of decision-making AI systems in high-stakes settings, relative to other applications, so that confidence levels match the potential consequences of deployment.
References
In addition, evaluations for decision-making systems in high-stakes settings will likely demand a higher level of confidence than other applications, but it is unclear how to determine the required level of rigor based on use case.
— Open Problems in Technical AI Governance
(2407.14981 - Reuel et al., 20 Jul 2024) in Section 3.3.1 Reliable Evaluations