Guidelines for public release of red teaming findings and exploits
Determine whether and how researchers conducting evaluation and red teaming of deployed generative AI systems should publicly release their findings, methods, and discovered exploits, specifying appropriate protocols, timing, and scope of disclosure to avoid overly broad or overly limited sharing that could harm the community.
Sponsor
References
It is unclear whether and how researchers should publicly release their findings, methods or the exploits themselves.
— A Safe Harbor for AI Evaluation and Red Teaming
(2403.04893 - Longpre et al., 7 Mar 2024) in Table “Themes and observations,” row “Chilling Effect on Vulnerability Disclosure,” Section 3 (Challenges to Independent AI Evaluation)