Design reporting structures for AI‑specific vulnerabilities
Develop and specify appropriate vulnerability reporting structures tailored to AI‑specific model vulnerabilities (such as jailbreaks, prompt injection, and model data leakage), including safe‑harbor and disclosure policies that account for the difficulty of patching such vulnerabilities and the offense–defense balance of relevant knowledge.
References
For AI-specific vulnerabilities (e.g. a model leaking sensitive training data in its outputs), it is less clear what appropriate reporting structures would look like. This is because it is often unclear how to patch such vulnerabilities and what the offense-defense balance of relevant knowledge is.
— From Principles to Rules: A Regulatory Approach for Frontier AI
(2407.07300 - Schuett, 10 Jul 2024) in Section IV.A.5 (Reporting structure for vulnerabilities)