Design reporting structures for AI‑specific vulnerabilities

Develop and specify appropriate vulnerability reporting structures tailored to AI‑specific model vulnerabilities (such as jailbreaks, prompt injection, and model data leakage), including safe‑harbor and disclosure policies that account for the difficulty of patching such vulnerabilities and the offense–defense balance of relevant knowledge.

Background

The authors distinguish traditional software vulnerabilities from AI‑specific ones (e.g., jailbreaks, prompt injection, and data leakage). While established bug‑bounty norms can guide software vulnerability reporting, analogous structures for AI‑specific issues are not yet well‑defined.

They note uncertainty about how to remediate model‑level vulnerabilities and the potential harms of disclosure due to offense–defense considerations, underscoring the need for carefully designed reporting frameworks that enable responsible discovery and timely mitigation.

References

For AI-specific vulnerabilities (e.g. a model leaking sensitive training data in its outputs), it is less clear what appropriate reporting structures would look like. This is because it is often unclear how to patch such vulnerabilities and what the offense-defense balance of relevant knowledge is.

— From Principles to Rules: A Regulatory Approach for Frontier AI (2407.07300 - Schuett, 10 Jul 2024) in Section IV.A.5 (Reporting structure for vulnerabilities)

Design reporting structures for AI‑specific vulnerabilities

Sponsor

Background

References

Related Problems